Lecture Notes in Artificial Intelligence Edited by R. Goebel, J. Siekmann, and W. Wahlster
Subseries of Lecture Notes in Computer Science
5755
De-Shuang Huang Kang-Hyun Jo Hong-Hee Lee Hee-Jun Kang Vitoantonio Bevilacqua (Eds.)
Emerging Intelligent Computing Technology and Applications WithAspects ofArtificial Intelligence 5th International Conference on Intelligent Computing, ICIC 2009 Ulsan, South Korea, September 16-19, 2009 Proceedings
13
Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany Volume Editors De-Shuang Huang Institute of Intelligent Machines Intelligent Computing Laboratory Chinese Academy of Sciences Hefei, Anhui, China E-mail:
[email protected] Kang-Hyun Jo Hong-Hee Lee Hee-Jun Kang University of Ulsan School of Electrical Engineering Ulsan, South Korea E-mail:
[email protected], {hhlee, hjkang}@ulsan.ac.kr Vitoantonio Bevilacqua Polytechnic of Bari eBIS and DEE Valenzano, Bari, Italy E-mail:
[email protected] Library of Congress Control Number: 2009932883
CR Subject Classification (1998): I.2.3, I.5.1, I.4, I.5, F.1, F.2 LNCS Sublibrary: SL 7 – Artificial Intelligence ISSN ISBN-10 ISBN-13
0302-9743 3-642-04019-5 Springer Berlin Heidelberg New York 978-3-642-04019-1 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12743049 06/3180 543210
Preface
The International Conference on Intelligent Computing (ICIC) was formed to provide an annual forum dedicated to the emerging and challenging topics in artificial intelligence, machine learning, bioinformatics, and computational biology, etc. It aims to bring together researchers and practitioners from both academia and industry to share ideas, problems, and solutions related to the multifaceted aspects of intelligent computing. ICIC 2009, held in Ulsan, Korea, September 16-19, 2009, constituted the 5th International Conference on Intelligent Computing. It built upon the success of ICIC 2008, ICIC 2007, ICIC 2006, and ICIC 2005 held in Shanghai, Qingdao, Kunming, and Hefei, China, 2008, 2007, 2006, and 2005, respectively. This year, the conference concentrated mainly on the theories and methodologies as well as the emerging applications of intelligent computing. Its aim was to unify the picture of contemporary intelligent computing techniques as an integral concept that highlights the trends in advanced computational intelligence and bridges theoretical research with applications. Therefore, the theme for this conference was “Emerging Intelligent Computing Technology and Applications.” Papers focusing on this theme were solicited, addressing theories, methodologies, and applications in science and technology. ICIC 2009 received 1,082 submissions from 34 countries and regions. All papers went through a rigorous peer review procedure and each paper received at least three review reports. Based on the review reports, the Program Committee finally selected 257 highquality papers for presentation at ICIC 2009, of which 214 papers have been included in two volumes of proceedings published by Springer: one volume of Lecture Notes in Computer Science (LNCS) and one volume of Lecture Notes in Artificial Intelligence (LNAI). The other 22 papers will be included in two international journals. This volume of Lecture Notes in Computer Science (LNCS) includes 106 papers. The organizers of ICIC 2009, including the University of Ulsan, Korea, Institute of Intelligent Machines of Chinese Academy of Sciences, made an enormous effort to ensure the success of ICIC 2009. We hereby would like to thank the members of the Program Committee and the referees for their collective effort in reviewing and soliciting the papers. We would like to thank Alfred Hofmann, executive editor at Springer, for his frank and helpful advice and guidance throughout and for his support in publishing the proceedings. In particular, we would like to thank all the authors for contributing their papers. Without the high-quality submissions from the authors, the success of the conference would not have been possible. Finally, we are especially grateful to the IEEE Computational Intelligence Society, the International Neural Network Society and the National Science Foundation of China for their sponsorship. July 2009
De-Shuang Huang Kang-Hyun Jo Hong-Hee Lee Hee-Jun Kang Vitoantonio Bevilacqua
Organization
General Co-chairs
Program Committee Co-chairs Organizing Committee Co-chairs
Award Committee Chair Publication Chair Special Session Chair
Tutorial Chair International Liaison Chair Publicity Co-chairs
Exhibition Co-chairs
De-Shuang Huang, China Honghee Lee, Korea Frank L Lewis, USA Kanghyun Jo, Korea Vitoantonio Bevilacqua, Italy Kang-Hyun Jo, Korea In-Soo Koo, Korea Youngsoo Suh, Korea Naoyuki Tsuruta, Japan Chun-Hou Zheng, China Daniel S. Levine, USA Heejun Kang, Korea Prashan Premaratne, Australia Tokuro Matsuo, Japan Vasily Gubarev, Russia Laurent Heutte, France Frank Neumann, Germany Kyungsook Han, Korea Vladimir Filaretov, Russia Zhongming Zhao, USA Maolin Tang, Australia Muhammad Khurram Khan, Saudi Arabia Valeriya Gribova, Russia Young-Soo Suh, Korea In-Soo Koo, Korea Jin Hur, Korea
Organizing Committee Members Myeong-Jae Yi, Korea Myung-Kyun Kim, Korea Byeong-Ryong Lee, Korea Won-Ho Choi, Korea
Sang-Bock Cho, Korea Munho Jeong, Korea Jongeun Ha, Korea
Dong-Joong Kang, Korea Jong-Bae Lee, Korea Sang-Moo Lee, Korea
VIII
Organization
Program Committee Members Andrea Francesco Abate, Italy Shafayat Abrar, UK Peter Andras, UK Sabri Arik, Turkey Vasily Aristarkhov, Russian Federation Costin Badica, Romania Vitoantonio Bevilacqua, Italy David B. Bracewell, USA Uday K. Chakraborty, USA Shih-Hsin Chen, Taiwan, China Wen-Sheng Chen, China Xiyuan Chen, China Yang Chen, China Yuehui Chen, China Sang-Bock Cho, Korea Won-Ho Choi, Korea Michal Choras, Poland Tommy Chow, Hong Kong, China Jose Alfredo F. Costa, Brazil Angelo Ciaramella, Italy Kevin Curran, UK Mingcong Deng, Japan Eng. Salvatore Distefano, Italy Karim Faez, Iran Jianbo Fan, China Minrui Fei, China Wai-Keung Fung, Canada Liang Gao, China Qing-Wei Gao, China Xiao-Zhi Gao, Finland Chandan Giri, India Dunwei Gong, China Valeriya Gribova, Russia Kayhan Gulez, Turkey Ping Guo, China Jongeun Ha, Korea
Aili Han, China Fei Han, China Kyungsook Han, Korea Haibo He, USA Laurent Heutte, France Wei-Chiang Hong, Taiwan, China Yuexian Hou, China Peter Hung, Ireland Chuleerat Jaruskulchai, Thailand Munho Jeong, Korea Li Jia, China Zhenran Jiang, China Jih-Gau Juang, Taiwan, China Dah-Jing Jwo, Taiwan, China Dong-Joong Kang, Korea Sanggil Kang, Korea Uzay Kaymak, The Netherlands Muhammad Khurram Khan, Saudi Arabia Myung-Kyun Kim, Korea Sungshin Kim, Korea In-Soo Koo, Korea Donald H. Kraft, USA Harshit Kumar, Ireland Yoshinori Kuno, Japan Takashi Kuremoto, Japan Wen-Chung Kuo, Taiwan, China Hak-Keung Lam, UK Byeong-Ryong Lee, Korea Jong-Bae Lee, Korea Sang-Moo Lee, Korea Vincent C.S. Lee, Australia Guo-Zheng Li, China Kang Li, UK
Li Li, China Peihua Li, China Hualou Liang, USA Chunmei Liu, USA Ju Liu, China Van-Tsai Liu, Taiwan, China Marco Loog, Denmark Ahmad Lotfi, UK Jinwen Ma, China Shiwei Ma, China Vishnu Vardhan Makkapati, India Cheolhong Moon, Korea Tarik Veli Mumcu, Germany Roman Neruda, Czech Republic Frank Neumann, Germany Minh Nhut Nguyen, Singapore Ben Niu, China Sim-Heng Ong, Singapore Francesco Pappalardo, Italy Caroline Petitjean, France Prashan Premaratne, Australia Shaoqi Rao, China Seeja K.R., India Angel Sappa, Spain Aamir Shahzad, China Li Shang, China Jiatao Song, China Nuanwan Soonthornphisaj, Thailand Joao Miguel Sousa, Portugal Min Su, USA Zhan-Li Sun, Singapore Maolin Tang, Australia
Organization
Antonios Tsourdos, UK Naoyuki Tsuruta, Japan Sergio Vitulano, Italy Anhua Wan, China Chao-Xue Wang, China Hong-Qiang Wang, USA Jinlian Wang, China Ling Wang, China Xueqin Wang, China
Yong Wang, China Xuesong Wang, China Ling-Yun Wu, China Shunren Xia, China Yu Xue, China Ching-Nung Yang, Taiwan, China Jun-Heng Yeh, Taiwan, China
IX
Myeong-Jae Yi, Korea Zhi-Gang Zeng, China Jun Zhang, China Yong Zhang, China Xing-Ming Zhao, China Zhongming Zhao, USA Bo-Jin Zheng, China Fengfeng Zhou, USA Huiyu Zhou, Italy
Reviewers Li Ding, Alessandro Ibba, Al Savvaris, Antonio Celesti, Adam Ghandar, Adriao Duarte, Asit Das, Andreas Konstantinidis, Alaa Sagheer, Alan Ritz, Aldayr Araujo, Alessia D'Introno, Alessia Albanese, Alessio Ferone, Alexander Hogenboom, Jose Alfredo F. Costa, Rui Jorge Almeida, Andrey Logvinov, Soon-Min Hwang, Saleh Aly, Amar Khoukhi, Amar Balla, Amelia Badica, Asunción Mochón, Aimin Zhou, Anbumani Subramanian, Andreas Schmidt, Wen-Yuan Liao, Andrey Larionov, Angelo Ciaramella, Angelo Riccio, Anne Canuto, Wei Yu, Antonino Staiano, Anvita Bajpai, Alexander Ponomarenko, Xinping Xie, Aravindan Chandrabose, Joongjae Lee, Ardelio Galletti, Irene Artemieva, Arun D. Mahindrakar, Asaduzzaman, Asharaf S, Atsushi Shimada, Wee Keong Ng, Banu Diri, Bao Vo-Nguyen, Bo-Chao Cheng, Beilu Shao, Beilu Shao, Ibrahim Beklan Kucukdemiral, Bo-Hyeun Wang, Bijaya Ketan Panigrahi, Bin Qian, Bin Li, Shuhui Bi, Xiangrong Zhang, Bekir Karlik, Jiguang Wang, Bogdan Raducanu, Barbara Pizzileo, Ni Bu, Cheon Seong-Pyo, B.V. Babu, Alessia D'Introno, Galip Cansever, Jianting Cao, Karina Shakhgendyan, Carme Julia, Caroline Petitjean, Chia-Mei Chen, Guisheng Chen, Gang Chen, Kuei-Hsiang Chao, Tariq Chattha, Chungho Cho, Jianhua Che, bo chen, Chun Chen, Chengjian Wei, Yuhu Cheng, chen hui, Chenkun Qi, Yang Chen, Chen Asia, Chee-Meng Chew, Ching-Hung Lee, Chuang Ma, Cuco Cuistana, C.-H. Yang, Alessandro Cincotti, Chenn-Jung Huang, Ching-kun Chen, Chunlin Chen, Jimmy Lin, Chi-Min Li, Quang Nguyen, Carmelo Ragusa, Wenjie Li, Min-Chih Chen, Ching-Ti Liu, Chingti Liu, Chi Zhou, Chin-Chun Chang, Chang Wook Ahn, Joo Seop Yun, Chieh-yao Chang, Changyin Sun, dong yang, Louis Wu, Yu-Chen Lin, Ping-Min Hsu, Danfeng Zhu, Vincenzo Daniele Cunsolo, Peng Zhang, David Bracewell, Dario Bruneo, Dajun Du, David Geronimo, Liya Ding, Dmitry Serkin, Jiayin Zhou, Dongsheng Che, Yan Dong, Yongsheng Dong, Denis Orel, Jun Qin, WeiWu Wang, Woosung Yang, Ben Niu, derchian tsaih, Dunwei Gong, Wenyong Dong, Lipo Wang, Hong Fu, Tolga Ensari, Shaoli Wang, Eylem Yucel, Erkan Zergeroglu, Filippo Castiglione, Li-Jen Kao, Chonglun Fang, Ingo Feldmann, Fei Ge, Fengfeng Zhou, LingFeng Liu, Frederik Hogenboom, Chien-Yuan Lai, Wei-Chiang Hong, Francesco Longo, Francesco Napolitano, Francesco Camastra, Nuanwan Soonthornphisaj, Fu-Shiung Hsieh, Shaojing Fan, Francesco Tusa, Fu Yonggui, Lina Lu, Yen Ming Chiu, Zhaohui Gan, Xiao-Zhi Gao, Dingfei Ge, Gerrit K. Janssens, Gwang-Hyun Kim, Ginny Wong, Giuseppe Agrillo, Yaroslava Katueva, Giuseppe Mangioni, Fahad Muhaya, Guang-Ming Wu, Xiujun Gong, Gouhei Tanaka, Muhammad Khurram
X
Organization
Khan, Ge Lei, Zhongsheng Wang, Guo Weidong, Jie Gui, Guilherme Barreto, Tiantai Guo, Gurumurthy Swaminathan, Guangwei Zhang, Gwo-Ruey Yu, Moussa Haddad, Haibing Gao, H.K. Lam, Hanif Ullah, Hanlin He, Haini Qu, Chiung-Hua Huang, Houshang Darabi, Tomohiro Henmi, Herbert Iu, Tiefang He, Han-min Chien, Honorius Galmeanu, Hassan Taheri, Huang Ping, Wei Huang, Weitong Huang, Huifang Li, Huiyu Zhou, Junhao Hu, Hameed Ullah Khan, Rong-xiang Hu, Shahid Hussain, Bo Chen, Jaehyung Park, Hsiang-Yi Lee, Hoang-Yang Lu, Hyun-Sik Kim, Zhongkun He, Ibrahim Aliskan, Irene Artemieva, Indrajit Banerjee, Ing-Chyuan Wu, Ikhyeon Jang, Jianli Li, Seong-Joe Lim, Francesco Iorio, yaou zhao, Jin Zhou, Insoo Koo, Jian Xun Peng, John Economou, Jackson Souza, Jose Alvarez, James Cai, James Walton, James Yeh, Hasan Jamil, Janset Dasdemir, Jawid Azizi, Jayasudha John Suseela, Jianbo Fan, Jiande Sun, Jih-Gau Juang, Javad Haddadnia, Hongjun Jia, Jiajun Yan, Peilin Jiang, Changan Jiang, Jiang jl, Kai Jiang, Lihua Jiang, Wei Jia, Jindong Liu, Guang Jin, Jinsoo Kim, Jungkyu Rho, Josep M. Mirats Tur, Jun Liu, John Klein, Jong Min Lee, Ji-Hun Bae, Joydeb Mukherjee, Jianping Qiao, Jinn-Shing Cheng, Joaquin Torres-Sospedra, Joaquin Torres-Sospedra, Jyh-Ching Juang, Juan Jose Gonzalez de la Rosa, Junaid Ahmed, Jun Du, Junlin Chang, Kang Li, Kanghee Kim, Wei Jing, Kaushik Roy, Iroshi Awasaki, Tsung-Yi Chen, Ke Tang, Hyun-Deok Kang, Alexander Kleschev, Kunikazu Kobayashi, Krishna Chandramouli, Krishnanand Kaipa Narasimha, Seeja K.R.H. K, Lance C. Fung, Laks Raghupathi, Lalit Gupta, Chin-Feng Lin, Le Dong, Sungon Lee, Hong-Bo Lei, Jie Lei, Yingke Lei, Kok-Leong Ong, Lin Gao, Sun Cheol Bae, Laurent Heutte, Hualiang Li, Lijuan Xiao, Lin Li, Guohui Zhang, Lin Wang, Yuxi Liu, Bo Liu, Huiran Liu, Lei Liu, Wenyun Li, Xinyu Li, Ling-po Li, Linlin Shen, Leh Luoh, Lingling Wang, Peixing Li, Milan Lovric, Li Qingfeng, Liqing Zhang, Tian-Yu Liu, Liangxu Liu, Yixiang Lu, Marco Cortellino, Maciej Hrebien, Yasushi Mae, Sakashi Maeda, Sakashi Maeda, Margaret Knyaseva, Margarita Knyazeva, Manish Srivastava, Maqsood Mahmud, M. Loog, JeongHyun Kim, Mario Marinelli, Mario Marinelli, Markus Koskela, Kazuyuki Matsumoto, Maqsood Mahmud, Max Power, Maysam Abbod, Zhongqiang Wu, Mark Halling-Brown, Aizhong Mi, Mika Sulkava, Min Jiang, Min Wu, Mine Tsunenori, hai min, Meiling Hou, Hamid Abrishami Moghaddam, Mohammad Narimani, Monalisa Mazumdar, Lucia Moreno, Santo Motta, Marzio Pennisi, MinhTri Pham, Mutsumi Watanabe, Mingyu You, Naeem Ramzan, Naiara Aginako, Nestor Arana, Beijing Chen, Nelson Mascarenhas, Seref Naci Engin, Neyir Ozcan, Mingxiao Li, Li Nie, Xiushan Nie, Nataliya Nikiforova, Nataliya Nikifirova, Nitthinun Suphasetthawit, Nikolay Mikhaylov, Qun Niu, Nhan Nguyen-Thanh, Evgeni Nurminski, Bunyarit Uyyanonvara, Masaru Okumura, Olesya Kazakova, Won-Kyu Kim, Kazunori Onoguchi, Ajiboye Osunleke, Ertan Ouml Znergiz, Ping Zhang, Pallavi Vajinepalli, Pandu Devarakota, Yehu Shen, Chen Peng, Alessandro Perfetto, Hyun-Ju Park, Ping Wang, Peilin Jia, Litt Teen Hiew, Elvira Popescu, Roy Power, Roy Power, Pradip Ghanty, Pramod NC, Pramuditha Suraweera, Prashan Premaratne, Prashan Premaratne, Qi Yu, Qiao Wang, Qi Liu, Qingwei Gao, Quande Qin, Jinpeng Qi, Peng Qiu, Quanke Pan, Thanh Tho Quan, Quang Nguyen, Hai-Tao Zheng, Qi Wang, Ruhul Sarker, Rafal Kozik, Raffaele Montella, M. Rafiq Swash, M.K.M. Rahman, Randeep Singh, Peng Ren, Xianwen Ren, Romina Oliva, Rong Jin, Rosa de Duonni, Lijun Xu, Nidhi Arora, Ryuzo Okada, Shaomin Zhang, Chin-yuan Fan, Saad Bedros, Xin Hao, Sarif Naik, Mihnea Scafes, Sheng Chen, Chen Shao, Jong
Organization
XI
Hyun Park, Sanggil Kang, Changho Yun, Shafayat Abrar, Elena Shalfeeva, Li Shang, Shao jj, Xiaojian shao, Sherif Sherif, Chuan Shi, Shaohui Liu, Shripad Kondra, S. Jamal H Zaidi, Shi-Jay Chen, Jiping SHI, Seokjoo Shin, Shiuh-Jeng Wang, Sawomir Lasota, Zhijun Tan, Mingguang Shi, Vitaliy Snytyuk, Xiaojing Song, Shengping Zhang, Sriparna Saha, Sibel Senan, Seokjin Sung, Eung Nam Ko, Sungshin Kim, S Kim, Xueqiang Zeng, Lei Zhang, Steve Ling, Steven Guan, Shih-Ting Yang, Zhang Li, Cheng Sun, Jie Sun, Tingxu Yan, You Ouyang, Supriya Rao, Susana Vieira, Suwon Lee, Yang Shi, Syed Ismail Shah, Peixing Li, Tiong Goh, Shin-ya Takahashi, Shinya Takahashi, Toshihisa Tanaka, Atsushi Yamashita, Weidong Xu, Zhi Teng, Zhu Teng, Thomas Tawiah, Thuc Kieu Xuan, Timo Honkela, Toshiaki Kondo, Tsang-Long Pao, ThanhVu Nguyen, Thomas O'Daniel, Tomasz Andrysiak, Tomasz Rutkowski, Toni Zgaljic, Gyung-Jin Hong, Tomoaki Tsuruoka, Naoyuki Tsuruta, Mengru Tu, U. Kaymak, Uttam Roy, Youngbae Hwang, Mario Rossi, Vanta Dimitrova, Vasily Aristarkhov, Venugopal Chakravarthy, Vinod Pathangay, Bae-guen Kwon, Vito Santarcangelo, Victor Jin, Vladimir Brusic, Wan-Jui Lee, Chih-Hung Wang, Chao Wang, Furong Wang, Wang Haili, Ling Wang, Xiaojuan Wang, Yongcui Wang, Zhengyou Wang, Wen-Chung Chang, Woochang Shin, Wuchuan Yang, Wudai Liao, Wei-Chih Yang, Weidong Li, Weifeng Li, Wenkai Li, Wen Shengjun, Yu-Chen Lin, Wangheon Lee, Wing-Kuen Ling, Shanwen Zhang, Wai-keung Fung, Worasait Suwannik, Takashi Kuremoto, Chao Wu, Yu Wu, Zikai Wu, Jun Zhang, Wei Xiong, Xin Zou, Xiaochun Cao, Chungui Xu, XiaoFeng Wang, Junfeng Xia, Xian-xia Zhang, Xiaomin Liu, Xianjun Shen, Xuemei Ren, De Xu, Bing Xue, Yu Xue, Huan Xu, Lu Xu, Ye Xu, Yun Xu, Xiaolei Xia, Xiaoyan Sun, Xiaoying Wang, Yang Song, Yago Saez, Yan Li, Banghua Yang, Yan Yang, Zhixia Yang, Yanmin Liu, Akira Yanou, Yasuhiro Taniguchi, Yuan-Chang Chang, Yu-Chiun Chiou, Ye Bin, Yeonsik Kang, Y.F. Xu, Yifeng Zhang, Zhao Yinggang, Yinglei Song, Lei Yang, Yangmin Li, Mi-ran Yun, Yoshinori Kobayashi, Yu-Qing Qiu, Yoon-Seok Nam, Yuanling Hao, Ming Yu, Yong Wang, Yue Wang, Yen-Wen Wang, Zhigang Wang, Zanchao Zhang, Zhenbing Zeng, Guowei Zhang, Hehua Zhang, Jun Zhang, Liang Zhao, Zhaohui Sun, Chunhou Zheng, Min Zheng, Zhigang Yan, Zhijun Yang, Lin Zhu, Zhong Jin, Zujun Hou, Dao Zhou, Sulan Zhang, Xiangbin Zhu, Shuanghe Zhu, Xuefen Zhu, Yihai Zhu, Zhang Liangsheng, Liu Zhiping, Guoyu Zuo, Zhongming Zhao.
Table of Contents
Neural Networks An Ensemble of Neural Networks for Stock Trading Decision Making . . . Pei-Chann Chang, Chen-Hao Liu, Chin-Yuan Fan, Jun-Lin Lin, and Chih-Ming Lai A SOM Based Stereo Pair Matching Algorithm for 3-D Particle Tracking Velocimetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kazuo Ohmi, Basanta Joshi, and Sanjeeb Prasad Panday Spiking Neural Network Performs Discrete Cosine Transform for Visual Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qingxiang Wu, T.M. McGinnity, Liam Maguire, Arfan Ghani, and Joan Condell Spam Detection Based on a Hierarchical Self-Organizing Map . . . . . . . . . Esteban Jos´e Palomo, Enrique Dom´ınguez, Rafael Marcos Luque, and Jos´e Mu˜ noz
1
11
21
30
The Analysis of the Energy Function of Chaotic Neural Network with White Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yaoqun Xu and Feng Qin
38
The Classification of a Simulation Data of a Servo System via Evolutionary Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . Asil Alkaya and G. Mira¸c Bayhan
48
Evolutionary Learning and Genetic Algorithms A New Source and Receiver Localization Method with Erroneous Receiver Positions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yingke Lei and Junfeng Xia Interactive Genetic Algorithms with Variational Population Size . . . . . . . Jie Ren, Dun-wei Gong, Xiao-yan Sun, Jie Yuan, and Ming Li
55 64
A Unified Direct Approach to Image Registration and Object Recognition with a Hybrid Evolutionary Algorithm . . . . . . . . . . . . . . . . . . . Igor V. Maslov and Izidor Gertner
74
Two Step Template Matching Method with Correlation Coefficient and Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gyeongdong Baek and Sungshin Kim
85
XIV
Table of Contents
Granular Computing and Rough Sets A Framework on Rough Set-Based Partitioning Attribute Selection . . . . . Tutut Herawan and Mustafa Mat Deris
91
On Multi-soft Sets Construction in Information Systems . . . . . . . . . . . . . . Tutut Herawan and Mustafa Mat Deris
101
Fuzzy Theory and Models Knowledge Representation and Consistency Checking in a Norm-Parameterized Fuzzy Description Logic . . . . . . . . . . . . . . . . . . . . . . . . Jidi Zhao, Harold Boley, and Weichang Du
111
Using Intelligent System for Reservoir Properties Estimation . . . . . . . . . . Fariba Salehi and Arnoosh Salehi
124
Fuzzy Support Vector Classification Based on Fuzzy Optimization . . . . . Zhimin Yang, Xiao Yang, and Bingquan Zhang
134
Fuzzy Systems and Soft Computing An FIS for Early Detection of Defect Prone Modules . . . . . . . . . . . . . . . . . Zeeshan Ali Rana, Mian Muhammad Awais, and Shafay Shamail
144
Variable Precision Concepts and Its Applications for Query Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fei Hao and Shengtong Zhong
154
The Application of Intuitionistic Fuzzy Theory in Radar Target Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hong Wang and Jie Wang
166
On the Robustness of Type-1 and Type-2 Fuzzy Tests vs. ANOVA Tests on Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juan C. Figueroa Garc´ıa, Dusko Kalenatic, and Cesar Amilcar Lopez Bello Combining Global Model and Local Adaptive Neuro-Fuzzy Network . . . . Yun-Hee Han and Keun-Chang Kwak On Some Properties of Generalized Symbolic Modifiers and Their Role in Symbolic Approximate Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Saoussen Bel Hadj Kacem, Amel Borgi, and Moncef Tagina
174
184
190
Table of Contents
XV
Swarm Intelligence and Optimization A New Differential Evolution Algorithm with Random Mutation . . . . . . . Yuelin Gao and Junmei Liu An Improved Harmony Search Algorithm for the Location of Critical Slip Surfaces in Slope Stability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . Liang Li, Guang-Ming Yu, Shi-Bao Lu, Guo-Yan Wang, and Xue-Song Chu
209
215
An Improved PSO Algorithm Encoding a priori Information for Nonlinear Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tong-Yue Gu, Shi-Guang Ju, and Fei Han
223
Multi-objective Oriented Search Algorithm for Multi-objective Reactive Power Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xuexia Zhang and Weirong Chen
232
Combined Discrete Particle Swarm Optimization and Simulated Annealing for Grid Computing Scheduling Problem . . . . . . . . . . . . . . . . . . Ruey-Maw Chen, Der-Fang Shiau, and Shih-Tang Lo
242
Supervised and Semi-supervised Learning Profile Based Algorithm to Topic Spotting in Reuter21578 . . . . . . . . . . . . Taeho Jo Training Neural Networks for Protein Secondary Structure Prediction: The Effects of Imbalanced Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Viviane Palodeto, Hern´ an Terenzi, and Jefferson Luiz Brum Marques Rough Set Theory in Pavement Maintenance Decision . . . . . . . . . . . . . . . . Ching-Tsung Hung, Jia-Ruey Chang, Jyh-Dong Lin, and Gwo-Hshiung Tzeng Using Control Theory for Analysis of Reinforcement Learning and Optimal Policy Properties in Grid-World Problems . . . . . . . . . . . . . . . . . . . S. Mostapha Kalami Heris, Mohammad-Bagher Naghibi Sistani, and Naser Pariz
252
258
266
276
Kernel Methods and Supporting Vector Machines Adaptive Chaotic Cultural Algorithm for Hyperparameters Selection of Support Vector Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jian Cheng, Jiansheng Qian, and Yi-nan Guo
286
XVI
Table of Contents
Application of a Case Base Reasoning Based Support Vector Machine for Financial Time Series Data Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . Pei-Chann Chang, Chi-Yang Tsai, Chiung-Hua Huang, and Chin-Yuan Fan Cost-Sensitive Supported Vector Learning to Rank Imbalanced Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiao Chang, Qinghua Zheng, and Peng Lin A Biologically Plausible Winner-Takes-All Architecture . . . . . . . . . . . . . . . Sebastian Handrich, Andreas Herzog, Andreas Wolf, and Christoph S. Herrmann
294
305 315
Combinatorial and Numerical Optimization Minimum Sum-of-Squares Clustering by DC Programming and DCA . . . Le Thi Hoai An and Pham Dinh Tao
327
An Effective Hybrid Algorithm Based on Simplex Search and Differential Evolution for Global Optimization . . . . . . . . . . . . . . . . . . . . . . . Ye Xu, Ling Wang, and Lingpo Li
341
Differential Evolution with Level Comparison for Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ling-po Li, Ling Wang, and Ye Xu
351
Tactical Aircraft Pop-Up Attack Planning Using Collaborative Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nan Wang, Lin Wang, Yanlong Bu, Guozhong Zhang, and Lincheng Shen Stereo Vision Based Motion Parameter Estimation . . . . . . . . . . . . . . . . . . . Xinkai Chen Binary Sequences with Good Aperiodic Autocorrelations Using Cross-Entropy Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shaowei Wang, Jian Wang, Xiaoyong Ji, and Yuhao Wang
361
371
381
Systems Biology and Computational Biology Agent Based Modeling of Atherosclerosis: A Concrete Help in Personalized Treatments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Francesco Pappalardo, Alessandro Cincotti, Alfredo Motta, and Marzio Pennisi MotifMiner: A Table Driven Greedy Algorithm for DNA Motif Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K.R. Seeja, M.A. Alam, and S.K. Jain
386
397
Table of Contents
XVII
Neural Computing and Optimization A New Method of Morphological Associative Memories . . . . . . . . . . . . . . . Naiqin Feng, Xizheng Cao, Sujuan Li, Lianhui Ao, and Shuangxi Wang A New Method of Color Map Segmentation Based on the Self-organizing Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhenqing Xue and Chunpu Jia
407
417
Knowledge Discovery and Data Mining A Quantum Particle Swarm Optimization Used for Spatial Clustering with Obstacles Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xueping Zhang, Jiayao Wang, Haohua Du, Tengfei Yang, and Yawei Liu Fuzzy Failure Analysis of Automotive Warranty Claims Using Age and Mileage Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SangHyun Lee and KyungIl Moon A Fuzzy-GA Wrapper-Based Constructive Induction Model . . . . . . . . . . . Zohreh HajAbedi and Mohammad Reza Kangavari Warning List Identification Based on Reliability Knowledge in Warranty Claims Information System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SangHyun Lee, ByungSu Jung, and KyungIl Moon Cluster Analysis and Fuzzy Query in Ship Maintenance and Design . . . . Jianhua Che, Qinming He, Yinggang Zhao, Feng Qian, and Qi Chen
424
434 440
450 458
A Semantic Lexicon-Based Approach for Sense Disambiguation and Its WWW Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vincenzo Di Lecce, Marco Calabrese, and Domenico Soldo
468
The Establishment of Verb Logic and Its Application in Universal Emergency Response Information System Design . . . . . . . . . . . . . . . . . . . . . Jian Tan and XiangTao Fan
478
Artificial Life and Artificial Immune Systems An AIS-Based E-mail Classification Method . . . . . . . . . . . . . . . . . . . . . . . . . Jinjian Qing, Ruilong Mao, Rongfang Bie, and Xiao-Zhi Gao A New Intrusion Detection Method Based on Antibody Concentration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jie Zeng, Tao Li, Guiyang Li, and Haibo Li
492
500
XVIII
Table of Contents
Ensemble Methods Research of the Method of Local Topography Rapid Reconstructed . . . . . Minrong Zhao, Shengli Deng, and Ze Shi A Novel User Created Message Application Service Design for Bidirectional TPEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sang-Hee Lee and Kang-Hyun Jo An Empirical Study of the Convergence of RegionBoost . . . . . . . . . . . . . . . Xinzhu Yang, Bo Yuan, and Wenhuang Liu Towards a Better Understanding of Random Forests through the Study of Strength and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simon Bernard, Laurent Heutte, and S´ebastien Adam
510
517 527
536
Machine Learning Theory and Methods Learning Hereditary and Reductive Prolog Programs from Entailment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shahid Hussain and M.R.K. Krishna Rao
546
A Novel Local Sensitive Frontier Analysis for Feature Extraction . . . . . . . Chao Wang, De-Shuang Huang, and Bo Li
556
Locality Preserving Discriminant Projections . . . . . . . . . . . . . . . . . . . . . . . . Jie Gui, Chao Wang, and Ling Zhu
566
Retracted: Using Bayesian Network and AIS to Perform Feature Subset Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 Boyun Zhang Construction of the Ensemble of Logical Models in Cluster Analysis . . . . Vladimir Berikov
581
Ordinal Regression with Sparse Bayesian . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiao Chang, Qinghua Zheng, and Peng Lin
591
GLRT Based Fault Detection in Sensor Drift Monitoring System . . . . . . . In-Yong Seo, Ho-Cheol Shin, Moon-Ghu Park, and Seong-Jun Kim
600
A Support System for Making Archive of Bi-directional Remote Lecture – Photometric Calibration – . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Naoyuki Tsuruta, Mari Matsumura, and Sakashi Maeda Conflict-Free Incremental Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rong-Lei Sun
610 618
Table of Contents
XIX
Biological and Quantum Computing Supervised Isomap for Plant Leaf Image Classification . . . . . . . . . . . . . . . . Minggang Du, Shanwen Zhang, and Hong Wang
627
Integration of Genomic and Proteomic Data to Predict Synthetic Genetic Interactions Using Semi-supervised Learning . . . . . . . . . . . . . . . . . Zhuhong You, Shanwen Zhang, and Liping Li
635
A Method of Plant Leaf Recognition Based on Locally Linear Embedding and Moving Center Hypersphere Classifier . . . . . . . . . . . . . . . . Jing Liu, Shanwen Zhang, and Jiandu Liu
645
Intelligent Computing in Pattern Recognition Survey of Gait Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ling-Feng Liu, Wei Jia, and Yi-Hai Zhu
652
Fingerprint Enhancement and Reconstruction . . . . . . . . . . . . . . . . . . . . . . . Rabia Malik and Asif Masood
660
A Level Set Based Segmentation Method for Images with Intensity Inhomogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiao-Feng Wang and Hai Min Analysis of Enterprise Workflow Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . Cui-e Chen, Shulin Wang, Ying Chen, Yang Meng, and Hua Ma
670 680
On Dynamic Spectrum Sharing Systems Cooperative Spectrum Sensing Using Enhanced Dempster-Shafer Theory of Evidence in Cognitive Radio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nhan Nguyen-Thanh, Kieu Xuan Thuc, and Koo Insoo A Secure Distributed Spectrum Sensing Scheme in Cognitive Radio . . . . Nguyen-Thanh Nhan and Insoo Koo
688 698
An Optimal Data Fusion Rule in Cluster-Based Cooperative Spectrum Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hiep-Vu Van and Insoo Koo
708
Exact Bit Error Probability of Multi-hop Decode-and-Forward Relaying with Selection Combining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bao Quoc Vo-Nguyen and Hyung Yun Kong
718
A Cooperative Transmission Scheme for Cluster Based Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Asaduzzaman and Hyung Yun Kong
728
XX
Table of Contents
A Packet Scheduling Algorithm for IEEE 802.22 WRAN Systems and Calculation Reduction Method Thereof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Young-du Lee, Tae-joon Yun, and Insoo Koo
738
On New Particle Swarm Optimization and Its Applications Study on Multi-Depots Vehicle Scheduling Problem and Its Two-Phase Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Suxin Wang, Leizhen Wang, Huilin Yuan, Meng Ge, Ben Niu, Weihong Pang, and Yuchuan Liu Image Segmentation to HSI Model Based on Improved Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bo Zhao, Yajun Chen, Wenhua Mao, and Xiaochao Zhang
748
757
Emotional Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wei Wang, Zhiliang Wang, Xuejing Gu, and Siyi Zheng
766
Symbiotic Multi-swarm PSO for Portfolio Optimization . . . . . . . . . . . . . . . Ben Niu, Bing Xue, Li Li, and Yujuan Chai
776
A Novel Particle Swarm Optimization with Non-linear Inertia Weight Based on Tangent Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Li Li, Bing Xue, Ben Niu, Lijing Tan, and Jixian Wang Particle Swarm Optimizer Based on Dynamic Neighborhood Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yanmin Liu, Qingzhen Zhao, Zengzhen Shao, Zhaoxia Shang, and Changling Sui An Improved Two-Stage Camera Calibration Method Based on Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hongwei Gao, Ben Niu, Yang Yu, and Liang Chen
785
794
804
On Intelligent Signal Processing for Interactive Brain-Machine-Interfacing EMD Based Power Spectral Pattern Analysis for Quasi-Brain-Death EEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qi-Wei Shi, Ju-Hong Yang, Jian-Ting Cao, Toshihisa Tanaka, Tomasz M. Rutkowski, Ru-Bin Wang, and Hui-Li Zhu Proposal of Ride Comfort Evaluation Method Using the EEG . . . . . . . . . Hironobu Fukai, Yohei Tomita, Yasue Mitsukura, Hirokazu Watai, Katsumi Tashiro, and Kazutomo Murakami
814
824
Table of Contents
XXI
On Advances in Intelligent Information Processing Image Reconstruction Using NMF with Sparse Constraints Based on Kurtosis Measurement Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Li Shang, Jinfeng Zhang, Wenjun Huai, Jie Chen, and Jixiang Du A Cyanobacteria Remote Monitoring System . . . . . . . . . . . . . . . . . . . . . . . . Zhiqiang Zhao and Yiming Wang
834 841
Study on Fault Diagnosis of Rolling Mill Main Transmission System Based on EMD-AR Model and Correlation Dimension . . . . . . . . . . . . . . . . Guiping Dai and Manhua Wu
849
Analysis of Mixed Inflammable Gases Based on Single Sensor and RBF Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yu Zhang, Meixing Qi, and Caidong Gu
858
Image Segmentation of Level Set Based on Maximization of Between-Class Variance and Distance Constraint Function . . . . . . . . . . . . Changxiong Zhou, Zhifeng Hu, Shufen Liu, Ming Cui, and Rongqing Xu Active MMW Focal Plane Imaging System . . . . . . . . . . . . . . . . . . . . . . . . . . Pingang Su, Zongxin Wang, and Zhengyu Xu Application of RBF Network Based on Immune Algorithm in Human Speaker Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yan Zhou and Xinming Yu Adaptive Immune Response Network Model . . . . . . . . . . . . . . . . . . . . . . . . . Tao Liu, Li Zhang, and Binbin Shi
865
875
882 890
Researches on Robust Fault-Tolerant Control for Actuator Failures in Time-Varying Delay System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dong Li
899
Design of a Single-Phase Grid-Connected Photovoltaic Systems Based on Fuzzy-PID Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fengwen Cao and Yiwang Wang
912
Ontology-Based Decision Support for Security Management in Heterogeneous Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michal Chora´s, Rafal Kozik, Adam Flizikowski, Rafal Renk, and Witold Holubowicz A Constrained Approximation Algorithm by Encoding Second-Order Derivative Information into Feedforward Neural Networks . . . . . . . . . . . . . Qing-Hua Ling and Fei Han
920
928
XXII
Table of Contents
Weighted Small World Complex Networks: Smart Sliding Mode Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuequan Yang and Xinghuo Yu
935
On Computational Intelligence in Bioinformatics and Systems Biology A Novel Method to Robust Tumor Classification Based on MACE Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shulin Wang and Yihai Zhu
945
Ensemble Classifiers Based on Kernel PCA for Cancer Data Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jin Zhou, Yuqi Pan, Yuehui Chen, and Yang Liu
955
A Method for Multiple Sequence Alignment Based on Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fasheng Xu and Yuehui Chen
965
Inference of Differential Equation Models by Multi Expression Programming for Gene Regulatory Networks . . . . . . . . . . . . . . . . . . . . . . . . Bin Yang, Yuehui Chen, and Qingfang Meng
974
Function Sequence Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . Shixian Wang, Yuehui Chen, and Peng Wu
984
On Network-Based Intelligent Technologies Speech Emotion Recognition Research Based on Wavelet Neural Network for Robot Pet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yongming Huang, Guobao Zhang, and Xiaoli Xu
993
Device Integration Approach to OPC UA-Based Process Automation Systems with FDT/DTM and EDDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1001 Vu Van Tan, Dae-Seung Yoo, and Myeong-Jae Yi A SOA-Based Framework for Building Monitoring and Control Software Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1013 Vu Van Tan, Dae-Seung Yoo, and Myeong-Jae Yi Data Fusion Algorithm Based on Event-Driven and Minimum Delay Aggregation Path in Wireless Sensor Network . . . . . . . . . . . . . . . . . . . . . . . 1028 Tianwei Xu, Lingyun Yuan, and Ben Niu Handling Multi-channel Hidden Terminals Using a Single Interface in Cognitive Radio Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1039 Liang Shan and Myung Kyun Kim
Table of Contents
XXIII
Network Construction Using IEC 61400-25 Protocol in Wind Power Plants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1049 Tae O Kim, Jung Woo Kim, and Hong Hee Lee Stability and Stabilization of Nonuniform Sampling Systems Using a Matrix Bound of a Matrix Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1059 Young Soo Suh Implementation of Induction Motor Control System Using Matrix Converter Based on CAN Network and Dual-Port RAM . . . . . . . . . . . . . . 1067 Hong-Hee Lee and Hoang M. Nguyen
On 2D versus 3D Intelligent Biometric and Face Recognition Techniques Fuzzy Data Fusion for Updating Information in Modeling Drivers’ Choice Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075 Mauro Dell’Orco and Mario Marinelli Multi-view Ear Recognition Based on Moving Least Square Pose Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1085 Heng Liu, David Zhang, and Zhiyuan Zhang Experimental Comparison among 3D Innovative Face Recognition Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096 Vitoantonio Bevilacqua, Giuseppe Mastronardi, Raffaele Piarulli, Vito Santarcangelo, Rocco Scaramuzzi, and Pasquale Zaccaglino Retinal Vessel Extraction by a Combined Neural Network–Wavelet Enhancement Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106 Leonarda Carnimeo, Vitoantonio Bevilacqua, Lucia Cariello, and Giuseppe Mastronardi
Erratum Using Bayesian Network and AIS to Perform Feature Subset Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Boyun Zhang
E1
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1117
An Ensemble of Neural Networks for Stock Trading Decision Making Pei-Chann Chang1,∗, Chen-Hao Liu2, Chin-Yuan Fan3, Jun-Lin Lin1, and Chih-Ming Lai1 1 Department of Information Management, Yuan Ze University, Taoyuan 32026, Taiwan
[email protected] 2 Department of Information Management, Kainan University, Taoyuan 32026, Taiwan 3 Department of Business Innovation and Development, Ming Dao University, Changhua 52345, Taiwan
Abstract. Stock turning signals detection are very interesting subject arising in numerous financial and economic planning problems. In this paper, Ensemble Neural Network system with Intelligent Piecewise Linear Representation for stock turning points detection is presented. The Intelligent piecewise linear representation method is able to generate numerous stocks turning signals from the historic data base, then Ensemble Neural Network system will be applied to train the pattern and retrieve similar stock price patterns from historic data for training. These turning signals represent short-term and long-term trading signals for selling or buying stocks from the market which are applied to forecast the future turning points from the set of test data. Experimental results demonstrate that the hybrid system can make a significant and constant amount of profit when compared with other approaches using stock data available in the market. Keywords: Stock turning signals; Ensemble neural network; PLR method; Financial time series data.
1 Introduction Stock turning signal is a local peak or valley during a stock price variation. The turning signals are also represented as a short-term or long term investment decision. However, these turning signals are very difficult to be detected or even observed since the price variation is subject to the problems of high dimensionality and non-stationary. Although some trading rules are clear, most of them are vague and fuzzy. Therefore, an investor cannot be the winner all the time with the same set of trading rules. In literature, researchers have spent a lot of efforts in forecasting the price variation and have tried to come up with numerous sophisticated techniques in predicting the stock price or the price movement. Technical analysis is the most widely used method in this area. However, the results are still quite limited . ∗
Corresponding author.
D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 1–10, 2009. © Springer-Verlag Berlin Heidelberg 2009
2
P.-C. Chang et al.
The major contribution of this research is to take a different approach by evolving a piecewise linear representation method with Ensemble neural networks system to predict the future turning signals of a stock price. Turning signals identified by PLR are represented using a nonlinear relationship among the stock closed price, i.e., yi and various technical indexes, i.e.,
xi and this nonlinear relationship will be studied inten-
sively through Ensemble neural network. In other words, local stock turning signals can be detected using PLR preprocessing technique. Later on, these turning signals governing the relationship among the stock closed price with various technical indexes will be inputted into the Ensemble neural network system for training. Then, the trained model can be applied as a universal predicator for predicting the future turning points in the set of test data. Finally, an investor then can apply these future signals as trading signals and make profits no matter short-term or long term trading decisions. The rest of the paper is divided into five sections. Section 2 reviews the literature in the areas of stock forecasting. Section 3 describes the development of an evolving PLR model and an ensemble neural network for stock trading point decision. Section 4 is the experimental tests conducted to test the effectiveness of trading points generated by using the evolving PLR approach. Finally, conclusions and future directions of the research are provided.
2 Literature Survey Prediction of a financial market is rather challenging due to chaos and uncertainty of the system. In the past, many statistics and artificial intelligence tools were applied to financial time series analysis. Kovalerchuk et al.(2000) XXclassifies the current methods into three categories: numerical models (ARIMA models, Instance-based learning, neural networks, etc.), rule-based models (decision tree and DNF learning, naive Bayesian classifier, hidden Markov model etc.), and relational data mining (inductive logic programming) ,among these approaches, soft computing techniques (including fuzzy logic, neural networks, Genetic algorithms…..etc) becomes the most popular forecasting model in this area, because of their abilities to handle uncertainty and noise in a stock market. These soft computing techniques are used for quantitative inputs, like technical indices, qualitative factors, political effects…etc to automate stock market forecasting and trend analysis. Neural Network (NNS) have been applied to this area (refer to Aiken and Bsat(1994), Baba et al.(2000), Brownstone(1996), Chen(2003), Pendharkar(2001), Schapire(1990) X). However, these models have their limitations subjecting to the tremendous noise and complex dimensionality of stock price data. The quantity of data itself and the input variables also interfere with each other. Therefore, the result may not be as convincing. Ensemble strategies have been used to solve Neural Network limitations in West et al. (2005)X inX. The basic concept of the ensemble method is that diverse perspective on different aspects of a problem can be combined to produce a high quality. The most important three strategies have been advanced for forming ensembles of predictors. That includes Cross-validation, Bagging ensembles and Boosting. Cross-validation is the simplest ensemble rule which used in ensemble all methods are trained with the same data.
F
F
An Ensemble of Neural Networks for Stock Trading Decision Making
3
Nearly ,AdaBoost becomes another popular ensemble strategy that uses perturbation in an attempt to improve the performance of the learning algorithm in Schapire(1990)X. AdaBoost technique has become an attractive ensemble method in machine learning since it is low in errorrate, performing well in the low noise data set inX Henley(1996), Jensen(1992). As a successor of the boosting algorithm, it is used to combine a set of weak classifiers to form a model with higher prediction outcomes in Henley(1996) XX. As a result, several research studies have successfully applied the AdaBoost algorithm to solve classification problems in object detection, including face recognition, video sequences and signal processing systems. For example, Zhou and Wei (2006) Xutilized the AdaBoost algorithm to extract the top 20 significant features. from the XM2VT face database. Their results showed that the AdaBoost algorithm reduces 54.23 % of the computation time. Additionally, Sun et al. (2006) applied the AdaBoost algorithm to extract high-order pattern and weight of evidence rule based classifiers from the UCI Machine Learning Repository. In this work, Adaboost algorithm ensembles two different kinds of neural net works. Those two different kinds of work include traditional BPN neural networks and evolving neural networks. This techniques will combined with PLR(Piecewise Linear Representation) to forecast Stock Trading Decision. In previous research, just few researches have focused on developing trading signals for effective stock trading. A stock trading method based on dynamic Bayesian networks is applied to model the dynamics of the trend of stock prices inn Jangmin et al .(2004) and it is a three level hierarchical hidden Markov model. This research tries to find the best way to solve these problems.
3 Applying Ensemble Neural Network –PLR to Generate Historical Turning Signals This research applies a Piecewise Linear Representation technique to effectively select turning points from the historic stock price database first. Ensemble Neural Network has been used to train trading signals and finally a new set of input data is processed to generate a signal for stock trading decision in the future. Flow chart of the Ensemble neural network system is described as follow: (Please see figure 1).
Fig. 1. The flow chart of Ensemble Neural Network model
The detail procedure describe as follows 3.1 Candidate Stocks Screening A set of candidate stocks will be selected based on the following criteria: 1. Capital Size; 2. Monthly Sales; 3. EPS (Earnings per Share); 4. Transaction Volume per day
4
P.-C. Chang et al.
and 5. Marginal cost of capital (MCC). According to those indices, 7 stocks has been selected in this research. 3.2 Generating an Initial Threshold An initial threshold is generated randomly for PLR in time series data segmentation. 3.3 Use PLR to Segment the Stock data PLR is used to segment the selected stock data using the initial threshold generated in step 1. The stock data segmented will be transformed into trading signals. From the previous study in Abu-Mostafa(1996) and Baba et al (2000)X, trading points of stocks are observed from the variation of technical indexes. This study attempts to develop an intelligent trading point prediction system. Investors can make a good trading strategy by applying this intelligent model. This study takes a different approach by using PLR to decide the trough or peak of the historical data and based on these trading points.The main procedures of PLR in predicting the trading point has been shown in our previous workX0Xand pseudo code describe as follows:(show in Fig2). Procedure BuildTree (S). Input: A financial time series S. Let S be represented as x[1..n], y[1..n]. If (Max (y[1..n] == y[1] OR Max (y[1..n] == y[n]) OR Min (y[1..n] == y[1] OR Min (y[1..n] == y[n])) Create a node in the hierarchy for this segment; Draw a line between (x[1],y[1]) and (x(n), y(n)); Max d = maximum Euclidean distance of (x[i],y[i]) to the line; If (Max d < threshold ( δ )) This segment is good enough; no further work Else Let (x[j],y[j]) be the point with maximum Euclidean distance to the line. Break the segment S into S1 and S2 at the point (x[j],y[j]); PARENT (S1) = S; PARENT (S2) = S; BuildTree (S1); BuildTree (S2); End If Else Break the segment at the maximum and/or minimum point(s) into smaller ones S1,..., Sm; For i = 1 to m PARENT (Si) = PARENT (S); End For Delete (S); For i = 1 to m BuildTree (Si) End For End If
Fig. 2. The pseudo code of the PLR
3.4 Input Variables Selection Using Stepwise Regression Analysis (SRA) As shown in TABLE 1, a set of technical indices affecting the stock price movement have been identified by Chang, et al. (2004)(2008). These input factors will be further selected using Stepwise Regression Analysis (SRA) model.
F
F
An Ensemble of Neural Networks for Stock Trading Decision Making
5
Table 1. Technical Indices used as input variables T
Technical index T
T
Moving Average(MA)
Technical Index (input in our system) 5MA,6MA,10MA, 20MA
Explanation
T
T
TTT
Moving averages are used to emphasize the direction of a trend and smooth out price and volume fluctuations that can confuse interpretation. The difference between the closing value and moving average line, which uses the stock price nature of returning back to average price to analyze the stock market. RSI compares the magnitude of recent gains to recent losses in an attempt to determine overbought and oversold conditions of an asset T
T
Bias (BIAS) relative strength index (RSI) nine days Stochastic line (K, D) Moving Average Convergence and Divergence (MACD)
5BIAS, 10BIAS 6RSI,12RSI KD
T
9 MACD 12W%R
Williams %R (pronounced "percent R") Transaction Volume Differences of technical index (•)
The stochastic line K and line D are used to determine the signals of over-purchasing, over-selling, or deviation. T
Transaction Volume
MACD shows the difference between a fast and slow exponential moving average (EMA) of closing prices. Fast means a short-period average, and slow means a long period one. Williams %R is usually plotted using negative values. For the purpose of analysis and discussion, simply ignore the negative symbols. it is best to wait for the security's price to change direction before placing your trades. Transaction volume is a basic yet very important element of market timing strategy. Volume provides clues as to the intensity of a given price move.
•5MA , •6MA, •10MA, •5BIAS , •10BIAS, •6RSI, •12RSI, Differences of technical index between t day and t+1 day. •12W%R, •9K, •9D, •9 MACD
3.5 Normalized Variables Input to Ensemble Neural Network Strategy for Trading Point Prediction The set of normalized variables and trading points will be inputted into BPN for training the connection weight. Once the model is trained, it can be applied for future trading point prediction. 3.6 Ensemble Neural- Networks Training Strategy In this section, Adaboost has been used for our major ensemble strategy in this forecasting system. Adaboost is a popular boosting algorithm. In our research, we use adaboost methods to ensemble different kinds of learning Neural networks methods. These methods include BPN neural network models and Evolving Neural network models. Detail describe has been shown in our previous research Chang et al. (2004)(2006)(2008) 3.7 Ensemble Neural Network Strategy In this section, the proposed hybrid model applied adaboost model to ensemble three different neural network models to derive the highest profit from the financial series data. AdaBoost maintains a set of training samples in weight distribution for these three models. Assume W t is the training sample weight distribution and Adaboost uses Component Learn algorithm repeatedly in a series of cycles, i.e., k. In response, the Component Learn trains a classified h t using the total profit earned. The weight distribution W t is updated after each cycle according to the total profit derived from the training samples. ‘‘low- profit ’’ samples that are correctly classified h t get lower weights, and ‘‘Higher-Profit’’ samples that get higher weights. Thus, AdaBoost focuses on the samples with higher weights and this process continues for T cycles, and
6
P.-C. Chang et al.
finally, AdaBoost linearly combines all the component methods into a single final hypothesis f. Greater weights are given to component methods with lower training errors. The pseduo code of Adaboost is described as follows:
Input : D , a set of d class-labeled training tuples k , the number of rounds a classification Output : A composite model Method : (1) Initialize the weight of each tuple in D to 1/d (2) for i :=1 to k do (3) sample D with replacement according to the tuple weights to obtain DBiB ; (4) use training set DBiB to derive a model , MBiB ; (5) computer error(MBiB) , the error rate of MBiB ; (6) if error(MBiB) > 0.5 then (7) reinitialize the weights to 1/d (8) go back to step 3 and try again ; (9) endif (10) for each tuple in Di that was correctly classified do (11) multiply the weight of each by error(MBiB)/(1-error(MBiB)) ; (12) normalize the weight of each tuple ; (13) endfor
Fig. 3. The pseduo code of Adaboost
Through this process, we can keep the best models weight and earn most profit in our hybrid system. But how to generate profits using BPN & ENN After generate the sub-segments by using PLR, the trading signals need to be transformed into 0 or 1 before they are fed into ensemble models (includes BPN &ENN). By adopting PLR, the sub-segments will be divided and the trends of time series data are defined as follows:
Ci ≥ C + δ , trend = upward , If Ci ≤ C − δ , trend = downward , If Ci − δ < C < Ci + δ , trend = steady , If
(1)
Where, Ci means the stock price of the i-th turning points; and C is the stock price of the current turning points; δ means the threshold for judging the trend of stock price. And the trend will be transferred as output value of our BPN. If the trend changes from up to down, the trading signal will be changed from 0 to 0.5; If the trend changes from down to up, the trading signal will be changed from 1 to 0.5; otherwise, the signal will not be changed. However, the above definition of trading signal is not quite related to the price variation. A trading signal should be able to reflect the price variation and provide more inside information for investor to make a precise decision for stock trading. Therefore, we redefine the trading signals according to the tendency of the stock and it is shown in as follows: If a stock is on up-trend ti =
⎡ ⎢ max{C ⎣
C i − min{C i , C i + 1 , C i + 2 }
⎤ ⎥ ⋅ 0.5.
, C i + 1 , C i + 2 } − m in{C i , C i + 1 , C i + 2 } ⎦ i
(2)
F
F
An Ensemble of Neural Networks for Stock Trading Decision Making
7
If a stock is on down-trend ⎛⎡
⎤ ⎞ ⋅ 0.5 ⎟ + 0.5, ⎥ ⎝ ⎣ max{C , C , C } − min{C , C , C } ⎦ ⎠ Ci − min{Ci , Ci +1 , Ci + 2 }
ti = ⎜ ⎢
i
i +1
i+2
i
i +1
(3)
i+2
Where Ci means the stock price of the i-th transaction day. This new definition of trading signals is more proper in representing the momentum of a stock price. The trading signals in Table 2 are recalculated and they are shown in Table 3. Instead of 0 or 1, this new trading signal is more informative and it is in the range of 0 to 1 which can provide more insightful information related to the movement of the stock price (Takes Google for example). Table 2. The Trading Signals and t i of Google
Time series 110 111 112 113 114 115 116 117 118 119 120
Stock price 176.29 171.43 169.98 173.43 171.65 170.45 178.69 179.78 176.47 180.08 176.29
Down
Trading point Buy
Down Down
Buy Buy
Up
Sell
Down UP
Buy Sell
Down
Buy
Trend
Trading signal 1 1 1 1 1 0 0 1 0 0 1
ti 0.5 0.7 0.9 0.9 0.7 0.5 0 0.5 0.5 0 0.5
Fig. 4. The Forecasted Trading points by Ensemble neural network in Google.(red is buy, black is sell).
8
P.-C. Chang et al.
All t i values will be fed into ensemble model for training the network to learn the best connection weights. After the training process, the outputs of ensemble models ([0.0, 1.0]) need to be transformed into trading decision in the testing period. In this study, the average of t i in training stage will be regarded as the boundary when make a trading decision as shown in Table 2 . In the case example, 0.50 is the boundary.
4 Numerical Examples Seven different stocks are selected for performance comparisons of the system in this study. All these stocks were chosen by Dow Jones Industrial Average (DJIA). These stocks span different industries and thus offer an opportunity to test our ensemble neural network on stock data. These companies include B.A. (Boeing, Aerospace & Defense), Google, C.A.T (Caterpillar, Commercial Vehicles &Trucks), D.D (DuPont Commodity, Chemicals), G.M (General Motors Automobiles), IBM (IBM, Computer Services), MSFT (Microsoft, Software) and the overall comparison of all instances will be presented in Table 3. In this research, we take IPLR in Chang et al (2008) and Genetic Program in Mallick et al .(2008) as our comparison benchmarks. X
in
Table 3. The overall comparisons of ENNPLR, Genetic Program X0X, and IPLR X0X in Rate of Profit
Stocks B.A Google CAT D.D GM IBM MSFT
ENN-PLR 166.93% 39.74% 51.49% 12.69% 149.12% 103.08% 110.65%
GP
X
-63.79% N/A -33.96% 122.79% 102.93% 522.36% -89.92%
IPLR
XX
95.52% 25.23% 45.16% 10.12% 4.13% 59.62% 62.16%
Fig. 5. The trends of CAT in steady-state
Through this table comparison, our system shows very exciting results in B.A., Google, MSFT. GM.The profit gains in other stocks do not perform very well and there are still rooms for our proposed model to improve. The reason for this partly is
F
F
An Ensemble of Neural Networks for Stock Trading Decision Making
9
because our model applies PLR to pre-define the trading points. Therefore, if the stocks are in steady-state as shown in Figure 5, PLR will not be able to detect the trading signals properly which will confuse our system thus not making the right decisions.
5 Conclusion Considerable amount of researches have been conducted to study the behavior of a stock price movement. However, the investor is more interesting in making profit by providing simple trading decision such as Buy/Hold/Sell from the system rather than predicting the stock price itself. Therefore, we take a different approach by applying PLR to decompose the historical data into different segments. As a result, turning signals (trough or peak) of the historical stock data can be detected and then be input into the Ensemble strategy to train the connection weight of the model. Then, a new set of input data can trigger the model when a buy or sell point is detected by the Ensemble strategy. An intelligent piecewise linear representation model is further developed by integrating the Genetic Algorithm with the PLR to evolutionarily improve the threshold value of PLR to further increase the profitability of the model. The proposed system is tested on most popular stock IBM. The experimental results show that our approach can make significant and stability amount of your investment. In summary, the proposed system is very effective and encouraging in predicting the future trading points of a specific stock. However, there is one issue to be further discussed and that is the price variation of the stock. It is observed that if the price variation of the current stock to be forecasted either in a up-trend or a down-trend then it is better that we trained our model with the similar pattern, i.e., either in a similar up-trend or down-trend period in future.
References 1. Kovalerchuk, B., Vityaev, E.: Data Mining in Finance. Kluwer Academic Publisher, USA (2000) 2. Abu-Mostafa, Y.S., Atiya, A.F.: Introduction to financial forecasting. Applied Intelligence 6, 205–213 (1996) 3. Aiken, M., Bsat, M.: Forecasting Market Trends with Neural Networks. Information Systems Management 16(4), 42–48 (1994) 4. Baba, N., Inoue, N., Asakawa, H.: Utilization of Neural Networks & s for Constructing Reliable Decision Support Systems to Deal Stocks. In: IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN 2000), vol. 5, pp. 5111–5116 (2000) 5. Brownstone, D.: Using Percentage Accuracy to Measure Neural Network Predictions in Stock Market Movements. Neurocomputing 10, 237–250 (1996) 6. Chang, P.C., Liao, T.W.: Combing SOM and Fuzzy Rule Base for Flow Time Prediction in Semiconductor Manufacturing Factory. Applied Soft Computing 6(2), 198–206 (2006a) 7. Chang, P.C., Wang, Y.W.: Fuzzy Delphi and Back-Propagation Model for sales forecasting in PCB Industry. Expert Systems with Applications 30(4), 715–726 (2006b) 8. Chang, P.C., Fan, C.Y., Liu, C.H.: Integrating a Piecewise Linear Representation Method and a Neural Network Model for Stock Trading Points Prediction. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews (December 2008)
10
P.-C. Chang et al.
9. Chang, P.C., Wang, Y.W., Yang, W.N.: An Investigation of the Hybrid Forecasting Models for Stock Price Variation in Taiwan. Journal of the Chinese Institute of Industrial Engineering 21(4), 358–368 (2004) 10. Chen, A.S., Leung, M.T., Daouk, H.: Application of Neural Networks to an Emerging Financial Market: Forecasting and Trading the Taiwan Stock Index. Computers and Operations Research 30, 901–923 (2003) 11. West, D., Dellana, S., Qian, J.: Neural network ensemble strategies for financial decision applications. Computers and Operations Research 32(10), 2543–2559 (2005) 12. Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 993–1001 (1990) 13. Pendharkar, P.C.: An empirical study of design and testing of hybrid evolutionary-neural approach for classiffcation. Omega-International Journal of Management Science 29, 361–374 (2001) 14. Schapire, R.E.: The strength of weak learnability. Machine Learning 1990.5, 197–227 (19905) 15. Henley, W.E., Hand, D.J.: A k-nearest neighbor classi&er for assessing consumer credit risk. Statistician 996.44, 77–95 (1990) 16. Jensen, H.L.: Using neural networks for credit scoring. Managerial Finance 18, 15–26 (1992) 17. Zhou, M., Wei, H.: Face Verification Using GaborWavelets and AdaBoost. In: The Eighteenth International Conference on Pattern Recognition, Hong Kong, pp. 404–407 (2006) 18. Sun, Y., Wang, Y., Wong, A.K.C.: Boosting an associative classifier. IEEE Trans. Knowledge and Data Engineering 18, 988–992 (2006) 19. Jangmin, O., Lee, J.W., Park, S.B., Zhang, B.T.: Stock Trading by Modelling Price Trend with Dynamic Bayesian Networks. In: Yang, Z.R., Yin, H., Everson, R.M. (eds.) IDEAL 2004. LNCS, vol. 3177, pp. 794–799. Springer, Heidelberg (2004) 20. Mallick, D., Lee, V.C.S., Ong, Y.S.: An empirical study of genetic programming generated trading rules in computerized stock trading service system. In: 5th International Conference Service Systems and Service Management - Exploring Service Dynamics with Science and Innovative Technology, ICSSSM 2008 (2008)
A SOM Based Stereo Pair Matching Algorithm for 3-D Particle Tracking Velocimetry Kazuo Ohmi1, Basanta Joshi2, and Sanjeeb Prasad Panday2 1
Dept. of Information Systems Engineering, Osaka Sangyo University, Daito-shi, Osaka 574-8530, Japan 2 Dept. of Information Systems Engineering, Graduate Student of Faculty of Engineering, Osaka Sangyo University, Daito-shi, Osaka 574-8530, Japan
[email protected] Abstract. A self-organizing map (SOM) based algorithm has been developed for 3-D particle tracking velocimetry (3-D PTV) in stereoscopic particle pairing process. In this process every particle image in the left-camera frame should be paired with the most probably correct partner in the right-camera frame or vice versa for evaluating the exact coordinate. In the present work, the performance of the stereoscopic particle pairing is improved by applying proposed SOM optimization technique in comparison to a conventional epipolar line analysis. The algorithm is tested with the 3-D PIV standard image of the Visualization Society of Japan (VSJ) and the matching results show that the new algorithm is capable of increasing the recovery rate of correct particle pairs by a factor of 9 to 23 % compared to the conventional epipolar-line nearest-neighbor method. Keywords: Particle pairing problem; Particle tracking velocimetry; PIV; PTV; Stereoscopic PIV; Neural network; Self-organizing map; SOM.
1 Introduction The basic algorithm of the 3-D particle tracking velocimetry is composed of two successive steps of particle pairing [1] (or identification of same particles) as depicted in Fig.1. The first one is the spatio-differential (parallactic) particle pairing, in which the particles viewed by two (or more) stereoscopic cameras with different viewing angles have to be correctly paired at every synchronized time stage. This is an indispensable procedure for computing the 3-D coordinates of individual particles. And the second one is the time-differential particle pairing, where the 3-D individual particles have to be correctly paired between two time stages in a short interval. Of these two steps of particle pairing, the second one is relatively rich in methodology because many of the known 2-D time-differential tracking algorithms can be extended into 3-D tracking without any additional complexity. However, the first step particle pairing is difficult when 3-D particle coordinates must be calculated with accuracy and with high recovery ratio. When the parallactic angle between the two camera axes is small (say 10 or less), some of the currently used temporal particle pairing algorithm can be applied to the spatial particle pairing, but in this D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 11–20, 2009. © Springer-Verlag Berlin Heidelberg 2009
12
K. Ohmi, B. Joshi, and S.P. Panday
Left camera image at t=t0
Right camera image at t=t0
Left camera image at t=t1
Right camera image at t=t1
Spatial particle matching
Spatial particle matching
Computation of 3-D position of particles at t=t0
Computation of 3-D position of particles at t=t1
Temporal particle matching Displacement vector map Fig. 1. Typical flow chart of 3-D particle tracking velocimetry
case, the resultant measures of particle coordinates are not so much resolved in depth direction as in the two planar directions. For more depth resolution, the parallactic angle has to be larger and the most commonly used method for the particle pairing is the epipolar line nearest neighbor analysis [2]. But with this method the recovery ratio of correct particle pairs is relatively low, especially with densely seeded particle images. One of the present author and his coworkers have already tried to improve this low recovery ratio by using a genetic algorithm in the epipolar line nearest neighbor analysis [3] [4].The basic concept of this method is to find out a condition in which the sum of the normal distance between an epipolar line and its pairing particle image should be minimized. The accuracy of particle pairing was indeed increased to some extent but even with a new-concept high speed genetic algorithm, the computation time for particle pairing was increased exponentially with the number of particles. For this reason, realistic computation time is only met with less than 1000 particles. Another problem of the genetic algorithm is that the calculation is not reproducible because the results of genetic operations depend on random numbers. In most cases, this drawback can be cancelled by setting up a strict terminal condition for the iterative computation. But even with this the pairing results are not always reproducible if the number of pairing particles is much increased. So, In the present work,a selforganizing maps (SOM) neural network is applied to the epipolar line proximity analysis for more accuracy in the stereoscopic particle pairing with a large parallax.
2 SOM Neural Network Neural networks are one of the most effective and attractive algorithms for the particle matching problem of the particle tacking velocimetry (PTV) because in many cases they work without preliminary knowledge of the flow field to be examined. Among others, the self-organising maps (SOM) model seems to have turned out particularly useful tool for this pupose. The principles of SOM neural network were originally proposed [5], basically aimed at clumped distribution of like terms. There was room for application of this principle to clumped distribution of correct particle
A SOM Based Stereo Pair Matching Algorithm for 3-D Particle Tracking Velocimetry
13
pairs between two time-differential or spatio-differential particle images. In this regard, the SOM neural network was used for 2-D time-differential particle tacking velocimetry by Labonté [6] and then improved by one of the authors [7] with successful results. In the present work SOM neural network is applied to spatio-differential particle images. The SOM neural network consists of an input layer composed of a number of input vectors (multi-dimensional input signals) and an output (competitive) layer composed of network neurons as shown in Fig.2. All the neurons in the output layer are subject to learning from the input layer and the connection between inputs and neurons is represented by weight vectors defined for each of their combinations. According to the SOM implementation by Labonté [6] for particle pairing of the 2-D timedifferential particle images, the input vectors are given by the 2-D coordinates of particle centroids in one of the two image frames and the network neurons are the relocation of particle centroids of the opposite image frame. In response to every input vector signal, the network neurons are subjected to the Kohonen learning (displacement of neurons in reality). And as a result of iteration of this Kohonen leaning, more probable particle pairs come in more proximity and less probable ones are kept away gradually.
Output layer (Competitive layer)
Network neurons min
mi1 mi2 mi3 - - - - - min Input layer
x1
x2
Weight Vectors
x3 - - - - - - - - - xn
Fig. 2. SOM neural network architecture by Kohonen [5]
In spatio-differential particle matching, the SOM learning is based on the Epipolar line normal distance projected on one of the two stereoscopic camera screens. Theoretically, on either of these two camera screens, the particles viewed directly by one camera should be located exactly on their respective Epipolar lines derived from the particles viewed by the other camera. But in reality, the algebraic equation of the Epipolar line is determined through a camera calibration process, which is not free from experimental errors [1]. As a result, the particles viewed directly by each camera are not necessarily located on their respective Epipolar lines but slightly separated from them as shown Fig.3. So in order to match the particles on the two stereoscopic camera screens, minimization of the normal distance between direct-view particles and Epipolar lines is usually used as a particle match condition. But this condition is not always correctly applied as the number of particles in the image is increased. So
14
K. Ohmi, B. Joshi, and S.P. Panday
Particle in 3-D space
p (x, y, z)
PR (X2,Y2)
PL (X1,Y1) 2-D normal distance
2-D normal distance
Left-camera screen
Right-camera screen
2-D Epipolar lines on projected screens Fig. 3. Epipolar line normal distance for stereoscopic particle matching
Left-camera screen
Target particle (Input signal)
Winner neuron Epipolar line (input presentation)
Learning area width
Right-camera screen
Neuron displacement corresponding to Kohonen learning
Fig. 4. Schematic illustration of one single step of Kohonen learning
the most probable coupling of the particles and their respective Epipolar lines must be found out by using some optimization method. And one of the best methods for this would be the use of the SOM neural network. In the present case of spatio-differential particle images, the network neurons are represented by the centroid of every individual particle in one of the two image frames. But the input signal to this network is not given by the particle centroids of the opposite image frame but by the epipolar lines derived from the related particle centroids. And the Kohonen learning is realized by the displacement of neurons in the normal direction to the epipolar line presented as an input. This learning is iterated until the best possible combinations of a single particle and a single epipolar line are established as shown in Fig.4.
3 Particle Pairing and SOM Implementation The mathematical form of the epipolar line in a stereoscopic camera arrangement can be formulated from the following perspective transform equations [8]:
A SOM Based Stereo Pair Matching Algorithm for 3-D Particle Tracking Velocimetry
15
⎧ c11 x + c12 y + c13 z + c14 − c31 xX 1 − c32 yX 1 − c33 zX 1 = X 1 ⎪ c x + c y + c z + c − c xY − c yY − c zY = Y ⎪ 21 22 23 24 31 1 32 1 33 1 1 ⎨ d x + d y + d z + d − d xX − d yX − d zX = X2 11 12 13 14 31 2 32 2 33 2 ⎪ ⎪⎩ d 21 x + d 22 y + d 23 z + d 24 − d 31 xY2 − d 32 yY2 − d 33 zY2 = Y2
(1)
where x, y and z are the physical-space 3-D coordinates of a particle centroid, X1 and Y1 the 2-D particle coordinates on the left-camera projection screen, and X2 and Y2 those on the right-camera projection screen. The two sets of matrix coefficients cxx and dxx are the camera parameters for left and right cameras, which are determined by means of calibration using a given number of calibrated target points viewed by the same two cameras. In these equations, if either set of (X1, Y1) or (X2, Y2) is given, the other set of X and Y comes into a linear relation, providing an arithmetic equation of the relevant epipolar line. Once the camera parameters are known, for any particle image in one of the two camera frames, the corresponding epipolar line in the other camera frame is mathematically defined and, then, the normal distance between the epipolar line and any candidate pairing particle is calculated by a simple geometric algebra.
(a) Before learning
(b) After learning
Fig. 5. SOM learning for stereoscopic particle pairing
The conventional stereoscopic particle pairing uses this normal distance as a definite index. For more assurance, the sum of the two normal distances for the same particle pair derived from different camera image frames is used but no more than that. By contrast, the SOM particle pairing is a kind of topological optimization process, in which more probable pairs of a particle centroid and an epipolar line come close together and others get away. The key factor of the Kohonen learning is assigning the neuron (particle centroid) with a minimum of this normal distance as winner but the learning itself applies to all the neighboring neurons. And this learning goes on with different input vector signals so that all the opposite-frame particle centroids are presented as epipolar lines as shown in Fig. 5 (a).This learning cycle is iterated until the fixed and unique combinations of a particle and an epipolar line are established as shown in Fig.5 (b).After the establishment of one to one relationship between the particle centroids, the 3D particle coordinates are computed by solving (1).
16
K. Ohmi, B. Joshi, and S.P. Panday
The SOM Neural network system is implemented by considering two similar networks covering the particles of the two camera frames. Let xi (i=1,..,N) and y j (j=1,..,M) be the 2-D coordinate vectors of the particles in the left-camera and rightcamera frames respectively. The left network has N neurons situated at xi and the right one has M neurons at y j. Each neuron has two weight vectors, corresponding to the two components of the coordinate vectors xi and y j, and is denoted by vi for the left network and by wj for the right one. These weight vectors are assigned the following initial values: vi = x i
( i = 1,.., N ) ,
wj = yj
( j = 1,.., M )
(2)
The weight vectors are so updated that the neurons of one network should work as stimuli for the other network. More concretely, the stimulus vector v i from the left network is presented to the right network in the form of the corresponding epipolar line. Then, a winner neuron is selected from the latter network as the one with the weight vector closest to the epipolar line derived from vi. Let c be the index of this winner neuron and wc its weight vector, u i be the intersection of the epipolar line and the normal line dropped from wc, then each neuron of the right network is subjected to the following displacement of weight vectors: Δ w j ( c ) = α j ( u i − wc )
( j = 1,..., M ) , α j = ⎧⎨α
⎩0
if neuron j ∈ S c ( r ) otherwise
(3)
where αj is a scalar variable between 0 and 1 and Sc (r) the closed band region with a half width r centered by the epipolar line. The increment of weight vector in (3) is given an important modification from the original Kohonen (1982) network model, in which the right-hand term is expressed as (u i – wc) instead of (u i – w j). Each time the input vector, or the epipolar line derived from vi, is presented to the right network, the weight vectors of the latter network are updated according to: N
w j ← w j + ∑ Δ w j ( ci )
( j = 1,..., M )
(4)
i =1
In the next step, on the contrary, the stimulus vector wj from the right network is presented to the left network in the form of the corresponding epipolar line. A winner neuron is selected as the closest one to the epipolar line. Each time the weight vector wj is presented to the left network, the weight vectors of the latter network are updated according to: M
vi ← vi + ∑ Δ vi ( c j )
( i = 1,..., N )
(5)
j =1
Each time the weight vectors from either network are updated, the width r of the band region, within which the weight vectors of neurons are changed, is altered by r ← β r (0 < β < 1) . At the same time, the amplitude α of the weight translation is altered by α ← α / β . These alternate steps are iterated until the width r of the band reaches a given threshold value of rf , which should be small enough to include only the winner neuron. Since the resultant correspondence between a left network neuron and its matching
A SOM Based Stereo Pair Matching Algorithm for 3-D Particle Tracking Velocimetry
17
right network neuron is not always reciprocally identical, a final nearest-neighbor check is done with a neighborhood criterion of small distance ε. Out of the probable neurons in the neighborhood, a tolerance distance ε is set, within which two neuron weight vectors will be considered equal. The solution time can be shortened by taking ε larger.
4 Results and Discussion The present new particle pairing algorithm is tested by using synthetic particle images, namely the PIV Standard Images by Okamoto[9], which are now available at the web site of the Visualization Society of Japan (http://vsj.or.jp/piv/). Out of 6 sets of 3D particle images offered at this site, only 4 image sets (Series #351, #352, #371 and #377) are selected for the present work. Table 1 gives the technical details of all these Standard Images. All of these are synthetic particle images showing different portions of a 3-D transient flow induced by an impinging jet. The parallax angle between the view axes of the two cameras is also different from series to series. In order to simulate particle refraction effect in a real experimental environment, the use of cylindrical volume illumination and water refractive index of 1.33 are taken into account. Fig.6 shows a sample pair of the 3-D PIV Standard Image tested in the present work and Fig.7 is the corresponding pair of marker particle images used for the Table 1. Summary of the tested 3-D PIV Standard Images
Series # / Frame #
352 / 0000
351 / 0000
371 / 0000
377 / 0000
372
2092
366
939
1/5 pix
1/5 pix
1/5 pix
1/5 pix
Standard deviation of diameter
2 pix
2 pix
2 pix
2 pix
Volume of visualized flow (in situ)
2cm3
2cm3
1cm3
0.5cm3
12 cm/sec
12 cm/sec
12 cm/sec
12 cm/sec
1.33
1.33
1.33
1.33
27
27
125
27
20 cm
20 cm
20 cm
11.5 cm
-30 deg
-30 deg
-30 deg
-29.9 deg
0 deg
0 deg
-10 deg
-45 deg
0 deg
0 deg
0 deg
16.1 deg
Number of existing particles Minimum /Mean particle diameter
Maximum flow rate (in situ) Refraction index Number of calibr. marker particles Distance to origin center Inclination from x-axis Left camera Inclination from y-axis
Right camera
Inclination from z-axis Distance to origin center Inclination from x-axis
20 cm
20 cm
20 cm
11.5 cm
30 deg
30 deg
30 deg
0 deg
Inclination from y-axis
0 deg
0 deg
-10 deg
-90 deg
Inclination from z-axis
0 deg
0 deg
0 deg
30 deg
18
K. Ohmi, B. Joshi, and S.P. Panday
(a) Left frame (frame# 0000)
(b) Right frame (frame# 2000)
Fig. 6. 3-D PIV Standard Images – Series #351
(a) Left frame (frame# 0999)
(b) Right frame (frame# 2999)
Fig. 7. 3-D calibration marker images – Series #351
stereoscopic camera calibration In each calibration image, there are 27 calibration points in a cubic cell arrangement with known absolute coordinates in a real 3-D space and they are used for computing the 11 camera parameters of each camera which becomes input to (1). The particle pairing results from these 3-D PIV Standard Images are shown in Table 2, in which the performance of the two algorithms (epipolar-line nearestneighbor pairing with and without SOM) are compared in terms of the “correct pair rate”. The SOM computation parameters for these four series of PIV images are kept within: r (initial radius) = 5, rf (final radius) = 0.01 to 0.001, α(initial translation rate) = 0.05 to 0.005, β(attenuation rate) = 08 to 0.9 and ε (max distance for final pairing) = 0.01. It can be seen from these results that the performance of the particle pairing is improved with the introduction of the SOM neural network strategy and this improvement is more marked when the number of existing particles is increased up to 1000 or more. This is indicative of the practical-use effectiveness of the SOM neural
A SOM Based Stereo Pair Matching Algorithm for 3-D Particle Tracking Velocimetry
19
Table 2. Particle pairing results with or without SOM neural network
Particle pairing with SOM
Particle pairing without SOM
Series # / Frame #
Number of existing particle pairs
Number of correct pairs
Correct pair rate
Number of existing particle pairs
Number of correct pairs
Correct pair rate
#352/000
283
263
92.93 %
283
239
84.45 %
#351/000
1546
1129
73.03 %
1546
986
63.78 %
#371/000
157
145
92.36 %
157
134
85.35 %
#377/000
352
286
81.25 %
352
233
66.19 %
network particle pairing for 3-D particle tracking velocimetry. The computational cost variation for Image #352, #371, #377 is about 1 sec(s) and that of Image #351 is 20s. However, the computational cost variation for the proposed algorithm with reference to previous algorithms is insignificant. Further, it was observed that correct pairing rate for the performance is even better than that of the genetic algorithm particle pairing proposed earlier by the one of the author and his coworkers [3] [4]. Another merit of the SOM neural network particle pairing is that the performance is considerably stable regardless of the optical conditions of particle imaging. Even with or without incidence, yaw and roll angles of the two stereoscopic cameras, the correct pair rate keeps a constantly high level. This is certainly important when the stereoscopic PTV has to be employed in many industrial applications, where the positions of cameras and of laser light units are more or less restricted.
5 Conclusions A SOM neural network based algorithm was successfully implemented for the stereoscopic particle pairing step in 3D particle tracking velocimetry and tested with the PIV standard image data. With this scheme, the overall performance of particle pairing is improved and the correct pairing rate for two sets of the synthetic 3D particle images goes up to 93%. The increase factor is not dramatic but it should be noted here that the accuracy of the subsequent time series particle tracking is fairly sensitive to that of the parallactic particle pairing. A slight improvement of the parallactic particle pairing may cause considerable increase in the correct time series particle tracking in the 3D particle tracking velocimetry. Further, efforts should be made to apply the methodology to more densely seeded particle images with larger numbers of particles. Moreover, the particle matching process can be made more accurate in presence of loss-of-pair particles.
20
K. Ohmi, B. Joshi, and S.P. Panday
References 1. Mass, H.G., Gruen, A., Papantoniou, D.: Particle tracking velocimetry in three-dimensional flows. Experiments in Fluids 15, 133–146 (1993) 2. Nishino, K., Kasagi, N., Hirata, M.: Three-dimensional particle tracking velocimetry based on automated digital image processing. Trans. ASME, J. Fluids Eng. 111, 384–391 (1989) 3. Ohmi, K., Yoshida, N.: 3-D Particle tracking velocimetry using a genetic algorithm. In: Proc. 10th Int. Symposium Flow Visualization, Kyoto, Japan, F0323 (2002) 4. Ohmi, K.: 3-D particle tracking velocimetry with an improved genetic algorithm. In: Proc. 7th Symposium on Fluid Control, Measurement and Visualization, Sorrento, Italy (2003) 5. Kohonen, T.: A simple paradigm for the self-organized formation of structured feature maps. In: Competition and cooperation in neural nets. Lecture notes in biomathematics, vol. 45. Springer, Heidelberg (1982) 6. Labonté, G.: A new neural network for particle tracking velocimetry. Experiments in Fluids 26, 340–346 (1999) 7. Ohmi, K.: Neural network PIV using a self-organizing maps method. In: Proc. 4th Pacific Symp. Flow Visualization and Image Processing, Chamonix, France, F-4006 (2003) 8. Hall, E.L., Tio, J.B.K., McPherson, C.A., Sadjadi, F.A.: Measuring Curved Surfaces for Robot Vision. Computer 15(12), 42–54 (1982) 9. Okamoto, K., Nishio, S., Kobayashi, T., Saga, T., Takehara, K.: Evaluation of the 3D-PIV Standard Images (PIV-STD Project). J. of Visualization 3(2), 115–124 (2000)
Spiking Neural Network Performs Discrete Cosine Transform for Visual Images Qingxiang Wu, T.M. McGinnity, Liam Maguire, Arfan Ghani, and Joan Condell Intelligent Systems Research Centre, University of Ulster at Magee Campus Derry, BT48 7JL, Northern Ireland, UK {q.wu,tm.mcginnity,lp.maguire,a.ghani,j.condell}@ulster.ac.uk
Abstract. The human visual system demonstrates powerful image processing functionalities. Inspired by the principles from neuroscience, a spiking neural network is proposed to perform the discrete cosine transform for visual images. The structure and the properties of the network are detailed in this paper. Simulation results show that the network is able to perform the discrete cosine transform for visual images. Based on this mechanism, the key features can be extracted in ON/OFF neuron arrays. These key features can be used to reconstruct the visual images. The network can be used to explain how the spiking neuron-based system can perform key feature extraction. The differences between the discrete cosine transform and the spiking neural network transform are discussed. Keywords: spiking neural networks; visual system; discrete cosine transform; visual image.
1 Introduction The human visual system demonstrates powerful image processing functionalities. The retina contains complex circuits of neurons that extract salient information from visual inputs. Signals from photoreceptors are processed by retinal interneurons, integrated by retinal ganglion cells and sent to the brain by axons of retinal ganglion cells. Different cells respond to different visual features, such as light intensity, colour or moving objects [1–5]. Mammalian retinas contain approximately 55 distinct cell types, each with a different function [1]. The exact neuronal circuits for extraction of key features and identification of various visual images still need to be determined. Similarly, there is a need to simulate the neuronal circuits in electronic circuits as a precursor to their application in artificial intelligent systems. These are still open questions. The Discrete Cosine Transform (DCT) [6] is an efficient approach for key feature extraction in the image processing domain. Based on the spiking neuron model [7-12], a neuronal circuit is proposed to perform discrete cosine transform and explain how a spiking neural network can extract key features of a visual image. The neuronal circuit has been simulated using Matlab and the results show that the key features of a visual image can be extracted by the neural network and the visual image can be reconstructed using the key features. D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 21–29, 2009. © Springer-Verlag Berlin Heidelberg 2009
22
Q.X. Wu et al.
The remainder of this paper is organized as follows. In Section 2, the architecture of the neural network is proposed to extract the key features and the network is shown to be able to reconstruct the visual image. The network model is based on simplified conductance-based integrate-and-fire neurons. The behaviors of the neural network are governed by a set of equations discussed in Section 3. Simulation results are presented in Section 4. The network performance is discussed in Section 5.
2 Spiking Neural Network Model for Discrete Cosine Transform The human visual system performs feature extraction very efficiently. Neuroscientists have found that there are various receptive fields from simple cells in the striate cortex to those of the retina and lateral geniculate nucleus [13]. The different pathways, which are composed of different types of neurons, play different roles in the visual system. ON/OFF pathways [3] were found in the visual system. Inspired by the mechanism of ON/OFF pathways, a spiking neural network model is proposed to perform the discrete cosine transform as shown in Fig. 1. Input neuron array
W N(m,n)Æ ON (p,q)
P
Q
N
M
W N(m,n)Æ OFF(p,q)
ON neuron array W ON (p,q) ÆRN(m,n) W OFF(p,q) ÆRN(m,n)Q M
P
OFF neuron array
Excitatory synapse Inhibitory synapse N
Reconstruction neuron array
Fig. 1. Spiking neural network for feature extraction and image reconstruction
Spiking Neural Network Performs Discrete Cosine Transform for Visual Images
23
Suppose that each neuron in the input neuron array generates spikes induced by a synapse current of photonic receptor according to the corresponding pixel brightness in a visual image. The dimension of the input neuron array is M×N. A neuron in the array is labeled with N(m, n), where m=0,…,M-1 and n=0,…,N-1. Each pixel of the image corresponds to a receptor. The intermediate layer is composed of two neuron arrays; one is the ON neuron array and other is the OFF neuron array. The ON/OFF neuron arrays have the same dimension P×Q, where P≤M and Q≤N. Neurons in the ON/OFF neuron arrays are labeled with ON(p, q) and OFF(p, q), where p=0,…,P-1 and q=0,…,Q-1. Each neuron in the ON/OFF arrays receives spike trains from all neurons in the input array through excitatory synapses with specific strength distributions WN ( m,n )→ON ( p,q ) for ON neurons and WN ( m,n) →OFF ( p ,q ) for the OFF neurons. Based on the principle of the Discrete Cosine Transform (DCT)[6], synapse strength distribution can be set as follows. π (2m + 1) p π (2n + 1)q π (2m + 1) p π (2n + 1)q ⎧ cos , if cos cos >0 ⎪⎪α pαq cos 2M 2N 2M 2N WN (m,n)→ON( p,q) = ⎨ π (2m + 1) p π (2n + 1)q ⎪ 0, if cos cos ≤0 ⎪⎩ 2M 2N
(1)
π (2m + 1) p π (2n + 1)q π (2m + 1) p π (2n + 1)q ⎧ cos , if cos cos H r , xL+1,1 was randomly generated from its lower and upper bound, the abovementioned procedure was applied to other elements in V thereby obtaining a new harmony hL+1 . The iterative steps of harmony search algorithm are as follows:
Step1: initialize the algorithm parameters: H r , Pr , L and randomly generate L harmonies (slip surfaces); Step 2: generate a new harmony (as described above) and evaluate it, i.e calculate the factor of safety using unbalanced thrust force method; Step 3: update the HM; i e., if the new harmony hN +1 was better than the worst harmony in the HM in terms of factor of safety, the worst harmony was replaced with the new harmony, thus one iteration was finished. Step 4: repeat steps 2 and 3 until the termination criterion (number of iterations reached the maximum allowed value (see Tm )) was achieved. The values of parameters such as H r , Pr were usually determined by the rule of thrum and there was no theoretical basis for the determination of these values, a dynamic adaptation procedure for the determination of these two values was proposed in improved harmony search algorithm. In addition, the substituting rule used by original harmony search algorithm was changed in improved harmony search algorithm. 3.2 Improved Harmony Search Algorithm
As mentioned above, there were two different procedures used in improved harmony search algorithm. The dynamic adaptation procedure for the determination of values of parameters was based on the convergence degree among all the harmonies. The convergence degree was represented by ‘central distance’ Cd of all the harmonies in the present HM which was calculated by Equation (3). L
L
Cd = ∑ Di i =1
Di =
∑(h m
j =1
ij
− Cj )
2
Cj =
Cd ⎧ Cd ≤ M d ⎪H r = M . ⎨ d ⎪ H = 1.0 C > M d d ⎩ r M d = η .L.
m
∑ (u j =1
i
− li ) .
∑h i =1
L
ij
.
(3)
(4)
2
(5)
An Improved Harmony Search Algorithm for the Location of Critical Slip Surfaces
219
When one iteration was completed, equation (4) was used to determine the value of H r .Where M d was the threshold value assigned by the researchers and it was defined in this study according to the lower and upper bounds to elements in vector V , assuming the lower and upper bounds to ith element in vector V as li , ui respectively, M d was calculated by using equation (5). Where η was a coefficient varying from 0 m
to1 which was used to decrease the value of L.
∑ (u j =1
− li ) . 2
i
The second improvement was the substituting rule. In the original harmony search algorithm, the worst harmony was replaced with the new generated harmony if the latter was better in terms of objective function than the former one. Such rule only considered the improvement of objective function, actually the new obtained harmony was always better than several ones in the current HM. A new substituting rule was proposed this study which defined the one yielding the maximum ‘central distance’ Cd to be replaced with the new obtained harmony. To put it in details, each of several harmonies which were worst than the new harmony was replaced with the new one respectively, the corresponding ‘central distance’ was obtained by Equation (3), the harmony yielding the maximum ‘central distance’ was replaced in the end. The improved harmony search algorithm was implemented as follows: (1) The harmony search algorithm with dynamic adaptation procedure functioned in the first half of total iterations prescribed by Tm ; (2) The harmony search algorithm with new substituting rule functioned in the second half of total iterations prescribed by Tm , in this half H r was equal to 1.0. In the following case studies, 10 different values of parameter H r were adopted for the original harmony search algorithm, while the value of Pr was identical for both the harmony search algorithms which was 0.1. The parameter Tm was 10000, the number of control variables for the generation of slip surface, i.e., m was 20, L was equal to 2m .
4 Case Studies The focused example was a slope in layered soil and genetic algorithm with Morgenstern and Price method was used by Zolfaghari [10]. The geometric layout of the slope was shown in Fig.2 while Table 1 gave the geotechnical properties for soil layers 1 to 4. Table 1. Geotechnical parameters for example
Layers
γ
1
19.0
(kN/m3)
c (kPa)
φ (degree)
15.0
20.0
2
19.0
17.0
21.0
3
19.0
5.00
10.0
4
19.0
35.0
28.0
220
L. Li et al. 50
Layer 1
Slope height/m
48
Layer 2
46
Layer 3
44
Layer 4
42
40 0
5
10
15
20
25
30
Slope width/m
Fig. 2. Cross section of example slope
It was clearly noticed from Fig. 3 that in original harmony search algorithm different values of parameter H r yielded different results varying from 1.103 to 1.145. However, one result of 1.102 was obtained by improved harmony search algorithm which avoided the determination of value of parameter H r and considered the diversity of the HM during the iteration steps. The initial harmony memory HM was identical during the experimental analysis for ten different values of H r and also during the implementation of improved harmony search algorithm, thus the noise situation has insignificant effect on the results obtained. The improved harmony search algorithm utilized different values of H r according to the ‘central distance’ of present HM to avoid the probability of trap into the local minimum, whereas the original harmony search algorithm regarded H r as one invariant regardless of the ‘central distance’ of present HM which was so small enough to lead the algorithm to the local minimum. 1.15 Original Harmony search algorithm
y t 1.14 e f a s f o 1.13 r o t c a f 1.12 m u m i n i 1.11 M
1.1
Improved harmony search algorithm
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Value of Hr
Fig. 3. The curves of minimum factors of safety found by different values of H r
An Improved Harmony Search Algorithm for the Location of Critical Slip Surfaces
Slope width/m
50
221
Zolfaghari Improvred harmony search algorithm
45
40 0
10
20
30
Slope height/m
Fig. 4. The comparison of critical slip surfaces
Zolfghari [11] gave the factor of safety of 1.24 by using genetic algorithm, Although the number of slices used by Zolfaghari was not clearly stated, the differences in the factors of safety between the results by the authors and Zolfghari were not small and such differences could not be accounted for by different number of slices used for computation. Refer to Fig.4, it was noticed that greater portions of the failure surfaces by the authors lay within soil layer 3 as compared with the solution by Zolfghari. The lower factor of safety obtained by the authors was hence more reasonable than the solution by Zolfghari. The comparison of the results proved that the improved harmony search algorithm was efficient and effective for this optimization problem which could also provide promising experience for other related fields.
5 Conclusions In this study an improved harmony search algorithm was proposed to determine the value of parameter H r which was usually defined by the rule of thrum. The improved harmony search algorithm also considered the diversity of the current HM which was possible to avoid the trap into the local minimum. The case studies demonstrated the improved harmony search algorithm was efficient within the same number of iterations.
Acknowledgments The author would like to thank for the help from Project (50874064 and 50804026) supported by the National Natural Science Foundation of China and also from Key Project (Z2007F10) supported by the Natural Science Foundation of Shandong province.
References 1. Chen, Z., Shao, C.: Evaluation of Minimum factor of safety in Slope Stability Analysis. Canadian Geotechnical Journal 25, 735–748 (1988) 2. Baker, R., Garber, M.: Theoretical Analysis of the Stability of Slopes. Geotechnique 28, 341–395 (1978)
222
L. Li et al.
3. Nguyen, V.U.: Determination of Critical Slope Failure Surfaces. Journal of Geotechnical Engineering, ASCE 111, 238–250 (1985) 4. Arai, K., Tagyo, K.: Determination of noncircular slip surfaces giving the minimum factor of safety in slope stability analysis 21, 43–51 (1985) 5. Baker, R.: Determination of the critical slip surface in slope stability computations. International Journal of Numerical and Analytical Methods in Geomechanics, 333–359 (1980) 6. Yamagami, T., Jiang, J.C.: A Search for the Critical Slip Surface in Three-Dimensional Slope Stability Analysis. Soils and Foundations 37, 1–6 (1997) 7. Greco, V.R.: Efficient Monte Carlo technique for locating critical slip surface. Journal of Geotechnical Engineering 122, 517–525 (1996) 8. Malkawi Abdallah, I.H., Hassan, W.F., Sarma, S.K.: Global search method for locating general slip surface using Monte Carlo techniques. Journal of Geotechnical and Geoenvironmental Engineering 127, 688–698 (2001) 9. Cheng, Y.M.: Location of Critical Failure Surface and some Further Studies on Slope Stability Analysis. Computers and Geotechnics 30, 255–267 (2003) 10. Bolton Hermanus, P.J., Heymann, G., Groenwold, A.: Global search for critical failure surface in slope stability analysis. Engineering Optimization 35, 51–65 (2003) 11. Zolfaghari, A.R., Heath, A.C., McCombie, P.F.: Simple genetic algorithm search for critical non-circular failure surface in slope stability analysis. Computers and Geotechnics 32, 139–152 (2005) 12. Li, L., Chi, S.C., Lin, G.: The complex method based on ant colony algorithm and its application on the slope stability analysis. Chinese Journal of Geotechnical Engineering 26, 691–696 (2004) 13. Cheng, Y.M., Li, L., Chi, S.C.: Determination of the critical slip surface using artificial fish swarms algorithm. Journal of Geotechnical and Geoenvironmental Engineering 134, 244–251 (2008) 14. Geem, Z.W., Kim, J.H., Loganathan, G.V.: Harmony search. Simulation 76, 60–68 (2001) 15. Geem, Z.W.: Optimal cost design of water distribution networks using harmony search. Engineeering optimization 38, 259–280 (2006)
An Improved PSO Algorithm Encoding a priori Information for Nonlinear Approximation Tong-Yue Gu, Shi-Guang Ju, and Fei Han School of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang, Jiangsu, China
[email protected],
[email protected],
[email protected] Abstract. In this paper, an improved PSO algorithm for nonlinear approximation is proposed. The particle swarm optimization is easy to lose the diversity of the swarm and trap into the local minima. In order to resolve this problem, in the proposed algorithm, when the swarm loses its diversity, the current each particle and its historical optimium are interrupted by random function. Moreover, the a priori information obtained from the nonlinear approximation problem is encoded into the PSO. Hence, the proposed algorithm could not only improve the diversity of the swarm but also reduce the likelihood of the particles being trapped into local minima on the error surface. Finally, two real data in chemistry field are used to verify the efficiency and effectiveness of the proposed algorithm.
1 Introduction In past decades, feedforward neural networks (FNN) have been widely used to pattern recognition and function approximation [1-2]. There have many algorithms used to train the FNN, such as backpropagation algorithm (BP), genetic algorithm (GA) [3], particle swarm optimization algorithm (PSOA) [4], simulating annealing algorithm (SAA) [5], etc. Compared with genetic algorithm, PSO algorithm has some advantages. First, PSO algorithm has the memory, and the knowledge of good solution is retained by all the particles. As for the genetic algorithm, the good solutions will lose once the current iteration particle changes. Second, the particles in the PSO algorithm are interrelated with each other to share information. Third, the PSO algorithm can converge faster to the optimal solution, easy to implement, and does not require to encode/decode, hybrid, etc. Finally, the PSO algorithm can solve a lot of optimization problems, and its global search capabilities are much better [6]. Nevertheless, the PSO is easy to lose its diversity, and leads to converge very slowly around the global optimum, which is the phenomenon of the “premature convergence” [7-8]. In order to overcome the shortcoming of PSO, many corresponding algorithms have been proposed. In the literature [9], the passive PSO algorithm was proposed. In this algorithm, the particle tracked its historical optimal location and the global one, and on the mean time tracked the location of its neighbors to update the velocity and improved the diversity of particles. A hybrid PSO algorithm was proposed in the literature [10], and each particle would be given the propagation probability based on D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 223–231, 2009. © Springer-Verlag Berlin Heidelberg 2009
224
T.-Y. Gu, S.-G. Ju, and F. Han
predetermined criteria. In each iteration, according to the high and low propagation probability, some particles are chosen into a pool. The particles hybrid each other in the pool and generate the equal number of offspring particles, and the offspring particles replace his parent ones in order to keep the same number. In the literature [11], a social clustering PSO algorithm regarded some particles as center, and grouped some particle near from it, and then calculate the center of each cluster and replace the historical and global optimal position with it. The algorithms mentioned above can improve the diversity of the swarm to some extent, but the particles in these algorithms still may be trapped into the local minima in the course of the training. In this paper, an improved PSO algorithm for nonlinear approximation is proposed. At the beginning of training, the a priori information will be encoded into initial particles. In the course of training, when the diversity of the swarm is lost, each particle and its historical optimum are interrupted by a random function, and on the mean time the a priori information of the involved problem is encoding into the interrupted particles. The proposed algorithm could not only improve the diversity of the swarm but also reduce the likelihood of the particles being trapped into local minima on the error surface., and the convergence performance of the proposed algorithm will be improved.
2 Particle Swarm Optimization Algorithm In PSO, a point in the problem space is called a particle, which is initialized with a random position and search velocity. Each particle flies with a certain velocity and find the global best position after some iteration. At each iteration, each particle adjusts its velocity, based on its momentum and the influence of its best position (Pb) as well as the best position of its neighbors (Pg), and then computes a new position that the particle is to fly to. Supposing the dimension for a searching space is m, the total number of particles is n, the position of the ith particle can be expressed as Xi= xi1; xi2...; xim ; the best position of the ith particle being searching until now is denoted as Pi=(pi1; pi2…pim); and the best position of the total particle swarm being searching until now is denoted as Pg=(pg1; pg2…pgm); the velocity of the ith particle is represented as Vi= vi1; vi2;…; vim . Then the original PSO is described as:
(
)
(
)
V i ( t + 1) = V i ( t ) + c 1 * r * ( P i ( t ) − X i ( t )) + c 2 * r * ( P g ( t ) − X i ( t ))
(1)
X i ( t + 1) = X i ( t ) + V i ( t + 1)
(2)
where c1, c2 are the acceleration constants worth positive values; rand() is a random number between 0 and 1; w is the inertia weight. In addition to the parameter c1, and c2 parameters, the implementation of the original algorithm also requires placing a limit on the velocity (vmax). After adjusting the parameters w and vmax, the PSO can achieve the best ability. The adaptive particle swarm optimization (APSO) algorithm is based on the original PSO algorithm, firstly proposed by Shi and Eberhart in 1998[12].The APSO can be described as follows:
An Improved PSO Algorithm Encoding a priori Information
225
V i (t + 1) = W (t ) * V i (t ) + c1 * r * ( P i (t ) − X i (t )) + c 2 * r * ( P g (t ) − X i (t ))
(3)
Xi (t + 1) = Xi (t ) + Vi (t + 1)
(4)
where w is a new inertial weight. This algorithm by adjusting the parameter w can make w reduce gradually as the generation increases, In the searching process of the PSO algorithm, the searching space will reduce gradually as the generation increase. The APSO algorithm is more effective, because the searching space reduces step by step nonlinearly. In this paper, we also use this strategy to modify the inertia weight w. When the PSO algorithm is used to modified the weights, each particle X i = ( w12ji w 3j 1i ), j = 1, 2, ..., n is defined as the connected weights in the feedforward
,
neural networks which is shown in the Fig.1. W[2]
W[3] t1
S1
t2 S2 . .…
. .…
. .…
t3 Sm
Fig. 1. The scheme of single hidden layer feedforward nural networks
3 The Proposed Algorithm The standard PSO algorithm has good global search ability, but the particle swarm is also easy to lose its diversity and trap into the local minimum. The reason why the loss of diversity is because that the current particle not only tracks its historical pbest position, but also tracks the gbest to find its own better position pbest. Therefore, pbest will close to the gbest, and bounds to lose the diversity. In this paper the proposed algorithm named as IDP-PSO algorithm, applies a random function to interrupt the current iteration particle and its pbest so that the current particles disperse around the global optimum [13]. As the iteration number increases, the optimal position is nearer to the global optimum obtained by PSO algorithm, so a constrained function is used to control the scope of the current particles dispersed. At the same time, the scope of particle swarm dispersed gradually becomes smaller, so that it could improve the convergence accuracy and rate. Because there are lots of local minima in the error surface, the particles are apt to be trapped into these local minima. In this paper the improved PSO algorithm is used to approximate the nonlinear monotonic functions. The a priori information of the monotonicity is encoded into the each particle.
226
T.-Y. Gu, S.-G. Ju, and F. Han
In this paper, the diversity of the swarm is defined as follows [14]: S (t ) =
1 n
n
n
∑
∑ (x
i=1
j =1
ij
(t ) −
x
j
(t )) 2
(5)
where t is the tth generation of the current particle evolution; S(t) is the convergent degree of the current particle swarm and the its historical best particle; n is the total number of the swarm; i is the jth dimension of the ith; Xij(t) is the jth dimension value of the ith particle; x ( t ) is the average value of the jth dimension over all particle. j
When the value of the diversity is less than the value of S(t), the random function is used to disturb current particle and its own Pb. The function which controls the scope of particle dispersed is defined as follows: X
Where
ij
(ite r ) =
X
ij
(ite r ) +
r a n d
e
a rc ta n ( ite r )
iter is the iter th iteration times; Xij (iter ) is the
(6)
i th dimension value of the
j th particle; rand stands for the generated values randomly between 0 and 1. In the literature [15], Joerding and Meador presented J.PF method to impose the increasing monotonicity on weights. In their method, a sufficient (but not necessary) condition to satisfy the increasing monotonicity is deduced:
wj1[ 2 ] w1 j [ 3 ]
(7)
> 0, j = 1, 2, ..., n ,
Based on this, J.PF method utilizes the following performance function to eliminate the nonmonotonic intervals in network models: P E (w )= S S E (w )+ L (w )
L(w) =
∑
u (θ w [j 21 ] w1[ 3j ] )( e j
θ w [j 21 ] w1[ 3j ]
− 1)
⎧⎪ 0 θ w [j 21 ] w1[ 3j ] > 0 u (θ w [j21 ] w1[ 3j ] ) = ⎨ ⎪⎩ 1 otherwise
(8)
(9)
(10)
where θ is a modulatory constant. In practice, due to the introduction of the penalty function, J.PF’s approximation accuracy and prediction ability are not very good. In this paper, in order to satisfy the priori information, C.P.G method is presented that the position of current iteration particle is obtained from the its own pbest and the 2i 3i gbest. The current particle is: X i = ( w1 j , w j 1 ), j = 1, 2 , ..., n . The global optimal position is Pg = ( w12jg , w3j1g ) j = 1, 2,..., n . The historical optimal position of each particle is Pi = ( w12jpi , w3j1pi ) j = 1, 2,..., n . If the weights of current iteration is w12ji .w3j1i < 0 (supposed 3g 3 pi that w12ji > w3j1i ) then set w3j1i = w j1 + w j1
2
. It must satisfy the following penalty:
An Improved PSO Algorithm Encoding a priori Information ⎧ w 3j 1g + w 3j 1pi ) ⎪−( 2 w 3j 1i = ⎨ 3g 3 pi ⎪ w j1 + w j1 ⎩ 2
sg n( w 3j 1i ) * sg n( 3i j1
sgn( w ) * sgn (
w 3j 1g + w 3j 1p i 2
)0
w 3j 1g + w 3j 1pi
227
(11)
Fig.2 shows the concret steps of the IDP-PSO algorithm Initiate the swarm X.Vˈset the constant C1 , C2
Update the X .calculate the Fitness, Gfittess Greater than 0 Use the formula(2)to verify each Xi Calculate the particle fitness(1)
Less than 0
Use the formula(10) to update the particle toXii calculate fitness˄2˅
y Fitness(1)> fitness(2) n Xi= Xii
use the formula(6) to calculate S(t)
S(t) f (x j ) ,
which means that there is an unobservable latent function value f ( xi ) ∈
(2) associated
with each training sample xi , and that the preference relation between any two instances dependents on the latent function values of them. The rules of deducing the preference label ri* of pair d i take the form ⎧+1, f ( x ) > f ( x ' ) ⎪ . ⎪ ri* = ⎨ 0, f ( x ) = f ( x' ) ⎪ ⎪⎩−1, otherwise
(3)
308
X. Chang, Q. Zheng, and P. Lin
The task of learning to rank is to find a model f * in space [ , which takes the minimum error of predicting the preference relation of the instances in training dataset. When ri is not equal to ri* , the loss of the prediction error of model f ( ⋅) can be denoted as the form l pref (di ,ri ,ri* ) . l pref (⋅) is the loss function. The empirical risk of
prediction error of f ( ⋅) to all pairs in set 1 M
Remp ( f ; π) =
π is given as the form M
∑l i =1
pref
( d i ,ri ,ri* ) .
(4)
* which takes the miniThe goal of learning to rank is to find an optimal model f pref
mum empirical risk of prediction error as the form * f pref = argmin Remp ( f ; π) .
(5)
f
3.2 Ranks Imbalance Analysis The instance pair d i can be labeled with the rank label of one of two instances. The formal rule of labeling the pair is given in Definition 1.
(
)
Definition 1: A pair d = d (1) , d ( 2) can be labeled with a rank label r following the rule as form: ⎧ y (1) , if y d (1) y d ( 2) r= ⎨ d y , if y (1) ≺ y ( 2 ) d d ⎩ d ( 2)
where y
d( ) i
,
(6)
is the label of i-th instance in the pair d. ( i ∈ {1, 2}) The pair d can be called
a pair of rank r. According to the result of labeling the pairs in the set π following Definition 1, the empirical risk function Remp ( f ; π) can be decomposed into k − 1 sub-items as the form Remp ( f;π) =
1 M
Nk
∑l i =1
pref
( di , ri , ri* ) + Rk
+
1 M
N2
∑l i =1
pref
(di , ri , ri* ) ,
(7)
R2
where N i denotes the number of instances of rank i ( i = 2,..., k ) , k is the number of ranks and Ri denotes the risk of prediction error of rank i. In real world dataset, the number of instances of ranks is different. Usually, the instance of the ‘important’ rank is fewest, followed by ‘possible important’ rank, and the instance of the ‘no important’ rank is most. Combining these instances to pairs and labeled with Definition 1. The pairs of ranks will be imbalance too. In this condition, the risk of the most important rank will occupies a fewer proportion in the total * of (5) will bias to the risk Remp ( f ; π) than that of minor ranks. The optimal result f pref
Cost-Sensitive Supported Vector Learning to Rank Imbalanced Data Set
309
rank of occupying the larger proportion in the total risk. The minimum prediction error can not be attained on the sample pairs of most important rank. Unfortunately, the most of real world datasets are imbalanced. So, researching the approach to improve the optimization result in this case should be a valuable work.
4 Cost-Sensitive Supported Vector Learning to Rank Imbalance Data 4.1 Cost-Sensitive Risk Model of Learning to Rank The strategy of learning imbalance data is to modify the error cost of pairs of ranks. Following the Definition 1, the dataset labeled with k ranks is split into k-1 pairs subsets. The error cost of pairs of a rank is adjusted by the proportion occupied by them in π . The cost-sensitive risk model is written as the form Remp ( f;π) =
N2 1 η2 ∑ l pref ( di , ri , ri* ) + M i =1
+
NK 1 ηK ∑ l pref ( d i , ri , ri* ) , M i =1
(8)
where ηr is a cost parameter of pairs of rank r, which is used to adjust the error cost of pairs of rank r. The cost parameters ηr can be computed by the formula as the form
ηr = er ⋅
Nm
Nr
, r = 2,..., k ,
(9)
where er is a enlargement factor to the cost of pairs of rank r, rank m is the rank with the most pairs which takes the form
m= argmax (N r ), r = 2,..., k . r
(10)
The error risk of pairs of a rank in Remp ( f ; π) can be adjusted by changing the value of er . 4.2 Cost-Sensitive One-Class Supported Vector Learning to Rank
In model (1), a binary classification model is employed. Following the rules (3), the pairs will be labeled as two classes. However, the pairs can be label with one class by changing the order of two instances in a pair. In this case, only the data of one class will be learned. Therefore, the cost-sensitive One-Class supported vector learning to rank is given as the form K
min w
Nr
∑∑η r = 2 j=1
r
⎡1 − w, Φ d(1) − Φ d (2) ⎤ + λ w 2 , j j ⎢⎣ ⎥⎦ +
( ) ( )
where Φ ( ⋅) mapping a sample pair d i from input space into feature space.
(11)
310
X. Chang, Q. Zheng, and P. Lin
The model in (11) is equal to a quadratic programming model as the form M
min w
∑
1 2 w + Ci ⋅ ξi 2 i =1 subject to − Φ d (2) w, Φ d (1) j j ξi ≥ 0, i = 1, , M
( ) ( )
≥ 1 − ξi , i = 1,
,M
,
(12)
which is used in computation. Proposition 1: The problems in (11) and (12) are equivalent, when Ci = ηr where the 2λ
pair di belongs to rank r. The Lagrange method is used to solve the quadratic programming problem (12). The Lagrange dual form of problem (12) takes the form min α
1 2
M
M
∑∑ α α
i i'
i =1 i ' =1
( ) ( ) ( ) ( )
Φ di(1) − Φ d i(2) , Φ d i(1)' − Φ di(2) '
subject to 0 ≤ α i ≤ Ci , i = 1,
, (13)
,M
M
∑α
i
=1
i =1
where Ci is the upper limit of the value of α i . The upper limit Ci of the value of the Lagrange coefficient of the pairs of rank r will increase with the increasing of the enlargement factor er . It is possible that a large value is assigned to α i that corresponds to a large Ci .
5 Experiments In experiment, the performance of the approach of cost-sensitive supported vector learning to rank proposed in this paper, is called CSRankSVM, is compared with convention Ranking SVM. The convention Ranking SVM is called RankSVM. OHSUMED [13], a dataset for document retrieval research, is used in the experiment. It has been processed and included in LETOR [14], a Benchmark dataset build by MSAR for ranking algorithm research. This dataset is imbalanced. CSRankSVM and RankSVM are trained and tested on both of the two datasets. The linear kernel function is employed in two algorithms in our experiment, which takes the form κ ( x, x' ) = xT • x' . The trade-off parameter ‘ λ ’ is set to 0.5 for two algorithms. The normalized discounted cumulative gain (NDCG) [15] is used as evaluation measure, which has been widely used by researchers in recent years. NDCG can be used to evaluate the performance of ranking method on the dataset that is labeled with more than two ranks.
Cost-Sensitive Supported Vector Learning to Rank Imbalanced Data Set k
NDCG @ k = Ν (k) • ∑ j =1
2 ( ) −1 , log (1 + j)
311
r j
(14)
where Ν (k) is the NDCG at k-th position of ideal ranking list. It is used as a normalization factor of the NDCG at k of ranking list of prediction result. 5.1 Experiment on Document retrieval
The OHSUMED collection is used in document retrieval research. This dataset has been used in information filtering task of TREC 2000. The relevance judgments of documents in OHSUMED are either ‘d’ (definitely relevant), ‘p’ (possibly relevant), or ‘n’ (not relevant). Rank ‘n’ has the largest number of documents, followed by ‘p’ and ‘d’. The original OHSUMED collection consists of 348,566 records from 270 medical journals. There are 106 queries. For each query, there are a number of documents associated. The OHSUMED has been collected into a Benchmark dataset LETOR for ranking algorithm research. In this dataset, each instance is represented as a vector of features, determined by a query and a document. Every vector consists of 25 features. The value of features has been computed. The 20 folds experimental dataset is obtained by running following strategy 20 times: selecting the instances of two query randomly as training data, the instances of one half of remained queries as validate data and that of another half of queries as test data. The average instances number of three ranks in 20 folds is given in Table 1. The rank 2, 1 and 0 denote the rank of ‘definitely relevant’, ‘possibly relevant’ and ‘not relevant’ respectively. The rank 0 has the most instances, followed by rank 1 and 2. Table 1. Instances number of ranks
Instance Number
Rank 0 38
Rank 1 4
Rank 2 3
Following the Definition 1, instance of three ranks are combined to the pairs of two ranks. The average pair number of two ranks is given in Table 2.The pairs of rank 2 is few than rank 1. All of the pairs of Rank 1 and Rank 2 are assigned to one class: +1. Table 2. Pairs number of ranks
Rank 1 Rank 2
Total Number of Pairs of A Rank 159.5 138.3
The training pairs are fallen into two most relevant ranks. The error risk of prediction can be decomposed into two parts. A two dimensions enlargement factor vector E = ( e1 , e2 ) is used in this experiment. e1 is the enlargement factor of rank 1. e2 is
312
X. Chang, Q. Zheng, and P. Lin
enlargement factor of rank 2, i.e. ‘definitely relevant’ rank. Document retrieval problem is focused on the prediction precision of ‘definitely relevant’ rank. In experiment, therefore, e1 is set to 1 and e2 is adjusted from 1 to 5. The experimental results are given in Fig.1. It can be seen that when e2 is set to 1 to 5 the value of NDCG of CSRankSVM all are higher than that of RankSVM significant. The value of NDCG is increased when e2 is set from 1 to 4. But when change the value of e2 from 4 to 5, the change of the value of NDCG is not monotony. The trend of the value of NDCG changing with the augment of e2 can be seen more obvious in Fig. 2. When e2 is increased from 1 to 4, the value of NDCG is improved significantly. When e2 is set to large than 4 the change of NDCG is not significant.
Fig. 1. The results of CSRankSVM an RankSVM on OHSUMED
Fig. 2. NDCG at 1 to 5 vs. e2 of CSRankSVM on OHSUMED
This mean that the prediction model will not be improved significantly when e2 is larger than a certain value. The optimal setting to vector E can be found by cross-validate approach.
6 Conclusion In this paper, a cost-sensitive risk minimum model is proposed to learning to rank imbalance data. In this model, the enlargement factors are used to adjust the error cost of ranks. Following this model, a cost-sensitive supported vector learning approach is developed. The performance of the approach proposed in this paper is compared with convention Ranking SVM in experiment. The experimental results on two real world dataset show that the performance of our approach is better than that of convention Ranking SVM to learn imbalanced data.
Cost-Sensitive Supported Vector Learning to Rank Imbalanced Data Set
313
In this paper, a simple way is employed to improve the Ranking SVM to learn imbalance data more efficient. In the future work, some advanced techniques which have been used in SVM classification imbalance data will be adopted to learning to rank imbalance data [16] [17].
Acknowledgement The research was supported by the National High-Tech R&D Program of China under Grant No.2008AA01Z131, the National Science Foundation of China under Grant Nos.60825202, 60803079, 60633020, the National Key Technologies R&D Program of China under Grant Nos. 2006BAK11B02, 2006BAJ07B06.
References 1. Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An Efficient Boosting Algorithm for Combining Preferences. In: 15th International Conference on Machine Learning, pp. 170– 178 (1998) 2. Herbrich, R., Graepel, T., Obermayer, K.: Support Vector Learning for Ordinal Regression. In: Nineth Ann. Conf. Artificial Neural Networks (ICANN 1999), pp. 97–102 (1999) 3. Crammer, K., Singer, Y.: Pranking with Ranking. In: Fourteenth Ann. Conf. Neural Information Processing Systems, NIPS 2001 (2001) 4. Shashua, A., Levin, A.: Ranking with Large Margin Principle: Two Approaches. In: 16th Ann. Conf. Neural Information Processing Systems (NIPS 2003), pp. 961–968 (2003) 5. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to Rank Using Gradient Descent. In: The 22nd International Conference on Machine Learning, pp. 89–96 (2005) 6. Chu, W., Ghahramani, Z.: Preference Learning with Gaussian Processes. In: 22nd International Conference on Machine Learning, pp. 137–144 (2005) 7. Chu, W., Ghahramani, Z.: Gaussian Processes for Ordinal Regression. Journal of Machine Learning Research 6, 23 (2005) 8. Lin, H.-T., Li, L.: Large-margin thresholded ensembles for ordinal regression: Theory and practice. In: Balcázar, J.L., Long, P.M., Stephan, F. (eds.) ALT 2006. LNCS, vol. 4264, pp. 319–333. Springer, Heidelberg (2006) 9. Tsai, M.-F., Liu, T.-Y., Qin, T., Chen, H.-H., Ma, W.-Y.: FRank: A Ranking Method with Fidelity Loss. In: The 30th Annual International ACM SIGIR Conference (2007) 10. Pahikkala, T., Tsivtsivadze, E., Airola, A., Boberg, J., Salakoski, T.: Learning to rank with pairwise regularized least-squares. In: The 30th International Conference on Research and Development in Information Retrieval -Workshop on Learning to Rank for Information Retrieval, pp. 27–33 (2007) 11. Cao, Y., Xu, J., Liu, T.Y., Li, H., Huang, Y., Hon, H.-W.: Adapting Ranking SVM to Document Retrieval. In: 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 186–193 (2006) 12. Joachims, T.: Optimizing Search Engines Using Clickthrough Data. In: The ACM Conference on Knowledge Discovery and Data Mining, pp. 133–142 (2002) 13. Hersh, W., Buckley, C., Leone, T.J., Hickam, D.: OHSUMED: An Interactive Retrieval Evaluation and New Large Test Collection for Research. In: Seventeenth Ann. ACM-SIGIR Conf. Research and Development in Information Retrieval (SIGIR 1994), pp. 192–201|358 (1994)
314
X. Chang, Q. Zheng, and P. Lin
14. Liu, T.Y., Xu, J., Qin, T., Xiong, W., Li, H.: Letor: Benchmark Dataset for Research on Learning to Rank for Information Retrieval. In: SIGIR 2007 Workshop on Learning to Rank for Information Retrieval (2007) 15. Kekalainen, J.: Binary and Graded Relevance in IR Evaluations - Comparison of the Effects on Ranking of IR Systems. Information Processing & Management 41, 1019–1033 (2005) 16. Raskutti, B., Kowalczyk, A.: Extreme re-balancing for SVMs: a Case Study. ACM SIGKDD Explorations Newsletter 6, 60–69 (2004) 17. Tao, Q., Wu, G.W., Wang, F.Y., Wang, J.: Posterior Probability Support Vector Machines for Unbalanced Data. IEEE Transactions on Neural Networks 16, 1561–1573 (2005)
A Biologically Plausible Winner-Takes-All Architecture Sebastian Handrich, Andreas Herzog, Andreas Wolf, and Christoph S. Herrmann Otto-von-Guericke-University Magdeburg Institute for Psychology II Department for Biological Psychology Universit¨atsplatz 2 39106 Magdeburg Germany
[email protected] Abstract. Winner-takes-all (WTA) is an important mechanism in artificial and biological neural networks. We present a biologically plausible two layer WTA architecture with biologically plausible spiking neuron model and conductance based synapses. The excitatory neurons in the WTA layer receive spiking signals from an input layer and can inhibit other excitatory WTA neurons via related inhibitory neurons. The connections from the input layer to WTA layer can be trained by Spike-Time-Dependent Plasticity to discriminate between different classes of input patters. The overall input of the WTA neurons are controlled by synaptic scaling.
1 Introduction For some purposes, it is desirable to have exactly one out of a number of neurons fire while others remain silent, e.g. for binary classifications. The so called winner-takes-all (WTA) architecture ensures this behavior. The WTA mechanism occurs in biological neural systems and has become an important part of artificial neural networks. Due to variations of input or variations in the first processing steps (input accumulation, different weights) a single neuron is activated, while all other neurons or populations are inactive. In a more liberal version, called k-winner-take-all, a limited number (k) of neurons are activated while the others remain inactive [1]. The WTA mechanism is the basis of many algorithms for perceptual decision making [2], selection of attention [3], and pattern classification. In the field of classical artificial neural networks, WTA architectures are widely used in unsupervised learning, e.g. in self-organizing-maps [4], and as post-processing for supervised learning (classifiers). In these cases, the WTA mechanism can easily be realized via a formal maximum search. In more complex networks, the WTA behavior can be an intrinsic network property, implemented by recurrent connections, like lateral inhibition. Such networks can be used to sharpen inputs in visual systems [5] and auditory systems [6]. WTA mechanisms have also been realized by spiking networks with integrate and fire neurons. Oster and Liu show a hardware implementation of 64 integrate and fire neurons
This work was supported by the DFG HE 3353/6-1 Forschungsverbund ’Electrophysiological Correlates of Memory and their Generation’, SFB 779, and by BMBF Bernstein-group ’Components of cognition: small networks to flexible rules’.
D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 315–326, 2009. c Springer-Verlag Berlin Heidelberg 2009
316
S. Handrich et al.
in an analog VLSI array [7] with instantaneous non-conductance based synapses. The recurrently connected neuron array works like a filter. Only the neuron with the highest input firing rate passes its activity onto the next layer. However, when implementing WTA architectures in more complex networks, a number of problems arise. If we consider the output of a neuron as a mean spiking rate, the dynamic of such WTA networks can be analyzed analytically [8,9]. Mao and Massaquoui [9] investigate the WTA dynamics with lateral inhibition in an abstract model of only inhibitory neurons under the assumption that all output connections of an inhibitory neuron have the same weight. They found that a strong lateral inhibition improves the contrast between winner and losers, but it does not guarantee the neuron with the largest input to win, because of the possible existence of multiple stable equilibria. In contrast, a weak lateral inhibition may guarantee the uniqueness of an equilibrium but has relative poor WTA behavior. An additional aspect is the dynamics of the inhibition. Ermentrout [10] discusses the complex dynamics in WTA networks with slow inhibition. The neurons can synchronize to a global periodic solution, which are stable under certain conditions but they can also start to oscillate. Taken together, there are two major problems, if we realize a WTA with lateral inhibition based on biologically plausible neurons and synaptic connections: (I) there is no guarantee that one of the excitatory neurons will fire during an input, and (II) the delay between the firing of the first excitatory neuron and the time inhibition requires to shut down other neurons lead up to the problem that an other excitatory neuron may fire during the delay [11]. A possibility to overcome the delay problem is given in [12]. A part of the inhibitory neurons gets a direct input parallel to the excitatory neurons and reacts to an input signal without the delay of the excitatory to inhibitory connection. This works fine on predefined connection weights, but it is unclear how the parallel learning of the parallel connections to inhibitory and excitatory can be performed. In this paper, we introduce biologically plausible WTA (bpWTA) in a network of spiking neurons based on the model of Izhikevich [13]. The network consists of an input layer and a bpWTA output layer (see Methods). When training networks by Spike-TimeDependent Plasticity (STDP) [14], bpWTA is a prerequisite when reward is supposed to modulate STDP learning [15]. In recurrent networks STDP can synchronize or desynchronize the activity, depending of the network connectivity and additional plasticity mechanisms [16]. In the presented paper we train feed forward connections from input layer to WTA layer and have to ensure that there is a relevant number of nearly coincident spikes on the pre- and postsynaptic side of the WTA output layer neurons.
2 Methods 2.1 Network Configuration Our network architecture consists of two different layers, the input- and the WTA-layer (see Fig. 1). The input layer is used to receive the external stimulus as described below. It is composed of 64 excitatory neurons, which are not laterally connected. The WTA layer consists of two excitatory neurons, which on the one hand integrate their inputs
A Biologically Plausible Winner-Takes-All Architecture
317
I E
E
I
WTA layer
input layer
Fig. 1. Network structure of the proposed WTA architecture. The input layer is activated by stimuli (filled circles). Within the WTA layer, the two excitatory neurons (E) inhibit each other via an inhibitory neuron (I).
received from the input layer, and on the other hand inhibit each other via two inhibitory neurons. This results in a binary categorization after some time which allows to compute an error signal for a possible subsequent learning algorithm. Each input neuron i is connected to each excitatory WTA neuron j, with initial synaptic weights cij = r2 /64, where r is randomly chosen from the uniform distribution within the interval of [0, 0.5]. In the WTA layer all synaptic weights are set to cW T A = 1. There are two types of input to our network. The first one is the stimulation of the input layer defined by a grid of 8x8 sensitive areas that can be stimulated by the input patterns. Those patterns, if considered as images, show randomly distributed noise superimposed onto vertical or horizontal bars (Fig. 5). Each sensitive area can excite the corresponding input neuron by a spike train (firing-rate fs = 50Hz), which is either uniformly- (synchronous input) or Poisson-distributed. The two patterns are presented alternately for one second with an inter-stimulus-interval (ISI) of one second. Additionally, each neuron receives thalamic background activity, which is generated by a Poisson-point process with a mean firing rate of λ = 1Hz. 2.2 Neuron Model We used the neuron model by Izhikevich [13], which reduces the biophysically accurate Hodgkin-Huxley model to a two-dimensional system of ordinary differential equations. The advantages of this model are the fast computation and the configuration of
318
S. Handrich et al.
different behaviors via a few parameters. So, it is widely used e.g. to apply simulations of network development [17]. The neurons of the input layer and the inhibitory ones of the WTA-layer were modeled by: v˙ = 0.04v 2 + 5v + 140 − u − Isyn , u˙ = a (bv − u) ,
(1)
where v is the membrane potential, u the recovery variable and Isyn is the total synaptic current. The parameter a determines the decay of the recovery-variable and b defines the sensibility of u to the membrane potential. If v reaches a threshold of 30 mV, v and u are reset to c and d, respectively (see [13] for more detail). The dimensionless model parameters a, b, c, d are used to generate typical firing patterns, observed in neocortical neurons. To implement an integrator characteristic, we adapt the constants for the excitatory neurons of the WTA-layer [18]: v˙ = 0.04v 2 + 4.1v + 108 − u − Isyn , u˙ = a (bv − u) .
(2)
The standard time step is 0.1 ms. 2.3 Neuron Types In our study, we used three different types of neurons: 1. Glutamate (excitatory) neurons with, according to the simulations in [19]: (a, b) = (0.02, 0.2) and (c, d) = (−65, 8) + (15, −6) r2 , where r is randomly selected from the uniform distribution in the interval [0, 0.5] to obtain a behavior between regular spiking (RS, r = 0) and intrinsically bursting (IB, r = 0.5). The square of r biases the distribution towards the RS cells. 2. GABAergic (inhibitory) neurons: (a, b) = (0.02, 0.25) and (c, d) = (−65, 2), to get low-threshold spiking (LTS) neurons. 3. Input-integrating(excitatory) neurons: (a, b) = (0.02, −0.1) and (c, d) = (−55, 6), used in the WTA-layer (Fig 2). 2.4 Synapse Model The synaptic input current in a neuron is calculated in each time step by: Isyn = gAMP A (v − 0) [(v + 80) /60]2 (v − 0) 1 + [(v + 80) /60]2 +gGABAA (v + 70) +gN MDA
+gGABAB (v + 90) ,
(3)
A Biologically Plausible Winner-Takes-All Architecture
319
i(t)
v(t)
Fig. 2. Excitatory neurons of the WTA-Layer are parametrized as integrator. A slow spike train (upper trace) from a single input neuron excites the membrane potential (lower trace) and results in an action potential only after some input spikes have been integrated.
where gk is the time dependent synaptic conductance and v the actual membrane potential. The conductances change by first-order linear kinetics g˙ k = −
gk , τk
(4)
with time constants τk = 5, 150, 6 and 150 ms for the simulated AM P A, N M DA, GABAA and GABAB receptors, respectively [19]. The rise time of currents is typically short and neglected. If a spike is transmitted from presynaptic neuron i to postsynaptic neuron j, after a delay-time tDelay , the conductances are updated depending on the type of presynaptic neuron, the synaptic efficiency Ri · wi and the synaptic weight cij : gk ← gk + cij ,
(5)
The transmission-delay tDelay results from the Euclidean distance between the neurons and an additional latency of 0.5 ms. It is set in the WTA layer (all connections) to 1 ms, and from the input layer to WTA layer to 10 ms. The relation of AM P A to N M DA channels and GABAA to GABAB channels is set to one. 2.5 Synaptic Plasticity Learning in neural circuits is considered as a change of synaptic strengths. To implement this, we used a well known form of hebbian learning, the Spike-Time-Dependent
320
S. Handrich et al.
Plasticity (STDP) [14], in which the temporal order of presynaptic and postsynaptic spike determines, whether a synapse is potentiated (LTP) or weakened (LTD): ⎧ ⎨A e −|Δt| τ+ (1 − cij ) − A10+ if Δt > 0 + Δcij = , (6) −|Δt| ⎩A e τ− c if Δt ≤ 0 −
ij
with A+ = 0.005, A− = 0.009, τ+ = 15ms and τ− = 20ms. This fullfills the equation A+ τ+ ≤ A− τ− , which ensures, that uncorrelated pre- and postsynaptic spikes lead to an overall synaptic weakening [20]. The term A10+ causes LTD, if the temporal difference between pre- and postsynaptic spike is too large, i.d. there is a postsynaptic spike without a preceding presynaptic one. The use of plastic synapses may end in an unconstrained feedback-cycle due to the fact, that correlated pre- and postsynaptic firing leads to LTP which again increases that correlation [21]. In biological neural networks this stability problem is avoided through a mechanism called homeostatic-synaptic-scaling, which changes the number of AMPA and NMDA receptors of the postsynaptic neuron in dependence of its activity [22]. This leads to a competitive scaling of the afferent synapses. In our model we achieved this n competitive scaling by keeping the cumulative weight j cij of the synapses leading onto the same excitatory WTA-neuron i at a constant level c0 : c0 cij ← cij · n ∀i. j cij
(7)
The STDP mechanism and the synaptic scaling interact dynamically. During the stimulation of the network with input patterns, the STDP increased the synaptic weights between stimulated neurons in the input layer and the winner neuron. The synaptic scaling prevents the winner neuron from excessive firing and down-regulates the weights of unused connections.
3 Results The goal of the proposed WTA mechanism is the generation of an error signal for a learning mechanism. The learning should be performed by the Spike-Time Depended Plasticity (STDP) [14]. Only if a fast computation from the onset of a stimulus to the decision of the winner is achieved, it can be guaranteed that enough pairs of pre- and postsynaptic spikes can be computed at the winner neuron. The weights cij of the connections from the 64 input neurons to the two excitatory WTA neurons are initialized randomly. The input patterns stimulate both excitatory WTA neurons. Due to the different weights and different stimulation, there are different input currents to the excitatory WTA neurons. This is demonstrated in Fig. 3 (upper trace) by showing the integrated input currents received by each excitatory WTA neuron (reseted to zero on each stimulus onset). The excitatory WTA neuron with the highest input fires first (e.g in Fig. 3 first trial, neuron E2), the related inhibitory neuron (I2) is activated with a short transmission delay and inhibits the opposite excitatory neuron (E1). During the presentation of the
A Biologically Plausible Winner-Takes-All Architecture
321
Ȉ I2(t) Ȉ I1(t)
Ȉ I1(t) Ȉ I2(t) v
h
v
h
vE1(t)
vI1(t)
vE2(t)
vI2(t) Fig. 3. The WTA mechanism. Upper traces show the integrated input current of excitatory WTA neurons in arbitrary units (AU) (reset to zero at each stimulus onset). The black bars indicate the stimulation with a vertical (v) or horizontal (h) bar. Other traces show the membrane potential of the WTA neurons. Note that only one excitatory neuron (E) is active at a time, i.e. the WTA requirement is fulfilled.
first input pattern, the neurons E1/I1 fire and the neurons E2/I2 are silent. This is the required WTA behavior. Thus, in each trial only one of the excitatory WTA neurons is activated. Therefore, there is a simultaneous activation of the input neurons, which are stimulated by the specific input pattern, and the winner of the excitatory WTA neurons resulting in an adaptation of the weights by STDP. While the input stimulation continues, the other neuron has no chance to fire until the input stops or the input pattern changes. There is a chance that both excitatory WTA neurons are stimulated nearly identically, so that they fire nearly synchronously. The feedback delay from the first firing of the winner neuron to an inhibitory postsynaptic potential (IPSP) in the other neuron is approximately 2.5 ms. If the delay between the firing of the first and second excitatory neuron is smaller than the inhibition feedback, both WTA neurons fire a short burst and potentially inhibit each other after that. However, such synchronous activation is unlikely and happens only on the first trials after a random initialization of the weights. Due to the noisy input patterns, a sequence with more than one coincident activation of both excitatory WTA-neurons is extreme unlikely, and one single bad trial has not much influence on STDP learning.
322
S. Handrich et al.
PWTA [%]
CI
Fig. 4. Performance of the WTA network on low input contrasts. The network is untrained, the weights are randomly initialized.
In order to quantify the performance of our WTA architecture, we consider its response in relation to the input contrast. The WTA response in each trial is defined as relative number of spikes of the winner neuron: nHI PW T A = , (8) nHI + nLI where nHI and nLI are the number of spikes generated by the WTA neurons, which received the higher and lower input respectively. The input contrast CI is defined as CI =
Imax − Imin , Imax + Imin
(9)
where Imin is the minimal and Imax is the maximal cumulative current which is received by the WTA neurons. The cumulative current of each WTA neuron is integrated from the stimulus onset t0 to the firing time tw0 of the first spike of the winner neuron: tw0 I= i(t). (10) t0
Figure 4 depicts the performance PW T A of an untrained WTA network (weights randomly initialized) in dependence of the input contrast CI . Contrast is divided into bins (width: 0.015 units). Performance of all trials (N=2700) in each bin is averaged. With rising contrast, the probability rapidly increases that the neuron, which received a higher input, fires more frequently. If the contrast is larger than 1.15, in every trial only the winner neuron is activated. Even if there is a very low input contrast (first bin from 1.0 to 1.015) there is a preference to the correct neuron. Thus, if STDP mechanism is applied, the synapses onto the correct WTA neuron are strengthened. This again leads to a higher input contrast.
A Biologically Plausible Winner-Takes-All Architecture
323
horizontal
vertical
vE1(t)
30 mV
vE2(t)
300 ms
Fig. 5. Network response after training. Vertical and horizontal patters are discriminated correctly by activation of either E1 (vertical) or E2 (horizontal).
t=0min
t=20min with synaptic scaling
t=20min no synaptic scaling
WTA Neuron 1
WTA Neuron 2
Fig. 6. Weight matrix of excitatory WTA neurons. Left: random initialization. Middle and right: after 20 min training, with synaptic scaling (middle), without synaptic scaling (right).
324
S. Handrich et al.
Fig. 5 shows the behavior of the network after 20 minutes of learning. Vertical patterns activate neuron E1 and horizontal patterns activate E2. The mapping is not fixed but depends on initialization of weights and the noise of the input pattern. During the training the weights are adapted by the STDP rule. The matrix of input weights for each excitatory WTA neuron reflects the input pattern (Fig 6, left and middle column). However, the unsupervised classification of overlapping and noisy patterns are not unambiguous. There are two general problems. The first is the generalization: If the two patterns can be discriminated during the training by only one single pixel, it may be possible that the final classification is based only on this single pixel and only one very hight synaptic weight. The second problem is, that the overlapping area itself can be interpreted as an additional pattern (Fig 6, right column). Both problems can be solved by an additional synaptic mechanism, the synaptic scaling (eq. 7).
4 Discussion and Conclusion We show a WTA architecture with a biologically plausible neuron model and conductance based synaptic connections. The input and output signals to the neurons in the WTA layer in our model are spikes (events). So we have to define the winner in an event driven environment. Many models use as input a value, which is considered as a mean firing rate of a population or single cells (see e.g. [11,23,10]) and the winner is defined as the neuron or population with the highest mean firing rate. Thus, on short intervals in the vicinity of the interspike distance, a mean firing rate can not be defined, although some models, like the Wilson-Cowan oscillator, suggest there are. We have designed a biologically plausible model with spikes to compare it with biological EEG experiments via time frequency representations [24]. Therefore, we need a spiking implementation and have used STDP as learning mechanism. The selected integrator characteristic of the WTA neurons is biologically plausible and combines the mean firing rate of earlier spikes with a fast computation of present spikes (see Fig. 2). This means the winner is the excitatory WTA neuron which receives the most input from many selective areas within an integration time. The integration time depends on synaptic components (NMDA time constant) and neuron components (integrator). An additional point is the design of the synaptic transmission and the synaptic plasticity. All firing rate models and simpler spiking models use a strict superposition of the input signals. The values of input are weighted and integrated as continuous values (firing rate) or current pulses (spikes). Real biological systems are nonlinear. We use conductance based synapses which take the actual membrane potential into account. The effect of a spike depends on the synaptic time constants, the actual membrane potential, and the reversal potential of the synapses. In contrast to other model studies, we did not use a single global inhibitor neuron and no self-excitation of excitatory WTA neurons. Instead, every excitatory WTA neuron has its own inhibitory neuron. Authors of global inhibition argument that there is a higher number of excitatory than inhibitory neurons in the cortex [11]. Hence, some excitatory neurons have to share one inhibitory neuron. This may be correct for the entire cortex, but circuits with WTA mechanisms may be very small parts and it is possible
A Biologically Plausible Winner-Takes-All Architecture
325
that for those small parts a one to one relation can be assumed. If we do not consider single neurons in WTA, but small populations of neurons, one population of excitatory neurons can be connected to one (smaller) population of inhibitory neurons [25]. However, a global inhibition requires a self excitation of the winner neuron to prevent it from inhibition. The parametrization of such recurrent networks are a difficult optimization problem with complex dynamics to prevent unstable behavior. It can be solved for a larger number of neurons by estimating the weights via a mean field model. Based on the working memory simulations of Brunel and Wang [26], Zylberberg et al. [27] present a network model of 2000 neurons with a global inhibition and self excitation of input selective excitatory populations which can work as a WTA network. The network provides load and retrieval mechanisms with a top-down control for two different inputs. But if we assume a larger number of input sources (e.g. the 8x8 input layer), which cause a higher input variability and a large number of weights for each of the excitatory WTA neurons, it seems more useful to use single neurons in the WTA layer instead of large populations. The input weights of each excitatory WTA neuron directly reflect the discriminated input pattern. We show that a WTA network consisting only of an input layer and a WTA layer can be trained in an unsupervised fashion to discriminate simple patterns with small areas of overlap. This network can be extended to different tasks. With one or more hidden layers it should be possible to discriminate more complex (spatiotemporal) input patterns, e.g. with more overlap. The hidden layer transforms the input pattern into another coordinate space. An unsupervised training of such a multilayer network by STDP requires a coincident activation of the WTA output neuron with the input neurons which can be a problem when input patterns are presented briefly. In [15] we show a network architecture with an additional feedback layer to extend the firing of the input layer after the end of external stimulation resembling the function of the hippocampus. The presented WTA architecture can also be applied for supervised learning by an external activation of the (winner) excitatory WTA neuron or activation of the opposite (non winner) inhibitory WTA neuron. It gives the chance to integrate supervised and unsupervised learning algorithm into a single mechanism (STDP) in one network architecture.
References 1. Yuille, A.L., Geiger, D.: Winner-take-all networks. In: Arbib, M.A. (ed.) The Handbook of Brain Therory and Neural Networks, 2nd edn., pp. 1228–1231. MIT Press, Cambridge (2003) 2. Salzman, C.D., Newsome, W.T.: Neural mechanisms for forming a perceptual decision. Science 264(5156), 231–237 (1994) 3. Lee, D.K., Itti, L., Koch, C., Braun, J.: Attention activates winner-take-all competition among visual filters. Nat. Neurosci. 2(4), 375–381 1097-6256 (Print) Journal Article (1999) 4. Kohonen, T.: Self-organizing maps. Springer, Heidelberg (2001) 5. Blakemore, C., Carpenter, R.H., Georgeson, M.A.: Lateral inhibition between orientation detectors in the human visual system. Nature 228(5266), 37–39 (1970) 6. Wu, G.K., Arbuckle, R., hua Liu, B., Tao, H.W., Zhang, L.I.: Lateral sharpening of cortical frequency tuning by approximately balanced inhibition. Neuron 58(1), 132–143 (2008)
326
S. Handrich et al.
7. Oster, M., Liu, S.: Spiking inputs to a winner-take-all network. In: Weiss, Y., Sch¨olkopf, B., Platt, J. (eds.) Advances in Neural Information Processing Systems, vol. 18, pp. 1051–1058. MIT Press, Cambridge (2005) 8. Maass, W.: On the computational power of winner-take-all. Neural Computation 12(11), 2519–2535 (2000) 9. Mao, Z.H., Massaquoi, S.: Dynamics of winner-take-all competition in recurrent neural networks with lateral inhibition 18(1), 55–69 (2007) 10. Ermentrout, B.: Complex dynamics in winner-take-all neural nets with slow inhibition. Neural Networks 5, 415–431 (1992) 11. Coultrip, R., Granger, R., Lynch, G.: A cortical model of winner-take-all competition via lateral inhibition. Neural Networks 5(1), 47–54 (1992) 12. Jin, D.Z.: Spiking neural network for recognizing spatiotemporal sequences of spikes. Phys. Rev. E 69(2), 021905 (2004) 13. Izhikevich, E.M.: Simple model of spiking neurons. IEEE Transactions on neural networks 14(6), 1569–1572 (2003) 14. Bi, G.Q., Poo, M.M.: Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J. Neurosci. 18(24), 10464–10472 (1998) 15. Handrich, S., Herzog, A., Wolf, A., Herrmann, C.S.: Prerequisites for integrating unsupervised and reinforcement learning in a single network of spiking neurons. In: IJCNN (2009) (in press) 16. Kube, K., Herzog, A., Michaelis, B., de Lima, A., Voigt, T.: Spike-timing-dependent plasticity in small world networks. Neurocomputing 71, 1694–1704 (2008) 17. Herzog, A., Kube, K., Michaelis, B., de Lima, A.Baltz, T., Voigt, T.: Contribution of the gaba shift to the transition from structural initializationto working stage in biologically realistic networks. Neurocomputing, 1134–1142 (2008) 18. Izhikevich, E.M.: Dynamical Systems in Neuroscience: The Geometry of Excitability and Bursting. The MIT Press, Cambridge (2006) 19. Izhikevich, E.M., Gally, J.A., Edelman, G.M.: Spike-timing dynamics of neuronal groups. Cerebral Cortex 14, 933–944 (2004) 20. Song, S., Miller, K., Abbott, L.: Competitive hebbian learning through spike-timingdependent synaptic plasticity. Nat. Neurosci. 3(9), 919–926 (2000) 21. Abbott, L., Nelson, S.: Synaptic plasticity: taming the beast. Nat. Neurosci. 3(suppl.), 1178–1183 (2000) 22. Turrigiano, G.: The self-tuning neuron: synaptic scaling of excitatory synapses. Cell 135(3), 422–435 (2008) 23. Brandt, S., Wessel, R.: Winner-take-all selection in a neural system with delayed feedback. Biol. Cybern. 97(3), 221–228 (2007) 24. Fr¨und, I., Schadow, J., Busch, N., Naue, N., K¨orner, U., Herrmann, C.: Anticipation of natural stimuli modulates eeg dynamics: physiology and simulation. Cogn. Neurodyn. 2, 89–100 (2008) 25. Lumer, E.D.: Effects of spike timing on winner-take-all competition in model cortical circuits. Neural Comput. 12(1), 181–194 (2000) 26. Brunel, N., Wang, X.: Effects of neuromodulation in a cortical network model of object working memory dominated by recurrent inhibition. J. Comput. Neurosci. 11(1), 63–85 (2001) 27. Zylberberg, A., Dehaene, S., Mindlin, G., Sigman, M.: Neurophysiological bases of exponential sensory decay and top-down memory retrieval: a model. Front Comput. Neurosci. 3, 4 (2009)
Minimum Sum-of-Squares Clustering by DC Programming and DCA Le Thi Hoai An1 and Pham Dinh Tao2 1
Laboratory of Theoretical and Applied Computer Science UFR MIM, University of Paul Verlaine - Metz, Ile du Saulcy, 57045 Metz, France 2 Laboratory of Modelling, Optimization & Operations Research, National Institute for Applied Sciences - Rouen, BP 08, Place Emile Blondel F 76131 Mont Saint Aignan Cedex, France
[email protected],
[email protected] Abstract. In this paper, we propose a new approach based on DC (Difference of Convex functions) programming and DCA (DC Algorithm) to perform clustering via minimum sum-of-squares Euclidean distance. The so called Minimum Sum-of-Squares Clustering (MSSC in short) is first formulated in the form of a hard combinatorial optimization problem. It is afterwards recast as a (continuous) DC program with the help of exact penalty in DC programming. A DCA scheme is then investigated. The related DCA is original and very inexpensive because it amounts to computing, at each iteration, the projection of points onto a simplex and/or onto a ball, that all are given in the explicit form. Numerical results on real word data sets show the efficiency of DCA and its great superiority with respect to K-means, a standard method of clustering. Keywords: clustering, MSSC, Combinatorial optimization, DC programming, DCA, Nonsmooth nonconvex programming, Exact penalty.
1
Introduction
Clustering, which aims at dividing a data set into groups or clusters containing similar data, is a fundamental problem in unsupervised learning and has many applications in various domains. In recent years, there has been significant interest in developing clustering algorithms to massive data sets (see e.g. [2] - [11], [17], [18], [23], [24] and the references therein). Two main approaches have been used for clustering: statistical and machine learning based on mixture models (see e.g. [2], [6]) and the mathematical programming approach that considers clustering as an optimization problem (see e.g. [3], [4], [7], [17], [18], [22], [26], [28], [32] and the references therein). The general term “clustering” covers many different types of problems. All consist of subdividing a data set into groups of similar elements, but there are many measures of similarity, many ways of measuring, and various concepts of subdivision (see [11] for a survey). Among these criteria, a widely used one is the minimum sum of squared Euclidean distances from each entity to the centroid of D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 327–340, 2009. c Springer-Verlag Berlin Heidelberg 2009
328
H.A. Le Thi and T. Pham Dinh
its cluster, or minimum sum-of-squares (MSSC) for short, which expresses both homogeneity and separation. The MSSC problem consists in partitioning a given set of n entities into c clusters in order to minimize the sum of squared distances from the entities to the centroid of their cluster. This problem may be formulated mathematically in several ways, which suggest different possible algorithms. The two most widely used models are a bilevel programming problem and a mixed integer program. Many early studies were focused on the well-known K-means algorithm ([8],[10]) and its variants (see [29] for a survey). More recently, several works in optimization approaches have been developed for the MSSC with the mixed integer programming formulation (see e.g. [4], [24], [26], [32]). There are some attempts to solve this problem exactly through the branch and bound or cutting methods via reformulation-linearization techniques, 0-1 SDP formulation. However, these methods are not available for massive data sets (large number of samples and large number of features). We investigate in this paper an efficient nonconvex programming approach for the mixed integer programming formulation of the MSSC problem. Our method is based on DC (Difference of Convex functions) programming and DCA (DC Algorithms) that were introduced by Pham Dinh Tao in a preliminary form in 1985. They have been extensively developed since 1994 by Le Thi Hoai An and Pham Dinh Tao (see [13], [14], [15], [27] and the references therein). They are classic now, and used by many researchers (see e.g. [20] , [21], [25], [30], [31]). Our walk is motivated by the fact that DCA has been successfully applied to many (smooth or nonsmooth) large-scale nonconvex programs in various domains of applied sciences (see [13] - [21], [25], [27], [30] - [33] and the references therein), in particular in Machine Learning ([17] - [21], [25], [30] - [33]) for which they provide quite often a global solution and proved to be more robust and efficient than the standard methods. In [17] an efficient DCA scheme has been developed to MSSC problem with a bilevel programming formulation. The purpose of this paper is to demonstrate that, as shown for previous studies in Machine Learning, DCA is a promising approach for the MSSC problem with a mixed integer programming formulation. Using an exact penalty result in DC programming we reformulate the considered MSSC problem in the form of a DC program. A so-called DC program is that of minimizing a DC function f = g − h (with g and h being convex functions) over a convex set. The construction of DCA involves the convex DC components g and h but not the DC function f itself. Moreover, a DC function f has infinitely many DC decompositions g − h which have a crucial impact on the qualities (speed of convergence, robustness, efficiency, globality of computed solutions,...) of DCA. We propose in this work a nice DC program for a continuous formulation of the MSSC problem and develop the corresponding DCA scheme. We show that the effect of the DC decomposition is meaningful. Experimental results on several biomedical data sets show clearly the effectiveness of the proposed algorithms and their superiority with respect to the standard K-means algorithm in both running-time and accuracy of solutions.
Minimum Sum-of-Squares Clustering by DC Programming and DCA
329
The remainder of the paper is organized as follows. Section 2 is devoted to the mathematical formulation of the MSSC problem and a continuous reformulation via an exact penalty technique. DC programming and DCA for solving the resulting MSSC problem are developed in Section 3. Finally, numerical experiments are reported in Section 4.
2
Mathematical Formulations of the MSSC Problem
2.1
Mixed Integer Program
Let X := {x1 , x2 , ..., xn } denote n entities to be partitioned into c (2 ≤ c ≤ n) homogeneous clusters C1 , C2 , ..., Cc where xk ∈ IRd (k = 1, ..., n) represents a multidimensional data vector. Let U = (ui,k ) ∈ IRc×n with i = 1, . . . , c and k = 1, . . . , n be the matrix defined by: 1 if xk ∈ Ci ui,k := . 0 otherwise Then a straightforward formulation of MSSC is ⎧ n c ⎪ ⎪ min f (U, V ) := ui,k ||xk − vi ||2 ⎪ ⎪ ⎨ k=1 i=1 s.t. ui,k ∈ {0, 1} for i = 1, .., c k = 1, .., n ⎪ c ⎪ ⎪ ⎪ ⎩ ui,k = 1, k = 1, .., n
.
(1)
i=1
where . is, in this paper, the Euclidean norm in IRd , and V the (c × d) - matrix whose ith row is vi ∈ IRd , the center of Ci . In the sequel to simplify related computations we will work on the vector space IRc×n ×IRc×d of real matrices. The variables are then (U, V ) ∈ IRc×n × IRc×d , where U ∈ IRc×n whose k th c×d column is denoted U k and V ∈ IR whose ith row is denoted Vi or vi (vi is a d row-vector in IR ). The last constraint ensures that each point xk is assigned to one and only one group. This is a mixed integer program where the objective function is nonconvex. It has recently been shown in [1] that the problem is NP-hard. Moreover, it is a global optimization problem with possibly many local minima. In real applications this is a very large scale problem (high dimension and large data set, i.e. d and n are very large), that is why global optimization approaches such as Branch & Bound, Cutting algorithms can’t be used. Most of the methods for the model (1) are pure heuristics that can only locate a “good” local solution. nIn some previous works , by rewriting the center vi of the cluster Ci as vi = ui,l xl l=1 n ui,l one considered the following problem where only U is variable: l=1
min
n c k=1 i=1
2 n c
ui,l xl
l=1
ui,k xk − n : ui,k ∈ {0, 1} , ui,k = 1, k = 1, .., n . ui,l
l=1
i=1
330
H.A. Le Thi and T. Pham Dinh
In this work we maintain the model (1) because it is quite convenient for using DCA: (1) is a nice DC program having an appropriate DC decomposition to which the corresponding DCA is simple and inexpensive. Moreover DCA computes separately U and V , so the use of both variables U and V do not increase actually the complexity of DCA. For applying DCA we reformulate (1) as a continuous optimization problem. First of all, since ui,k ∈ {0, 1} we can replace ui,k by u2i,k and rewritte the objective function of (1) by f (U, V ) :=
n c
u2i,k ||xk − vi ||2 .
k=1 i=1
We will see on the next that using u2i,k instead to ui,k is uselful for getting a good DC decomposition and the resulting DCA is interesting. 2.2
A Continuous Reformulation
Our reformulation technique is based on the following results developed in [16] Lemma 1. ([16]) Let K be a nonempty bounded polyhedral convex set, f be a DC function on K and p be a nonnegative concave function on K. Then there exists t0 ≥ 0 such that for all t > t0 the following problems have the same optimal value and the same solution set: (P ) α = inf{f (x) : x ∈ K, p(x) ≤ 0} (Pt ) α(t) = inf{f (x) + tp(x) : x ∈ K}. For applying this result in the reformulation of (1) we first show that f (U, V ) is a DC function. Using the equation 2f1 f2 = (f1 + f2 )2 − (f12 + f22 ) we can express f (U, V ) as f (U, V ) = =
n c
u2i,k xk − vi 2 k=1 i=1 n c 1 u2i,k + xk − 2 k=1 i=1
vi 2
2
−
1 2
u4i,k + xk − vi 4 .
Hence the following DC decomposition of f (U, V ) seems to be natural: f (U, V ) := G1 (U, V ) − H1 (U, V ) where
2 1 2 ui,k + xk − vi 2 2 i=1
(3)
1 4 ui,k + xk − vi 4 2 i=1
(4)
n
G1 (U, V ) =
(2)
c
k=1
and
n
H1 (U, V ) =
k=1
are clearly convex functions.
c
Minimum Sum-of-Squares Clustering by DC Programming and DCA
331
In the problem (1) the variable U is a priori bounded in IRc×n . One can also find a constraint for bound the variable V . Indeed, let xk,j be the j th component, j = 1, ..., d, of the vector xk and let αj :=
min xk,j , βj := max xk,j .
k=1,...,n
k=1,...,n
d Hence vi ∈ Ti := Πj=1 [αj , βj ] for all i = 1, ..., c. For each k ∈ {1, ..., n} let Δk be the (c − 1)-simplex in IRc defined by c k c Δk := U := (ui,k )i ∈ [0, 1] : ui,k = 1 i=1 n c and Δ := Πk=1 Δk , T := Πi=1 Ti . The problem (1) can be rewritten as: min f (U, V ) : U ∈ Δ ∩ {0, 1}c×n , V ∈ T .
(5)
Consider the function p defined on IRc×n by p(U ) :=
c n
ui,k (1 − ui,k ).
i=1 k=1
Clearly that p is finite concave on IRc×n , nonnegative on Δ, and c×n
Δ ∩ {0, 1}
:=
ui,k ∈ {0, 1}
c×n
:
c
ui,k = 1
= {U ∈ Δ : p(U ) = 0} = {U ∈ Δ : p(U ) ≤ 0}.
i=1
Using the Lemma above we can now write (5) in the form of the following nonconvex program in continuous variables: min {F (U, V ) := f (U, V ) + tp(U ) : (U, V ) ∈ Δ × T } ,
(6)
where t > t0 is called penalty parameter. In the sequel we will consider the MSSC problem via the continuous formulation (6) with t being sufficiently large. We will develop DC programming and DCA for solving (6).
3
DC Programming and DCA for Solving MSSC Problem
For facilitate the readers we give below a brief introduction to DC programming and DCA. 3.1
Outline of DC Programming and DCA
DC programming and DCA constitute the backbone of smooth/nonsmooth nonconvex programming and global optimization. They address the problem of minimizing a function f which is the difference of two convex functions on the whole
332
H.A. Le Thi and T. Pham Dinh
space IRd or on a convex set C ⊂ IRd . Generally speaking, a DC program is an optimisation problem of the form : α = inf{f (x) := g(x) − h(x) : x ∈ IRd }
(Pdc )
where g, h are lower semi-continuous proper convex functions on IRd . Such a function f is called a DC function, and g − h a DC decomposition of f while g and h are the DC components of f. The convex constraint x ∈ C can be incorporated in the objective function of (Pdc ) by using the indicator function on C denoted by χC which is defined by χC (x) = 0 if x ∈ C, and +∞ otherwise: inf{f (x) := g(x) − h(x) : x ∈ C } = inf{χC (x) + g(x) − h(x) : x ∈ IRd }. Let
g ∗ (y) := sup{ x, y − g(x) : x ∈ IRd }
be the conjugate function of a convex function g. Then, the following program is called the dual program of (Pdc ): αD = inf{h∗ (y) − g ∗ (y) : y ∈ IRd }.
(Ddc )
One can prove that α = αD , and there is the perfect symmetry between primal and dual DC programs: the dual to (Ddc ) is exactly (Pdc ). For a convex function θ, the subdifferential of θ at x0 ∈ dom θ := {x ∈ IRd : θ(x0 ) < +∞}, denoted by ∂θ(x0 ), is defined by ∂θ(x0 ) := {y ∈ IRn : θ(x) ≥ θ(x0 ) + x − x0 , y , ∀x ∈ IRd }.
(7)
The subdifferential ∂θ(x0 ) generalizes the derivative in the sense that θ is differentiable at x0 if and only if ∂θ(x0 ) ≡ { x θ(x0 )}. DCA is based on the local optimality conditions of (Pdc ), namely ∂h(x∗ ) ∩ ∂g(x∗ ) =∅
(8)
(such a point x∗ is called a critical point of g − h), and ∅ = ∂h(x∗ ) ⊂ ∂g(x∗ ).
(9)
Note that (9) is a necessary local optimality condition for (Pdc ). For many classes of the DC program, it is also a sufficient optimality condition (see [14], [15]). The idea of DCA is simple : each iteration l of DCA approximates the concave part −h by its affine majorization (that corresponds to taking y l ∈ ∂h(xl )) and minimizes the resulting convex function (that is equivalent to determining a point xl+1 ∈ ∂g ∗ (y l )). DCA scheme Initialization: Let x0 ∈ IRd be a best guess, 0 ← l. Repeat – Calculate y l ∈ ∂h(xl )
Minimum Sum-of-Squares Clustering by DC Programming and DCA
333
– Calculate xl+1 ∈ arg min{g(x) − h(xl ) − x − xl , y l : x ∈ IRd } (Pl ) – l+1← l Until convergence of xl . Note that (Pl ) is a convex optimisation problem and in so far "easy" to solve. Convergence properties of DCA and its theoretical basis can be found in ([13], [14], [15], [27]). For instance it is important to mention that – DCA is a descent method: the sequences {g(xl )−h(xl )} and {h∗ (y l )−g ∗ (y l )} are decreasing (without linesearch). – If the optimal value α of problem (Pdc ) is finite and the infinite sequences {xl } and {y l } are bounded, then every limit point x∗ (resp. y) of the sequence {xl } (resp. {y l }) is a critical point of g − h (resp. h∗ − g ∗ ), i.e. ∂h(x∗ ) ∩ ∂g(x∗ ) = ∅ (resp. ∂h∗ (y ∗ ) ∩ ∂g ∗ (y ∗ ) = ∅). – DCA has a linear convergence for DC programs. For a complete study of DC programming and DCA the reader is referred to [13], [14], [15], [27] and the references therein. The solution of a nonconvex program (Pdc ) by DCA must be composed of two stages: the search of an appropriate DC decomposition of f and that of a good initial point. We shall apply all these DC enhancement features to solve problem (1) with the help of some equivalent DC program given in the next section. We note that the convex concave procedure (CCCP) for constructing discrete time dynamical systems mentioned in [33] is nothing else than a special case of DCA. In the last five years DCA has been successfully applied in several studies in Machine Learning e.g., for SVM-based Feature Selection[19], [25], for improving boosting algorithms [12], for implementing-learning [20], [31], [21], for Transductive SVMs [30] and for unsupervised clustering [17], [18]. 3.2
DC Reformulation for the MSSC Model (6)
We first remark that, if f is a DC function with DC components G and H then the function F (U, V ) := f (U, V ) + tp(U ) is DC too with DC components G and H − tp (remember that p is concave function). Hence, the natural DC decomposition (2) of f involves the next DC decomposition of F : F (U, V ) := G1 (U, V ) − [H1 (U, V ) − tp(U )] .
(10)
However, from numerical point of views, the DCA scheme corresponding to this DC decomposition is not interesting because it requires an iterative algorithm for solving a convex program at each iteration. In an elegant way we introduce a nice DC reformulation of the problem (6) for which the resulting DCA is explicitly determined via a very simple formula. Firstly, we determine a ball of radius r and center 0 ∈ IRd containing necessarily the optimum centers vi . The necessary first order optimality conditions for (U, V ) imply that ∇V F (U, V ) = 0, i.e., ∂vi F (U, V ) =
n k=1
ui,k 2(vi − xk ), i = 1, ..., c, k = 1, ..., n
334
or vi
H.A. Le Thi and T. Pham Dinh n k=1
ui,k =
n
ui,k xk . On the other hand, the condition that cluster should
k=1
n
not be empty imposes that
ui,k > 0 for i = 1, ..., c. Hence
k=1 n
( 2
vi ≤
ui,k xk )2
k=1
(
n
ui,k
)2
≤
n
xk 2 := r2 .
k=1
k=1
Let Ri (i = 1, ..., c) be the Euclidean ball centered at the origin and of radius r c in IRd , and let C := Πi=1 Ri . We can reformulate the problem (6) as: min {F (U, V ) : (U, V ) ∈ Δ × C } .
(11)
A nice DC decomposition of F is inspired by the following result. Theorem 1. There exists ρ > 0 such that the function H(U, V ) :=
ρ (U, V )2 − f (U, V ) 2
(12)
is convex on Δ × C. Proof. Let us consider the function h : IR × IR → IR defined by h(x, y) = ρ2 x2 +
ρ 2 2n y
− x2 y 2 .
(13)
The Hessian of h is given by: J(x, y) =
ρ − 2y 2 −4xy
−4xy ρ 2 n − 2x
.
(14)
For all (x, y), 0 ≤ x ≤ 1, | y |≤ α, we have for the determinant of J(x, y) (denoted | J(x, y) |): | J(x, y) | = ρ − 2y2 ( nρ − 2x2) − 16x2 y 2 = n1 ρ2 − n2 y 2 + 2x2 ρ − 16x2 y 2 ≥ n1 ρ2 − n2 α2 + 2 ρ − 16α2 provided that α > 0 and ρ is larger than the upper zero of this quadratic function, i.e., for 2 16 1 1 2 2 2 ρ≥ n n α +1+ + nα . (15) n α +1 It is clear that ρ > 0. With ρ as large as in (15), the function h is convex on [0, 1] × [−α, α]. Therefore, for x → ui,k and y → xk − vi 2 , the functions θi,k (ui,k , vi ) :=
ρ 2 ρ u + xk − vi 2 − u2i,k xk − vi 2 2 i,k 2n
Minimum Sum-of-Squares Clustering by DC Programming and DCA
335
are convex on {0 ≤ ui,k ≤ 1, vi ≤ r} with ρ as in (15) and α = r + max xk .
(16)
1≤k≤n
As a consequence, the function hi,k defined by hi,k (ui,k , vi ) = θi,k (ui,k , vi ) +
ρ ρ 2 xk , vi − xk n 2n
is convex on (ui,k , vi ). Finally, since H(U, V ) :=
ρ 2
(U, V )2 − f (U, V ) =
n c k=1 i=1
hi,k (ui,k , vi ), the function H(U, V ) is convex on Δ × C with ρ as in (15) and α = r + max xk . 1≤k≤n
In the sequel, the function H is defined with a ρ satisfying the condition (15). According to the proposition above we can express our second DC decomposition of as follows: F (U, V ) := G2 (U, V ) − H2 (U, V )
(17)
with G2 (U, V ) :=
ρ 2 (U, V ) , H2 (U, V ) := H(U, V ) − tp(U ) 2
being clearly convex functions . Now, the optimisation problem (11) can be written as ρ 2 min χΔ×C (U, V ) + (U, V ) −H2 (U, V ) : (U, V ) ∈ (U, V ) ∈ IRc×n × IRc×d . 2 (18) 3.3
DCA Applied to (18)
For designing a DCA according the general DCA described above, we first need the computation of (Y l , Z l ) ∈ ∂H2 (U l , V l ) and then have to solve the convex program ρ 2 min (U, V ) − (U, V ), (Y l , Z l ) : (U, V ) ∈ Δ × C . (19) 2 The function H2 is differentiable and its gradient at the point (U l , V l ) is given by: (Y l , Z l ) = ∇H2 (U l , V l ) where
k=1,...n ρuli,k − 2uli,k xk − Vil 2 + 2tuli,k − t , i=1,...c n Z l = ρVil − 2 (Vil − xk )(uli,k )2 . Yl =
k=1
i=1,...c
(20)
336
H.A. Le Thi and T. Pham Dinh
The solution of the auxiliary problem (19) is explicitly computed as (Proj stands for the projection)
l+1 k k U = ProjΔk Y l k = 1, ...n, Vil+1 = Pr ojRi ρ1 (Z l )i i = 1, ...c. (21) The algorithm can be described as follows. DCA: DCA Applied to (18) Initialization: Choose the memberships U 0 and the cluster centers V 0 . Let > 0 be sufficiently small, 0 ← l. Repeat – Compute Y l and Z l via (20). – Define (U l+1 , V l+1 ) by setting: k l+1 k U = ProjΔk Y l for k = 1, ...n, (Z l ) i,. if (Z l )i,. ≤ ρr 1 l l+1 ρ Vi = Pr ojRi (Z )i = l (Z )i. r ρ otherwise l
, (i = 1, .., c).
(Z )i
– l+1← l Until (U l+1 , V l+1 ) − (U l , V l ) ≤ . Remark 1. The DC decomposition (17) is interesting because the resulting DCA is simple: each iteration of DCA consists of computations of the projection of points onto a simplex and/or onto a ball, that all are explicitly computed. So DCA do not require an iterative method at each iteration as in the DCA scheme applied to the fist DC decomposition (10).
4
An Interpretation of DCA: Why DCA Is Better than K-Means?
In the MSSC formulation (1) the variable U corresponds to the affectation of objects to clusters while the variable V stands for centers of clusters. The computation of U l+1 at iteration l of DCA can be interpreted as the affectation of objects to clusters with centers Vil , i = 1, ..., c (that are defined at iteration l − 1) while the calculation of V l+1 is nothing but the determination of the new center of clusters. There are actually some similarities between DCA and the K-means algorithm. However DCA enjoys two main advantages that might explain why DCA is better than K-means. i) Firstly, the sequence {(U l , V l )} in DCA is determined in the way that the sequence of objective function of (1), say {f (U l , V l )} (the cluster cost at iteration l), is decreasing. It means that each iteration of DCA improves certainly the clustering. ii) Secondly, in DCA U l+1 and V l+1 are separately but simultaneously computed in the way that U l+1 as well as V l+1 depend on both U l and V l , while K-means determines them alternatively and V l+1 depends on U l+1 . In other words DCA determines at the same time the clusters and their centers.
Minimum Sum-of-Squares Clustering by DC Programming and DCA
5
337
Numerical Experiments
Our algorithms are coded in C++, and run on a Pentium 2.930GHz of 1024 DDRAM. We have tested our code on 8 real data sets. • “PAPILLON” is a well known dataset called ”jeux de papillon”. Several articles on clustering have discussed this dataset (see Revue Modulad - Le Monde Des Utilisateurs de L’Analyse de Données, numéro 11, 1993, 7-44). • “IRIS” is the classical IRIS dataset which is perhaps the best known dataset found in pattern recognition literature. The dataset consists of 3 classes, 50 instances each and 4 numeric attributes where each class refers to a type of iris plant, namely Iris Setosa, Iris Versicolor, Iris Verginica. The first class is linearly separable from the other ones while that latter ones are not linearly separable. The measurements consist of the sepal and petal lengths and widths in centimeters. • “GENE” is a Gene Expression dataset containing 384 genes that we get from http://faculty.washington.edu/kayee/cluster/. • “VOTE” is the Congressional Votes dataset (Congressional Quarterly Almanac, 98th Congress, 2nd session 1984, Volume XL: Congressional Quarterly Inc. Washington, D.C., 1985), it consists of the votes for each of the U.S. House of Representative Congressmen, for 16 key votes, on different subjects (handicap, religion, immigration, army, education, . . . ). For each vote, three answers are possible: yes, nay, and unknown. The individuals are separated into two clusters: democrats (267) and republicans (168). • “ADN” is the ADN dataset (ftp://genbank.bio.net) that consists of 3186 genes, described by 60 DNA sequel elements, called nucleotides or base pairs, with 4 possible modalities (A, C, G or T). These genes are divided into three different clusters: “intron → exon” or ie (sometimes called donors, 767 objects), ”exon → intron” or ei (sometimes called acceptors, 765 objects), and "neither", noted as n objects). • “YEAST” is composed of 2945 genes in the space of 15 dimensions and can be downloaded from http://genomics.stanford.edu. • “SERUM” is composed of 517 genes in the space of 12 dimensions; the entire dataset is available at http://genome-www.stanford.edu/serum. • “HUMAN CANCER ” is composed of 60 human cancer cell lines in the space of 768 dimensions and is available at http://discover.nci.nih.gov/nature2000/. We used the following parameters in our code: the exact penalty parameter t is in the interval [10, 100] while the accuracy is equal to 10−6 . The starting point 0 V 0 is randomly chosen in the ball C and Ui,k corresponds to the assignment of the point xk to closest cluster center vi0 . In Tables 1 we present the comparative numerical results provided by our algorithms and K-means which is available on the web site: http://www.fas. umonteral.ca/biol/legendre/. Here ”iter”denotes the number of iterations. All CPU are computed in seconds.
338
H.A. Le Thi and T. Pham Dinh
Table 1. Comparative results between DCA and K-means. PWPO (in the first five problems where the clustering is known a priori) means the percentage of the well placed objects. Jc (in the last 2three problems) denotes the so termed "cluster cost", say Jc := min xi − vk . i=1..n k=1..c
Real Data n PAPILLON 23 IRIS 150 GENE 384 VOTE 435 ADN 3186 YEAST 2945 SERUM 517 HUMAN CANCER 768
p 4 4 17 16 60 15 12 60
c 5 3 5 2 3 16 8 9
iter 3 4 24 4 15 156 120 118
K-means CPU PWPO / Jc 0.01 97% 0.81 92% 0.73 75% 0.03 82% 1.95 79% 6.36 43424.88 50.98 6199.56 44.3 139696.02
iter 10 5 8 3 8 28 39 65
DCA CPU PWPO / Jc 0.002 100% 0.01 99.33% 0.20 91.55% 0.01 98.34% 0.62 94.52 7.81 43032.01 4.50 6009.37 16.03 129622.05
From the numerical experiments we observe that – DCA is better than K-means in both the running-time and quality of solutions. – DCA is efficient for all datasets: for PAPILLON dataset, DCA gives exactly the right clustering; for IRIS dataset only one element is misclassified; likewise, for all other datasets the percentage of well placed objects is large.
6
Conclusion
We have proposed, for solving the MSSC problem a new and efficient approach based on DC programming and DCA. The hard combinatorial optimization MSSC model has been recast as a DC program in its elegant matrix formulation and with a nice DC decomposition, in order to make simpler and so much less expensive the computations in the resulting DCA. It fortunately turns out that the corresponding DCA consists in computing, at each iteration, the projection of points onto a simplex and/or a ball, that all are given in the explicit form. Preliminary numerical experiments on real world database show the efficiency and the superiority of DCA with respect to the K-means. Moreover it is possible to better still improve the DCA (on both the quality of solution and time consuming) by investigating efficient procedures to get a good starting point for DCA. On the other hand, DCA may be interesting for a kernelized version of MSSC model. Works in these directions are in progress.
References 1. Aloise, D., Deshpande, A., Hansen, P., Popat, P.: Np-hardness of Euclidean Sumof-squares Clustering, Cahiers du GERAD, G-2008-33 (2008) 2. Arora, S., Kannan, R.: Learning Mixtures of Arbitrary Gaussians. In: Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, pp. 247–257 (2001)
Minimum Sum-of-Squares Clustering by DC Programming and DCA
339
3. Bradley, B.S., Mangasarian, O.L.: Feature Selection via Concave Minimization and Support Vector Machines. In: Shavlik, J. (ed.) Machine Learning Proceedings of the Fifteenth International Conferences (ICML 1998), pp. 82–90. MorganKaufmann, San Francisco (1998) 4. Brusco, M.J.: A Repetitive Branch-and-bound Procedure for Minimum Withincluster Sum of Squares Partitioning. Psychometrika 71, 347–363 (2006) 5. Dhilon, I.S., Korgan, J., Nicholas, C.: Feature Selection and Document Clustering. In: Berry, M.W. (ed.) A Comprehensive Survey of Text Mining, pp. 73–100. Springer, Heidelberg (2003) 6. Duda, R.O., Hart, P.E.: Pattern classification and Scene Analysis. Wiley, Chichester (1972) 7. Feder, T., Greene, D.: Optimal Algorithms for Approximate Clustering. In: Proc. STOC (1988) 8. Fisher, D.: Knowledge Acquisition via Incremental Conceptual Clustering. Machine Learning 2, 139–172 (1987) 9. Forgy, E.: Cluster Analysis of Multivariate Date: Efficiency vs. Interpretability of Classifications. Biometrics, 21–768 (1965) 10. Jancey, R.C., Botany, J.: Multidimensional Group Analysis. Australian, 14–127 (1966) 11. Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: a Review. ACM Comput. Surv. 31(3), 264–323 (1999) 12. Krause, N., Singer, Y.: Leveraging the Margin More Carefully. In: International Conference on Machine Learning ICML (2004) 13. Le, T.H.A.: Contribution à l’optimisation non convexe et l’optimisation globale: Théorie, Algoritmes et Applications, Habilitation à Diriger des Recherches, Université de Rouen (1997) 14. Le, T.H.A., Pham, D.T.: Solving a Class of Linearly Constrained Indefinite Quadratic Problems by DC Algorithms. Journal of Global Optimization 11, 253–285 (1997) 15. Le, T.H.A., Pham, D.T.: The DC (Difference of Convex Functions) Programming and DCA Revisited with DC Models of Real World Nonconvex Optimization Problems. Annals of Operations Research 133, 23–46 (2005) 16. Le, T.H.A., Pham, D.T., Huynh, V.: Ngai, Exact penalty in DC Programming, Technical Report. LMI, INSA-Rouen (2005) 17. Le, T.H.A., Belghiti, T., Pham, D.T.: A New Efficient Algorithm Based on DC Programming and DCA for Clustering. Journal of Global Optimization 37, 593– 608 (2007) 18. Le, T.H.A., Le, H.M., Pham, D.T.: Optimization Based DC Programming and DCA for Hierarchical Clustering. European Journal of Operational Research (2006) 19. Le, T.H.A., Le, H.M., Nguyen, V.V., Pham, D.T.: A DC Programming Approach for Feature Selection in Support Vector Machines Learning. Journal of Advances in Data Analysis and Classification 2, 259–278 (2008) 20. Liu, Y., Shen, X., Doss, H.: Multicategory ψ-Learning and Support Vector Machine: Computational Tools. Journal of Computational and Graphical Statistics 14, 219– 236 (2005) 21. Liu, Y., Shen, X.: Multicategoryψ -Learning. Journal of the American Statistical Association 101, 500–509 (2006) 22. Mangasarian, O.L.: Mathematical Programming in Data Mining. Data Mining and Knowledge Discovery 1, 183–201 (1997)
340
H.A. Le Thi and T. Pham Dinh
23. MacQueen, J.B.: Some Methods for Classification and Analysis of Multivariate Observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967) 24. Merle, O.D., Hansen, P., Jaumard, B., Mladenovi’c, N.: An Interior Point Algorithm for Minimum Sum of Squares Clustering. SIAM J. Sci. Comput. 21, 1485–1505 (2000) 25. Neumann, J., Schnörr, C., Steidl, G.: SVM-based feature selection by direct objective minimisation. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 212–219. Springer, Heidelberg (2004) 26. Peng, J., Xiay, Y.: A Cutting Algorithm for the Minimum Sum-of-Squared Error Clustering. In: Proceedings of the SIAM International Data Mining Conference (2005) 27. Pham, D.T., Le, T.H.A.: DC Optimization Algorithms for Solving the Trust Region Subproblem. SIAM J. Optimization 8, 476–505 (1998) 28. Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Mathematical Programming 79, 191–215 (1997) 29. Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975) 30. Ronan, C., Fabian, S., Jason, W., Léon, B.: Trading Convexity for Scalability. In: International Conference on Machine Learning ICML (2006) 31. Shen, X., Tseng, G.C., Zhang, X., Wong, W.H.: ψ -Learning. Journal of American Statistical Association 98, 724–734 (2003) 32. Sherali, H.D., Desai, J.: A global Optimization RLT-based Approach for Solving the Hard Clustering Problem. Journal of Global Optimization 32, 281–306 (2005) 33. Yuille, A.L., Rangarajan, A.: The Convex Concave Procedure (CCCP). In: Advances in Neural Information Processing System, vol. 14. MIT Press, Cambrige (2002)
An Effective Hybrid Algorithm Based on Simplex Search and Differential Evolution for Global Optimization Ye Xu, Ling Wang, and Lingpo Li Tsinghua National Laboratory for Information Science and Technology (TNList), Department of Automation, Tsinghua University, Beijing, 100084, P.R. China {xuye05,llp03}@mails.tsinghua.edu.cn,
[email protected] Abstract. In this paper, an effective hybrid NM-DE algorithm is proposed for global optimization by merging the searching mechanisms of Nelder-Mead (NM) simplex method and differential evolution (DE). First a reasonable framework is proposed to hybridize the NM simplex-based geometric search and the DE-based evolutionary search. Second, the NM simplex search is modified to further improve the quality of solutions obtained by DE. By interactively using these two searching approaches with different mechanisms, the searching behavior can be enriched and the exploration and exploitation abilities can be well balanced. Based on a set of benchmark functions, numerical simulation and statistical comparison are carried out. The comparative results show that the proposed hybrid algorithm outperforms some existing algorithms including hybrid DE and hybrid NM algorithms in terms of solution quality, convergence rate and robustness. Keywords: global optimization, Nelder-Mead simplex search, differential evolution, hybrid algorithm.
1 Introduction Global optimization has always gained extensive research and applications in science and a variety of engineering fields [1]. During the past two decades, evolutionary computation has been a hot topic in the fields of Operations Research and Computer Science and has been a kind of promising tool for global optimization. As a comparatively new optimization algorithm, Differential evolution (DE) is with less control parameters than other evolutionary algorithms like Genetic Algorithm (GA) [2], [3], [4]. DE performs its search process under the instruction of swarm intelligence produced by cooperation and competition between individual vectors in the species. Also, the specific memory ability of DE makes it possible to track the current search situation dynamically to adjust the search strategy. Although having some advantages in finding global optimum, the local search ability of DE is weak and sometimes the searching process of DE may be trapped in local minima [5]. As a direct search method, Nelder-Mead simplex search method [6] is easy to implement and has been widely applied to unconstrained optimization problems with low dimensions. D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 341–350, 2009. © Springer-Verlag Berlin Heidelberg 2009
342
Y. Xu, L. Wang, and L. Li
However, it has been shown that the performance of NM simplex search depends on the initial simplex and especially it is difficult to achieve good results for multidimensional problems [7], [8]. Recently, Memetic algorithms that reasonably combine evolutionary algorithms with local search have shown good performances for both combinatorial optimization and numerical optimization. In this paper, the NM simplex-based geometric search and the DE-based evolutionary search are hybridized within a reasonable framework. In particular, the NM simplex search is modified and incorporated into DE to enhance and to balance the exploration and exploitation abilities. By merging these two approaches with different searching mechanisms, the hybrid algorithm is of good performances in terms of solution quality, convergence rate and robustness. Numerical simulation and statistical comparison are carried out based on a set of benchmarks, which verifies the superiority of the proposed hybrid algorithm. The remainder of the paper is organized as follows: DE and NM simplex methods are briefly introduced in Section 2. In Section 3, the NM-DE hybrid algorithm is proposed. Computational results and comparisons are provided in Section 4. Finally, we end the paper with some conclusions in Section 5.
2 Introduction to DE and NM Simplex Method 2.1 Problem Statement Generally, an unconstrained optimization problem is formulated as follows [9]: min
f ( X ) , X = [ x1 ,..., xn ] , s.t. x j ∈ [a j , b j ], j = 1, 2,..., n
,
(1)
where f is the objective function, and X is the decision vector consisting of n variables, and a j and b j are the lower and upper bounds for each decision variable, respectively. 2.2 Differential Evolution Differential Evolution is a population-based stochastic optimization technique [10]. According to the comprehensive study in [11], DE outperforms many other optimization methods like GA in terms of convergence speed and robustness over some benchmark problems and real-world applications. While DE shares some similarities with other evolutionary algorithms, it differs significantly in the sense that distance and direction information from the current population is used to guide the search process [12]. The DE denoted as DE / rand /1/ bin is adopted in this paper, whose searching power is accredited to parent choice (selection), differential operator (mutation), discrete crossover (crossover) and greedy selection (decision) [13]. Selection: All the individuals in the population have the same chance to generate candidate individuals. For each individual X i , three other individuals are randomly selected from current population such that the four individuals are different from each other. Thus, a pool of four parents is formed.
An Effective Hybrid Algorithm Based on Simplex Search
343
Mutation: A mutated individual Vi is generated with the three individuals randomly chosen by Selection as follows: Vi = X r1 + F ( X r 2 − X r 3 )
,
(2)
where r1, r 2, r 3(r1 ≠ r 2 ≠ r 3 ≠ i ) are three distinct indexes randomly selected, F is a positive real number that is usually less than 1.0. Crossover: A trial individual Ti is formed by recombination of the elements of X i and Vi one by one as follows:
⎧⎪vi, j if r ≤ cr or j = sn ti , j = ⎨ j = 1,K, n ⎪⎩ xi , j otherwise ,
(3)
where t i, j is the j th element of the trial individual Ti , r is a uniformly random number between 0 and 1, cr is a real number between 0 and 1 to control the ratio of selection between of the parent and mutated vector, and sn is an index randomly chosen to ensure that at least one element of Vi will be inherited. Decision: Greedy selection between the trial vector Ti and X i is applied and the one with better function value f i will be promoted to the next generation. The procedure of the DE is illustrated in Fig. 1. Step 1: Initialize the population randomly and find the best
; best individual X Step 2: For each individual X i in the population: Step 2.1: Randomly selected three indexes r1, r 2, r3 ; Step 2.2: Get the mutation vector Vi by Eqn. (2); Step 2.3: Get trial vector Ti by Eqn. (3); Step 2.4: Greedy selection between Ti and X i . Step 3: Go back to Step 2 if the stopping criterion is not satisfied, else stop. Fig. 1. Procedure of DE
2.3 The Nelder-Mead Simplex Search Method The Nelder-Mead simplex search method is a direct search method designed for unconstrained optimization without using gradient information [14]. The procedure of this method is to rescale the simplex based on the local behavior by using four basic operators: reflection, expansion, contraction and shrink. With these operators, the simplex can successfully improve itself and get closer to the optimum. The original NM simplex procedure is outlined in Fig. 2, where all the steps are described as follows.
344
Y. Xu, L. Wang, and L. Li
Step 1: Randomly Initialize n+1 individuals, and order them s.t. f ( X 1 ) ≤ f ( X 2 )... ≤ f ( X n +1 ) ; Step 2: Calculate the reflection point X r ; If f ( X 1 ) ≤ f ( X r ) < f ( X n ) , accept X r and terminate; If f ( X r ) < f ( X1 ) , go to Step 3; If f ( X r ) ≥ f ( X n ) , go to Step 4; Step 3: generate X e ; If f ( X e ) ≤ f ( X r ) , accept X e and terminate; If f ( X e ) > f ( X r ) , accept X r and terminate; Step 4: If f ( X r ) < f ( X n+1 ) , generate X c ; If f ( X c ) ≤ f ( X r ) , accept X c and terminate; Else, go to Step 5; If f ( X r ) ≥ f ( X n +1 ) , generate X c ; If f ( X c ) ≤ f ( X n +1 ) , accept X c and terminate; Else, go to Step 5; Step 5: For each individual i except 1 in population: evaluate Vi = X1 + δ ( X i − X1 ) and set X = V ; Fig. 2. Procedure of NM simplex method
Initialization: To minimize a function with n variables, create n+1 vertex points randomly to form an initial n-dimensional simplex. Evaluate the function value at each vertex point and order the n+1 vertices to satisfy f ( X 1 ) ≤ f ( X 2 )... ≤ f ( X n +1 ) . Reflection: Calculate the reflection point X r as follows: X r = X + α ( X − X n +1 )
(4)
,
∑ X / n is the centroid of the n best points (all vertices except for X n
where
X =
i
n +1
),
i =1
and α ( α >0) is the reflection coefficient. If reflected point X r and terminate the iteration. Expansion: If
f ( X r ) < f ( X1 ) ,
f ( X1 ) ≤ f ( X r ) < f ( X n ) ,
then accept the
then calculate the expansion point as follows: Xe = X + β (Xr − X )
(5)
,
where β ( β >1) is the expansion coefficient. If f ( X e ) ≤ f ( X r ) , then accept the expanded point X e and terminate the iteration; otherwise accept X r and terminate the iteration. Contraction: If f ( X r ) ≥ f ( X n ) , then perform a contraction between of X n +1 and X r . (a) If f ( X r ) < f ( X n +1 ) , then perform outside contraction:
X
and the better
An Effective Hybrid Algorithm Based on Simplex Search
Xc = X + γ (Xr − X )
(6)
,
where γ (1> γ >0) is the contraction coefficient. If f ( X c ) ≤ terminate the iteration; otherwise go to Shrink step. (b) If f ( X r ) ≥ f ( X n +1 ) , then perform inside contraction: X c = X − γ ( X − X n +1 )
If f ( X c ) ≤ step.
f ( X n +1 ) ,
then accept
Xc
345
.
f (Xr ) ,
then accept
Xc
and
(7)
and terminate the iteration; otherwise go to Shrink
Shrink: Shrink operator is performed as follows: Vi = X 1 + δ ( X i − X 1 )
,
i = 2, . . . , n+1 ,
(8)
where δ (1> δ >0) is the shrinkage coefficient. The simplex obtained by shrink is denoted as V = { X 1 ,V1 ,V2 ,...,Vn +1} for the next iteration.
3 Hybrid NM-DE Algorithm To fuse the searching behaviors of the NM and DE methods, we propose a hybrid search framework illustrated in Fig. 3. It can be seen from Fig. 3 that NM simplex search and DE-based evolutionary search are hybridized. During each generation, the population will be improved both by differential evolution and simplex search. The information of the best individuals is well used to guide the local search of the algorithm. Thus, the advantages of these two methods could be utilized together. Moreover, the classic NM simplex search is modified, which is used to guide the search towards promising region, which can enhance the local search ability. Suppose the population size is P, the initial P points are randomly generated from the n-dimensional space. First, the P individuals are ranked from best to worse according to the objective values. Then, the top Q individuals are used to calculate the initial simplex centroid for the NM method. For the left P-Q individuals, each will perform an iteration of simplex search with the calculated centroid: Q
X = ∑ Xi / Q
(9)
.
i =1
When dealing with the shrink step, just shrink the single point to the best point until it reaches a better objective value while keeping the positions of the other points fixed instead of shrinking all simplex points as original NM simplex method does. For example, for the point X j , generate the shrink point as follows: V j = X1 + δ ( X j − X1 )
.
(10)
, then accept V j , otherwise, keep on shrinking operation until the termination condition is satisfied.
If
f (V j ) ≤ f ( X j )
346
Y. Xu, L. Wang, and L. Li
Best Centroid
Q
Q
Selection Mutation Modified NM Method
P-Q
Worst
P-Q
P
Crossover
Ranked population Fig. 3. Schematic representation of the NM-DE hybrid algorithm
After performing the modified NM simplex search method, P-Q new individuals are generated. Then the P-Q new individuals and the original Q individuals from the initial population are merged to form the initial population of DE-based evolutionary search. Via a generation of DE-based search, P new individuals are generated. After ranking these new individuals, the modified NM simplex search is performed once again. The above interactive search process based on simplex search and DE based search is repeated until a predefined stopping criterion is satisfied. By combining the NM simplex search and DE search, the exploration and exploitation searching abilities could be enhanced and balanced, which may result in a more effective optimization algorithm for global optimization.
4 Numerical Simulation and Comparisons 4.1 Experiment Setup In this section, the proposed hybrid algorithm is tested based on ten benchmarks [13] as shown in Table.1. These benchmarks are widely used to test and compare the performances of newly designed optimization algorithms. Moreover, to compare our algorithm with hybrid DE and hybrid NM methods, we compare NM-DE with the recent hybrid DEahcSPX algorithm and DE in [13] and NM-PSO in [6]. For the NMDE algorithm, we set F=0.7, Cr=0.2, α =1, β =1.5, γ =0.5, δ =0.5, Q=n. As for the stopping criterion, we set the maximum number of function evaluations as 10000*n , which is the same as that in [13]. The experiments were carried out on Intel Pentium Processor 1600MHZ and 2GB-RAM in Visual Studio 2005 environment. For each problem, the NM-DE is independently run 50 times. In addition, an accuracy level is set as 1e-6 for all the functions. That is, a test is viewed as a successful run if the derivation between the obtained function value by the algorithm and the theoretical optimal value is less than the level.
An Effective Hybrid Algorithm Based on Simplex Search
347
Table 1. Test testing functions [13]
Function number 1 2 3 4 5 6 7 8 9 10
Function name Sphere Function Rosenbrock’s Function Ackley’s Function Griewank’s Function Rastrigin’s Function Generalized Schwefel’s Problem 2.26 Salomon’s Function Whitely’s Function Generalized Penalized Function 1 Generalized Penalized Function 2
Optimal OBJ value 0 0 0 0 0 0 0 0 0 0
4.2 Simulation Results and Comparisons Firstly, we compare the NM-DE (P=2n) with DE and the hybrid DEahcSPX. For all the testing problems with 30 dimensions, the resulted average function values and variance values of DE, DEahcSPX and NM-DE are listed in Table 2. In addition, the average evaluation times and the time of successful run (data in the bracket) of these algorithms are summarized in Table 3. Table 2. Comparison of DE, DEahcSPX, NM-DE (n =30)
Function no.
DE [13]
DEahcSPX [13]
NM-DE
1 2 3 4 5 6 7 8 9 10
5.73E-17±2.03E-16 5.20E+01±8.56E+01 1.37E-09±1.32E-09 2.66E-03±5.73E-03 2.55E+01±8.14E+00 4.90E+02±2.34E+02 2.52E-01±4.78E-02 3.10E+02±1.07E+02 4.56E-02±1.31E-01 1.44E-01±7.19E-01
1.75E-31±4.99E-31 4.52E+00±1.55E+01 2.66E-15±0.00E+00 2.07E-03±5.89E-03 2.14E+01±1.23E+01 4.70E+02±2.96E+02 1.80E-01±4.08E-02 3.06E+02±1.10E+02 2.07E-02±8.46E-02 1.71E-31±5.35E-31
4.05E-299±0.00E+00 9.34E+00±9.44E+00 8.47E-15±2.45E-15 8.87E-04±6.73E-03 1.41E+01±5.58E+00 3.65E+03±7.74E+02 1.11E+00±1.91E-01 4.18E+02±7.06E+01 8.29E-03±2.84E-02 2.19E-04±1.55E-03
From Table 2, it can be seen that for most problems NM-DE outperforms DE and DEahcSPX in terms of average value and variance. So, it is concluded that the proposed NM-DE is of better searching quality and robustness. Table 3. Comparison of DE, DEahcSPX, NM-DE in terms of average evaluation times and time of successful runs
Fun. no.
DE [13]
DEahcSPX [13]
NM-DE
1 2
148650.8 (50) -
87027.4 (50) 299913.0 (2)
8539.4 (50) 74124.9 (40)
348
Y. Xu, L. Wang, and L. Li Table 3. (continued)
Fun. no. 3 4 9 10
DE [13] 215456.1 (50) 190292.5 (38) 160955.2 (43) 156016.9 (48)
DEahcSPX [13] 129211.6 (50) 121579.2 (43) 96149.0 (46) 156016.9 (48)
NM-DE 13574.7 (29) 9270.2 (36) 7634.3 (44) 7996.1 (42)
From Table 3, it can be found that the NM-DE costs much less computational effort than DE and DEahcSPX while the time of successful run of NM-DE is close to that of DEahcSPX. So, it is concluded that NM-DE is more efficient than the comparative algorithms. Secondly, we compare NM-DE with NM-PSO [6] that combines the NM simplex search with particle swarm optimization (PSO). Since only three high-dimensional functions were used to test NM-PSO in [6], here we compare our NM-DE with NMPSO by using those problems. The population size is set 3n+1 as the same as that in [6]. The results are summarized in Table 4. From the table it can be clearly seen that the average performance NM-DE is superior to that of NM-PSO. Comparing with the pure PSO and NM methods, our NM-DE is of much better performance. Table 4. Comparison of NM, PSO, NM-PSO, NM-DE
Function no. Dimension 2 10 1 30 4 50
NM [6] 2.429e-8 726.704 1.230
PSO [6] 1013.251 4824.621 6.575
NM-PSO [6] 3.378e-9 2.763e-11 9.969e-12
NM-DE 9.783e-21 4.516e-155 1.184e-16
Finally, we test the effect of population size on the performance of algorithms with the fixed total evaluation times (300000). From Table 5, it can be seen that with the growth of the population size, the performances of DE and DEahcSPX gradually get worse while the population size has little effect on the performance of NM-DE. That is, our NM-DE is much robust on population size. All in all, the proposed hybrid algorithm is a potentially powerful tool for global optimization. Table 5. Comparison of DE, DEahcSPX, NM-DE with different population size
PopSize=50 Fun. no.
DE [13]
DEahcSPX [13]
NM-DE
1
2.31E-02±1.92E-02
6.03E-09±6.86E-09
8.46E-307±0.00E+00
2 3 4 5 6 7 8
3.07E+02±4.81E+02 3.60E-02±1.82E-02 5.00E-02±6.40E-02 5.91E+01±2.65E+01 7.68E+02±8.94E+02 8.72E-01±1.59E-01 8.65E+02±1.96E+02
4.98E+01±6.22E+01 1.89E-05±1.19E-05 1.68E-03±4.25E-03 2.77E+01±1.31E+01 2.51E+02±1.79E+02 2.44E-01±5.06E-02 4.58E+02±7.56E+01
2.34E+00±1.06E+01 8.26E-15±2.03E-15 2.12E-03±5.05E-03 1.54E+01±4.46E+00 3.43E+03±6.65E+02 1.16E+00±2.36E-01 3.86E+02±8.39E+01
An Effective Hybrid Algorithm Based on Simplex Search
349
Table 5. (continued)
9 10
2.95E-04±1.82E-04 9.03E-03±2.03E-02
PopSize=50 1.12E-09±2.98E-09 4.39E-04±2.20E-03
4.48E-28±1.64E-31 6.59E-04±2.64E-03
PopSize=100 Fun. no.
DE [13]
DEahcSPX [13]
NM-DE
1
3.75E+03±1.14E+03
3.11E+01±1.88E+01
1.58E-213±0.00E+00
2 3 4 5 6 7 8 9 10
4.03E+08±2.59E+08 1.36E+01±1.48E+00 3.75E+01±1.26E+01 2.63E+02±2.79E+01 6.56E+03±4.25E+02 5.97E+00±6.54E-01 1.29E+14±1.60E+14 6.94E+04±1.58E+05 6.60E+05±7.66E+05
1.89E+05±1.47E+05 3.23E+00±5.41E-01 1.29E+00±1.74E-01 1.64E+02±2.16E+01 6.30E+03±4.80E+02 1.20E+00±2.12E-01 3.16E+08±4.48E+08 2.62E+00±1.31E+00 4.85E+00±1.59E+00
2.06E+01±1.47E+01 8.12E-15±1.50E-15 3.45E-04±1.73E-03 1.24E+01±5.80E+00 3.43E+03±6.65E+02 8.30E-01±1.27E-01 4.34E+02±5.72E+01 6.22E-03±2.49E-02 6.60E-04±2.64E-03
Fun. no.
DE [13]
DEahcSPX [13]
NM-DE
1
4.01E+04±6.26E+03
1.10E+03±2.98E+02
5.05E-121±2.44E-120
2 3 4 5 6 7
1.53E+10±4.32E+09 2.02E+01±2.20E-01 3.73E+02±6.03E+01 3.62E+02±2.12E+01 6.88E+03±2.55E+02 1.34E+01±8.41E-01
1.49E+07±7.82E+06 9.11E+00±7.81E-01 1.08E+01±2.02E+00 2.05E+02±1.85E+01 6.72E+03±3.24E+02 3.25E+00±4.55E-01
2.04E+01±8.49E+00 7.83E-15±1.41E-15 3.45E-04±1.73E-03 1.23E+01±6.05E+00 4.61E+03±6.73E+02 6.36E-01±9.85E-02
8 9 10
2.29E+16±1.16E+16 2.44E+07±7.58E+06 8.19E+07±1.99E+07
5.47E+10±6.17E+10 9.10E+00±2.42E+00 6.18E+01±6.30E+01
4.16E+02±5.40E+01 4.48E-28±1.55E-31 4.29E-28±2.59E-31
PopSize=200
PopSize=300 Fun. no.
DE [13]
DEahcSPX [13]
NM-DE
1 2 3 4 5 6 7 8 9 10
1.96E+04±2.00E+03 3.97E+09±8.92E+08 1.79E+01±3.51E-09 1.79E+02±1.60E+01 2.75E+02±1.27E+01 6.87E+03±2.72E+02 1.52E+01±5.43E-01 2.96E+16±1.09E+16 3.71E+07±1.29E+07 1.03E+08±1.87E+07
6.93E+02±1.34E+02 5.35E+06±2.82E+06 7.23E+00±4.50E-01 7.26E+00±1.74E+00 2.03E+02±1.49E+01 6.80E+03±3.37E+02 3.59E+00±4.54E-01 1.83E+11±1.72E+11 1.09E+01±3.76E+00 3.42E+02±4.11E+02
5.55E-86±7.59E-86 2.25E+01±1.16E+01 7.19E-15±1.48E-15 6.40E-04±3.18E-03 1.30E+01±7.48E+00 4.41E+03±6.41E+02 5.32E-01±8.19E-02 4.28E+02±5.47E+01 4.48E-28±1.64E-31 4.29E-28±5.44E-43
350
Y. Xu, L. Wang, and L. Li
5 Conclusion In this paper, NM simplex method and DE were reasonably hybridized for global optimization. With the local search via the modified NM method and the evolutionary search via DE, NM-DE algorithms got better results than some existing hybrid DE and hybrid NM algorithms in terms of searching quality, convergence rate and robustness. The future work is to develop the hybrid algorithm with adaptive mechanism and to apply the algorithm to some engineering problems.
Acknowledgments This research is partially supported by National Science Foundation of China (Grant No. 60774082, 70871065, 60834004) and the National 863 Program under the grant number 2007AA04Z155 as well as the Project-sponsored by SRF for ROCS, SEM.
References 1. Nash, S.G., Sofer, A.: Linear and Nonlinear Programming. McGraw-Hill, New York (1996) 2. Huang, F.Z., Wang, L.: A Hybrid Differential Evolution with Double Populations for Constrained Optimization. In: IEEE CEC, pp. 18–25. IEEE Press, New York (2008) 3. Price, K., Storn, R.M., Lampinen, J.A.: Differential Evolution: A Practical Approach to Global Optimization. Springer, New York (2005) 4. Rahman, S., Tizhoosh, H.R., Salama, M.M.A.: Opposition-Based Differential Evolution. IEEE Transactions on Evolutionary Computation 12(1), 64–79 (2008) 5. Isaacs, A., Ray, T., Smith, W.: A Hybrid Evolutionary Algorithm with Simplex Local Search. In: IEEE CEC, pp. 1701–1708. IEEE Press, New York (2007) 6. Fan, S.K.S., Zahara, E.: A Hybrid Simplex Search and Particle Swarm Optimization for Unconstrained Optimization. European J. Operational Research 181, 527–548 (2007) 7. Nelder, J.A., Mead, R.: A Simplex Method for Function Minimization. Computer J. 7(4), 308–313 (1965) 8. Lagarias, J.C., Reeds, J.A., Wright, M.H., Wright, P.E.: Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions. SIAM J. Optimization 9(1), 112–147 (1999) 9. Ali, M.M.: Differential Evolution With Preferential Crossover. European J. Operational Research 181, 1137–1147 (2007) 10. Storn, R., Price, K.: Differential Evolution - A Simple and Efficient Heuristic for Global Optimization Over Continuous Spaces. J. Global Optimization 11(4), 341–359 (1997) 11. Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Real-Parameter Optimization with Differential Evolution. Evolutionary Computation 2(1), 506–513 (2005) 12. Omran, M.G.H., Engelbrecht, A.P., Salman, A.: Bare Bones Differential Evolution. European J. Operational Research 196(1), 128–139 (2009) 13. Noman, N., Iba, H.: Accelerating Differential Evolution Using an Adaptive Local Search. IEEE Transactions on Evolutionary Computation 12(1), 107–125 (2008) 14. Cohen, G., Ruch, P., Hilario, M.: Model Selection for Support Vector Classifiers via Direct Simplex Search. In: FLAIRS Conference, Florida, pp. 431–435 (2005)
Differential Evolution with Level Comparison for Constrained Optimization Ling-po Li, Ling Wang, and Ye Xu Tsinghua National Laboratory for Information Science and Technology (TNList), Department of Automation, Tsinghua University, Beijing, 100084, P.R. China {llp03,xuye05}@mails.tsinghua.edu.cn,
[email protected] Abstract. The effectiveness of a constrained optimization algorithm depends on both the searching technique and the way to handle constraints. In this paper, a differential evolution (DE) with level comparison is put forward to solve the constrained optimization problems. In particular, the α (comparison level) constrained method is adopted to handle constraints, while the DE-based evolutionary search is used to find promising solutions in the search space. In addition, the scale factor of the DE mutation is set to be a random number to vary the searching scale, and a certain percentage of population is replaced with random individuals to enrich the diversity of population and to avoid being trapped at local minima. Moreover, we let the level increase exponentially along with the searching process to stress feasibility of solution at later searching stage. Experiments and comparisons based on the 13 well-known benchmarks demonstrate that the proposed algorithm outperforms or is competitive to some typical state-of-art algorithms in terms of the quality and efficiency. Keywords: differential evolution, level comparison, constrained optimization.
1 Introduction Generally speaking, a constrained optimization problem is described as follows: Find X to minimize f (X )
Subject to: g i ( X ) ≤ 0 i = 1,2,K, m,
h j ( X ) = 0 j = 1,2, K , q,
(1)
lk ≤ xk ≤ u k k = 1,2,K, d , where X = [ x1 , x2 ,K, xd ]T denotes a d -dimensional solution vector, m denotes the number of inequality constraints, and p denotes the number of equality constraints, and uk and lk are the upper bound and the lower bound of xk , respectively. During the past two decades, much attention has been attracted to solve constrained optimization problems via evolutionary algorithms (EAs) [1], [2]. For constrained optimization, it is known that the balance of searching between the objective and constraints greatly affects the effectiveness of the algorithm. However, there is no D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 351–360, 2009. © Springer-Verlag Berlin Heidelberg 2009
352
L.-p. Li, L. Wang, and Y. Xu
any theoretical metric criterion on this subject. Numerous methods have been proposed for handling constraints, such as adaptive penalty approach based on genetic algorithm (GA) [3], co-evolutionary particle swarm optimization (PSO) [4], multiobjective optimization [5], comparison criteria [6], stochastic-rank (SR) based method [7], and α constrained method [8]. As a population based EA, Differential evolution (DE) [9] is of simple concept, easy implementation and quick convergence. So far, DE has been successfully applied to a variety of unconstrained optimization problems and real world applications, such as filter design [10], production scheduling [11]. Since the work by Lampinen [12], DE has gained increasing attention in the field of constrained optimization [13], [14], [15]. In this paper, a DE with level comparison is put forward for constrained problems. In particular, the α constrained method is adopted to handle constraints, while DE is used to perform evolutionary search in the search space. In addition, some modifications are made to DE to enhance global searching ability and to enrich the diversity of population. Moreover, we set the level varying with the searching process to stress the feasibility of solutions to different extent. Experiments and comparisons based on the 13 well-known benchmarks are carried out, and the results demonstrate that the proposed algorithm outperforms or is competitive to some effective state-of-art algorithms in terms of the quality and efficiency. The remaining content is organized as follows. The DE and the α constrained method are briefly introduced in Section 2 and Section 3, respectively. In Section 4, the DE with level comparison is proposed. Simulation results and comparisons are presented in Section 5. Finally, we conclude the paper in Section 6.
2 Differential Evolution Differential Evolution is a population-based stochastic optimization technique proposed for unconstrained problem [16]. According to the comprehensive study [9], DE outperforms many other EAs like GA in terms of convergence speed and robustness over common benchmark problems and real-world applications. Many variants of the classic DE have been proposed. The DE denoted as DE / rand / 1 / bin [16] is adopted in this paper, whose key operators are introduced as follows. The mutation operator is performed by the following way: u i = X r1 + F ( X r 2 − X r 3 ) ,
(2)
where ui = [ui1 , ui 2 , K , uid ] is the trial individual, X r1 , X r 2 , X r 3 are three different individuals in the population, and F is a scale factor between 0 and 1. The crossover is performed as follows:
⎧⎪uij if (rand ( j ) ≤ cr ) or j = sn vij = ⎨ j = 1,2, K , d , ⎪⎩ xij ohterwise
(3)
where cr is a real parameter between 0 and 1 to control the crossover rate of the trial individual and the old individual, sn is a random integer number in {1,2, K , d } to
Differential Evolution with Level Comparison for Constrained Optimization
353
assure that at least one component of the new individual is inherited from the trial individual, and vij is the jth dimension of the new individual Vi . After crossover, a greedy selection is performed as follows: ⎧⎪ X g if ( f ( X g ) ≤ f (v g )) X g +1 = ⎨ , ⎪⎩Vg if ( f (v g ) > f ( X g ))
(4)
where X g +1 denotes the individual in the next generation, X g denotes the old individual and Vg denotes the new individual. The main procedure of DE is summarized as follows: Step 1: Initialize a population of individuals with random solutions. Step 2: Evaluate all individuals. Step 3: For each individual: Step 3.1: Perform mutation; Step 3.2: Perform crossover; Step 3.3: Greedy selection. Step 4: If a stopping criterion is met, then output best solution found so far, otherwise go back to Step 3.
3 α Constrained Method Inspired by the fuzzy control rules, Takahama and Sakai [17] proposed the α constrained method by introducing the level comparison to compare the search solutions based on a predefined satisfaction level. Using level comparison, the constrained problem is converted to an unconstrained problem. Thus, unconstrained optimization techniques can be applied. 3.1 Satisfaction Level of Constraints
The satisfaction level of constraints is used to indicate how well a solution satisfies the constraints, which is defined as follows: ⎧μ ( x) = 1 f ( g i ( x) ≤ 0 and h j ( x) = 0) for all i, j . ⎨ ⎩0 ≤ μ ( x) ≤ 1 otherwise
(5)
To define a satisfaction level, the level of each constraint should be defined and combined. For example, each constraint in Eqn. (1) can be transformed into one of the following satisfaction levels defined by piecewise linear functions on g i and h j .
354
L.-p. Li, L. Wang, and Y. Xu
⎧1 if g i ( x) ≤ 0 ⎪ ⎪ g (X ) μ g i ( X ) = ⎨1 − i , if 0 ≤ g i ( x) ≤ bi , bi ⎪ ⎪⎩0 otherwise ⎧ hj (X ) ⎪1 − bj μh j ( X ) = ⎨ ⎪ ⎩0
if | h j ( x) | ≤ b j
.
(6)
(7)
otherwise
where bi and b j are proper positive fixed numbers. The satisfaction level of the solution is obtained by combining the satisfaction levels of all constraints. In this paper, the levels are combined as follows:
μ ( X ) = min{μ g i ( X ), μ h j ( X )} . i, j
(8)
3.2 The α Level Comparison
The α level comparison is defined as the order relationship on the set of solutions to be used in the tournament selection as the Deb’s rules [6]. If the satisfaction level of a solution is less than 1, it is viewed as infeasible. The α level comparisons are defined by a lexicographic order in which μ (X ) precedes f ( X ) , because the feasibility is more important than the minimization of objective value. Let f1 , f 2 and μ1 , μ 2 be the function values and the satisfaction levels of two solutions X 1 , X 2 . Then, for any α satisfying 0 ≤ α ≤ 1 , the α level comparisons ≤α
and period Yes
Update the value of α and set count = 0
Fig. 1. The Flowchat of α DE
As shown in Fig. 1, the α DE has the similar procedure as traditional DE. In particular, a population is initialized with random individuals and the differential mutation and crossover operators are almost the same as those of traditional DE. To diversify the mutation operator, the scale factor is set as a random value between 0.2 and 0.9, which is different from the classic DE with the fixed factor. The selection in α DE is based on the α level comparison. To take advantage of the information of infeasible solutions at earlier searching stage while stress feasibility at later searching stage, the value of α is dynamically updated by Eqn. (11). In addition, the level is adjusted with period T p from an initial value.
356
L.-p. Li, L. Wang, and Y. Xu
⎧(max( μ ) + average( μ )) / 2 g = 0 ⎪ g %T p = 0, g < G max/ 2 ⎪(1 − β )α ( g − 1) + β , α (g) = ⎨ g %T p ≠ 0, g < G max/ 2 ⎪α ( g − 1) ⎪1 g ≥ G max/ 2 ⎩
(11)
where g denotes the number of generation, G max is the maximum generation number, β is used to control increasing rate of α , % denotes modulus operator. In order to avoid converging to infeasible local optima when the value of α is less than 1, a few individuals of the population are replaced with randomly generated individuals after certain generations. In our procedure, the population is divided into two parts. The first part contains feasible individuals, where the individuals with the largest objective value are eliminated. The second part contains infeasible individuals, where the individuals with the lowest satisfaction level are eliminated. In the following simulation, 1/3 solutions of each part will be eliminated.
5 Experiments and Comparisons In order to verify the effectiveness and efficiency of α DE, we carry out some experiments based on the 13 well-known benchmarks for constrained optimization [18]. These benchmarks are widely used to test and compare the performances of the newly designed algorithms. In our experiments, we set population size N = 50 , G max = 4500 (i.e., the total evaluation time is 225000), cr = 0.9 as commonly used in the literature, T p = 5 , β =0.05. The fixed number bi , b j to define the α level of each constraint is set as the median value of the initial constraint violations. Moreover, it is limited to 1e4 if the value is too large, and it is set as the length of search space if the value is 0. Simulation is carried out in MATLAB2008b on AMD Athlon 64 Processor 2800+1.81GHz with 1G-RAM. Each problem is independently run 30 times. 5.1 Performances of α DE
The statistical results of α DE are summarized in Table 1, which include the best, median, mean, worst, and standard deviation of the objective values. Table 1. Performances of α DE
f g01 g02 g03 g04 g05 g06
Optimal -15.000 0.803619 -1.000 -30665.539 5126.498 -6961.814
best -15.00000 0.803619 -1.0000 -30665.539 5126.4981 -6961.814
median -15.00000 0.785274 -1.0000 -30665.539 5126.4981 -6961.814
mean -15.00000 0.7785236 -1.0000 -30665.539 5126.4981 -6961.814
worst -15.00000 0.694402 -1.0000 -30665.539 5126.4981 -6961.814
std 6.3364E-15 2.5319E-02 3.3599E-07 2.0108E-11 1.8736E-07 4.9548E-11
Differential Evolution with Level Comparison for Constrained Optimization
357
Table 1. (continued)
f g07 g08 g09 g10 g11 g12 g13
Optimal 24.3062 -0.95825 680.630 7049.248 0.7500 -1.000 0.053950
best 24.3062 -0.095825 680.6301 7049.2480 0.7500 -1.0000 0.05395
median 24.3062 -0.095825 680.6301 7049.2480 0.7500 -1.0000 0.05395
mean 24.3062 -0.095774 680.6301 7049.2480 0.7500 -1.0000 0.05395
worst 24.3062 -0.094043 680.6301 7049.2480 0.7500 -1.0000 0.05395
std 6.7723E-06 3.2510E-04 2.8946E-13 1.1030E-05 1.9259E-11 0.0000E+00 1.6332E-11
It can be seen from the Table that the best solution is almost equivalent to the optimal solution for every problem, even for g13 containing three equality constraints that is regarded as very hard to optimize. For g01 g04 g06 g07 g09 g10 g12, the optimal solutions can be found consistently in all 30 runs. For g03, g05, g11 and g13 with equality constraints, the result obtained is very close to the optimal value. For g02, the optimal solutions could not be consistently found since there are many local optima with high peak near the global optimum. For g08, we find that the optimal solution can be found in 29 runs out of 30 runs. 5.2 Comparison with Other Approaches
Next, we compare α DE with some typical state-of-art approaches, including the Improved Stochastic Ranking (ISR) [18], Simple Multimember Evolution Strategy (SMES) [19], the Diversity DE (DDE) [20], DE with Dynamic Stochastic Selection (MDESS) [11] and the α simplex [8]. The main parameter setting of the six approaches are illustrated in Table 2. Note that the equality constraints should be converted to inequality constraints with a certain tolerance when using ISR, SMES, DDE, MDESS, while the conversion is not needed when using α DE and α simplex. Table 2. The main parameters of the algorithms for comparisons Approach ISR SMES DDE MDESS α simplex α DE
Total evaluation times 350,000 240,000 225,000 225,000 290,000~330 225,000
Run times 100 30 100 100 30 30
Tolerance 0.0001 0.0004 0.0001 0.0001 0 0
Population size 400 300 450 250 90 50
The results of all the algorithms are listed in Table 3. From Table 3, it is clear that
α DE outperforms or competitive to other five algorithms in terms of the quality and efficiency except for g08, where α DE was trapped at local optimum in 1 out of the 30 runs. All the six algorithms can find the optima consistently for g01, g04, g06, g07, g09, g10 and g12. As for g02, α DE was able to find the same optima as the other five algorithms, whereas the worst result of α DE is the worst among the six algorithms. However, the “mean” of α DE is better than that of ISR, SMES and
358
L.-p. Li, L. Wang, and Y. Xu
α simplex. For problem g03, g05 and g11, ISR, MDESS and α simplex found better results than the optima because of using the tolerance to treat the equality constraints, SEMS could not find the optima consistently, whereas α DE was capable to find the real optima consistently. With respect to g 13, only α DE can find the optima consistently, ISR, SMES, DDE and α simplex were trapped in the local optima, MDESS could only find the optima consistently after relaxing the equality constraints. As for the computational costs, the maximum number of function evaluations (NFE) of our algorithm is 225,000, whereas those of ISR, SEMS, DDE, MDESS and α simplex are 350,000, 240,000, 225,000, 225,000 and 330,000. The computation cost of our algorithm is the smallest one. Thus, our algorithm is a potentially promising tool for constrained optimization, especially for problems with equality constraints. Table 3. Comparisons between different algorithms
f g01
g02
g03
g04
g05
g06
g07
g08
Sta best mean worst std best mean worst std best mean worst std best mean worst std best mean worst std best mean worst std best mean worst std best mean worst std
ISR[18]
SMES[19]
DDE[20]
MDESS[13] α simplex[8]
-15.000 -15.000 -15.000 5.8E-14 0.803619 0.782715 0.723591 2.2E-02 1.001 1.001 1.001 8.2E-09 -30665.539 -30665.539
-15.000 -15.000 -15.000 0 0.803619 0.785238 0.751322 1.7E-02 1.000 1.000 1.000 2.1E-04
-15.000 -15.000 -15.000 1.0E-09 0.803619 0.798079 0.751742 1.1E-02 1.000 1.000 1.000 0 -30665.539
-15.000 -15.000 -15.000 1.3E-10 0.803619 0.786970 0.728531 1.5E-02 1.0005 1.0005 1.0005 1.9E-08 -30665.539
-15.000 -15.000 -15.000 6.4E-06 0.803619 0.7841868 0.7542585 1.3E-02 1.0005 1.0005 1.0005 8.5E-14 -30665.539
α DE -15.000 -15.000 -15.000 6.3E-15 0.803619 0.785274 0.694402 2.5E-02 1.000 1.000 1.000 3.36E-07 -30665.539
-30665.539 -30665.539
-30665.539 -30665.539
-30665.539 -30665.539
-30665.539 -30665.539
0 5126.497 5126.497 5126.497 0 6961.814 6961.814 6961.814 0 24.306 24.306 24.306 8.22E-09 0.095825 0.095825 0.095825 0
2.7E-11 5126.497 5126.497 5126.497 0 6961.814 6961.814 6961.814 0 24.306 24.306 24.306 7.5E-07 0.095825 0.095825 0.095825 4.0E-17
4.2E-11 5126.497 5126.497 5126.497 3.5E-11 6961.814 6961.814 6961.814 1.3E-10 24.306 24.306 24.307 1.3E-04 0.095825 0.095825 0.095825 3.8E-13
2.0E-11 5126.498 5126.498 5126.498 1.9E-07 6961.814 6961.814 6961.814 5.0E-11 24.306 24.306 24.306 6.8E-06 0.095825 0.095774 0.094043 3.3E-04
-30665.539
-30665.539 -30665.539 -30665.539
1.1E-11 5126.497 5126.497 5126.497 7.2E-13 6961.814 6961.814 6961.814 1.9E-12 24.306 24.306 24.306 6.3E-05 0.095825 0.095825 0.095825 2.7E-17
0 5126.599 5174.492 5304.167 5.0E+01 6961.814 6961.284 6952.482 1.9E+00 24.327 24.475 24.843 1.3E-01 0.095825 0.095825 0.095825 0
Differential Evolution with Level Comparison for Constrained Optimization
359
Table 3. (continued)
f g09
g10
g11
g12
g13
Sta best mean worst std best mean worst std best mean worst std best mean worst std best mean worst std
ISR[18]
SMES[19]
DDE[20]
MDESS[13] α simplex[8]
α DE
680.630 680.630 680.630 3.2E-13 7049.248 7049.248 7049.248 3.2E-03 0.7500 0.7500 0.7500 1.1E-16 1.000 1.000 1.000 1.2E-09 0.053942 0.06677 0.438803 7.0E-02
680.632 680.643 680.719 1.6E-02 7049.248 7049.248 7049.248 1.4E-02 0.7500 0.7500 0.7500 1.5E-04 1.000 1.000 1.000 0 0.053986 0.166385 0.468294 1.8E-01
680.630 680.630 680.630 0 7049.248 7049.248 7049.248 4.45E-02 0.7500 0.7500 0.7500 0 1.000 1.000 1.000 0 0.053941 0.069336 0.438803 7.58E-02
680.630 680.630 680.630 2.9E-13 7049.248 7049.248 7049.248 1.4E-03 0.7499 0.7499 0.7499 0 1.000 1.000 1.000 0 0.053942 0.053942 0.053942 1.0E-13
680.630 680.630 680.630 2.9E-13 7049.248 7049.248 7049.248 1.1E-05 0.7500 0.7500 0.7500 1.9E-11 1.000 1.000 1.000 0 0.53950 0.53950 0.53950 1.6E-11
680.630 680.630 680.630 2.9E-10 7049.248 7049.248 7049.248 4.7E-06 0.7499 0.7499 0.7499 4.9E-16 1.000 1.000 1.000 3.9E-10 0.053942 0.066770 0.438803 6.9E-02
6 Conclusion In this paper, level comparison was combined with differential evolution to develop an approach for constrained optimization. To the best of our knowledge, this was the first report to incorporate α constrained method into DE for constrained optimization. Simulation results based on some well-known benchmarks and comparisons with some typical state-of-art approaches demonstrated the effectiveness and efficiency of our algorithm. The future work is to further improve the algorithm by incorporating diversity maintains and adaptive mechanism and to apply the algorithm for some constrained engineering design problems.
Acknowledgments This research is partially supported by National Science Foundation of China (Grant No. 60774082, 70871065, 60834004) and the National 863 Program under the grant number 2007AA04Z155 as well as the Project-sponsored by SRF for ROCS, SEM.
References 1. Wang, L.: Intelligent Optimization Algorithms with Application. Tsinghua University & Springer Press, Beijing (2001) 2. Coello, C.A.C.: Theoretical and Numerical Constraint-Handling Techniques Used with Evolutionary Algorithms: A Survey of the State of the Art. Comput. Methods Appl. Mech. Eng. 191, 1245–1287 (2002)
360
L.-p. Li, L. Wang, and Y. Xu
3. Rasheed, K.: An Adaptive Penalty Approach for Constrained Genetic-Algorithm Optimization. In: Genetic Programming 1998. Proceedings of the Third Annual Conference, Madison, WI, USA, pp. 584–590 (1998) 4. He, Q., Wang, L.: An Effective Co-evolutionary Particle Swarm Optimization for Constrained Engineering Design Problems. Eng. Appl. Artif. Intell. 20, 89–99 (2007) 5. Cai, Z., Wang, Y.: A Multiobjective Optimization-Based Evolutionary Algorithm for Constrained Optimization. IEEE Trans. Evol. Comput. 10, 658–675 (2006) 6. Deb, K.: An Efficient Constraint Handling Method for Genetic Algorithms. Comput. Methods Appl. Mech. Eng. 186, 311–338 (2000) 7. Runarsson, T.P., Yao, X.: Stochastic Ranking for Constrained Evolutionary Optimization. IEEE Trans. Evol. Comput. 4, 284–294 (2000) 8. Takahama, T., Sakai, S.: Constrained Optimization by Applying the Alpha Constrained Method to the Nonlinear Simplex Method with Mutations. IEEE Trans. Evol. Comput. 9, 437–451 (2005) 9. Price, K., Storn, R.M., Lampinen, J.A.: Differential Evolution: A Practical Approach to Global Optimization (Natural Computing Series). Springer, New York (2005) 10. Storn, R.: Designing Nonstandard Filters with Differential Evolution. IEEE Signal Process. Mag. 22, 103–106 (2005) 11. Qian, B., Wang, L., Huang, D., Wang, X.: Multi-objective Flow Shop Scheduling Using Differential Evolution. LNCIS, vol. 345, pp. 1125–1136 (2006) 12. Lampinen, J.: A Constraint Handling Approach for the Differential Evolution Algorithm. In: Proceedings 2002 IEEE Congress on Evolutionary Computation, Honolulu, Hawaii, pp. 1468–1473 (2002) 13. Zhang, M., Luo, W., Wang, X.: Differential Evolution with Dynamic Stochastic Selection for Constrained Optimization. Inf. Sci., 3043–3074 (2008) 14. Huang, F.Z., Wang, L., He, Q.: An Effective Co-evolutionary Differential Evolution for Constrained Optimization. Appl. Math. Comput. 186, 340–356 (2007) 15. Becerra, R.L., Coello, C.A.C.: Cultured Differential Evolution for Constrained Optimization. Comput. Meth. Appl. Mech. Eng. 195, 4303–4322 (2006) 16. Storn, R., Price, K.: Differential Evolution - A Simple and Sfficient Heuristic for Global Optimization Over Continuous Spaces. J. Glob. Optim. 11, 341–359 (1997) 17. Takahama, T., Sakai, S.: Tuning Fuzzy Control Rules by Alpha Constrained Method Which Solves Constrained Nonlinear Optimization Problems. Trans. Inst. Elect. Inf. Commun. Eng. 82-A, 658–668 (1999) 18. Runarsson, T.P., Yao, X.: Search Biases in Constrained Evolutionary Optimization. IEEE Trans. Syst., Man, Cybern. C, Appl. Rev. 35, 233–243 (2005) 19. Mezura-Montes, E., Coello, C.A.C.: A Simple Multimembered Evolution Strategy to Solve Constrained Optimization Problems. IEEE Trans. Evol. Comput. 9, 1–17 (2005) 20. Mezura-Montes, E., Velazquez-Reyes, J., Coello, C.A.C.: Promising Infeasibility and Multiple Offspring Incorporated to Differential Evolution for Constrained Optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2005), New York, pp. 225–232 (2005)
Tactical Aircraft Pop-Up Attack Planning Using Collaborative Optimization Nan Wang, Lin Wang, Yanlong Bu, Guozhong Zhang, and Lincheng Shen Mechatronics and Automation School of National University of Defense Technology, 410073, Changsha, China
[email protected] Abstract. This paper proposes a collaborative optimization based method for tactical aircraft pop-up attack planning. First, the planning problem is described by four laws: delivery maneuver, ballistics, survivability and detection. Then the collaborative optimization (CO) framework is introduced to model the complex coupled problem, in which the problem is divided into two levels: system level optimization and laws level optimization. Following the planning procedure of CO, the genetic algorithm combined with response surface method is used to solve the above optimization problem. Finally, the proposed method is validated through experiments and appears to be effective and feasible to solve the pop-up attack planning problem. Keywords: pop-up attack, weapon delivery, planning, law, collaborative optimization, response surface.
1 Introduction Nowadays air-to-ground attack is playing a more and more important role in modern wars. Given current surface-to-air threat capabilities, pop-up attack (Fig. 1) is widely used by pilots to engage well-protected targets. Before executing, the attack profile should be deliberately planned to ensure the survivability of the plane and the success of engagement, which includes determination and evaluation of the attack parameters (ingress height, speed, dive angle, release height, etc). To solve the pop-up attack planning (PAP) problem, two difficulties must be faced. First, the design space, formed by the attack parameters, is large and heterogeneous. There are more than 20 variables to be determined and evaluated, some of which are continuous, and some are discrete, with different bounds. Second, the analysis and evaluation is performed by different laws: the delivery maneuver law, the ballistics law, the survivability law and the sensor detection law. Each law computes its own constrains and objects and is coupled with each other, making the total problem a complex multi-objective, multidisciplinary optimization problem. In this paper, the collaborative optimization (CO) method combined with evolutionary optimization techniques is introduced to solve the pop-up attack planning problem. Detailed algorithms and experimental results are also presented. D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 361–370, 2009. © Springer-Verlag Berlin Heidelberg 2009
362
N. Wang et al.
Fig. 1. Typical profile of pop-up attack
2 Problem Formulation
①
As described above, the PAP problem contains four laws (Fig. 2): delivery maneuver; ballistics; survivability; sensor detection. Table 1 lists the design variables and their bounds in the problem. Table 2 lists the state variables and their relations with each law. When the laws are considered as disciplines, the pop-up attack planning is inherently a multidisciplinary process. Traditionally multidisciplinary optimization (MDO) methods have focused on disciplines within the field of engineering analysis, such as aerodynamics, solid mechanics and so on. However, in planning domain, similar problem encounters, and may become a potential application domain to be explored.
②
③
④
all design variables
Delivery Maneuver Law
ٛIPǃPUPǃAPǃRP
ٛih
Survivability Law
ٛrs ٛbr ٛdaǃra
Ballistics Law
ballistics accuracy
threat exposure
Optimization ٛisǃiaǃipd
Detection Law
detection performance
Fig. 2. Structure of the pop-up planning problem, where IP, PUP, AP and RP stand for the key positions in the attack profile
2.1 Delivery Maneuver Law In the delivery maneuver law, the attack profile is calculated based on the attack parameters, which are treated as design variables, such as is, ia, ipd, da, etc. The output
Tactical Aircraft Pop-Up Attack Planning Using Collaborative Optimization
363
Table 1. List of design variables in PAP problem
is ia ipd da lf tt ra ah tu
Name (Unit) ingress speed (KTAS) ingress altitude (ft) IP-PUP distance (nm) dive angle ( ) load factor (G) tracking time (s) release altitude (ft) attack heading ( ) turn
°
°
Bound [300, 550] [150, 500] [1, 5] [10, 45] {3, 4, 5} [1, 10] [2000, 8000] [0, 360) {left, right}
Table 2. List of state variables in PAP problem
IP PUP AP RP tet sp rs ba br ih dp
Name (Unit) initial point position pop-up point position apex point position release point position threat exposure time (s) survivability (%) release speed (m/s) ballistic accuracy (%) bomb range (ft) ingress heading ( ) detection probability (%)
°
Output From DML DML DML DML SL SL DML BL BL DML DL
Input to SL SL SL SL
BL DML DL
profile contains variables used by other laws, which are treated as state variables, such as rs, ih, etc. The calculation is based on the pop-up formulas in [1]. 2.2 Ballistic Law The ballistic law analyzes the release parameters (da, ra, rs) and calculates the bomb range br as well as the ballistic accuracy ba. The aod, which is input to the delivery maneuver law to calculate the RP position and ra, is calculated through simulation of the delivered bomb trajectory using (1), where v is the value of bomb velocity, is the obliquity angle of bomb axis, x and y are the horizontal and vertical distance traveled by the bomb.
θ
⎧ dv ⎪⎪ dt = g sin θ − aR ⎨ ⎪ dθ = g cos θ / v ⎪⎩ dt
ba
⎧ dx ⎪⎪ dt = v cos θ ⎨ ⎪ dy = v sin θ ⎪⎩ dt
(1)
The ballistic accuracy is attained by looking up the statistical table of typical bombing errors under different release parameters for special bomb types and delivery types, and is one of the objects to be optimized (minimized) in the PAP problem.
364
N. Wang et al.
2.3 Survivability Law The survivability law takes the IP, PUP, AP, RP position and altitude into calculation and outputs the tet. In the survivability law, threats are modeled as 3-D envelopes. As the threats in the target area are mainly A-A guns and short range missiles, detection and engagement areas are no distinguished. The tet is attained by analyzing the IP-PUP, PUP-AP, AP-RP segment respectively. The average speed vi in each segment is calculated and is considered constant through the analysis. In the analyzing procedure, dichotomy is used to calculate the threat exposure trajectory length Li in each segment. Then the tet is calculated as follows, which is another object to be minimized. tet = ∑ Li / vi
(2)
2.4 Detection Law Before carrying out the delivery maneuver, the target must be found and locked, and these are done in the ingress phase. In the detection law, the detection performance (dp, rt) during the ingress is calculated using sensor detection and observer search models. For EO sensors, the resolvable cycles across the target is used to evaluate the imaging quality in slope range R, which is calculated as follows, where Lb is the resolving line number, and Se is the target equivalent dimension, fc is the camera focus, bc is the imaging surface height. The line-of-sight between the plane and the target should be maintained during the detection course, otherwise the detection is considered to be failed.
δ
δ = Lb × Se × f c /(bc × R)
(3)
For different targets, δM represents the minimum resolvable cycles needed to find the target with 50%, which is the function of target sizes and orientations. The instantaneous probability of detection is the function of δ and δM, as shown in (4). And the ensemble detection probability PN by time step N can be calculated by (5). Then dp can be calculated based on the ingress time and the expected detection probability, and dp-1 is the third object to be minimized. Pi = f (δ / δ M ) N
PN = 1 − ∏ (1 − Pi )
(4) (5)
i =1
3 Methodology The structure of the PAP problem makes it difficult to be solved by conventional planning methods. First, the design space is large, highly non-linear and contains different variable types, both continuous and discrete. And due to the complex battle environment, local minima are inevitable. Second, the problem involves complex analysis and evaluation procedures carried out by different laws, and laws are coupled
Tactical Aircraft Pop-Up Attack Planning Using Collaborative Optimization
365
with each other. This leads to the time-consuming analysis and iteration in planning procedures. Inherently, the PAP problem is a complex design problem with a set of design variables to be determined. If the laws are considered as disciplines, the PAP problem is inherently a MDO problem. Therefore, the collaborative optimization (CO) method is adopted to determine the optimal solution of this complicated problem. The collaborative optimization method is first proposed by Kroo et al., [3] and Balling and Sobieszczanski-Sobieski [4], and has been successfully applied to a number of different MDO problems. CO is a two level optimization method specifically created for largescale distributed-analysis applications. The basic architecture of CO is made up by two-level optimizers (Fig. 3), which facilitates concurrent optimization at the discipline design level. In CO, variables are divided into shared design variables xsh, local variables xi, state variables yi and auxiliary variables (xaux)ij. Auxiliary variables are introduced as additional design variables to replace the coupling variables (yij) output from disciplinary i and input to disciplinary j, so that the analyses in disciplines i and j can be executed concurrently. Interdisciplinary compatibility constraints d are added to ensure that consistency such that (xaux)ij=yij. The system level optimizer attempts to minimize the design objectives function F while satisfying all the compatibility constrains d. The system level design variables xsys consist of not only the variables xsh but also the auxiliary variables xaux, which are specified by the system level optimizer and sent down to subspaces as targets to be matched. Each subspace, as a local optimizer, operates on its local design variables xi with the goal of matching target values posed by the system level as well as satisfying local constraints gi. The optimization procedure runs through the iteration between the system level and subspace optimizers, and the optimal solution is found when the design variables converge. When applying the CO method to solve the PAP problem, a system optimizer is founded and an optimizer is integrated within each law. The resulting framework rigorously simulates the existing relationship of the laws in the PAP problem as shown in Fig. 4. The design and state variables are organized as follows.
Fig. 3. Basic collaborative optimization architecture (Braun et al., [5])
366
N. Wang et al. Table 3. Design vectors and functions for PAP problem
CA1
CA2
CA3
CA4
Vector or Function xsh xaux targets to be matched x1 (xsh)1 (xaux)1 analysis targets to be matched x2 (xsh)2 (xaux)2 analysis targets to be matched x3 (xsh)3 (xaux)3 analysis targets to be matched x4 (xsh)4 (xaux)4 analysis
xsh0, DML Optimizer d1* D. V. : xss1={(xsh)1, (xaux)1, x1}
Variables or Content {is, ia, ipd, da, ra} {IP, PUP, AP, RP, rs, ih, aod} (xsys0)1={is, ia, ipd, da, ra, aod} {lf, tt, ah, tu} {is, ia, ipd, da, ra} empty set {IP, PUP, AP, RP, rs, ih} (xsys0)2={da, ra, rs} empty set {da, ra} {rs} {ba, aod} (xsys0)3={IP, PUP, AP, RP} empty set empty set {IP, PUP, AP, RP} {tet} (xsys0)4={is, ia, ipd, ih} empty set {is, ia, ipd} {ih} {dp, dt}
System Level Optimizer ba, tet, dp-1 Subject to: di*=0 D. V. : x={xsh, xaux} Min:
xaux0 d1*
xsh0, xaux0 d4*
Min:
xss1 CA1
DL Optimizer d4* D. V. : xss4={(xsh)4, (xaux)4, x4} Min:
xsh0,
xaux
0
xsh0, d2*
xaux0
d3*
xss4
y1 BL Optimizer Min: d2* D. V. : xss2={(xsh)2, (xaux)2, x2} xss2
y2 CA2
SL Optimizer Min: d3* D. V. : xss3={(xsh)3, (xaux)3, x3} xss3
y4 CA4
y3 CA3
Fig. 4. The PAP problem collaborative optimization framework
In the CO optimization process, the system level optimizer adjusts the design variables to minimize the objectives while satisfies the compatibility constrains. The system objectives are scaled by weight factors, as shown in (6), where each weight represents the significance of the corresponding objective. Due to the non-linear characteristic of the problem, the genetic algorithms (GA) are used to solve the optimization needs in both system and laws level. However, the fitness evaluation is based on the analyses results of the laws, which may cause intolerable evaluation time during
Tactical Aircraft Pop-Up Attack Planning Using Collaborative Optimization
367
iteration. The most common way to handle it in MDO is to calculate the post-optimal sensitivities at the subspace optimum as the gradients of the system level objectives and constrains. However, when the problem is highly non-linear, the calculation gradients fails to work. For a better choice, the response surface method (RSM) is used to approximate the fitness (state variables) changes in response to the design variable variation [rsearico]. There are several techniques in building the response surface, including polynomial, variable complexity modeling (VCM) and neural networks (NN). In this article, the RBF neural network is adopted. Details of the RSM methods are provided in [6][7][8].
f obj = w1 × be + w2 × tet + w3 × dt −1
(6)
The planning procedure of the PAP problem using CO is as follows: Procedure: Collaborative Optimization based PAP with RSM (1) Construct response surface (2) Initialize system level decide variables (3) Repeat (4) For i=1 to 4 Send system design variables [xshi xauxi] to ith law (5) (6) End For (7) Perform laws level optimization respectively (8) For i=1 to 4 Return ith law objective value di* to system level (9) (10) End For (11) Update response surface (12) Perform system level optimization using response surface (13) Until all compatibility constrains are satisfied or terminal iteration condition is reached
4 Experiments and Results The introduced CO method is tested in two different PAP planning scenarios, as shown in Figure 5. The simulation is run by Matlab on a standard PC with Pentium 4 CPU of 2.8GHz. The main parameters’ values are shown in Table 4. The training samples for building the RBF network are randomly selected among the design space. The output of the network includes the optimization objectives (tet, ba, dp-1) as well as the compatibility constrains (d1-d4). The initial number of training samples is as small as 37, and the network is updated through every iteration in the optimization process. The average running time of the proposed method with RSM is compared to that without a RSM as shown in Table 6. It can be seen that using the RBF network a significant improvement in planning speed is achieved. The results found by the CO method are shown in Table 5, and the delivery maneuver trajectory is shown in Figure 6. The quality of the results is compared with general GA methods in which all design variables are searched on a single level (allat-once). From the comparison results, we can find that the proposed CO method can find out better solutions while maintain efficiency in solving the PAP problem.
368
N. Wang et al.
(a)
(b)
Fig. 5. Two different scenarios for pop-up attack weapon delivery tasks
(a)
(c)
(b)
(d)
Fig. 6. The result delivery maneuver trajectory found by the CO method. Subfigure (a) and (b) show the plane profile and subfigure (c) and (d) show the vertical profile in different scenarios. The area in red shows the terrain mask to threat at ingress altitude
Tactical Aircraft Pop-Up Attack Planning Using Collaborative Optimization
369
Table 4. The main parameters’ values used by the experiments
Parameters
Lb fc
Values 400 120
Units
Parameters
/ mm
bc δM
Values 9.6 4
Units mm /
Table 5. The results found by the CO method
Design variables is ia ipd da lf tt ra ah tu
Scenario a
Scenario b
400 410 7.686 17 3 8 5429 328 right
357 170 2.67 24 3 5 2229 205 right
Table 6. The running time of the method with and without RSM
with RSM without RSM
Scenario a
Scenario b
35s 250s
40s 291s
Table 7. Comparison of the proposed CO method and general GA
proposed CO method Scenario a Scenario b tet ba
dp constrains iteration rounds planning time
0 79.9% 100% satisfied 711 35s
12 60.70% 100% satisfied 791 40s
general GA Scenario a Scenario b 0 69.7% 100% satisfied 1002 120s
10 49.9% 92% satisfied 1122 133s
5 Conclusion In this paper, four different laws (DML, BL, SL, and DL) are presented to describe the tactical aircraft pop-up attack planning problem. Then, to solve the aforementioned problem, a collaborative optimization method with response surface is designed and implemented. And the efficiency of the method is verified by simulations. The simulation results demonstrate that the proposed method do well in PAP planning. Further, based on the proposed method, we can implement parallel algorithm to further improve the method for near real-time planning, which can make the algorithm more practical for the applications of tactical aircraft mission planning and is an important direction for our research.
370
N. Wang et al.
References 1. F-16 combat aircraft fundamentals. HQ ACC/DOT (1996) 2. Yang, M.Z., Yin, J., Yu, L.: Research on Operation Distance of TV Homer. Electronics Optics & Control 10, 27–30 (2003) (in Chinese) 3. Kroo, I., Altus, S., Braun, R.: Multidisciplinary Optimization Methods for Aircraft Preliminary Design. In: AIAA-96-4018, 5th AIAA Symposium on Multidisciplinary Analysis and Optimization (1994) 4. Balling, R.J., Sobieski, J.: Optimization of Coupled Systems: A Critical Overview of Approaches. In: AIAA-94-4330-CP, 5th AIAA Symposium on Multidisciplinary Analysis and Optimization (1994) 5. Braum, R.D., Gage, P., Kroo, I.M., Sobieski, I.P.: Implementation and Performance Issues in Collaborative Optimization. In: AIAA 94-4325-CP, 6th AIAA Symposium on Multidisciplinary Analysis and Optimization (1994) 6. Sobieski, I.P., Manning, V.M.: Response surface estimation and refinement in collaborative optimization. In: AIAA-98-4758, pp. 359–370 (1998) 7. Sobieski, I.P., Kroo, I.M.: Collaborative Optimization Using Response Surface Estimation. In: AIAA Aerospace Sciences Meeting (1997) 8. Marc, A.S., Stephen, M.B.: Neural network approximation of mixed continuous discrete systems in multidisciplinary design. In: AIAA-98-0916, 36th Aerospace Science Meeting and Exhibit (1998) 9. Luis, V.S., Victor, A.S., Carlos, A.: Coello Coello: Use of Radial Basis Functions and Rough Sets for Evolutionary Multi-Objective Optimization (2007) 10. Rana, A.S., Zalzala, A.M.S.: A Neural Networks Based Collision Detection Engine for Multi-Arm Robotic Systems. In: International conference on artificial neural networks 5th, pp. 140–145 (1997) 11. Jon, C.L., Ronald, G.D.: Surveillance and Reconnaissance Imaging Systems. Artech House Publishers, Beijing (2007) 12. Russell, D.N.: Evaluation of Air-to-Ground Weapon Delivery Systems Performance. Naval Air Test Center Report, TM 80-1 SA (1980) 13. Gu, X.Y., John, E.R., Leah, M.A.: Decision-Based Collaborative Optimization. Journal of Mechanical Design 124 (2002) 14. Kroo, I.M., Manning, V.M.: Collaborative optimization - Status and directions. In: AIAA2000-4721, 8th AIAA Symposium on Multidisciplinary Analysis and Optimization (2000) 15. Garza, A.D., Darmofa, D.L.: An all-at-once approach for multidisciplinary design optimization. AIAA (1998)
Stereo Vision Based Motion Parameter Estimation Xinkai Chen Department of Electronic and Information Systems, Shibaura Institute of Technology, 307 Fukasaku, Minuma-ku, Saitama-shi, Saitama 337-8570, Japan
[email protected] Abstract. The motion parameter estimation for a class of movements in the space by using stereo vision is considered by observing a group of points. The considered motion equation can cover a wide class of practical movements in the space. The observability of this class of movement is clarified. The estimation algorithm for the motion parameters which are all time-varying is developed based on the second method of Lyapunov. The assumptions about the perspective system are reasonable and have apparently physical interpretations. The proposed recursive algorithm requires minor a priori knowledge about the system. Experimental results show the proposed algorithm is effective even in the presence of measurement noises. Keywords: Stereo vision, motion parameter, estimation.
1 Introduction Estimation of the motion and the structure of a moving object in the space by using the image data with the aid of CCD camera(s) has been studied recently. The motion treated in the literature field is composed of a rotation part and a translation part. A very typical method is the application of the extended Kalman filter (EKF) [5][13][14]. Numerous successful results have been reported until now where the formulation is based on a discrete expression of the motion, and the observability conditions are derived based on the perspective observations of a group of points [1][4]. Such a recursive algorithm obviously alleviates the noises in the image data in contrast to the non-recursive methods [9] based on solving a set of nonlinear algebraic equations. It should be mentioned that some theoretical convergence conditions of discrete EKF have been established both as observer and filter [11]. For continuous time perspective systems, the observation problem has been studied in the point of view of dynamical system theory in [2][3][7][10]. A necessary and sufficient condition for the perspective observability is given in [6] for the case that the motion parameters are constants. For the movements with piecewise constant motion parameters, the perspective observability problems are clarified in [13] for the cases of observing one point or a group of points. Furthermore, for the observer design, some simple formulations for observing the position of a moving object are proposed in [2][3][8]. The proposed observers are guaranteed to converge in an arbitrarily large (but bounded) set of initial conditions, and since the convergence is D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 371–380, 2009. © Springer-Verlag Berlin Heidelberg 2009
372
X. Chen
exponential it is believed that the performance of the new observers are reliable, robust and would quickly compute the position on real data. This paper considers the problem of motion parameter estimation for a class of movements under perspective observation. Naturally, the motions are formulated in continuous-time settings and the so-called motion parameters are assumed to be all time-varying. The motion parameters are estimated by using image data observed through pin-hole camera with constant focal length (normalized to unity). The basic and important idea is to analyze the extent to which we can develop a scheme that is guaranteed to converge by observing minimum number of points. A dynamical systems approach is employed since it provides us with powerful mathematical tools, and a nonlinear observer is developed based on the second method of Lyapunov [12]. In this paper, the considered motion equation can cover a wide class of practical movements in the space. The observability of this class of movement is clarified by observing at least three points. The estimation algorithm of the motion parameter is developed. The formulated problem can be converted into the observation of a dynamical system with nonlinearities. It should be noted that smoothened image data instead of the measured one is used in the proposed formulation in order to alleviate the noises in the image data. The assumptions about the perspective system are reasonable, and the convergence conditions are intuitive and have apparently physical interpretations. The attraction of the new method lies in that the algorithm is very simple, easy to be implemented practically. Furthermore, the proposed method requires minor a priori knowledge about the system and can cope with a much more general class of perspective systems. It should be noted that the changing of focal length is not considered in this paper. Experimental results show the proposed algorithm is effective.
2 Problem Statement Consider the movement of the object described by
ω1 (t ) ω 2 (t )⎤ ⎡ x1 (t ) ⎤ ⎡b1 (t ) ⎤ ⎡ x1 (t ) ⎤ ⎡ 0 d ⎢ ⎥ ⎢ x 2 (t )⎥ = ⎢ − ω1 (t ) ω 3 (t )⎥⎥ ⎢⎢ x 2 (t )⎥⎥ + ⎢⎢b2 (t )⎥⎥ , 0 dt ⎢ ⎢⎣ x3 (t )⎥⎦ ⎢⎣− ω 2 (t ) − ω 3 (t ) 0 ⎥⎦ ⎢⎣ x3 (t )⎥⎦ ⎢⎣b3 (t )⎥⎦
(1)
where x(t ) = [x1 , x 2 , x3 ] is the position; ω i (t ) and bi (t ) (i = 1, 2, 3) are the motion parameters. It is supposed that the observed position by Camera 1 is defined by T
y (t ) = [ y1 (t ),
⎡x y 2 (t ) ] = ⎢ 1 , ⎣ x3
x2 ⎤ ⎥, x3 ⎦
(2)
x2 − n ⎤ ⎥, x3 ⎦
(3)
and the observed position by Camera 2 is defined by
[
y * (t ) = y1* (t ),
]
⎡x −m y 2* (t ) = ⎢ 1 , ⎣ x3
Stereo Vision Based Motion Parameter Estimation
373
where m and n are constants. The perspective observations are defined in (2) and (3). The combination of the observations in (2) together with (3) is called “stereo vision”. In this paper, we make the following assumptions. (A1). m and n are known constants with m 2 + n 2 ≠ 0 . (A2). The motion parameters ω i (t ) and bi (t ) (i = 1, 2, 3) are bounded. (A3). x3 (t ) meets the condition x3 (t ) > η > 0 , where η is a constant. (A4). y (t ) and y * (t ) are bounded. Remark 1. It is easy to see that assumptions (A3) and (A4) are reasonable by referring to the practical systems. The purpose of this paper is to estimate the motion parameters ω i (t ) and bi (t ) (i = 1, 2, 3) by using the perspective observations.
3 Formulation of the Motion Identification Define
y 3 (t ) =
1 . x3 (t )
(4)
Then, equation (1) can be transformed as ⎧ y&1 (t ) = ω 2 + ω1 y2 + ω 2 y12 + ω3 y1 y2 + b1 y3 − b3 y1 y3 ⎪ 2 ⎨ y& 2 (t ) = ω3 − ω1 y1 + ω 2 y1 y2 + ω3 y2 + b2 y3 − b3 y2 y3 . ⎪ y& (t ) = ω y y + ω y y − b y 2 2 1 3 3 2 3 3 3 ⎩ 3
(5)
T θ (t ) = [b1 , b2 , b3 , ω1 , ω 2 , ω 3 ]T Δ = [θ 1 , θ 2 , θ 3 , θ 4 , θ 5 , θ 6 ] ,
(6)
Let
and ⎡φ ( t ) ⎤ ⎡ y φ (t ) = ⎢ 1 ⎥ = ⎢ 3 ⎣φ 2 (t )⎦ ⎣⎢ 0
0
− y1 y3
y2
1 + y1
y3
− y 2 y3
− y1
y1 y 2
2
y1 y 2 ⎤ . 2⎥ 1 + y 2 ⎦⎥
(7)
Thus, the first two equations in (5) can be rewritten as
⎡ y&1 (t ) ⎤ ⎢ y& (t )⎥ = φ (t ) ⋅ θ (t ) . ⎣ 2 ⎦
(8)
⎡ y&1* (t )⎤ * ⎢ * ⎥ = φ (t ) ⋅ θ (t ) . & y t ( ) ⎣ 2 ⎦
(9)
Similarly, for y * (t ) , it gives
374
X. Chen
with ⎡φ * (t )⎤ φ * (t ) = ⎢ 1* ⎥ ⎣φ 2 (t )⎦ ⎡y =⎢ 3 ⎣0
0 y3
− y1* y 3 − y 2* y 3
y 2* + ny 3 1 + y1* ( y1* + my 3 ) y1* ( y 2* + ny 3 ) ⎤ ⎥, y 2* ( y1* + my 3 ) 1 + y 2* ( y 2* + my 3 )⎦ − y1* − my 3
(10)
From (2) and (3), y 3 (t ) can be calculated by the average y 3 (t ) = m
y1 − y1* m +n 2
2
+n
y 2 − y 2* m2 + n2
.
(11)
Thus, φ (t ) and φ * (t ) are available. In the following, the vectors φ (t ) ⋅ θ (t ) and φ * (t ) ⋅ θ (t ) are estimated in section 3.1 by using the perspective observations defined in (2) and (3). Then, the motion parameters ω i (t ) and bi (t ) (i = 1, 2, 3) are estimated in section 3.2 by using the stereo observation of at least three points. 3.1 Identification of φ (t )θ (t ) and φ * (t )θ (t )
In the following, the observer of system (8) is formulated. We consider the system described by
⎡ yˆ&1 (t ) ⎤ ⎡ w1 (t ) ⎤ ⎢& ⎥=⎢ ⎥, ⎣⎢ yˆ 2 (t ) ⎦⎥ ⎣ w2 (t ) ⎦
⎡ yˆ 1 (0) ⎤ ⎡ y1 (0) ⎤ ⎢ˆ ⎥=⎢ ⎥, ⎣ y 2 (0) ⎦ ⎣ y 2 (0) ⎦
(12)
w& i (t ) = −( f i + α i ) wi (t ) + λˆi (t ) ⋅ sign( y i − yˆ i ) + f iα i ( y i − yˆ i ) ,
(13)
λˆi (t ) = β i ( y i − yˆ i + α i ri (t )) ,
(14)
r&i (t ) = yi − yˆ i ,
(15)
where f i ,α i , β i are positive constants, wi (0) can be any small constants, and ri (0) is chosen as ri (0) = 0 . Let ⎡ w (t ) ⎤ w(t ) = ⎢ 1 ⎥ . ⎣ w2 (t )⎦
The next theorem is obtained.
(16)
Stereo Vision Based Motion Parameter Estimation
375
Theorem 1. All the generated signals in (12)-(15) are uniformly bounded and w(t ) is the asymptotic estimate of φ (t ) ⋅ θ (t ) , i.e. lim (φ (t ) ⋅ θ (t ) − w(t ) ) = 0 .
(17)
t →∞
Proof. For simplicity, we only give the proof for i=1. Let e1 (t ) = y1 (t ) − yˆ 1 (t ) .
(18)
e&1 (t ) = φ1 (t ) ⋅ θ (t ) − w1 (t ), e1 (0) = 0 .
(19)
r1 (t ) = e&1 (t ) + α 1e1 (t ) .
(20)
Differentiating e1 (t ) yields
Now, define
Differentiating r (t ) yields
r&1 (t ) =
(
)
d (φ1θ − w1 ) + α 1 (φ1θ − w1 ) = η1 (t ) − f 1r1 (t ) + λˆ1 (t )sign(e1 ) , dt
(21)
d (φ1θ ) + ( f 1 + α 1 )(φ1θ ) . dt
(22)
with
η1 (t ) =
The uniformly boundedness of η1 (t ) and η&1 (t ) can be easily derived by using the assumptions. Thus, there exist constants λ1 > 0 such that
η1 +
1
α1
η&1 < λ1 .
(23)
Now, consider the Lyapunov candidate V (t ) =
(
Differentiating V (t ) yields
(
V& (t ) = r1 (t ) η1 (t ) − f 1r1 (t ) − λˆ1 (t ) ⋅ sign(e1 (t ))
(
)
)
2 1 2 1 ˆ r1 (t ) + λ1 (t ) − λ1 . 2 2β 1
(24)
)
+ λˆ1 (t ) − λ1 (e&1 (t ) ⋅ sign(e1 (t )) + α 1 e1 (t ) ) = − f1 r12 (t ) + r1 (t )η1 (t ) − (e&1 (t ) + α 1e1 (t ) )λˆ1 (t ) ⋅ sign(e1 (t ))
(
)
+ λˆ1 (t ) − λ1 (e&1 (t ) ⋅ sign(e1 (t )) + α 1 e1 (t ) ) = − f1 r12 (t ) + r1 (t )η1 (t ) − λ1e&1 (t ) ⋅ sign(e1 (t )) − α 1λ1 e1 (t )
(25)
376
X. Chen
Integrating the both sides of (25) from 0 to t yields V (t ) = V (0) − f1 ∫ r12 (τ )dτ + ∫ (e&1 (τ ) + α 1e1 (τ ) )η1 (τ )dτ t
t
0
0
t
− λ1 e1 (t ) − α1λ1 ∫ e1 (τ ) dτ 0
t
= V (0) − f1 ∫ r12 (τ )dτ + e1 (t )η1 (t ) − e1 (0)η1 (0) 0
+ ∫ e1 (τ )(− η&1 (τ ) − α1η1 (τ ) )dτ − λ1 e1 (t ) − α1λ1 ∫ e1 (τ ) dτ t
t
0
0
≤ V (0) − f1 ∫ r12 (τ )dτ + e1 (t ) (η1 (t ) − λ1 ) t
0
−e1 (0)η1 (0) + ∫ e1 (τ ) (η&1 (τ ) + α 1 η1 (τ ) − α 1λ1 (τ ) )dτ t
0
≤ V (0) − f
t
∫r
2 1 0 1
(τ )dτ − e1 (0)η1 (0)
(26)
Thus, it can be seen that V (t ) and the integral
t 2 0 1
∫r
(τ )dτ are bounded. Therefore,
r1 (t ) → 0 as t → ∞ . By the definition of r1 (t ) , it gives e1 (t ) → 0 and e&1 (t ) → 0 as t → ∞ . The theorem is proved. Similarly to (10), construct the equation
⎡ yˆ& * (t ) ⎤ ⎡ w* (t ) ⎤ ⎡ yˆ * (0) ⎤ ⎡ y * (0) ⎤ yˆ& * (t ) = ⎢ 1* ⎥ = ⎢ 1* ⎥ , ⎢ 1* ⎥ = ⎢ 1* ⎥ , ⎢⎣ yˆ& 2 (t ) ⎥⎦ ⎣ w2 (t ) ⎦ ⎣ yˆ 2 (0) ⎦ ⎣ y 2 (0)⎦
(27)
⎡ w * (t ) ⎤ * where ⎢ 1* ⎥ Δ = w (t ) can be defined by referring (13)-(15) by using the obtained w ( t ) 2 ⎣ ⎦ image data y * (t ) from Camera 2. Similar to Theorem 1, it can be concluded that w * (t ) is uniformly bounded and
(
)
lim φ * (t ) ⋅ θ (t ) − w * (t ) = 0 , t →∞
(28)
i.e. w * (t ) is the asymptotic estimate of φ * (t )θ (t ) . 3.2 Identification of θ (t )
Relations (17) and (28) tell us that, by observing one point via stereo vision, four relations about θ (t ) can be obtained. It can be easily checked that the rank of the ⎡ φ (t ) ⎤ matrix ⎢ * ⎥ is three. It can be argued that the relations about θ (t ) can be increased ⎣φ (t ) ⎦ by increasing the observation points. Since there are six entries in θ (t ) , it can be argued that at least two points are needed to get a solution of θ (t ) .
Stereo Vision Based Motion Parameter Estimation
377
Now, suppose p points are observed. For the j-th point, we denote the obtained ⎡ φ ( j ) (t ) ⎤ ⎡ φ (t ) ⎤ ⎡ w ( j ) (t ) ⎤ ⎡ w(t ) ⎤ and as ⎢ *( j ) ⎥ and ⎢ *( j ) ⎥ , respectively. ⎢ * ⎥ ⎢ ⎥ * ⎣φ (t )⎦ ⎣ w (t ) ⎦ ⎣φ (t )⎦ ⎣ w (t ) ⎦ Define
⎡ φ (1) (t ) ⎤ ⎡ w (1) (t ) ⎤ ⎢ *(1) ⎥ ⎢ *(1) ⎥ ⎢ φ (t ) ⎥ ⎢ w (t ) ⎥ Φ (t ) = ⎢ M ⎥ , W (t ) = ⎢ M ⎥ . ⎢ ( p) ⎥ ⎢ ( p) ⎥ ⎢ φ (t ) ⎥ ⎢ w (t ) ⎥ ⎢φ *( p ) (t ) ⎥ ⎢ w*( p ) (t )⎥ ⎦ ⎣ ⎣ ⎦
(29)
By Theorem 1, it gives
lim(Φ (t ) ⋅ θ (t ) − W (t ) ) = 0 .
(30)
t →∞
About the rank of the matrix Φ (t ) , we have the next lemma. Lemma 1. The matrix Φ (t ) is of full rank if and only if at least three points are not on a same line. ⎡ φ ( j ) (t ) ⎤ Proof. First, it can be easily checked that rank ⎢ *( j ) ⎥ = 3 . Then, by some basic ⎣φ (t )⎦ (i ) ⎡ φ (t ) ⎤ ⎢ *( i ) ⎥ φ (t ) calculations, it can be concluded that rank ⎢ ( j ) ⎥ = 5 if the i-th point and the j-th ⎢ φ (t ) ⎥ ⎢ *( j ) ⎥ ⎢⎣φ (t ) ⎥⎦ points are not same. Then, by some calculations, the lemma can be proved.
Lemma 1 means that at least three points are needed in the proposed formulation. Theorem 2. If at least three observed points are not on a same line, then the motion parameters are observable and it holds
{
(
lim θ ( t ) − Φ T ( t )Φ ( t ) t→ ∞
(
i.e. Φ T ( t )Φ ( t )
)
−1
}
Φ T ( t )W ( t ) = 0 ,
(31)
)
−1
Φ T ( t )W ( t ) is the asymptotic estimate of the vector θ (t ) . Since the image data is directly used in Φ (t ) , the measurement noise will directly
influence the accuracy of the estimation. In the practical application of the proposed algorithm, the image data y ( j ) (t ) and y *( j ) (t ) can be respectively replaced by the generated smooth signals
yˆ ( j ) (t )
and
yˆ *( j ) (t ) , since
y ( j ) (t ) − yˆ ( j ) (t ) → 0
and y *( j ) (t ) − yˆ *( j ) (t ) → 0 . As to the value of y3( j ) (t ) in Φ (t ) , although it can be
378
X. Chen
calculated in (11) by using the image data, we use a smoothed signal yˆ 3( j ) (t ) to replace it in order to mitigate the influence of measurement noises. The signal yˆ 3( j ) (t ) is generated as follows. ( j) ⎛ y ( j ) − y *( j) ⎞ y ( j) − y* ˆy&3( j) = λˆ(3 j ) (t ) sign ⎜ m 1 2 1 2 + n 2 2 22 − yˆ 3( j ) ⎟ , ⎜ ⎟ m +n m +n ⎝ ⎠
( j)
y ( j) − y* & λˆ(3 j ) (t ) = γ ⋅ m 1 2 1 2 m +n
(32)
( j)
y ( j) − y* + n 2 2 22 m +n
− yˆ 3( j ) ,
(33)
where γ is a positive constant. It can be easily proved that yˆ 3( j ) (t ) and λˆ3 (t ) are uni-
(
)
formly bounded and lim y 3( j ) (t ) − yˆ 3( j ) (t ) = 0 . t →∞
The recursive algorithms of deriving yˆ ( j ) (t ) , yˆ *( j ) (t ) and yˆ 3( j ) (t ) obviously alleviate the noises in the image data. By replacing y ( j ) (t ) , y *( j ) (t ) and y3( j ) (t ) in the matrix Φ (t ) with yˆ ( j ) (t ) , yˆ *( j ) (t ) and yˆ 3( j ) (t ) recpectively, we get a matrix Φˆ (t ) . If Φ (t ) is of full rank, then Φˆ (t ) is of full rank when t is large enough. Then, the relation (31) still holds if Φ (t ) is replaced by Φˆ (t ) .
4 Experimental Results In the experiment, an object is fixed on the tip of robot manipulator. The motion of the object is observed by two cameras. The generated motion of the object is described by
⎡ x1 (t ) ⎤ ⎡ 0 0.3 0.4 ⎤ ⎡ x1 (t ) ⎤ ⎡- 35.9 - 3cos(2t)⎤ d ⎢ x 2 (t )⎥⎥ = ⎢⎢ − 0.3 0 − 0.6⎥⎥ ⎢⎢ x 2 (t )⎥⎥ + ⎢⎢ 62.4 + 0.1sin(t) ⎥⎥ . dt ⎢ ⎥⎦ ⎢⎣ x3 (t ) ⎥⎦ ⎢⎣− 0.4 0.6 0 ⎥⎦ ⎢⎣ x 3 (t ) ⎥⎦ ⎢⎣ 11.8
(34)
The image data is obtained every 0.06 seconds. Four points starting at [23, 13, 98] , T
[23, − 13, 98]T , [− 23, 13, 98]T ,
and [− 23, − 13, 98] are observed. The differences between the estimated parameters and the corresponding genuine parameters are shown in Figures 1-2. The simulation results of the differences ω1 (t ) − ωˆ 1 (t ) and ω 2 (t ) − ωˆ 2 (t ) are very similar to that in Figure 1. The simulation results of b (t ) − bˆ (t ) and b (t ) − bˆ (t ) is very similar to that in Fig. 2. It can be seen 1
1
3
T
3
that very good estimates for the motion parameters are obtained based on the obtained image data.
Stereo Vision Based Motion Parameter Estimation
379
Fig. 1. The difference between ω 3 (t ) and ωˆ 3 (t )
Fig. 2. The difference between b2 (t ) and bˆ2 (t )
5 Conclusions The motion parameter estimation for a class of movements in the space by using stereo vision has been considered based on the observation of multiple (at least three) points. The considered motion equation can cover a wide class of practical movements in the space. The estimations of the motion parameters which are all timevarying have been developed based on the second method of Lyapunov. The assumptions about the perspective system are reasonable, and the convergence conditions are intuitive and have apparently physical interpretations. The proposed method requires minor a priori knowledge about the system and can cope with a much more general class of perspective systems. Experimental results have shown that the proposed algorithm is effective.
380
X. Chen
References 1. Calway, A.: Recursive estimation of 3D motion and surface structure from local affine flow parameters. IEEE Trans. on Pattern Analysis and Machine Intelligence 27, 562–574 (2005) 2. Chen, X., Kano, H.: A new state observer for perspective systems. IEEE Trans. Automatic Control 47, 658–663 (2002) 3. Chen, X., Kano, H.: State Observer for a class of nonlinear systems and its application to machine vision. IEEE Trans. Aut. Control 49, 2085–2091 (2004) 4. Chiuso, A., Favaro, P., Jin, H., Soatto, S.: Structure from motion causally integrated over time. IEEE Trans Pattern Analysis & Machine Intelligence 24, 523–535 (2002) 5. Doretto, G., Soatto, S.: Dynamic shape and appearance models. IEEE Trans. on Pattern Analysis and Machine Intelligence 28, 2006–2019 (2006) 6. Dayawansa, W., Ghosh, B., Martin, C., Wang, X.: A necessary and sufficient condition for the perspective observability problem. Systems & Control Letters 25, 159–166 (1994) 7. Ghosh, B.K., Inaba, H., Takahashi, S.: Identification of Riccati dynamics under perspective and orthographic observations. IEEE Trans. on Automatic Control 45, 1267–1278 (2000) 8. Jankovic, M., Ghosh, B.K.: Visually guided ranging from observation of points, lines and curves via an identifier based nonlinear observer. Systems & Control Letters 25, 63–73 (1995) 9. Kanatani, K.: Group-Theoretical Methods in Image Understanding. Springer, Heidelberg (1990) 10. Loucks, E.P.: A perspective System Approach to Motion and Shape Estimation in Machine Vision. Ph.D Thesis, Washington Univ. (1994) 11. Reif, K., Sonnemann, F., Unbehauen, R.: An EKF-based nonlinear observer with a prescribed degree of stability. Automatica 34, 1119–1123 (1998) 12. Satry, S., Bodson, M.: Adaptive Control, Stability, Convergence, and Robustness. Prentice Hall, Englewood Cliffs (1989) 13. Soatto, S.: 3-D structure from visual motion: Modelling, representation and observability. Automatica 33, 1287–1321 (1997) 14. Xirouhakis, Y., Delopoulos, A.: Least squares estimation of 3D shape and motion of rigid objects from their orthographic projections. IEEE Trans. on Pattern Analysis and Machine Intelligence 22, 393–399 (2000)
Binary Sequences with Good Aperiodic Autocorrelations Using Cross-Entropy Method Shaowei Wang1 , Jian Wang1 , Xiaoyong Ji1 , and Yuhao Wang2 1 2
Department of Electronic Science and Engineering, Nanjing University, Nanjing, Jiangsu, 210093, P.R. China Information Engineering School, Nanchang University, Nanchang, Jiangxi, 330031, P.R. China {wangsw,wangj,jxy}@nju.edu.cn,
[email protected] Abstract. Cross Entropy (CE) has been recently applied to combinatorial optimization problems with promising results. In this short paper a CE based algorithm is presented to search for binary sequences with good aperiodic autocorrelation properties. The algorithm proposed can explore and exploit the solution space efficiently. In most cases, it can frequently find out binary sequences with higher merit factor and lower peak sidelobe level very quickly.
1
Introduction
A binary sequence S of length n is an n − tuple(s0 , s1 , . . . , sn−1 ), where each si takes the value −1 or +1. The aperiodic autocorrelation of the binary sequence S at shift k is given by the autocorrelation function (ACF) Rk =
n−k−1
si si+k ,
for
k = 0, 1, ..., n − 1.
(1)
i=0
Generally, binary sequences whose aperiodic autocorrelations are collectively as small as possible are suitable for application in synchronization, pulse compression and especially radar. Researchers endeavor to search for such sequences since the 1950s. Barker sequence [1], which has the peak sidelobe level (PSL) of unity (|Rk | = 1 for 0 < k < n − 1), is obviously the perfect one. But the longest Barker sequence found by now is of length 13. It has long been conjectured that no other Barker sequence exists [2]. Since it is unlikely to achieve the ideal behavior given by a Barker sequence when the length of a binary sequence is beyond 13, attentions turned to other measures of how closely the aperiodic autocorrelations of a binary sequence of length can collectively approach the ideal behavior. There are two measures which are commonly used to evaluate the merit figures of the ACF. One is PSL mentioned above, which is the maximum magnitude of the out of phase ACF PSL(S) = max |Rk |. 0 t p m p ⎩
(9)
<
tp is the initial value of simulated annealing, tm is the end value, 0 η0≤1. At the beginning of training, in order to gain the probability structure of input sample space fast, so when the training time t≤tp ,choose max rate η0 from η(t).when t tp, make η(t) into 0 uniform to adjust the weight subtly, to accordance the probability structure of sample space.
>
5 Results And Discussions In this paper we took segmentation to a color map using the above algorithm improved. In this operation we selected I1-I2-I3 as Color space, and took I1-I2 -I3 as input mode. We used 8X8 matrix on output layer, Fig.3 was a scanning color map which had been erase noise. Fig.4 and Fig.5 were the results of segmentation.
422
Z. Xue and C. Jia
We may draw the following conclusions from the above segmentation results: (1) The competition learning network structure and the learning algorithm are simpler, the calculation is high efficiency when we took segmentation to the color map, small interference by the noise, and better accuracy. (2) The competition learning network is sometimes unstable. And longer time study. The experiment indicates that the competition learning network improved overcome this flaw in a certain extent. (3) When we used the SOM algorithm to take segmentation to the color map, Majority of pixels could be effectively distinguished. But some individual category division is not to be good. It is mainly because the samples are not typical, not many, etc
Fig. 3. Color map
Fig. 4. Blue element
A New Method of Color Map Segmentation
423
Fig. 5. Brown element
Acknowledgment We wish to extend our heartfelt thanks to Sen Yang, Wenjun Yang, Wenli Zhang who devoted considerable time to this thesis, knowing that they would derive very little in the way of tangible benefits in return.
References 1. Li, H.J.: SOM network in the Image Segmentation. Computing Technology and Automation, 19(3), 21–25(2000) 2. Moreira, J., Costa, L.D.F.: Neural-based Color Image Segmentation and Classification using Self-organizing Maps. In: Anais do IX SIBGRAPI 1996, pp. 47–54 (1996) 3. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics 43(1), 59–69 (1982) 4. Liqun, H.: Artificial neural network theory. Design and Application, 63–72 (2002) 5. Vesanto, J.: Data mining techniques based on the self-organizing map, Finland, pp. 4–10 (1997) 6. Hagan, M.T.: Neural network design, 14th edn., pp. 12–16. China Machine Press (2002) 7. Ohta, Y.C.: Color Information for Recog Segmentation. Computer Graphics and Image Processing 13, 222–241 (1990) 8. Martin, T., Hagan Howard, B., Demuth, M.B.: Neural Network Design. China Machine Press, 14(12)–14(16) (2002)
A Quantum Particle Swarm Optimization Used for Spatial Clustering with Obstacles Constraints Xueping Zhang1,2, Jiayao Wang1,3, Haohua Du4, Tengfei Yang1, and Yawei Liu1 1
School of Information Science and Engineering, Henan University of Technology, Zhengzhou 450052, China 2 Key Laboratory of Spatial Data Mining & Information Sharing of Ministry of Education, Fuzhou University, Fuzhou, 350002, China 3 School of Surveying and Mapping, PLA Information Engineering University, Zhengzhou 450052, China 4 School of computer science and engineering, Beihang University,Beijing 100191, China
[email protected] Abstract. In this paper, a more effective Quantum Particle Swarm Optimization (QPSO) method for Spatial Clustering with Obstacles Constraints (SCOC) is presented. In the process of doing so, we first proposed a novel Spatial Obstructed Distance using QPSO based on Grid model (QPGSOD) to obtain obstructed distance, and then we developed a new QPKSCOC based on QPSO and K-Medoids to cluster spatial data with obstacles constraints. The contrastive experiments show that QPGSOD is effective, and QPKSCOC can not only give attention to higher local constringency speed and stronger global optimum search, but also get down to the obstacles constraints and practicalities of spatial clustering; and it performs better than Improved K-Medoids SCOC (IKSCOC) in terms of quantization error and has higher constringency speed than Genetic K-Medoids SCOC. Keywords: Spatial clustering, Obstacles constraints, Quantum particle swarm optimization, Spatial obstructed distance.
1 Introduction Spatial clustering has been an active research area in the data mining community. Spatial clustering is not only an important effective method but also a prelude of other task for Spatial Data Mining (SDM). As reported in surveys on data clustering, clustering methods can be classified into Partitioning approaches, Hierarchical methods, Density-based algorithms, Probabilistic techniques, Graph theoretic, Grid-based algorithms, Model-based approaches, Genetic Algorithms, Fuzzy methods, Rough Set methods etc. Some algorithms have also integrated two or three kinds of clustering methods. As pointed out earlier, these techniques have focused on the performance in terms of effectiveness and efficiency for large databases. However, few of them have taken into account constraints that may be present in the data or constraints on the clustering. These constraints have significant influence on the results of the clustering D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 424–433, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Quantum Particle Swarm Optimization Used for SCOC
425
process of large spatial data. In order to improve the practicability of spatial clustering, studying spatial clustering with constraints is essential, and moreover, has important practical meaning. Its achievements will have more practical value and extensive application prospect. Spatial clustering with constraints has two kinds of forms [1]. One kind is Spatial Clustering with Obstacles Constraints (SCOC).An obstacle is a physical object that obstructs the reach ability among the data objects, such as bridge, river, and highway etc. whose impact on the result should be considered in the clustering process of large spatial data. As an example, Fig.1 shows clustering data objects in relation to their neighbors as well as the physical obstacle constraints. Ignoring the constraints leads to incorrect interpretation of the correlation among data points. The other kind is spatial clustering with handling operational constraints [2], it consider some operation limiting conditions in the clustering process. In this paper, we mainly discuss SCOC. Handling these obstacles constraints can lead to effective and fruitful data mining by capturing application semantics [3-8]. Since K.H.Tung put forward a clustering question COE (Clustering with Obstacles Entities) [3] in 2000, a new studying direction in the field of clustering research have been opened up. To the best of our knowledge, only three clustering algorithms for clustering spatial data with obstacles constraints have been proposed very recently: C3 C2 Bridge
C1 River
Mountain
(a) Data objects and constraints
C4
(b) Clusters ignoring constraints
Fig. 1. Clustering data objects with obstacles constraints
COD-CLARANS [3] based on the Partitioning approach of CLARANS, AUTOCLUST+ [4] based on the Graph partitioning method of AUTOCLUST, and DBCluC [5]-[8] based on the Density-based algorithm. Although these algorithms can deal with some obstacles in the clustering process, many questions exist in them. CODCLARANS algorithm inherits the shortcoming of CLARANS algorithm, which only gives attention to local constringency. AUTOCLUST+ algorithm inherits the limitation of AUTOCLUST algorithm, which builds a Delaunay structure to cluster data points with obstacles costly and is unfit for a large number of data. DBCluC inherits the shortcoming of DBSCAN algorithm, which cannot run in large high dimensional data sets etc. We proposed GKSCOC (Genetic K-Medoids Spatial Clustering with Obstacles Constraints) based on Genetic algorithms (GAs) and IKSCOC (Improved K-Medoids Spatial Clustering with Obstacles Constraints) in the literature [9]. The experiments show that GKSCOC is effective but the drawback is a comparatively slower speed in clustering. Particle Swarm Optimization (PSO) is relatively a newer addition to a class of population based search technique for solving numerical optimization problems. PSO has undergone a plethora of changes since its development in 1995. One of the recent developments in PSO is the application of Quantum laws of mechanics to observe the
426
X. Zhang et al.
behavior of PSO. Such PSO’s are called Quantum PSO (QPSO), and has faster convergence and global optima [10-13]. Recently, QPSO has been applied to data clustering [14]. In this paper, we presented a more effective QPSO method for SCOC. In the process of doing so, we first proposed a novel Spatial Obstructed Distance using QPSO based on Grid model (QPGSOD) to obtain obstructed distance, and then we developed QPKSCOC algorithm based on QPSO and K-Medoids to cluster spatial data with obstacles constraints. The contrastive experiments show that QPGSOD is effective, and QPKSCOC is better than IKSCOC in terms of quantization error and has higher constringency speed than GKSCOC. The remainder of the paper is organized as follows. Section 2 introduces QPSO algorithm. QPGSOD is developed in Section 3. Section 4 presents QPKSCOC. The performances of QPKSCOC are showed in Section 5, and Section 6 concludes the paper.
2 Quantum Particle Swarm Optimization 2.1 Classical PSO The Particle Swarm Optimization (PSO) is a population-based optimization method first proposed by Kennedy and Eberhart [15]. In order to find an optimal or nearoptimal solution to the problem, PSO updates the current generation of particles (each particle is a candidate solution to the problem) using the information about the best solution obtained by each particle and the entire population. In the context of PSO, a swarm refers to a number of potential solutions to the optimization problem, where each potential solution is referred to as a particle. The aim of the PSO is to find the particle position that results in the best evaluation of a given fitness (objective) function. Each particle has a set of attributes: current velocity, current position, the best position discovered by the particle so far and, the best position discovered by the particle and its neighbors so far. The user can define the size of the neighborhood. The mathematic description of PSO is as the following. Suppose the dimension of the searching space is D, the number of the particles is n. Vector X i = ( xi1 , xi 2 ,K , xiD )
represents the position of the i th particle and pbesti = ( pi1 , pi 2 ,K , piD ) is its best position searched by now, and the whole particle swarm's best position is represented as gbest = ( g1 , g 2 ,K , g D ) .Vector Vi = (vi1 , vi 2 ,K , viD ) is the position change rate of the i th particle. Each particle updates its position according to the following formulas:
vid (t + 1) = wvid (t ) + c rand ()[ pid (t ) - xid (t )] +c rand ()[ g d (t ) - xid (t )]
(1)
xid (t + 1) = xid (t ) + vid (t + 1) , 1 ≤ i ≤ n, 1 ≤ d ≤ D
(2)
1
2
where c and c are positive constant parameters, Rand () is a random function with 1 2 the range [0, 1], and w is the inertia weight. Equation (1) is used to calculate the particle's new velocity, then the particle flies toward a new position according to equation (2). The performance of each particle is
A Quantum Particle Swarm Optimization Used for SCOC
427
measured according to a predefined fitness function, which is usually proportional to the cost function associated with the problem. This process is repeated until userdefined stopping criteria are satisfied. A disadvantage of the global PSO is that it tends to be trapped in a local optimum under some initialization conditions [16]. 2.2 Quantum PSO
The development in the field of quantum mechanics is mainly due to the findings of Bohr, de Broglie, Schrödinger, Heisenberg and Bohn in the early twentieth century. Their studies forced the scientists to rethink the applicability of classical mechanics and the traditional understanding of the nature of motions of microscopic objects [17]. As per classical PSO, a particle is stated by its position vector X i and velocity vector Vi , which determine the trajectory of the particle. The particle moves along a determined trajectory following Newtonian mechanics. However if we consider quantum mechanics, then the term trajectory is meaningless, because X i and Vi of a particle cannot be determined simultaneously according to uncertainty principle. Therefore, if individual particles in a PSO system have quantum behavior, the performance of PSO will be far from that of classical PSO [18]. In the quantum model of a PSO, the state of a particle is depicted by wave functionψ ( x, t ) , instead of position and velocity. The dynamic behavior of the particle is widely divergent from that of the particle in traditional PSO systems. In this context, the probability of the particle’s appearing in position X i from probability density function ψ ( x,t ) , the form of which depends on the potential field the particle lies in [10]. The particles move according to the following iterative equations[11-13]: 2
x(t + 1) = p + β ∗ mbest − x(t ) ∗ In(1/ u ) if k ≥ 0.5 x(t + 1) = p − β ∗ mbest − x(t ) ∗ In(1/ u ) if k ≺ 0.5
,
(3)
p = (c1 pid + c2 pgd ) /(c1 + c2 )
where mbest =
1 M
M
M
M
(4) M
∑ P = ( M ∑ P , M ∑ P , …, M ∑ P 1
i
i =1
1
i1
i =1
1
i2
i =1
iD
).
(5)
i =1
Mean best of the population is defined as the mean of the best positions of all particles, u , k , c1 and c2 are uniformly distributed random numbers in the interval [0, 1]. The parameter β is called contraction-expansion coefficient. The QPSO is presented as follows: 1. Initialize population; 2. For t = 1 to t do { max
3. 4. 5. 6. 7.
Calculate mbest by equation (5) Update particles position using equation (3); Evaluate fitness of particle; Update Pbest ; Update Pgbest }
428
X. Zhang et al.
3 Spatial Obstructed Distance Using QPSO To derive a more efficient algorithm for SCOC, the following definitions are first introduced.
Definition 1. (Obstructed distance)Given point p and point q , the obstructed distance d o ( p, q ) is defined as the length of the shortest Euclidean path between two points p and q without cutting through any obstacles. Spatial obstructed distance using QPSO can be divided into two stages, that is, first establishes environment modeling based on grid model, and then adopts QPSO to get shortest obstructed path. 3.1 Environment Modeling Based on Grid Model
The basic thought is that the space area is divided into many small areas which have the same size, and every small area is considered as a grid. If the obstructed area is irregular shape, we can fill up the obstacles within verges. The grid considered as a free grid if it has no obstacles, otherwise an obstructed grid. Every grid has a corresponding coordinate and a sequence number, which represents coordinate uniquely. Grid model divided the space area into two-valued grids, number 0 represents the free grid and number 1 represents the obstructed grid. An example is shown in Fig.2 (a), 20*20 grid model, the shadow areas indicate the obstructed grid, and the number in the grid is the sequence number. The relationship between coordinate ( x p , y p ) and sequence number P is defined as follows:
⎧⎪ x p = ⎣⎡( p − 1) mod m ⎦⎤ + 1⎫⎪ ⎨ ⎬ ⎪⎩ y p = int ⎡⎣( p − 1) / m ⎤⎦ + 1 ⎪⎭
,
(6)
where m is the number of grid in every line. Our task is to search a route avoiding obstacles from point S to point E .The objective function can be revised as: np
L = ∑ ( xi − xi −1 ) 2 + ( yi − yi −1 ) 2 ,
(7)
i =2
where ( xi , yi ) express the information of route point, n p is the number of route point. 3.2 Obstructed Distance by QPSO
In the system of PSO, each particle represents a route from starting point to target point, for an example, xi = ( xi1 , xi 2 , ⋅⋅⋅xiD ) , where D is the dimension of particle. The
dimension of every particle represents a grid, the first dimension express the sequence number of the starting grid and the last one is the sequence number of target grid. Thus it can form a route when the sequence numbers are connected by the ascending order. The fitness function is defined as follows:
A Quantum Particle Swarm Optimization Used for SCOC
f =
1
1 ⎞ ⎛ ⎜1 + ⎟L n −1 ⎠ ⎝
,
429
(8)
where n is the number of the route grid passed, L is the sum distance between the two sequence number, it can be calculated according to equation (7). The QPGSOD is developed as follows. 1. Establish the environment model by grid model; 2. Initialize each particle, and make any particle’s position is free grid; 3. For t = 1 to t do { max
4. 5 6. 7. 8. 9.
Calculate mbest by equation (5) Update particles position using equation (3); Evaluate fitness of particle according to equation (8); Update Pbest ; Update Pgbest } Output obstructed distance.
The simulation result is in Fig.2 (b) and the solid line represents the optimal obstructed path obtained by QPGSOD.
(a) Environment modeling
(b) Obstructed path by QPGSOD
Fig. 2. Spatial Obstructed Distance using QPSO
4 Spatial Clustering with Obstacles Constraints Using QPSO 4.1 IKSCOC Based on K-Medoids
Typical partitioning-base algorithms are K-Means, K-Medoids and CLARANS. Here, K-Medoids algorithm is adopted for SCOC to avoid cluster center falling on the obstacle.Square-error function is adopted to estimate the clustering quality, and its definition can be defined as: Nc E = ∑ ∑ ( d ( p, m j )) 2 j =1 p∈C j
where
,
(9)
is the number of cluster C j , m is the cluster centre of cluster C j , d ( p, q) is j the direct Euclidean distance between the two points p and q . Nc
430
X. Zhang et al.
To handle obstacle constraints, accordingly, criterion function for estimating the quality of spatial clustering with obstacles constraints can be revised as: Eo =
N c ∑ ∑ j =1p∈C
(10)
( d o ( p , m )) 2 j j
where d o ( p, q ) is the obstructed distance between point p and point q . The method of IKSCOC is adopted as follows [5]. 1. Select N c objects to be cluster centers at random; 2. Distribute remain objects to nearest cluster center; 3. Calculate Eo according to equation (10); 4. While ( Eo changed) do {Let current E = Eo ; 5. Select a not centering point to replace the cluster center m randomly; j 6. Distribute objects to the nearest center; 7. Calculate E according to equation (9); 8. If E > current E , go to 5; 9. Calculate Eo ; 10. If Eo < current E , form new cluster centers}. While IKSCOC still inherits two shortcomings, one is selecting initial value randomly may cause different results of the spatial clustering and even have no solution, the other is that it only gives attention to local constringency and is sensitive to an outlier. 4.2 QPKSCOC Based on QPSO and K-Medoids
QPSO has been applied to data clustering [12]. In the context of clustering, a single particle represents the N c cluster centroid. That is, each particle X i is constructed as follows: X i = (mi1 ,..., mij ,..., miNc )
(11)
where mij refers to the j th cluster centroid of the i th particle in cluster Cij . Here, the objective function is defined as follows: f (x ) = 1 J i
(12) i
Nc Ji = ∑ ∑ d o ( p, m j ) j = 1 p ∈ Cij
(13)
The QPKSCOC is developed as follows. 1. Execute the IKSCOC algorithm to initialize one particle to contain cluster centroids;
Nc
selected
A Quantum Particle Swarm Optimization Used for SCOC
2. Initialize the other particles of the swarm to contain troids at random; 3. For t = 1 to t do {
Nc
431
selected cluster cen-
max
4. 5. 6.
For i = 1 to no_of_particles do { For each object p do { Calculate d o ( p, mij ) ;
7. Assign object p to cluster Cij such that do ( p, mij ) = min∀c = 1,..., N {do ( p, mic )} ; c 8. Calculate the fitness according to equation (12) } 9. Calculate mbest by equation (5) 10. Update particles position using equation (3); 11. Update Pbest ; 12. Update Pgbest } 13. Select two other particles j and k ( i ≠ j ≠ k ) randomly; 14. Optimize new individuals using IKSCOC} 15. Output. where t is the maximum number of iteration for QPSO. STEP 14 is to improve the max local constringency speed of QPSO.
5 Results and Discussion We have made experiments separately by K-Medoids, IKSCOC, GKSCOC, and QPKSCOC. n = 50 , c1 = c2 = 2 , tmax = 100 . Fig.3 shows the results on real Dataset. Fig.3 (a) shows the original data with river obstacles. Fig.3 (b) shows the results of 4 clusters found by K-Medoids without considering obstacles constraints. Fig.3(c) shows 4 clusters found by IKSCOC. Fig.3 (d) shows 4 clusters found by GKSCOC. Fig.3 (e) shows 4 clusters found by QPKSCOC. Obviously, the results of the clustering illustrated in Fig.3(c), Fig.3 (d) , and Fig.3 (e) have better practicalities than that in Fig.3 (b), and the ones in Fig.3 (e) and Fig.3 (d) are both superior to the one in Fig.3(c). So, it can be drawn that QPKSCOC is effective and has better practicalities. Fig.4 is the value of J showed in every experiment on Dataset1 by IKSCOC and QPKSCOC respectively. It is showed that IKSCOC is sensitive to initial value and it constringes in different extremely local optimum points by starting at different initial value while QPKSCOC constringes nearly in the same optimum points at each time. Fig. 5 is the constringency speed in one experiment on Dataset1. It is showed that QPKSCOC constringes in about 12 generations while GKSCOC constringes in nearly 25 generations. So, it can be drawn that QPKSCOC is effective and has higher constringency speed than GKSCOC. Therefore, we can draw the conclusion that QPKSCOC has stronger global constringent ability than IKSCOC and has higher convergence speed than GKSCOC.
432
X. Zhang et al.
(a)
(b)
(c)
(d)
(e) Fig. 3. Clustering Dataset
Fig. 4. QPKSCOC vs. IKSCOC
Fig. 5. QPKSCOC vs. GKSCOC
6 Conclusions In this paper, we developed a more effective Quantum Particle Swarm Optimization (QPSO) for Spatial Clustering with Obstacles Constraints (SCOC) by proposing a novel spatial obstructed distance using QPSO and the QPKSCOC algorithm. The proposed method is also compared with some other algorithms to demonstrate its efficiency and the experimental results are satisfied.
Acknowledgments This work is partially supported by Program for New Century Excellent Talents in University (NCET-08-0660),the Supporting Plan of Science and Technology Innovation Talent of Colleges in Henna Province (Number: 2008HASTIT012),and the Opening Research Fund of Key Laboratory of Spatial Data Mining & Information Sharing of Ministry of Education (Number:200807).
A Quantum Particle Swarm Optimization Used for SCOC
433
References 1. Tung, A.K.H., Han, J., Lakshmanan, L.V.S., Ng, R.T.: Constraint-based clustering in large databases. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 405–419. Springer, Heidelberg (2000) 2. Tung, A.K.H., Ng, R.T., Lakshmanan, L.V.S., Han, J.: Geo-spatial Clustering with UserSpecified Constraints. In: Proceedings of the International Workshop on Multimedia Data Mining (MDM/KDD 2000), Boston USA, pp. 1–7 (2000) 3. Tung, A.K.H., Hou, J., Han, J.: Spatial Clustering in the Presence of Obstacles. In: Proceedings of International Conference on Data Engineering (ICDE 2001), Heidelberg Germany, pp. 359–367 (2001) 4. Castro, V.E., Lee, I.J.: AUTOCLUST+: Automatic Clustering of Point-Data Sets in the Presence of Obstacles. In: Proceedings of the International Workshop on Temporal, Spatial and Spatial-Temporal Data Mining, Lyon France, pp. 133–146 (2000) 5. Zaïane, O.R., Lee, C.H.: Clustering Spatial Data When Facing Physical Constraints. In: Proceedings of the IEEE International Conference on Data Mining (ICDM 2002), Maebashi City Japan, pp. 737–740 (2002) 6. Wang, X., Hamilton, H.J.: DBRS: A Density-Based Spatial Clustering Method with Random Sampling. In: Proceedings of the 7th PAKDD, Seoul Korea, pp. 563–575 (2003) 7. Wang, X., Rostoker, C., Hamilton, H.J.: DBRS+: Density-Based Spatial Clustering in the Presence of Obstacles and Facilitators (2004), http://Ftp.cs.uregina.ca/Research/Techreports/2004-09.pdf 8. Wang, X., Hamilton, H.J.: Gen and SynGeoDataGen Data Generators for Obstacle Facilitator Constrained Clustering (2004), http://Ftp.cs.uregina.ca/Research/Techreports/2004-08.pdf 9. Zhang, X.P., Wang, J.Y., Fang, W., Fan, Z.S., Li, X.Q.: A Novel Spatial Clustering with Obstacles Constraints Based on Genetic Algorithms and K-Medoids. In: Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications (ISDA 2006) [C], Jinan Shandong China, pp. 605–610 (2006) 10. Liu, J., Sun, J., Xu, W.-b.: Quantum-behaved particle swarm optimization with adaptive mutation operator. In: Jiao, L., Wang, L., Gao, X.-b., Liu, J., Wu, F. (eds.) ICNC 2006. LNCS, vol. 4221, pp. 959–967. Springer, Heidelberg (2006) 11. Sun, J., Feng, B., Xu, W.: Particle Swarm Optimization with particles having Quantum Behavior. In: Proceedings of Congress on Evolutionary Computation, Portland, OR, USA, pp. 325–331 (2004) 12. Liu, J., Sun, J., Xu, W.-b.: Improving quantum-behaved particle swarm optimization by simulated annealing. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCS (LNBI), vol. 4115, pp. 130–136. Springer, Heidelberg (2006) 13. Sun, J., Lai, C.H., Xu, W.-b., Chai, Z.: A novel and more efficient search strategy of quantumbehaved particle swarm optimization. In: Beliczynski, B., Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds.) ICANNGA 2007. LNCS, vol. 4431, pp. 394–403. Springer, Heidelberg (2007) 14. Chen, W., Sun, J., Ding, Y.R., Fang, W., Xu, W.B.: Clustering of Gene Expression Data with Quantum-Behaved Particle Swarm Optimization. In: Proceedings of IEA/AIE 2008, vol. I, pp. 388–396 (2008) 15. Kennedy, J., Eberhart, R.C.: Particle Swarm Optimization. In: Proceedings of IEEE International Conference on Neural Networks, Perth, Australia, vol. IV, pp. 1942–1948 (1942) 16. van de Frans, B.: An Analysis of Particle Swarm Optimizers. Ph.D. thesis, University of Pretoria (2001) 17. Pang, X.F.: Quantum mechanics in nonlinear systems. World Scientific Publishing Company, River Edge (2005) 18. Feng, B., Xu, W.B.: Adaptive Particle Swarm Optimization Based on Quantum Oscillator Model. In: Proceedings of the 2004 IEEE Conf. on Cybernetics and Intelligent Systems, Singapore, pp. 291–294 (2004)
Fuzzy Failure Analysis of Automotive Warranty Claims Using Age and Mileage Rate SangHyun Lee and KyungIl Moon Department of Computer Engineering, Honam University, Korea
[email protected],
[email protected] Abstract. There are many situations where several characteristics are used together as criteria for judging the eligibility of a failed product. The warranty analysis characterized by a region in a two-dimensional plane with one axis representing age and the other axis representing mileage is known as warranty plan. A classical warranty plan requires crisp data obtained from strictly controlled reliability tests. However, in a real situation these requirements might not be fulfilled. In an extreme case, some warranty claims data come from users whose reports are expressed in a vague way. It might be caused by subjective and imprecise perception of failures by a user, by imprecise records of warranty data, or by imprecise records of the rate of mileage. This paper suggests different tools appropriate for modeling a two-dimensional warranty plan, and a suitable fuzzy method to handle some vague data. Keywords: Mileage accumulation ratio, Use period, Warranty claims reoccurrence, Fuzzy reasoning.
1 Introduction Manufacturers in developing products, production process, as well as available on the market after the sale and use of products sold to users and to verify the reliability of assessment is emerging as an important issue. Development in the life of products made in the lab for testing and accurate results of the data, but takes a lot of the costs, as well as does not reflect the actual use environment is a disadvantage. But the actual use of the warranty claims data obtained from the scene contains inaccurate data, but cost less and entering an environment that reflects the actual use and is the biggest advantage. The product of the post-warranty claims to collect data that can be generated with the system growing companies, the use of these warranty claims data is more demanding. Automotive warranty claims are usually obtained from afterservice center, in which are included some repairs of product failure and warranty data. Using these data, the failure of the product failure occurred at the point of extraction period on sales is defined as the period minus the period of use. Kaleisch et al. (1991) proposed a log-linear Poisson model using a number of warranty claims. But, Baxter (1994) pointed out that it can not see the first product use to estimate some distributions of its life table. This model can be applied only if the product is the same of each period. For the accumulation of mileage warranty, Lawless et al. D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 434–439, 2009. © Springer-Verlag Berlin Heidelberg 2009
Fuzzy Failure Analysis of Automotive Warranty Claims Using Age and Mileage Rate
435
(1995) suggested an optimal model based on data and supplementary information. Hu and Lawless (1996) developed non-parametric estimation of means and rates of occurrences from truncated recurrent event data. Suzuki(1985) examined the efficiency of a consistent estimator relative to maximum likelihood estimator when along with complete record of failure; partial record of non-failures can also be obtained. Nelson (1988, 1995) gave simple graphical methods for analyzing repairable systems data based on number and costs of repair and also provided confidence limits for the same. Automotive warranty claims related databases using the actual useful data about product performance in environments without feedback is a very rich resource of information. Failure mode and failure is loosely based on that report incorrect data belongs to the warranty period of uncertainly and mileage inaccurate because of features can occur. In this paper data from previous model year vehicle available through automotive warranty database will be used for the assessment. Sub-system of the vehicle mileage and the warranty claims data used to arrive at estimates for the time period using the fuzzy system provides a simple methodology. Also, the estimate of the vehicle’s warranty coverage period and mileage to evaluate the impact of the change in imports is used to help. In this paper, we received from the car company’s warranty data from actual field use and the mileage on the part of the fuzzy system using the product as opposed to a bad rate.
2 The Estimation of Warranty Claims We develop a method to estimate number of claims for a specific sub-system of a vehicle at various combinations of use period and mileage limits of warranty. Although the method focuses on analysis at component and sub-system level, the estimates at system or vehicle level can be easily obtained by pooling of component, sub-system level estimates. A main objective of the estimation method being discussed is to be able to assess the impact of changes in time and/or mileage limits on number of claims. For this purpose, in the following section, we discuss some issues of mileage accumulation, repeat failures, and claims per thousand vehicles sold. Let Mi denote time (in months from the sale of the automobile) and Ki denote mileage for the i-th (i =1, 2,…, N) automobile in a population of same type of vehicles. Let (in miles per month) denote mileage accumulation of i-th automobile. Let and be the distribution function (df) and probability density function (pdf), respectively, for. Estimating the parameters of pdf and df is a crucial step in the analysis. A warranty database usually contains mileage accumulation data for only those cars that fail within the warranty period. Repeat claims could be the result of either a new failure or difficulty in root cause elimination during the previous repair. The expected number of total claims can be obtained by combining estimates of repeat claims with the estimates for the first claims. Repeat claims as a proportion of the first claims can be estimated using following formula. ^
p rc , m / k 0 =
n (m ) − n f (m ) n f (m )
.
436
S. Lee and K. Moon
1) prc , m / k : estimate of repeat claims as a proportion of the first claims at month-in0
service and K0 mileage warranty limit. 2) n(m) : total number of claims up to m month-in-service value 3) n f (m) : number of first claims up to m0 month-in-service or number of cars with at least one claim up to m0 month-in-service. When increments of 5 or 10 month-in-service are used for arriving at prc , m / k val0 ues, the curve fitted to the data points can be used to arrive at intermediate ^ ^ prc , m / k values. Using p rc ,m / k and parameters of G(α ) , p rc ,m / un lim ited is estimated as: 0
0
^
p
rc , m / un lim ited
=
x −υ K p [α i ≤ M
0 0
m]
.
3 Fuzzy Reasoning of Age/Mileage Failure Rate Fuzzy logic was originally introduced by Zadeh as a mathematical way to represent vagueness in everyday life. The proposed overall procedure for approximating the impact of corrective actions on two-dimensional warranty is shown in Fig. 1, which consists of four components, namely, fuzzy rule base, fuzzy inference process, fuzzification process, and defuzzification process. The basic unit of any fuzzy system is the fuzzy rule base. All other components of the fuzzy logic system are used to implement these rules in a reasonable and efficient manner. Fuzzy rules are usually formulated as IF-THEN statements, with one or more antecedents connected to a consequent via operators like AND, OR, etc. IF (Antecedent1) OP (Antecedent2) … OP (Antecedentn) THEN (Consequent) (w). Where n is an integer, OP stands for operators like AND, OR, etc., and w represents a weight value indicating the importance of a rule.
Fig. 1. An overview of fuzzy logic system for a failure rate
Now, imagine a fuzzy rule where 2 antecedents apply to the same consequent (n = 2). Further, let Antecedent1 be activated to a degree of 0.8, and Antecedent2 to a degree of 0.7. The weight value is usually 1.0, and OP is usually an OR operator defined as:
Fuzzy Failure Analysis of Automotive Warranty Claims Using Age and Mileage Rate
437
Consequent = max[Antecedent1, Antecedent2]. In this situation, Consequent would be activated to a degree of max[0.8, 0.7] = 0.8. There is nothing extraordinary here. The process described is a standard process. Now imagine a more complex scenario where 5 antecedents apply to a Consequent (n = 5). A possible scenario may look like: Consequent = max[0.7, 0.8, 0.6, 0.5, 0.9] = 0.9. In this situation, we probably are less confident in the applicability and usefulness of the rule. Furthermore, as there are many different weights indicating the importance of the rule, an aggregation of the consequents across the rules is probably less confident. The formulation of rules showing such complexity, however, might be common in some domains. To approach the problem, this paper presents a method that aims to include each activated rule antecedent more actively in the reasoning process of the fuzzy rule base. OP is usually an OR operator defined as: Consequent = max[Antecedent1, Antecedent2]. IF an AND operator is defined, Consquent=min[Antecedent1, Antecedent2]. To explain the method, we describe a simple fuzzy rule system where two antecedents (Age, Mileage) relate to an output (Failure rate). The following two rules shall be included in the fuzzy rule system: IF (Mileage) IS (normal) OR (Age) IS (normal) THEN (Number of Warranty Claim) is (normal) IF (Mileage) IS (low) OR (Age) IS (low) THEN (Number of Warranty Claim) is (low) Further, both rules shall carry the weight 1.0, and the OR operator employed shall be the same as it was used before.
4 The Automotive Example The result of estimation of stress failure rate is obtained by two-way fuzzy analysis of the mileage and the age data. In this example, the vehicle information regarding particular model year, vehicle, sub-system name, and failure mode is not disclosed to protect proprietary nature of the information. Tables 1 summarize the number and amount of warranty claims issued against vehicle A shipped out in the 2004~2005 year.
Fig. 2. Mileage distribution
Using reliability analysis mileage to the production and sales by month and claims analysis to the claims data can accurately mapping. However, the number of vehicles
438
S. Lee and K. Moon
in production and difficult to calculate mileage. But, [figure 2 and Table 1] the total mileage claims came as the number of vehicles used to estimate the reliability of the mileage-based function can be calculated. Table 1. Mileage reliability analysis
∑M
0 ~ 2000
= mˆ 0~ 2000 (%) × ∑ Sale
Total stress can be defined as a function of mileage and age. Here, a stress and the failure rate of production are analyzed. Using the proposed fuzzy reasoning, the stress failures are estimated in 40000 and 60000km [figure 3]. Stress (%) =
∑ Claim × 100(%) i = a Age, l = a Mileage ∑ Sale × ∑ Mileage . ij
i
j
Fig. 3. Estimation of stress failure rate
5 Conclusion In this study, the change in age and the limit for usage are critical elements for evaluating the effect on the number and failure rate of the warranty claims. In general, when the accumulated ratio of mileage is high, the expectation of the warranty claim
Fuzzy Failure Analysis of Automotive Warranty Claims Using Age and Mileage Rate
439
is higher than when the ratio is low. In the conventional method for the estimation of the failure rate for warranty claims, the result is relatively higher. However, the method for the analysis of age and mileage warranty based on fuzzy deduction shows more reasonable results. In other words, the rate of an increase in the number of claims changes more rapidly while the accumulated ratio of Mileage is higher. Finally, since there exist various elements affecting warranty claims and it is necessary to perform a multi-dimensional analysis in consideration of these elements simultaneously, there is room for future improvements on this study. It is critical to probe deeply into the formation of the fuzzy groups in terms of input and output variables. In addition, it is desirable to obtain more accurate measurements for the weighted ratio of this fuzzy rule.
References 1. Baxter, L.A.: Estimation from Quasi Life Table. Biometrika 81(3), 567–577 (1994) 2. Kalbfleisch, J.D., Lawless, J.F., Robinson, J.A.: Methods for the Analysis and Prediction of Warranty Claims. Technometrics 33(3), 273–285 (1991) 3. Lawless, J.F., Hu, J., Cao, J.: Methods for the estimation of failure distributions and rates from automobile warranty data. Lifetime Data Anal., 227–240 (1995) 4. Suzuki, K.: Estimation of lifetime parameters from incomplete field data. Technometrics 27, 263–271 (1985) 5. Hu, X.J., Lawless, J.F.: Estimation of rate and mean functions from truncated recurrent event data. J. Am. Statist. Assoc. 91, 300–310 (1996) 6. Nelson, W.: Graphical analysis of system repair data. J. Qual. Technol. 20, 24–35 (1988) 7. Nelson, W.: Confidence limits for recurrence data applied to cost or number of product repairs. Technometrics 37, 147–157 (1995)
A Fuzzy-GA Wrapper-Based Constructive Induction Model Zohreh HajAbedi1,*, and Mohammad Reza Kangavari 2 2
1 Science and Research branch, Islamic Azad University, Tehran, Iran Department of Computer, Iran University of Scince and Technology, Tehran, Iran
Abstract. Constructive Induction is a preprocessing step applied to representation space prior to machine learning algorithms and transforms the original representation with complex interaction into a representation that highlights regularities and is easy to be learned. In this paper a Fuzzy-GA wrapper-based constructive induction system is represented. In this model an understandable real-coded GA is employed to construct new features and a fuzzy system is designed to evaluate new constructed features and select more relevant features. This model is applied on a PNN classifier as a learning algorithm and results show that integrating PNN classifier with Fuzzy-GA wrapper-based constructive induction module will improve the effectiveness of the classifier. Keywords: Constructive induction, Feature construction, Feature selection, Fuzzy, GA.
1 Introduction One of the most fundamental machine-learning tasks is inductive machine learning where a generalization is obtained from a set of samples, and it is formalized using different techniques and models. The ability of an inductive learning algorithm to find an accurate concept description depends strongly on the representation space [6]. Representation space is a space in which, training samples, hypotheses and domain knowledge are represented. Learning algorithms encounter difficulties when data is complex and includes irrelevant features, high accuracy in values and correlated features. If the representation space is well-designed, then learning results will tend to be satisfactory with almost any method [11]. As the importance of the representation space for machine learning problems was proved, the idea of CI was introduced by Michalski in 1986 [3]. Gradually, this idea was expanded and different artificial intelligent tools were employed to implement it. A learning process that includes two phases, one for constructing best representation space and the other for generating best hypothesis in new space, is known as CI. Most CI methods such as GALA1 [26], LFC [27] 2 and MRP3 [28], apply a greedy search to ∗
Corresopnding author. Lookahead Feature Construction. 2 Multidimensional Relational Projection. 3 Similarity Based Learning Algorithm. 1
D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 440–449, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Fuzzy-GA Wrapper-Based Constructive Induction Model
441
find new features. Due to the attributes interaction, the search space for constructing new features has more variation, and therefore, a greedy method may find a local optimal solution. Recent works on problems with interaction show that a global search strategy such as GA4 is more likely to be successful in searching through the intractable and complicated search spaces. Moreover, GA provides the ability to construct and evaluate several features as one single individual. Evaluating a combination of features is essential for the success of a CI method when complex interactions among several subsets of attributes exist. Some CI methods such as Fringe, SymFringe and DCFringe [30] use hypotheses that are generated with an SBL. These methods rely strongly on generated hypotheses. In cases where representation space is very complex, A SBL cannot learn effective hypotheses and as a result, CI methods depended on these hypotheses, cannot generate optimized representation space. Other CI methods such as CITRE [5] and Fringe are applicable for specific learning algorithm such as decision trees. Many earlier methods such as GALA use greedy approaches to implement CI and as we know greedy methods may be converged to local optima in complex search spaces. Evolutionary approaches are successful global search strategies in complex spaces. In recent works, evolutionary algorithms are employed to implement constructive induction. One example of these systems is MFE2/GA [4] system. In this system a GA-based Constructive Induction method is applied to representation space and satisfactory result are taken. Another example of evolutionary based constructive induction systems is GAP [13] system. In this system Genetic Programming and a Genetic Algorithm to pre-process data before it is classified using the C4.5 decision tree learning algorithm. Genetic Programming is used to construct new features from those available in the data, a potentially significant process for data mining since it gives consideration to hidden relationships between features. A Genetic Algorithm is used to determine which set of features is the most predictive. In this paper, a Fuzzy-GA wrapper-based approach is employed to implement CI. This model uses a simple and understandable real coding scheme for encoding chromosomes; it is also applicable for all types of learning algorithms. This model is tested on a PNN classifier and the results reveal that with the present CI module, PNN classifier accuracy can be increased considerably. The organization of this paper is as follows: section 2 introduces constructive induction, a brief explanation of fuzzy is represented in section 3, section 4 represents GA briefly, section 5 discusses Fuzzy-GA wrapper-based constructive induction system, in section 6 implementation results are illustrated and section 7 concludes the paper.
2 Constructive Induction Learning algorithms depend on information which is provided by user, in order to construct descriptions and hypotheses from data. So data analyst should determine adequate features for the learner. Features will be inadequate for learning if they are 4
Probabilistic Neural Network.
442
Z. HajAbedi and M.R. Kangavari
weakly or indirectly inter relevant or if they are not measured correctly and with appropriate accuracy. As the importance of the representation space for machine learning problems was proved, the idea of CI was introduced by Michalski [3] in 1986. Gradually, this idea was expanded and different artificial intelligent tools were employed for implementing it. CI is the process of transforming the original representation of hard concepts with complex interaction into a representation that highlights regularities [4]. The basic premise of research on CI is that results of a learning process directly depend on the quality of the representation space. If the representation space is well-designed, then learning results will be satisfactory with almost any method [11]. Similarity-Based Learning methods learn concepts by discovering similarities. They achieve high accuracy when the data representation of the concept is good enough to maintain the closeness of instances of the same class. Hard concepts with complex interaction are difficult to be learned by an SBL. Interaction means the relation between one feature and the target concept depends on another feature[19]. Due to interaction, each class is scattered through the space, and therefore, regularities are hidden to the learner. The interaction problem arises when shortage of domain knowledge exists and only low-level primitive features are available to represent data. In the previous works, CI concentrated only on constructing new goal relevant features. Later it was revealed that constructing new features is only one of the ways to optimize representation space. In fact, constructing new features is one type of expanding the space. Feature selection and features’ values abstraction are types of destruction of representation space. More recent works have viewed CI more generally, as a double-search process, in which one search is for an improved representation space and the other for best hypothesis in this space [11]. Operations for improving representation space for learning are: removing irrelevant features, constructing new goal relevant features and abstracting features’ values. We can also define constructive induction as a preprocessing process that is applied on representation space before learning algorithm, to provide a more suitable space for the learner. Place of CI is demonstrated in Fig. 1 [8].
Fig. 1. Place of CI process
Constructive induction can be divided into a filter approach or a wrapper [3]. In the filter approach feature selection is performed as a pre-processing step to induction. Because it is separated from the induction algorithm, filters are fast, they can be used with any induction algorithms once filtering is done, and can be used on large datasets. However, they may not agree on the relevancy of certain features with the induction algorithm. Methods for filtering mostly include those based on information theory or probabilistic methods. The wrapper approach uses the induction algorithm itself to make estimates of the relevance of the given features.
A Fuzzy-GA Wrapper-Based Constructive Induction Model
443
3 Fuzzy-GA Wrapper-Based Constructive Induction Model In this paper a wrapper-based constructive induction model is designed and implemented using a hybrid method by combination of GA and fuzzy logic. As we know, fuzzy set theory has been widely used to represent uncertain or flexible information in many types of applications, such as scheduling, engineering design, and optimization problems. It may provide an alternative and convenient framework for handling uncertain parameters, while there is lack of certainty in data or even lack of available historical data. In this model a genetic algorithm is also employed. A genetic algorithm is a problem-solving technique motivated by Darwin’s theory of natural selection in evolution [22]. A basic framework of GAs was established by Holland [21]. Most of the optimization problems are complex and solving them with classic methods is difficult or impossible. Genetic algorithms are powerful tools for these kinds of problems. In the presented model we combine fuzzy and GA as a hybrid model to use the premiums of both. The model presented in this paper discrete features' values, constructs new goal relevant features and select best features and thus provide a improved representation space for learning algorithms. In the Fig. 2 a block diagram of wrapper-based FGACI5 model is illustrated.
Fig. 2. Wrapper Based FGACI Model block diagram
As it is demonstrated in the Fig. 2, in the first step a real coded genetic algorithm is employed for constructing new features. Data set is updated on the basis of new features' formulas. EWD6 discretization algorithm is applied on new data set and feature's values are discretizied.EWD is a simple discretization algorithm that discretes features’ values into a predefined number of intervals [18]. In the next step, a PNN7 classifier is applied on data set and classifier precision is considered as a criterion for evaluating the new features. PNN [29] is a two-layer feed forward neural network that is used widely in classification applications. In Fuzzy-GA wrapper-based CI model a fuzzy system is designed for final evaluation of feature's sets. This fuzzy system has two inputs and one output. Fuzzy system inputs are "PNN classifier" and "average length of features". Fuzzy system output represents the quality of features sets. 3.1 GA Coding Scheme In wrapper-based FGACI model a real coding scheme is considered. Real coding in comparison to binary coding has faster convergence [20]. In this model each 5
Fuzzy GA Constructive Induction. Equal Width Discretization. 7 Probabilistic Neural Network. 6
444
Z. HajAbedi and M.R. Kangavari
chromosome has a structure as shown in Fig. 3.a. As it is depicted in Fig. 3.a, chromosome structure is in matrix form. In this matrix, the number of rows determines the number of new constructed features and the number of columns determines the length of constructed features.
Fig. 3. a: FGACI Chromosome Structure, b: A RGACI chromosome example
Each chromosome, i.e. a problem solution, includes one set of new features. Each row in chromosome matrix is a new feature formula. The number of new features varies and is determined randomly. Also the length of each feature formula and as a result, number of original features that take part in constructing new features varies and selected at random. In this matrix structure, odd columns contain original feature’s number and even columns contain operator’s code. Arithmetic operators, i.e. +,-,/,*, are used to construct new features. As an example, a row similar to Fig. 4, means 8th feature’s value must be subtracted from value of 7th feature to construct a new feature.(45 is subtraction code).
Fig. 4. A new feature formula
As another example, consider the Fig. 3.b. In this chromosome, the number of rows is 3 and thus we have 3 new features. Each row represents a new feature formula. Length of first new feature (row no. 1) is seven, length of second new feature (row no. 2) is 3 and length of third new feature (row no. 3) is 5. Features’ length are variable and thus in each row, extra cells are illustrated with zero.
3.2 GA Crossover Operator In FGACI, one-point crossover[1] is employed. Depending on crossover rate, a number of chromosomes are selected as parents. Parent chromosomes are selected two by two at random. In the next step, in each parent, one row is selected randomly as a crossover point. Ultimately parent chromosomes’ rows are exchanged from crossover points. In different applications, various types of crossover operators are being used. In FGACI model, a one-point crossover operator is employed.
A Fuzzy-GA Wrapper-Based Constructive Induction Model
445
Fig. 5. Crossover operator
Crossover operation is illustrated in Fig. 5. As depicted in Fig. 7, first parent is a feature subset of five new features that consists of five new features’ formula. Second parent consists of three new features’ formula. In the first parent, fifth row is selected as crossover point and in the second parent, second row is selected. Parents’ rows are exchanged through the crossover points and two offsprings are generated that have six and two rows, respectively. 3.3 GA Mutation Operator Mutation is used to transform the chromosome by the means of randomly changing one of the genes [23]. In FGACI, one chromosome is selected randomly. In the matrix of selected chromosome, one cell is selected. If column number of this cell is odd, selected cell contains a feature number and therefore it is replaced by a new feature number. If column number is even, it will be replaced by a new operator code.
Fig. 6. Mutation operator
As it is demonstrated in Fig. 6, cell with row number of three and column number of five is selected. Column number of this cell is odd and therefore the feature number of seven is replaced by five. 3.4 GA Selection Method The selection operation determines which parent chromosomes participate in producing offsprings for the next generation. Usually members are selected with a probability proportional to their fitness value [1]. In FGACI model, %90 percent of next generation population is selected in proportion to their fitness values and %10 percent of population is selected randomly. So in addition to providing strong solutions with more chance of being selected, weak solutions have still some chance to be included and improved in the next generation. 3.5 GA Fitness Function A genetic algorithm uses a fitness function to calculate the degree of fitness of a chromosome [24]. Within every generation, a fitness function should be used to evaluate the quality of every chromosome to determine the probability of its surviving to the next generation; usually the chromosomes with larger fitness have a higher survival probability [25].
446
Z. HajAbedi and M.R. Kangavari
In wrapper-based FGACI, fitness of each chromosome is evaluated with the fuzzy system designed for this purpose. This fuzzy system has two inputs and one output. Two inputs are "average length of features" and "PNN classifier accuracy". The output variable determines the quality of features set based on "average length of features" and "PNN classifier accuracy". "PNN classifier accuracy" is calculated by applying PNN on new data set. "Average length of features" is a criterion that estimates the complication of features sets. For calculating this criterion, sum of new features' formulas length are calculated through each feature set and this summation is divided on the number of features in the features set. As an example consider the chromosome in the Fig.7.
Fig. 7. A chromosome with 3 features formula
Fig.10 displays a feature set with 3 features. First feature length is 7, second feature length is 3, third feature length is 5. Average length of features in this feature set is calculated as below: 5
(1)
This fuzzy system is a mamdani fuzzy system that has two inputs and one output, Its inputs are " PNN Classifier Precision" and " Average length of features " and its output is " Quality of features set ". This fuzzy system rulebase has 25 fuzzy rules that some of them are listed below: If (ClassifierPrecision is very low) and (length is very short) then (Performance is Xpoor) If (ClassifierPrecision is medium) and (length is very short) then (Performance is XXgood) If (ClassifierPrecision is high) and (length is very short) then (Performance is excellent)
The wrapper-based FGACI model is applied on five data sets and results are shown in the following section.
4 Implementation Results Five well-known data sets are used from the UCI repository to examine the wrapperbased FGACI performance. These data sets Information are shown in table 1. Table 1. Datasets Information
A Fuzzy-GA Wrapper-Based Constructive Induction Model
447
Table 2, displays the results of running PPN classifier, prior to integrating it with wrapper-based FGACI module and after integrating it with the model. Datasets are divided to train and test data. In each dataset, 60% of data are used as train data and 40% are used as test data. In Table 2, accuracy columns show classifier accuracy on test data. GA parameters, i.e. population size, max epoch, crossover rate and mutation rate, are selected through a trial and error approach. Table 2. PNN classifier accuracy prior to integrating it with wrapper-based FGACI module and after integrating it with the model
As illustrated in Table 2, in "Wine" and "Glass" data set, classifier accuracy is increased up to 100% and thus AFN, i.e. the average number of samples that are labeled negative incorrectly, is zero. Number of samples in “Ionosphere” dataset is 354, 60% of this data, i.e. 212 samples, are used for training and 142 samples are used for testing the system. Number of classes in “Ionosphere” dataset is 2.In order to calculate ATP 8 , in each class, the number of samples that are predicted positive truly, is accounted and an average over the 2 classes is calculated and introduced as ATP. The calculating procedure of AFN9 is the same as ATP; the difference is in that for calculating AFN, in each class, then number of samples that are predicted negative falsely is accounted. As it is shown in Table 2, in “Ionosphere” dataset, AFN is decreased from 19 to 15, after integrating PNN with constructive induction module. Results reveal that classifier accuracy is increased considerably after applying FGACI module. In table 3, results of applying wrapper-based FGACI on datasets are compared with results of a system, named GAP [13]. As it is noted in Introduction section, in GAP, a GP10 is used for constructing features and a GA is employed for feature selection. Table 3. Comparison of Fuzzy-GA wrapper-base CI with GAP
8
Average True Positive. Average False Negative. 10 Genetic Programming. 9
448
Z. HajAbedi and M.R. Kangavari
As it is shown in table 3 in most cases Fuzzy-GA wrapper-based CI system represents better solutions. Furthermore, in this system, features complexity is also considered for evaluating but in GAP system only the learning algorithm accuracy is considered to evaluate the features sets.
5 Conclusion In this paper a Fuzzy-GA wrapper-based model is represented for constructive induction. In this model we employed fuzzy and GA in combination to take advantages of both to conquer the optimization problem of finding the best representation space for learning algorithms. Wrapper-based approaches use the induction algorithm itself to make estimates of the relevance of the given features and therefore we employed the wrapper-based approach to get better results. This model is applied to data sets prior to learning algorithms and improves learning algorithm accuracy by optimizing representation space. In this work, a PNN classifier is selected as learning algorithm and PNN classifier accuracy is evaluated before and after integrating it with represented constructive induction model. Results reveal that classifier accuracy will improved after integrating PNN classifier with Fuzzy-GA wrapper-based constructive induction module.
References 1. Kantardzic, M.: Data Mining: Concepts, Models, Methods, and Algorithms. John Wiley & Sons, Chichester (2003) 2. Zhou, E., Khotanzad, A.: Fuzzy Classifier Design Using Genetic Algorithms. Elsevier, Pattern Recognition 40 (2007) 3. Bloedorn, E.: Multistrategy Constructive Induction. George Mason University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy (1996) 4. Shila, S.L., Pérez, E.: Constructive Induction and Genetic Algorithms for Learning Concepts with Complex Interaction. ACM, New York (2005) 5. Matheus, C., Rendell, L.: Constructive Induction On Decision Tree. University of Illinois at Urbana-Champaign (1989) 6. Hu, Y., Kibler, D.: A Wrapper Approach For Constructive Induction. University of California (1993) 7. Callan, J.: Knowledge-Based Feature Generation for Inductive Learning. University of Massachusetts (1993) 8. Fawcett, T.: Feature Discovery for Inductive Concept Learning. University of Massachussets (1993) 9. Yang, S.: A Scheme for Feature Construction and a Comparison of Empirical Methods. University of Illinois (1991) 10. Rendell, L., Ragavan, H.: Improving the Design of Induction Methods by Analyzing Algorithm Functionality and Data-Based Concept Complexity. University of Illinois (1993) 11. Bloedorn, E., Michalski, R.: The AQ17-DCI System for Data-Driven Constructive Induction and Its Application to the Analysis of World Economic. In: Ninth International Symposium on Methodologies for Intelligent Systems (ISMIS-1996), Zakopane, Poland, June 10–13 (1996)
A Fuzzy-GA Wrapper-Based Constructive Induction Model
449
12. Gang, Q., Garrison, W., Greenwood, D., Liu, C., Sharon, H.: Searching for Multiobjective Preventive MaintenanceSchedules: Combining Preferences with Evolutionary Algorithms. Elsevier, European Journal of Operational Research (2007) 13. Ghosh, A., Jain, L.C.: Evolutionary Computation in Data Mining. Springer, Heidelberg (2005) 14. Thomas, B.B.: Evolutionary Computation in Data Mining. Springer, Heidelberg (2006) 15. Theodoridis, S.: Pattern Recognition, 2nd edn. Elsevier, Amsterdam 16. Fukunaga, K.: Introduction to Statistical Pattern Recognition. Purdue University (2003) 17. Wesley, C., Tsau, Y.L.: Foundations and Advances in Data Mining. Springer, Heidelberg (2005) 18. Yang, Y., Webb, G.I.: A Comparative Study of Discretization Methods for Naive-Bayes Classifiers. In: The 2002 Pacific Rim Knowledge Acquisition Work-shop, Tokyo, Japan (2002) 19. Watanabe, L., Rendell, L.: Learning Structural Decision Trees From Examples. University of Illinois (1991) 20. Cho, H.J., Wang, B.H.: Automatic Rule Generation for Fuzzy Controllers Using Genetic Algorithms: a Study on Representation Scheme and Mutation rate. In: IEEE World Congress on Computational Intelligence Fuzzy Systems (1998) 21. Holland, J.: Adaption in Natural and Artificial. SystemsUniversity of Michigan Press (1975) 22. Whitley, D.: A genetic algorithm tutorial (1994) 23. Guo, Z.X., Wong, W.K., Leung, S.Y.S., Fan, J.T., Chan, S.F.: Genetic Optimization of Order Scheduling with Multiple Uncertainties. Elsevier, Expert Systems with Applications 35 (2008) 24. Chen, S.M., Huang, C.M.: A New Approach to Generate Weighted Fuzzy Rules Using Genetic Algorithms for Estimating Null Values. Elsevier, Expert Systems with Applications 35 (2008) 25. Chen, L.H., Hsiao, H.D.: Feature Selection to Diagnose a Business Crisis by Using a Real GA-based Support Vector Machine: An Eempirical Study. Elsevier, Expert Systems with Applications 35 (2008) 26. Hu, Y., Kibler, D.F.: Generation of Attributes for Learning Algorithms. In: Proc. of the Thirteenth National Conference on Artificial Intelligence, pp. 806–811. AAAI, The MIT Press (1996) 27. Ragavan, H., Rendell, L.A.: Lookahead Feature Construction for Learning Hard Concepts. In: Proc. Of the Tenth International Conference on Machine Learning, June 1993, pp. 252–259. University of Massachusetts, Amherst, MA, USA (1993) 28. Perez, E.: Learning Despite Complex Interaction: An Approach Based on Relational Operators. PhD thesis, university of Illinois, Urbana-Champaign (1997) 29. Specht, D.F.: Probabilistic Neural Networks and the Polynomial Adaline as Complementary Techniques for Classification. IEEE Transactions on Neural Networks 1, 111–121 (1990) 30. Pagallo, G.: Adaptive Decision Tree Algorithms for Learning from Examples. PhD thesis, University of California at Santa Cruz (1990)
Warning List Identification Based on Reliability Knowledge in Warranty Claims Information System SangHyun Lee1, ByungSu Jung2, and KyungIl Moon1 1 Department of Computer Engineering, Honam University, Korea Department of Computer and Electric Information, Nambu University, Gwangju, Korea
[email protected],
[email protected],
[email protected] 2
Abstract. The purpose of this paper is to propose process, method and knowledge for identifying warning lists objectively based on reliability knowledge in warranty claims information system. This study was designed with literature (process, method, knowledge) and case study. Firstly, process, method and knowledge for identifying were proposed and then applied to an automobile manufacturing firm. An identifying process was used three kinds of reliability knowledge composed of false alarm knowledge of warranty claims, inventory knowledge of products and variation knowledge due to warranty claims factors. Based on the case study of an automobile manufacturing firm, we found that the identification of warning lists using reliability knowledge results in the improvement in the production process of units. The main idea has been applied to a single practical case, an automobile manufacturing firm. For warning list identification of warranty claims, this paper uses three kinds of knowledge from a warranty claims system, which have not been treated before, including false warning possibility, inventory aspects and variation knowledge among warranty claims factors. Keywords: Warranty claims information system, Reliability knowledge, Warning list identification, VIN list.
1 Introduction Warranty claims are knowledge obtained by analyzing field data. The field data provides important information used to evaluate reliability, to assess new design and manufacturing changes, to identify causes of failure and to compare designs, vendors, materials, or manufacturing methods. The age-specific analysis of product failure data has engendered considerable interest in the literature. Field data is usually obtained from requests for repair or replacement when failures occur within the warranty period. For example, if an automobile under warranty fails, it is repaired by the original manufacturer, and the manufacturer obtains information such as failure times, causes of failures, manufacturing characteristics of items (e.g., model, place or time of manufacture, etc.), and environmental characteristics during use (e.g., personal characteristics of users, climatic conditions, etc.) (Junga and Bai, 2007). D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 450–457, 2009. © Springer-Verlag Berlin Heidelberg 2009
Warning List Identification Based on Reliability Knowledge
451
Many factors contribute to product failures in the warranty claims. The most important factors are the age (the time in service, etc.) of the product and the effects of the manufacturing characteristics, time of manufacture and the operating seasons or environments. The main problem is knowledge related to age-based warranty claims, which is obtained by analyzing the unanticipated failure modes, a harsher than expected operating environment, an unknown change in raw material properties or supplier quality, an improperly verified design change, etc (Wu and Meeker, 2001). The age-based analysis has several problems in statistically estimating the reliability. In particular, the calculation of the false alarm probability can be unreasonable in aggregated warranty claims analysis. Majeske (2003) proposes a general mixture model framework for automobile warranty data that includes parameters for product field performance, the manufacturing and assembly process, and the dealer preparation process. Identifying warning lists in a warranty claims system is a sort of warranty planning system that needs reliability knowledge (tolerance error of fraction nonconforming, degree of error variation, deviance of control limit, etc.) and artificial intelligence in the reasoning of system risk. There are many studies in connection with this problem: Data mining modeling in the automotive industry (Hotz et al., 1999); warranty claims process modeling of the automobile (Hipp and Lindner, 1999); software cost model for quantifying the gain and loss associated with warranty claims (Teng and Pham, 2004); software-based reliability modeling (Jeske and Zhang, 2005). These studies are related to forecasts of warranty claims. Sometimes warranty claims may depend on knowledge such as manufacturing conditions or the environment in which the product is used (such as production or operating periods). In regard to this problem, how to detect change points using adjacent distribution of warranty data in terms of identifying sub-units of the product serviceable during the warranty period is a reasonable and useful method (Karim and Suzuki, 2005). In warranty claims analysis using these covariates, inventory knowledge can be extended in the usual way to allow covariate analysis. Inventory knowledge for countable data such as claims is required in forecasting warranty claims. There are three problems in warning list identification of warranty claims data. The first is to exclude it even if there is a possibility that the warranty claims data may be wrong. The second is to identify the warning list by using only a known quantity of production. Thus, there is not reasonable in practical application. The final is to include variation due to chance of warranty claims variations. In this study, we provide a focus for these problems. The warning list identification of warranty claims is similar to data mining for identifying the reliability knowledge.
2 Reliability-Based Warning List Identification Procedure Many factors contribute to product failures that result in warranty claims. The most important factors are the age (the time in service) of the product and the effects of the manufacturing characteristics, time of manufacture and the operating seasons or environments. Thus, the age-specific analysis of product failure data has engendered considerable interest in the warranty system development. Specifically, the age-specific analysis is related to reliability knowledge discovery which can be defined as the nontrivial extraction of implicit, unknown, and potentially useful information from the
452
S. Lee, B. Jung and K. Moon
warranty claims system. The reliability knowledge is very important to early warning/detection of bad design, poor production process, defective parts, poor materials, etc. Thus, this section describes a process of the reliability knowledge discovery in the warranty claims system and represents a reliability-based procedure for warning list identification. 2.1 Reliability Knowledge in Warranty Claims System The reliability of warranty claims data can be measured by operation group and parts group. Figure 1 represents a process for measuring reliability in each operation group. The applied unit contains unit code, previous models, applicability of previous models and point of application. New units interlock the number of units and that of warranty claims with each other in the previous model to identify data. Unit sales data includes unit number, classification code, model code, model index and point of production and sales. Filtering engine outputs an XML or any other formatted document that contains claim ID, age (in days), usage (in hours), failure mode, downtime (in days), repair cost (adjusted), etc. Application discovery defines some terminology and concepts of group reliability, which are related to life, failure mode, repair and downtime (the time elapsed between the occurrence of failure and the completion of repair). The warranty claims count is very important to group reliability characterization. The warranty claims count by operation typically shows use period, number of warranty claims and total sales amount. The purpose of operation data is to multiply the number of individual operations that comprise the operation group by the number of units sold for the sake of correcting defective percentages. Group reliability characterization includes model codes, operation codes, points of production, and the shape and scale parameter of probability distribution as the reliability function. A standardization of Fraction nonconforming is frequently used in the reliability definition.
Fig. 1. The process of reliability discovery by the operation group
Figure 1. The reliability discovery by each parts group is similar to figure 1 except that parts group data file is used instead of the operation data. The parts data file includes unit division, parts group, parts code and parts name. Warranty claims count
Warning List Identification Based on Reliability Knowledge
453
per parts group includes unit division, model code, model index, group code, use duration and number of warranty claims. As the reliability characterization, a function of time in service and fraction nonconforming is usually used, and it is represented as a Poisson model or a Weibull distribution. In particular, the Weibull distribution is frequently used in the case that the failure rate is increasing according to worn-out. So, the Weibull distribution is most suitable for the warning/detection problem of the warranty claims. To estimate the parameters of the Weibull distribution for each operation group, data must be identified under the assumption of stable quality over a constant past period. 2.2 Reliability-Specific Warranty Claims Identification Process Figure 2 shows a method for analyzing and predicting warranty claims using the reliability knowledge discovery. In the reliability knowledge discovery, some incorrect facts are regarded as random variables. From this viewpoint, the false warning (alarm) problem of the warranty claims is included in reliability analysis. The second step is to build up a warranty claims table through an appropriate estimation of inventory period. In most cases, the warranty claims table is based on production and marketing level with no regard to inventory period. It makes the deviation between production amount and marketing amount remarkable, and thus leads to the significant error in the reliability sense. The third step is to provide early-warning lists of the warranty claims and to improve the imperfect units. Past warranty repair units are checked, and improvement effects are analyzed. The analysis of improvement effect is equivalent to the reduced operation of fraction nonconforming based on the warranty claims. Here, safety and security units are excluded. The fourth step is to apply variation knowledge due to disturbing factors. Thus, a desirable warning degree of the warranty claims can be determined.
Fig. 2. Reliability-specific procedure for waning list identification
3 Constructing Reliability Knowledge Matrix A reliability-specific procedure is to objectify the current failure state and to predict any future states. Basically, this procedure requires the modules which regard certain
454
S. Lee, B. Jung and K. Moon
failure phenomenon as random variable and calculate the failure distribution. Also, a method of sales amount prediction is needed for every warning activity of the warranty claims, and it is usually accomplished by using time series analysis. Here, there are several problems such that a possibility that the warranty claims data may be wrong, to set aside the inventory aspects, not to apply improvement effect and to include variation due to chance of warranty claims variations. In this section, a reliability knowledge matrix is proposed to resolve these problems. 3.1 False Warning Knowledge of Warranty Claims Suppose that ni is the unit number produced at period i ; nij is the unit number produced at period i and sold at period i+(j-1); and Rijk is the number of warning reports at k-th period with regard to certain warranty claims codes under consideration. The warranty claims monitoring is based on the nonconforming fraction of the past warranty claims. It is necessary to allocate a power function that allows us to answer questions about the service life of different. Let Rijk is subject to a Poisson distribution with independent parameter, nij λ k . Here, λ k means the number of units reported during the k -th serviceable period for the certain warranty claims code under consideration. λ0R equivalent to the reference value of λ k may be obtained on the basis of past reports. The process of detecting lower reliability refers to testing a hypothesis λ1 ≤ λ01 , λ 2 ≤ λ02 ,⋅ ⋅ ⋅, λ M ≤ λ0M . In terms of overall error alarm rate, the increase of M must turn to decrease the power function. Since Rijk is independent of Rijl ( l ≠ k ) , the test for hypothesis λ1 ≤ λ01 ,⋅ ⋅ ⋅, λ M ≤ λ0M may be conducted to form of an individual test. The test for a hypothesis λ1 ≤ λ01 (corresponding to a report during the first serviceable period) can come to the conclusion that λ1 > λ01 at period i + 1 , if Ri11 ≥ Ci11 for appropriate critical values. During the follow-up period, additional information can be still be accumulated on the basis of this information. In general, if Sij1 ≥ Cij1 at period i + j , we can come to the conclusion that λ1 > λ01 . Here, Sil1 is equivalent to the cumulative frequency of warranty claims reported during the first serviceable period in terms of units manufactured at period i . The probability of type 1 error can be expressed in the following:
α1* = 1 − p[ si11 < ci11, ⋅ ⋅⋅, siM 1 < CiM 1 ] .
(1)
This is less than or equal to the probability of actual type 1 errors. At j-th period after information is available at service time k , a hypothesis λ k ≤ λ0k can be accepted, if Sijk ≥ Cijk . So, type 1 error can be expressed as the following: α ∗k = 1 − P [ S j1k < Ci1k ,⋅ ⋅ ⋅ , Si ,M − K +1,K < Ci ,M − k +1,k ] ≤ α k
Thus, the false warning probability is calculated as follows:
.
(2)
Warning List Identification Based on Reliability Knowledge M
M
K =1
K =1
α ∗ = 1 − ∏ ( 1 − α ∗K ) ≤ 1 − ∏ ( 1 − α K ) = α .
455
(3)
To present this probability concept easily, it can be used a monitoring chart of specific warranty claims codes (see Wu and Meeker, 2001). The monitoring chart shows alarms arising during the first service period from units manufactured at the second production period. As the production period passes by, monitoring goes on, but something terminates. Here, there is an unacceptable problem. It is that 1 − ∏ α ∗ becomes zero in the sense that when there are a few number of units sold even without any warranty claims. In particular, the poisson probability detection is not desirable. This problem can be solved by applying Z-test method for failure rate in parallel with the Poisson test so as to calculate the optimal interval of production and sales based on inventory point. 3.2 Warning List Identification Which it requires aggregate information such as point of production, use period and frequency report of warranty claims data. The warning list of warranty claims is obtained as the way of calculating the statistical power function and predicting the fraction nonconforming by using this knowledge. A poisson distribution is usually used to the fraction nonconforming. First, the differences of poisson parameter(λ) are calculated based on previous and current warranty claims. Next, the product of the differences and the statistical significance level is calculated. The warning list can be obtained as the following rule: IF ∑ C ( Rijk ) > α* More accurate warning list is to reflect the improvement effect analysis, and to rewrite as the following rule: IF (AV_Ratio> threshold value) and (average of preimprovement > average of post-improvement) then positive effects=Yes, else positive effects=No. Here, AV _ Ratio refers to pure variation of improvement effect in terms of warranty claims variations; it is the ratio of reproducibility variation in warranty claims data. And the threshold value refers to a value offered by users.
4 The Automotive Example This section discusses the practical cases of applying the warning list identification. The warning list identification is the same as that of VIN(Vehicle Identification Number) list in the case of the automobile company. To extract the VIN list, this study used warranty claims data for report an automobile Company. In particular, warranty claims related to a pad kit – front disk as parts of the front brake system of cars; dating from September 2004 to August 2005. Figure 4 shows warranty claims data loaded from the point of view of production and sales and the serviceable point with regard to pad kit. By estimating parameters of the Weibull distribution, it is found that m = 2.0056, η = 44.6, and the intercept value of least square regression line, t0 = 2031.98. By using these parametric values, we can get average life cycle = 11 months, the viability in 3 years ( t =36) = 47.8% and the time for multiple failures
456
S. Lee, B. Jung and K. Moon
= 2.6 years. Poisson probability detection is disadvantageous in the sense that when there are a few number of units sold even without any warranty claims, 1 − ∏ α ∗ becomes zero, as discussed in the above Section 3.
Fig. 3. Optimal interval of production and sales using inventory knowledge
Fig. 4. Results of improvement effect analysis
To complement this demerit, this study applied the Z-test method for failure rate in parallel with the Poisson test so as to calculate the optimal interval of production and sales based on inventory point in regard to 1- to 2-month period of use, see Figure 5. Finally, figure 6 shows a VIN list on the reliability knowledge matrix based on the production month and use month. The corresponding warning rule is the following.
The VIN list shows that all units produced appeared in warning status at a certain point just after 7-months of use, which indicates considerable faults in reliability. Follow-up ongoing process improvement has contributed to a remarkably better quality of units produced since October 2004.
Warning List Identification Based on Reliability Knowledge
457
Fig. 5. VIN list extraction
5 Conclusion In this paper, we presented a process for identifying the warning lists of the warranty claims based on the reliability knowledge. This knowledge is related to the incorrect warning of the warranty claims. As practical application for identifying the warning list, we represented the VIN list from the warranty claims database in an automobile manufacturing firm. The result showed that the proposed process was better than using the existing identification lists of the warranty claims data.
References 1. Bai, J., Pham, H.: Repair-limit Risk-free Warranty Policies with Imperfect Repair. IEEE Transactions on Systems, Man, and Cybernetics Part A 35(6), 765–772 (2005) 2. Jeske, D.R., Zhang, X.: Some Successful Approaches to Software Reliability Modeling in Industry. Journal of Systems and Software 74(1), 85–99 (2005) 3. Junga, M., Bai, D.S.: Analysis of field data under two-dimensional warranty. Reliability Engineering & System Safety 92, 135–143 (2007) 4. Karim, M.R., Suzuki, K.: Analysis of Warranty Claim Data: A Literature Review. International Journal of Quality & Reliability Management 22(7), 667–686 (2005) 5. Majeske, K.D.: A mixture model for automobile warranty data. Reliability Engineering & System Safety 81(1), 71–77 (2003) 6. So, M.W.C., Sculli, D.: The role of trust, quality, value and risk in conducting e-business. Industrial Management & Data Systems 102(9), 503–512 (2002) 7. Teng, X., Pham, H.: A Software Cost Model for Quantifying the Gain with Considerations of Random Field Environments. IEEE Trans. Computers 53(3), 380–384 (2004) 8. Wu, H., Meeker, W.Q.: Early Detection of Reliability Problems Using Information from Warranty Databases. Technometrics 44(2), 120–133 (2001)
Cluster Analysis and Fuzzy Query in Ship Maintenance and Design Jianhua Che1 , Qinming He1 , Yinggang Zhao2 , Feng Qian1 , and Qi Chen1 1
2
College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China {chejianhua,hqm,qfeng,chenqi}@zju.edu.cn College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo 454003, China
[email protected] Abstract. Cluster analysis and fuzzy query win wide-spread applications in modern intelligent information processing. In allusion to the features of ship maintenance data, a variant of hypergraph-based clustering algorithm, i.e., Correlation Coefficient-based Minimal Spanning Tree(CC-MST), is proposed to analyze the bulky data rooting in ship maintenance process, discovery the unknown rules and help ship maintainers make a decision on various device fault causes. At the same time, revising or renewing an existed design of ship or device maybe necessary to eliminate those device faults. For the sake of offering ship designers some valuable hints, a fuzzy query mechanism is designed to retrieve the useful information from large-scale complicated and reluctant ship technical and testing data. Finally, two experiments based on a real ship device fault statistical dataset validate the flexibility and efficiency of the CC-MST algorithm. A fuzzy query prototype demonstrates the usability of our fuzzy query mechanism. Keywords: Cluster Analysis; Fuzzy Query; Data Mining; Ship Maintenance and Design.
1
Introduction
Cluster analysis and fuzzy query are always the pivotal arts in data mining and information retrieval. In recent decades, they have received a lot of focus by community researchers. Cluster analysis is often applied in data compression and vector quantization, image processing and pattern recognition, business analysis and bioinformatics mining [13] etc. Clustering, in nature, is a kind of unsupervised classification process, which partitions a given set of patterns(observations, data items, or feature vectors) into disjoint groups(clusters), and makes the proximity between patterns in the same group maximal, the proximity between patterns in different groups minimal [7]. The proximity between patterns may be figured out according to their diversified properties, and distance between patterns is one of the most common measures. However, clustering is also a difficult D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 458–467, 2009. c Springer-Verlag Berlin Heidelberg 2009
Cluster Analysis and Fuzzy Query in Ship Maintenance and Design
459
problem for its efficiency and agility in different applications. For example, there are a mass of devices fault statistic data in ship maintenance, which holds the features such as devices type diversity, data scale density and so on. At the same time, ship maintainers often need to examine various clustering results to find out the indeed fault causes. Many traditional clustering algorithms fail to offer an agile and convenient manner for ship maintainers. The appropriate clustering algorithms will be helpful for this peculiar application. On the other hand, fuzzy query plays an important role in intelligent information processing, especially in intelligent information retrieval [11]. Designed to run in conjunction with plain database management systems(DBMSs), fuzzy query explores the endless possibilities in database and shows the approximate candidates that best meet a given criteria so as to conduct the next level of information analysis. By setting an elastic threshold that relaxes or restricts how deeply into the data that query process probes, fuzzy query can provide information beyond the harsh restrictions of algebra and crisp relational operators. But similar to cluster analysis, fuzzy query currently suffers from two issues: 1) How to compute the similarity degree of database record properties to the criteria record properties; 2) How to compute the similarity degree of database records to the criteria record according to the similarity degree of their homologous properties. Therefore, it’s requisite to design a reasonable fuzzy query mechanism for those basilica applications. This paper presents a variant of hypergraph-based clustering algorithm for ship maintainers to analyze the existed fault data, an unbiased and usable fuzzy query mechanism for ship designers to retrieve the helpful data records. Specifically, the contributions of this paper consist in: 1) we have proposed a hypergraph-based clustering algorithm-Correlation Coefficient-based Minimal Spanning Tree(CC-MST) to group the peculiar ship maintenance data by investigating their features, the CC-MST algorithm owns merits such as intuition, flexibility and efficiency, etc. 2) we have designed an unbiased and serviceable fuzzy query mechanism to find the most similar data records to a given criteria record. This fuzzy query mechanism efficiently resolves two issues, i.e., computing the similarity degree of different properties data such as numeric, character, boolean and the similarity degree of database records to the given criteria record. The rest of this paper is organized as follows. We begin in Section 2 with related work. Then, we introduce the taxonomy of common cluster algorithms and our proposed Correlation Coefficient-based Minimal Spanning Tree(CC-MST) algorithm in Section 3 and our proposed fuzzy query mechanism in Section 4. In Section 5, we describe the experiment setup and result evaluation. Finally, we conclude with discussion in Section 6.
2
Related Work
Cluster analysis and fuzzy query in common applications have been studied extensively for decades, and lots of clustering algorithms and fuzzy query mechanisms have been brought forward [6,12,10,3,13,5,2]. Tapas Kanungo et al. [6]
460
J. Che et al.
gives a detailed introduction to the analysis and implementation of K-MEANS clustering algorithm. Hui Xiong et al. [12] investigates how data distributions can have impact on the performance of K-means clustering, and provides a necessary criterion to validate the clustering results, i.e., the coefficient of variation(CV). Pavan and Pelillo [10] present a new graph-theoretic approach for clustering and segmentation. Inderjit S. Dhillon et al. [3] develops a fast high-quality multilevel algorithm that directly optimizes various weighted graph clustering objectives to achieve the weighted graph cuts without eigenvectors. Rui Xu and Donald Wunsch II [13] survey the scope of cluster analysis, popular clustering algorithms and their applications in some benchmark data sets, and disscuss several tightly related topics like proximity measure and cluster validation. Hoque et al. [5] designs a sample fuzzy database and shows that fuzzy query costs the same time as classical query on classical database, but the less time on the sample fuzzy database. Charalampos Bratsas et al. [2] provide an extension of the ontologybased model to fuzzy logic, as a means to enhance the information retrieval(IR) procedure in semantic management of Medical Computational Problems(MCPs). As for cluster analysis of ship model design and industry information, Li et al. [8] propose a fuzzy c-mean clustering algorithm to detect ships using fully polarimetric SAR data. Fang et al. [4] explores the application of key data mining technologies in computer aided conceptual design of ship engine room through analyzing the principle and way of ship engine room conceptual design. Benito et al. [1] analyzes the roles of each section in Norway marine industry and their influence on the whole industry with clustering algorithms. Liu and Wu [9] have analyzed the human operational factors in all investigation reporters of shipwreck to find the relation between shipwreck and human misoperations. All of these works are classic and significant. However, they do not involve the devices fault analysis with clustering method in ship maintenance and the ship model design with fuzzy query in ship design. At present, there are little work on cluster analysis in ship maintenance and fuzzy query in ship design.
3
Cluster Analysis in Ship Maintenance
In ship maintenance, there exists amount of statistic data about various device faults. For example, marine pump is an important unit to pump water and oil, and its reliability impacts directly the quality of ship products. Each type of marine pumps may occur various fault phenomena, which arise by some design and fabrication factors, types selection or human operations. When maintaining them, it is hard to properly classify these phenomena and identify the fault causes. Cluster analysis is an efficient resolution to do active precaution, diagnose arisen failure and improve original design. 3.1
Classification of Clustering Algorithms
Multitudinous algorithms have been proposed in the literature for clustering. These algorithms can be divided into seven categories: hierarchical methods, partitional methods, grid-based methods, constraint-based methods, co-occurrence
Cluster Analysis and Fuzzy Query in Ship Maintenance and Design
461
categorical data-based methods, methods used in machine learning, and methods for high dimensional data [13]. Further, hierarchical methods can be classified into agglomerative algorithms and divisive algorithms, which agglomerative algorithms contain single-link algorithms, complete-link algorithms and minimum-variance algorithms(e.g., BIRCH, CURE, ROCK, CHAMELEON). Partitional method includes also squared error clustering(e.g., SNN, PAM, CLARA, CLARANS, X-means), probabilistic clustering(e.g., EM), densitybased clustering(e.g., DBSCAN, DENCLUE, OPTICS, DBCLASD, CURD), graph-theoretic clustering, and mode-seeking clustering. Grid-based methods includes STNG, STNG+, CLIQUE, Wave-Cluster, etc. Constraint-based methods includes COD and so on. Methods used in machine learning enclose artificial neural network(ANN)(e.g., SOM, LVQ) and evolutionary algorithms(e.g., SNICC). Methods for high dimensional data include subspace clustering(e.g., CACTUS), projection techniques and co-clustering techniques. 3.2
Correlation Coefficient-Based Minimal Spanning Tree
In all clustering algorithms, hypergraph-based clustering algorithms hold the simple, intuitive and flexible properties [10]. Taking into account the special characteristic of devices fault data in ship maintenance, we propose a variant of hypergraph-based clustering algorithms, i.e., Correlation Coefficient-based Minimal Spanning Tree(CC-MST). The primary philosophy of CC-MST includes two steps: building a minimal spanning tree and partitioning it into clusters. To build the minimal spanning tree, we select randomly an observation and start from it to compute the correlation coefficient between any two observations, and then take each observation as node, the correlation coefficient between two observations as the weight value of their edge to connect all observations. The obtained connected graph is a hypergraph, which has many minimal spanning trees. By comparing the edge weight value sum of all minimal spanning trees, a minimal spanning tree with all nodes and the smallest edge weight value will be used for clustering. After that, one or more connected subgraph can be obtained by partitioning the minimal spanning tree according to a given threshold value λ(λ ∈ [0, 1]), and all observations in a connected subgraph make up of a cluster. To cluster all observations, we firstly need to figure out the proximity degree of all observations. Definition 1: To assume that domain U = {X1 , X2 , ..., Xn } is the set of observations, and element Xi represents an observation. Each observation Xi has m feature indices that can be denoted by a vector: Xi = (xi1 , xi2 , ..., xim ), where i = 1, 2, ..., n. In CC-MST, we use the correlation coefficient method to compute the proximity degree of two observations as follow: m
|xik − xi ||xjk − xj | Rij = r(Xi , Xj ) = m m 2 (xik − xi ) (xjk − xj )2 k=1
k=1
k=1
(1)
462
J. Che et al.
where xi =
1 m
m
xik , xj =
k=1
of all Rij with Rˆij =
Rij +1 2
1 m
m k=1
xjk . If Rij is negative, then adjust the values
to ensure Rˆij ∈ [0, 1].
Algorithm. Correlation Coefficient-based Minimal Spanning Tree(CC-MST) Input: Xi = (xi1 , xi2 , ..., xim ) and λ; //The feature data sequence of observations and partitioning threshold value Proc: 1) Compute the proximity r(Xi , Xj ) of all observations with equation (1), and build the feature proximity matrix Λ; 2) Take each observation as node, connect all observations according to the value of r(Xi , Xj ) in Λ with descending order, mark each r(Xi , Xj ) as the weight value of edges and extract the minimal spanning tree with all nodes; 3) Traverse the minimal spanning tree, delete all edges that the weight values are smaller than λ (i.e., r(Xi , Xj ) < λ) and obtain one or more connected subgraphs, which each subgraph means a cluster; Output: one or more clusters;
From the algorithm, we can find that the clustering results will be different according to different λ values. This presents an opportunity for ship maintainers to make a comparison between multiple categories of clustering results without any additional computation. The λ is often set to 0.5 as default.
4
Fuzzy Query in Ship Design
As is well known, shipbuilding has a long history. Large-scale technical data about various types of ships and devices have been piled up during their design and manufacture. From another perspective, ship designers often want some reference data to improve or renew their original design after determining the fault causes. The massive legacy data will give a lot of help at the moment. For example, when designing a bow, the technical and testing data of shape-alike bows own a high reference value. But, these technical data of ships and devices hold very complicated structure and characteristics, how to get the relevant data for ship designers from a large database? Fuzzy query is an ideal resolution. In fuzzy query, we call the criteria data record presented by ship designers as criteria object, and the data record existed in the database as query object, which both have the same property(i.e., field in a database) structure. A key difficulty is how to compute the similarity degree between query object and criteria object. In our proposed mechanism, we first compute the similarity degree of each property between query object and criteria object, and then figure out the similarity degree between both with a weighting method. Note that the algorithms to compute the similarity degree between different types of properties data will be
Cluster Analysis and Fuzzy Query in Ship Maintenance and Design
463
diverse, because the properties field of record objects may be numeric, character or boolean, etc. 4.1
Similarity Computing of Numeric Property Data
For numeric property data, we compute the similarity degree with gray correlation coefficient method. To assume that the criteria object presented by ship designers is X0 = (x0 (1), x0 (2), ..., x0 (n)), and the query object existed in data table is Xi = (xi (1), xi (2), ..., xi (n))(i = 1, 2, ..., n), where xi (1), xi (2), ..., xi (n) are the property fields corresponding to x0 (1), x0 (2), ..., x0 (n), i.e., both have the same numeric property structure. The similarity degree of numeric property data between X0 and Xi is:
s(x0 (k), xi (k)) =
min min |x0 (k) − xi (k)| + ε max max |x0 (k) − xi (k)| i
i
k
k
|x0 (k) − xi (k)| + ε max max |x0 (k) − xi (k)| i
k
Where ε is the resolution ratio and ε = 0.5 as default. In addition, we can normalize all numeric property data of query object and criteria object by initialization, equalization and reciprocal transformation to ensure the value of s(x0 (k), xi (k)) falling into [0, 1]. 4.2
Similarity Computing of Character Property Data
For character property data, we design a compromised algorithm STRCOMPAR to compute the similarity degree of character property data. The algorithm deals with two cases: 1. If the character property data in criteria object is the abbreviation or synonym of the one in query object(or vice verse), then the similarity degree of both is set to 1; 2. If the character property data in criteria object is not the abbreviation or synonym of the one in query object(or vice verse), then the similarity degree of both is: r(S1 , S2 ) =
ST REQU AL(S1, S2 ) × 100% M AX(S1 , S2 )
Where ST REQU AL(S1, S2 ) returns the number of same characters in S1 and S2 , M AX(S1 , S2 ) returns the maximal character number of S1 and S2 . 4.3
Similarity Computing of Boolean Property Data
Compared with numeric and character property data, the similarity computing of boolean property data is intuitive: if the boolean property data values of query objects are equal to criteria object, then the similarity degree of both is set to 1; otherwise, it is set to 0.
464
J. Che et al.
After figuring out the similarity degree of each property data in query objects and criteria object with three above methods, the similarity degree of query objects and criteria object can be computed by superimposing the similarity degree of each property data with allowing for their weight values in the whole record object. Finally, all record data are listed for ship designers according to the descending order of their similarity degrees.
5
Experiments Setup and Evaluation
To validate the agility and efficiency of CC-MST algorithms, we have conducted two experiments based on a real ship device fault statistic dataset. These ship devices fault statistic data originates from the historical records of maintaining various conked ship devices by ship maintainers. As limited by the privacy of these data, we have just taken a part of this dataset and made some undamaged processing. The probability data is shown in Figure 1, where P 1 ∼ P 4 represents four type of marine propellers, and X1 ∼ X20 denotes twenty kind of faults, e.g., the first ten kind of faults (namely X1 ∼ X10) mean respectively an over-tightly enclosed swivel, a loose collar bearing, the break-away of axis traction rod, a powerless drawbar pull, the unstable rotation of swivel, the intermittent rotation of propeller blades, the stopping rotation of propeller blades, an attrited swivel, the abnormally slow gyration of swivel, and some slack coping bolts for propeller blades. For the sake of a clear look, we just show a minimal spanning tree with 10 nodes and edges that the weight value λ is bigger than 0.5 as Figure 2. With our CC-MST algorithm, different granular clusters can be gained by setting different λ values. For example, Figure 3 and Figure 4 show two different clustering results corresponding to two λ values. The result clusters are {X1, X3, X8, X9}, {X2, X5}, {X4, X6, X7, X10} with λ = 0.7 and {X1, X8, X9}, {X3}, {X2, X5}, {X4, X6}, {X7, X10} with λ = 0.8. These results will contribute a lot for ship maintainers to discover the fault causes. Further, the value of λ for an optimized clustering result may be set by ship maintainers with their experimental knowledge, or computed automatically with a machine learning method based on those existed fault diagnosis data. In the second experiment, we have compared the clustering speed of CCMST against the classic K-MEANS algorithm based on different size parts of the whole dataset as Figure 5. The CPU times for each data size demonstrate that CC-MST is faster than K-MEANS, especially when the dataset is getting bigger. In addition, we have implemented our fuzzy query mechanism based on a practical database of ship model technic and performance data, which derives from the practice process of designing and testing numerous ship models. By carrying through many times of fuzzy queries for different criteria objects, we compute the real distance between the obtained fuzzy query objects and the criteria object by hand and compare the real distance with the similarity degree computed by fuzzy query engine. The consistency of both proves the availability of our fuzzy mechanism.
Cluster Analysis and Fuzzy Query in Ship Maintenance and Design
465
Fig. 1. The fault probability data of four ship bows
Fig. 2. The MST with only 10 nodes and λ = 0.5
Fig. 3. The clustering result of CCMST with λ = 0.7
Fig. 4. The clustering result of CCMST with λ = 0.8
Fig. 5. Running time versus data size of CC-MST and K-MEANS
466
6
J. Che et al.
Conclusion and Future Work
The existed clustering algorithms are competent for most applications, but not agile enough for some special field such as ship maintenance data analysis. This paper proposed a hypergraph-based clustering algorithm namely Correlation Coefficient based-Minimal Spanning Tree(CC-MST) to diagnose the causes of ship device faults based on the detailed analysis of ship maintenance data features. The algorithm holds the intuitive, flexible and efficient properties. Two experiments on a real ship maintenance dataset indicates the agility and efficiency of the CC-MST algorithm. In addition, we also design a fuzzy query mechanism to retrieve the most proximate ship technical and testing data records for ship designers to assist their design. Its unbiasedness and usability are verified by matching two kind of similarity degrees computed by hand and fuzzy query engine. In the future, we will go further into computing the weight value of each property in fuzzy query with machine learning methods.
References 1. Benito, G.R.G., Berger, E., Shum, J.: A Cluster Analysis of the Maritime Sector in Norway. International Journal of Transport Management, 203–215 (2003) 2. Bratsas, C., Koutkias, V., Kaimakamis, E., Bamidis, P., Maglaveras, N.: OntologyBased Vector Space Model and Fuzzy Query Expansion to Retrieve Knowledge on Medical Computational Problem Solutions. In: IEEE of the 29th Annual International Conference on Engineering in Medicine and Biology Society, pp. 3794–3797 (2007) 3. Dhillon, I.S., Guan, Y., Kulis, B.: Weighted Graph Cuts without Eigenvectors a Multilevel Approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1944–1957 (2007) 4. Fang, X.F., Wu, H.T.: Application of Data Mining Technology to Concept Design of Ship Engine Room. Shipping Engineering, 28–30 (2003) 5. Hoque, A.H.M., Sajedul, M.S., Aktaruzzaman, M., Mondol, S.K., Islam, B.: Performance Comparison of Fuzzy Queries on Fuzzy Database and Classical Database. In: International Conference on Electrical and Computer Engineering, ICECE 2008, pp. 654–658 (2008) 6. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y., Center, A.R., San Jose, C.A.: An Efficient K-means Clustering Algorithm: Analysis and Implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 881–892 (2002) 7. Kotsiantis, S.B., Pintelas, P.E.: Recent Advances in Clustering: A Brief Survey. WSEAS Transactions on Information Science and Applications, 73–81 (2004) 8. Li, H., He, Y., Shen, H.: Ship Detection with the Fuzzy C-mean Clustering Algorithm Using Fully Polarimetric SAR. In: IEEE International Geoscience and Remote Sensing Symposium, pp. 1151–1154 (2007) 9. Liu, Z.J., Wu, Z.L.: Data Mining to Human Factors Based on Ship Collision Accident Survey Reports. Navigation of China, 1–6 (2004) 10. Pavan, M., Pelillo, M.: A New Graph-Theoretic Approach to Clustering and Segmentation. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2003)
Cluster Analysis and Fuzzy Query in Ship Maintenance and Design
467
11. Ribeiro, R.A., Moreira, A.M.: Fuzzy Query Interface for a Business Database. International Journal of Human-Computer Studies, 363–391 (2003) 12. Xiong, H., Wu, J., Chen, J.: K-means Clustering versus Validation Measures: A Data Distribution Perspective. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 779–784 (2006) 13. Xu, R., Wunsch II, D.: Survey of Clustering Algorithms. IEEE Transactions on Neural Networks, 645–678 (2005)
A Semantic Lexicon-Based Approach for Sense Disambiguation and Its WWW Application Vincenzo Di Lecce1,*, Marco Calabrese1, and Domenico Soldo2 1
Polytechnic of Bari, II Faculty of Engineering – DIASS, Taranto, 74100, Italy 2 myHermes S.r.l., Taranto, 74100, Italy {v.dilecce,m.calabrese}@aeflab.net,
[email protected] Abstract. This work proposes a basic framework for resolving sense disambiguation through the use of Semantic Lexicon, a machine readable dictionary managing both word senses and lexico-semantic relations. More specifically, polysemous ambiguity characterizing Web documents is discussed. The adopted Semantic Lexicon is WordNet, a lexical knowledge-base of English words widely adopted in many research studies referring to knowledge discovery. The proposed approach extends recent works on knowledge discovery by focusing on the sense disambiguation aspect. By exploiting the structure of WordNet database, lexico-semantic features are used to resolve the inherent sense ambiguity of written text with particular reference to HTML resources. The obtained results may be extended to generic hypertextual repositories as well. Experiments show that polysemy reduction can be used to hint about the meaning of specific senses in given contexts. Keywords: semantic lexicon, sense disambiguation, WordNet; polysemy, Web Mining.
1 Introduction Since its origin, the World Wide Web (hereinafter WWW or simply the Web) has quickly increased in number of available resources. Most of current search engines however seem not to keep the pace with this evolution due to low-semantics information retrieval approaches they implement. Everyone has experience of it when, in the search for a Web resource, he/she specifies an ambiguous (polysemous or general purpose) query word. The access to high-quality information on the Web may be thus problematic for unskilled users. Several approaches have been proposed in the literature with the purpose of semantically organizing Web knowledge from Web resources. In many cases they require the human experience to control a part (i.e. semi-supervised techniques) of knowledge process or the whole (i.e. supervised techniques) of it. Research on the Semantic Web still remains far from the Tim Berners Lee’s vision [1] appearing achievable only in *
Corresponding author.
D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 468–477, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Semantic Lexicon-Based Approach for Sense Disambiguation
469
the long run. In the short term instead Web ontology representation in restricted domains can be a better target to follow. Web ontologies have to deal with a large range of questions spanning from language inherent ambiguity to context dependency, presence of incoherent statements, difficulty in ontology matching and so on. Nevertheless, it seems that these issues comprise both the semantic (concept) level and the lexical (language) level. This observation hints the need of tackling with both conceptual and linguistic aspects when engineering knowledge. Currently, a good mean between the two approaches seems to be represented by the new emerging golden standard-based ontologies [2][3] represented by a particular type of enhanced thesaurus called Semantic Lexicon (SL for short). SL proves to be highly feasible and reliable thanks to the recent progress in developing broad-coverage dictionaries like WordNet [4]. WordNet is a broadly used tool for Natural Language Processing (NLP) that, as it will be shown further in the text, is particularly suited for Word Sense Disambiguation (WSD) task. WSD is an ontology-based technique that consists in assigning the correct semantic interpretation to ambiguous words in a given context. This work proposes a model for resolving sense disambiguation using WordNet taxonomy. In particular the adopted knowledge-based system refers to the IS-A hierarchy, i.e. hyponymy/hypernymy semantic relation. The model is based on the taxonomical and ontological layers of the SL. The information retrieval system uses a recently published procedure to extract relevant information from website documents. It was introduced in other authors’ works [5][6][7], highlighting the innovation of the proposed system ensured by the good quality of the experiment results. The outline of this paper is as follows: Section 2 sketches some well-known methods particularly used to disambiguate word senses in text; Section 3 introduces some theoretical aspects of word sense ambiguity; Section 4 presents the proposed WSD framework; Section 5 shows experiments; Section 6 reports the conclusions.
2 Related Work It is noteworthy in the literature that the supervised learning from examples is a valid approach in WSD. It consists of statistical or Machine Learning classification models induced from semantically annotated corpora [8]. In [9] a system that uses a statistical language model to evaluate the potential substitutions in a given context is presented. Through this model the likelihoods of the various substitutes are used to select the best sense for the analyzed word. Always studying the statistical substitution model for WSD some authors [10][11] evaluated the use of related monosemous words, resulting in extensive labelled examples for the semi-supervised techniques. Instead an unsupervised system has been proposed in [12], evaluating the relevance of the sense gathered by the substitution model for the analyzed word by means of the query responses of a search engine. Moreover they have extended the statistical model to its uses for polysemy ambiguity. In [13] another WSD method used to disambiguate polysemous word for a given context by means of dictionary definitions is presented. This technique consists in counting the number of common words between the “context bag” (i.e. the set of words presented in the definitions of other context words) and each “sense bag”
470
V. Di Lecce, M. Calabrese, and D. Soldo
(i.e. the set of words presented in the definition of a single sense) for a specific word. The sense gaining the maximum common words is considered as the “winner” sense. In [14] Sussna’s WSD method is based on the use of a semantic distance between topics in WordNet. In [15] a WSD method grounded on the idea of conceptual density in WordNet is proposed. In text representation domain authors in [16] propose a technique based on the representation in terms of the synsets through the IS-A relations of the tokenized words from text. Scott’s model presents a pre-filtering module aiming at the sense disambiguation. In [17] a noun disambiguation method also using IS-A relations was presented. The method evaluates the semantic similarity between two words and gives back the most informative common subsumer from hypermym hierarchy of both words. A similar method was presented in [18][19]. They proposed a measure of the semantic similarity by calculating the length of the path between the two nodes in the hierarchy. Some of the above systems were evaluated in a real-word spelling correction system in [20].
3 Managing Word Sense Ambiguity In a dictionary any lexical element refers to one or more senses and typically for each sense some examples are reported. The word “sense” is generally defined as the meaning of a word or phrase in a specific context. This implies that, a word can be used with different senses depending on the context in which it is used (giving the polysemous ambiguity) or different words can be used with the same sense (driving to the synonymous ambiguity). A human user of a dictionary is generally able to find the correct word form in a context only by means of information gathered from a thesaurus: this task appears instead challenging for a machine. WordNet provides the following definitions for the two types of before mentioned ambiguities: − Polysemy: the ambiguity of an individual word or phrase that can be used (in different contexts) to express two or more different meanings; − Synonymy: the semantic relation between two words that can (in a given context) express the same meaning. Starting by these definitions, it is clear that the meaning of a word is tightly related to the meanings of the surrounding elements, which in turn may be ambiguous. It is interesting enough referring to [21] underlining that the meaning is highlighted by means of mutual sense reinforcement of possibly ambiguous words. Then, for the sense definition the concept of context is of paramount importance. This is what WordNet reports for “context”: − Sense #1: discourse that surrounds a language unit and helps to determine its interpretation; − Sense #2: the set of facts or circumstances that surround a situation or event. 3.1 Lexico-Semantic Synthesis: Sense Matrix In the aim of formalizing the concept of sense two distinct entities can be identified. The first one is the lexical element (i.e. lexicon) and the second one is the semantic
A Semantic Lexicon-Based Approach for Sense Disambiguation
471
element (i.e. concept). According to WordNet 3.0 data model, the sense is the relation existing between a lexical entity and a concept. The concept is derived by the semantic relations. Then, the sense collection can be represented by a matrix whose rows and columns are lexical set and semantic set respectively. This matrix is known in the literature [22][23] as lexical matrix. Referring to the sense concept we prefer to call it sense matrix. Table 1 depicts an example of a sense matrix. The sense matrix element can be expressed as a binary relation between a word form and a concept. Table 1. Example sense matrix. Senses are defined as matches between lexical entities (rows) and concepts (columns).
Lexicon
Sense Matrix l1 l2 l3 l4 l5
Concepts c1 0 0 1 1 0
c2 1 0 1 0 0
c3 0 0 0 1 1
c4 1 1 0 1 0
3.2 Semantic Relations: Core Ontology The semantic relation between concepts has been represented in the literature through a particular type of direct acyclic graph (DAG) known as single-rooted tree. In [24] the authors use this DAG to define Core Ontology and formalize the concept of taxonomy. The structure CO := (C, croot,≤C) is called core ontology. C is a set of conceptual entities and croot ∈ C is a designated root concept for the partial order, ≤C on C. This partial order is called concept hierarchy or taxonomy. The relation ∀c∈ C: c≤C croot holds for this concept hierarchy. Hence, Core Ontology refers to the semantic level, while it lacks any mention to lexical entities. 3.3 Semantic Lexicon and Lexico-Semantic Features Sense matrix and Core Ontology are widely used in knowledge-based systems. In addition, the SL is the third relevant concept in this application domain. SL can be simply defined as the join between the lexical and concept sets (the sense matrix) on the one hand and the set of recursive relations on concept set on the other hand. Lexical recursive relations may be considered as well but, for the sake of simplicity, they are not considered in the proposed model. In a previous work [7] the concept of Web minutia was defined as a sub-graph extracted from the above mentioned SL. Minutiae are actually lexico-semantic features (relevant senses) for the explored Web site and represent an effective way to deal with WSD in Web documents.
4 Proposed WSD Model The authors have chosen WordNet 3.0. as SL. WordNet is one of the most adopted golden standard-based ontologies in knowledge-based systems. Furthermore, it is a
472
V. Di Lecce, M. Calabrese, and D. Soldo
broad-coverage dictionary purposely engineered for text mining and information extraction. WordNet is based on the concept of synset; in version 3.0, its data model is developed around the concept of sense, i.e. the one-to-one relation between the synset and the lexical entity. In WordNet data model (part of which is depicted in Figure 1) one group of tables (‘word’, ‘sense’ and ‘synset’) is related to lexico-semantic meaning and another one (‘lexlinkref’, ‘semlinkref’ and ‘linkdef’) is related to lexico-semantic relations. The underpinning knowledge structure is then represented by contextdependent lexico-semantic chains organized according to given hierarchies (e.g. hypernymy). In this work only semantic relations are adopted to disambiguate word forms retrieved by a semantic feature extractor. The other elements provided by the WordNet data model have not been considered and will be examined in future research: Fragos et al. for example use WordNet relations to disambiguate tokenized text by means of the WordNet glosses.
refers to
has
has
refers to
has
refers to
Fig. 1. An extract of WordNet data model
4.1 WSD Abstract Architecture Figure 2 shows the proposed architecture. Two main blocks can be identified: 1. Lexical entity extraction: it is composed of a crawling/parsing system to retrieve texts from website. 2. Sense disambiguation: it carries out the WSD task. The lexical entity array is the input buffer for the feature extraction process. First some considerations are necessary to understand the WSD process. A synchronous switch pairs controlled by the semantic feature extractor (SFE for short) drive data flow to/from the sense matrix.
A Semantic Lexicon-Based Approach for Sense Disambiguation
SENSE MATRIX
S1
S2 2 1
INPUT LEXICAL ENTITY ARRAY
2 CTRL
SEMANTIC FEATURE EXTRACTOR
SEMANTIC MATRIX
OUTPUT LEXICAL ENTITY ARRAY
1
CRAWLER PARSER
473
Fig. 2. Proposed WSD abstract architecture. It can be divided into a lexical entity extraction module (left side) and a sense disambiguation module (right side). The sense disambiguation module takes lexical entities parsed from Web sites as input and returns disambiguated lexical entities in a given context. The sense disambiguation is resolved by the two-step interaction between the SFE and the Sense Matrix.
Lexical entries deriving from the parsing phase act as triggers for the WSD system. These inputs (rows in the sense matrix) intercept concepts (columns in the sense matrix) when there is a match between lexical entity and concept (corresponding to a unit value in the example matrix shown in Table 1). When the match occurs, the retrieved concepts are used as inputs for the SFE. SFE iterates on recursive semantic relations (i.e. hypernymy) until it finds stop conditions depending on the implemented procedures. Web minutiae are an example of such procedures [7]. This process ends with a set of newly found concepts, which are fed back to the sense matrix (e.g. column entries) in order to intercept new lexical entries. All the matches found during these steps produce a subset of the original sense matrix. This subset can be statistically analyzed to infer on polysemy by simply assessing the number of retrieved senses (reduced core ontology) for each retrieved lexical entity (reduced vocabulary). It is interesting to note that the proposed architecture is very scalable since it adopts WordNet semantic-lexicon taxonomies as the indexing structure. This structure of course does not increase as the amount of indexed documents increases. According to the proposed model, WordNet can be decomposed into the blocks depicted in Figure 3. It is plain that the “sense table” and the “synset relation table” refer exactly to the corresponding tables in WordNet E-R model (named ‘sense’ and ‘semlinkref’ in WordNet 3.0).
474
V. Di Lecce, M. Calabrese, and D. Soldo
LEXICON
CTRL
SENSE TABLE
LEXICAL ENTRY 1 2
2
STARTING SYNSET 1
SYNSET RELATION TABLE
SYNSET
S2
S1
WORDNET
Fig. 3. SL instance using WordNet
5 Experiments and Results Several websites have been crawled to test the proposed model. We have purposely pre-clustered these sites in four distinct semantic contexts reported in Table 2. Table 2. Semantic contexts and number of inspected websites for each one
Semantic Context University Airport Low-cost Airline Seaport
# inspected websites 17 13 10 8
The crawling and parsing processes have been limited to the analysis of the first one hundred crawled webpages for each website. The authors consider this limit enough to demonstrate the ability of the presented model. The semantic matrix depicted in Figure 2 represents the IS-A relations (i.e. hypernym/hyponym hierarchy). The maximum number of iterations on the semantic matrix has been fixed to 5 for reasons of two orders: 1. higher depth level in taxonomy returns context-independent concepts; 2. computational effort may increase considerably as depth level increases. As mentioned in Section 4 the input for the proposed disambiguation system is the set of lexical entity extracted from webpages; while the output is the set of disambiguated input lexical entities. Thus given a word form having different senses in WordNet the system returns the same word possibly with a reduced number of senses. This reduction is more evident when a word form is strongly context-dependent. Figure 4 depicts a part of the results that have been carried out about Web minutiae extracted by the “university” context. It is clearly visible that increasing the number of iterations in the WordNet taxonomy the system shows lower WSD performances. This consideration can be deduced by analyzing the hypernym hierarchy, which has been used at the
A Semantic Lexicon-Based Approach for Sense Disambiguation
475
base of the Web minutiae feature extraction process. In fact, crossing the IS-A hierarchy in hypernymy direction represents a shift towards generalized synsets (the maximum abstraction for any lexical entity is considered in WordNet with the “entity” synset).
Fig. 4. Plot over first 200 most polysemous lexical entities in the ‘university’ benchmark. Icons account for different number of iterations used in minutiae feature extraction process
Table 3 reports the average ratio obtained dividing, for any lexical entity, the number of senses retrieved by the proposed WSD system with the number of all senses of that lexical entity in WordNet. Results are grouped according to the chosen contexts and number of iterations on the synset relation table. It highlights the relevant performance of the proposed system with one iteration and for two contexts even with two iterations. In these cases, the disambiguated senses result to be on average as about the 85% of their whole. Table 3. Polysemy ratios experiment results #iteration
University
Airport
Seaport
0,23 0,34
Low-cost Airline 0,10 0,17
1 2
0,18 0,30
3 4
0,35 0,41
0,36 0,38
0,27 0,37
0,28 0,38
5
0,48
0,41
0,47
0,45
0,13 0,18
6 Conclusions A basic framework for resolving WSD (polysemy in particular) through the exploitation of a Semantic Lexicon has been presented as an extension of a semantic approach firstly
476
V. Di Lecce, M. Calabrese, and D. Soldo
presented in [6]. The here proposed WSD framework enables sense disambiguation without human intervention. The authors used the WordNet 3.0 dictionary as Semantic Lexicon. The model is suitable for any kind of hypertextual dataset. In this work the system has been tested on different contexts datasets. The system is highly modular and suitable for distributed knowledge extraction because it relies on a well-studied layered architecture. The system can be adapted for using some other relations or modules of WordNet, thus obtaining a possible increase of its performance. Furthermore, the proposed model is sufficiently robust to be extended to other languages than English, without affecting the system generality. In this sense, the studies on multi-lingual dictionaries employing the WordNet structure are being currently carried out by some authors [25, 26]. As future works, further experiments will be conducted on a more extensive set of different contexts in order to better define dependency between Web minutiae and context lexical descriptors.
References 1. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (2001) 2. Zavitsanos, E., Paliouras, G., Vouros, G.A.: A Distributional Approach to Evaluating Ontology Learning Methods Using a Gold Standard. In: Proc. of 3rd Workshop on Ontology Learning and Population (OLP3) at ECAI 2008, Patras, Greece (2008) 3. Farrar, S., Langendoen, D.T.: A Linguistic Ontology for the Semantic Web. GLOT International 7(3), 97–100 (2003) 4. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998) 5. Di Lecce, V., Calabrese, M., Soldo, D.: Mining context-specific web knowledge: An experimental dictionary-based approach. In: Huang, D.-S., Wunsch II, D.C., Levine, D.S., Jo, K.-H. (eds.) ICIC 2008. LNCS (LNAI), vol. 5227, pp. 896–905. Springer, Heidelberg (2008) 6. Di Lecce, V., Calabrese, M., Soldo, D.: Fingerprinting Lexical Contexts over the Web. Journal of Universal Computer Science 15(4), 805–825 (2009) 7. Di Lecce, V., Calabrese, M., Soldo, D.: Semantic Lexicon-Based Multi-agent System for Web Resources Markup. In: The Fourth International Conference on Internet and Web Applications and Services (ICIW 2009), Mestre, Italy, pp. 143–148 (2009) 8. Màrquez, L., Escudero, G., Martínez, D., Rigau, G.: Supervised Corpus-Based Methods for Wsd. In: Agirre, E., Edmonds, P. (eds.) Word Sense Disambiguation: Algorithms and Applications. Text, Speech and Language Technology, vol. 33, Springer, Heidelberg (2007) 9. Yuret, D.: KU: Word Sense Disambiguation by Substitution. In: Proc. of the 4th International Workshop on Semantic Evaluations (SemEval 2007), Prague, Czech Republic, pp. 207–214 (2007) 10. Mihalcea, R.: Bootstrapping Large Sense Tagged Corpora. In: Proc. of the 3rd International Conference on Languages Resources and Evaluations (LREC 2002), Las Palmas, Spain, pp. 1407–1411 (2002) 11. Leacock, C., Chodorow, M., Miller, G.: Using Corpus Statistics and WordNet Relations for Sense Identification. Computational Linguistics 24(1), 147–165 (1998)
A Semantic Lexicon-Based Approach for Sense Disambiguation
477
12. Martinez, D., Agirre, E., Wang, X.: Word Relatives in Context for Word Sense Disambiguation. In: Proc. of the 2006 Australasian Language Technology Workshop (ALTW 2006), Sydney, Australia, pp. 42–50 (2006) 13. Lesk, M.: Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone. In: Proc. of the of the 5th Annual International Conference on Systems Documentation (SIGDOC 1986), New York, USA, pp. 24–26 (1986) 14. Sussna, M.: Word Sense Disambiguation for Free-Test Indexing Using a Massive Semantic Network. In: Proc. of the 2nd International Conference on Information and Knowledge Management (CIKM 1993), Arlington, Virginia, USA, pp. 67–74 (1993) 15. Agirre, E., Rigau, G.: Word Sense Disambiguation Using Conceptual Density. In: Proc. of the 16th International Conference on Computational Linguistics (COLING 1996), Copenhagen, Denmark, vol. 1, pp. 16–22 (1996) 16. Scott, S., Matwin, S.: Text Classification Using the WordNet Hypernyms. In: Proc. of the Workshop on Usage of the WordNet in Natural Language Processing Systems (COLING-ACL 1998), Montreal, Canada, pp. 45–52 (1998) 17. Resnik, P.: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Proc. of the 14th International Joint Conference on Artificial Intelligence (IJCAI 1995), Montreal, Canada, pp. 448–453 (1995) 18. Lee, J., Kim, H., Lee, Y.: Information Retrieval Based on Conceptual Distance in IS-A Hierarchies. Journal of Documentation 49(2), 188–207 (1993) 19. Leacock, C., Chodorow, M.: Combining Local Context and WordNet5 Similarity for Word Sense Disambiguation. In: [6] 20. Budanitsky, A., Hirst, G.: Semantic Distance in WordNet: An Experimental, ApplicationOriented Evaluation of Five Measures. In: Proc. of the Workshop on WordNet and Other Lexical Resources, Second Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL 2001), Pittsburgh, PA, USA, pp. 29–34 (2001) 21. Ramakrishnanan, G., Bhattacharyya, P.: Text Representation with WordNet Synsets Using Soft Sense Disambiguation. In: Proc. of the 8th International Conference on Application of Natural Language to Information Systems (NLDB 2003), Burg, Germany, pp. 214–227 (2003) 22. Ruimy, N., Bouillon, P., Cartoni, B.: Inferring a Semantically Annotated Generative French Lexicon from an Italian Lexical Resource. In: Proc. of the Third International Workshop on Generative Approaches to the Lexicon, Geneva, Switzerland, pp. 27–35 (2005) 23. Magnini, B., Strapparava, C., Ciravegna, F., Pianta, E.: A Project for the Construction of an Italian Lexical Knowledge Base in the Framework of WordNet. IRST Technical Report #9406-15 (1994) 24. Dellschaft, K., Staab, S.: On how to perform a gold standard based evaluation of ontology learning. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 228–241. Springer, Heidelberg (2006) 25. Ramanand, J., Ure, A., Singh, B.K., Bhattacharyya, P.: Mapping and Structural Analysis of Multilingual Wordnets. IEEE Data Engineering Bulletin 30(1) (2007) 26. Ordan, N., Wintner, S.: Representing Natural Gender in Multilingual Lexical Databases. International Journal of Lexicography 18(3), 357–370 (2005)
The Establishment of Verb Logic and Its Application in Universal Emergency Response Information System Design Jian Tan and XiangTao Fan Laboratory of Digital Earth Sciences, Center for Earth Observation and Digital Earth, Chinese Academy of Sciences, Beijing, 100101, China
[email protected] Abstracts. It is always a challenge to build up a stable and high-level integrated system capable of different types of emergencies. The biggest obstacle is how to build a universal work flow mode for the different events. To solve the problem, our research adopts an unusual way based on the self-evident truth that full text description of phenomena is a whole map of it. Then the system analysis’ subject can be altered from the real emergency response to the text description of it. Therefore semantic annotation which uses the semantic labels in propbank can be employed in the analysis process. The annotation subjects are the documents that each of them described a full emergency response process of different emergency type. After classification and statistic, three linguistic rules are found out. First, every sentence have a predicate verb which indicate an executable action and it belongs to a fixed set, second, each verb coexists with semantic role Arg0(actor), third, all the complement roles of predicate verbs converge into a fixed subset of semantic roles ,these conclusions are named together as verb logic. It is a high abstract semantic model, for it not only contains domains but also tell the relations among domains. Based on verb logic, universal work flow mode is constructed, and a universal emergency response system can be built up. The design of the system is also stated in this paper. Keywords: Universal emergency response system, Verb logic, Semantic annotation, Propbank.
1 Introduction Emergency system involved researches are booming since the 9.11 attack which is a reminder to the necessity of efficient integrated emergency response. Most of the researches of computer aided emergency response derived from the former expert decision systems, which focused on decision models extraction from domain ontology and knowledge. The others contribute to the technology of system construction in special event type. But these solutions are not integrated in a universal framework yet, because the business logics differ in different types of emergency, and the relations between these emergency response systems are not stable, if the business logic changed in one of the collaborated systems, the integration would be broken. D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 478–491, 2009. © Springer-Verlag Berlin Heidelberg 2009
The Establishment of Verb Logic and Its Application
479
So the basic challenge is the construction of a universal emergency response work flow mode. In the practice to build a first emergency response system which must integrate with other emergency systems for national security department of china, we have to consider the compatibility in prior in the system design. Originally, the first step of software design in OO programming era is extracting the object from concrete business process, for they can be redefined and saved in computer format. this procedure commonly only existed in the mind of designer who digest the business flow and then figure it out in ways that are more familiar to the programmer, like UML. Until R.A. Meersman I in his research put forward an semantic way to speed this procedure up ,in his work, a "global" common ontology (lexicon, thesaurus) is conjectured, and some desirable properties are proposed[1], He argue nevertheless that large public lexicons should be simple, suggest their semantics become implicit by agreement among "all" users, and ideally completely application independent. In short, the lexicon or thesaurus then becomes the semantic domain for all applications. In designing these lexicons whose grammar can be conceived by computer and expression in natural language. eg. choosing “has a”.” of a” on behalf of the relation between an ontology object and its property. While other words or phrase used for the rest in domain ontology. One following similar research by Meenakshi is about extracting domain ontology entities through semantic web. His research used an ontology database which can aid in entity extraction in web document, after a disambiguation process, the semantic annotation result could be formatted like this: <Enilty id="494S05" > Henietl-Packard <Entity id="3'7S349" > IYPQ: up $0.J3 to SlS.O3[2]. In the form of XML,it can be delivered and processed through different systems for object reconstruction. The researches mentioned above are both innovative, despite of their different methods and results they all use semantic annotation to extract domain model to facilitate other application automatically perceiving and integrating the meaningful entity. In the other hand, either of them can not figure out the interoperation details between entities yet, because there are uncountable relations which couldn’t be unambiguously identified. This fault makes the results not suitable for system design but class construction. In our research, firstly the construction of a universal emergency response system is the goal, second, based on the axiom that full text description of an phenomena is a whole map of it, as well as a full work flow of the same phenomena executed in computer, then the system analysis’ subject can be altered to the text description which act as an agent of the real phenomena. This way may be indirect but could find some useful rules that can not be found out in ordinary ways, therefore analysis is directly taken on the linguistic descriptions of emergency response process in our research, and semantic annotation also adopted, which use the semantic labels in propbank. The subjects are some documents that each of them described a full emergency response process. After the annotation, statistic is tried, finally three linguistic rules are found out, first, every sentence have a predicate verb, second, each predicate verb coexists with semantic role Arg0(actor), third, all the complement roles of predicate
480
J. Tan and X. Fan
verbs converge into a fixed subset of semantic roles ,we call them together as verb logic. It is a higher abstract semantic model than domain ontology, for it is not only across domains but also tell the relations among entities. So based on verb logic, the universal emergency response system can be built up. In this paper, first the process of verb logic is stated, second the system design based on it is described, and then the system’s capability is discussed, finally three cases of different event type are used to demonstrate its capability.
2 Verb Logic 2.1 Semantic Labels and Propbank Representing the predicate-argument structure has been one of the focal points in recent efforts to develop semantic resources. This is generally achieved by specifying the semantic roles for the arguments anchored by the predicate, but the specific semantic role labels used differ from project to project. They range from very general role labels such as agent, theme, beneficiary adopted in the VerbNet(Kipper et al. 2000; Kipper et al. 2004; Kipper et al. 2006) and the semantic component of the Sinica Treebank (Chen et al. 2004), to labels that are meaningful to a specific situation, like the role labels used in the FrameNet (Bakeret al. 1998) and the Salsa Project for German (Burchardt et al. 2006), to predicate specific labels used in the English PropBank (Palmer et al. 2005) and the Nombank (Meyers et al. 2004). The difference between the various approaches can be characterized in terms of levels of abstraction. The Propbank style of annotation can be considered to be the least abstract, as it uses argument labels (Arg0, Arg1, etc.) that are meaningful only with regard to a specific predicate. Table 1. Semantic roles in Propbank labels
Arg0
Arg1
Arg2
Arg3
Arg4
Arg5
Core Recipient, Asset,
semantic
Agent,
Theme ,Topic,
meaning role
Additional
Beneficiary Destination
ArgM
ArgM
ArgM
ArgM
ArgM ArgM ArgM
ArgM ArgM
-DIR
-DGR
-EXT
-TMP
-TPC
-LOC
Extent,
Theme2,
Predicate
Recipient
ArgM ArgM
ArgM
-ADV -BNE
-CND
Experiencer Patient
labels -PRP
-FRQ
-MNR
Semantic Purpose or
Role
meaning
Adverbials
Beneficiary
Condition
Direction
Degree
Extent
Temporal
Topic
Frequency Locative
Manner
Reason
2.2 Statistic and Analysis There are many kinds of Linguistic documents described emergency response process such as Emergency Processing Conclusion, Counterplan or Exercise instruments, even journalistic reports can provide the detail of response. In our study, we collect 16 different kinds’ documents. As table 2 show, each of them has at least 30 sentences and fully covers an emergency response process whether finished or just in preparation.
The Establishment of Verb Logic and Its Application
481
Table 2. The subject documents Index
Event type
Document type
Source organization
1
Fire incident
Handling norms
Community property
The number of sentence 52
companies 2
Drug trafficking
Handling norms
Public security bureau
33
3
smuggling
counterplan
Border Corps
75
4
Public health
Handling norms
Health Department
46
crisis 5
Bombings
counterplan
Public security Department
118
6
earthquake
counterplan
Seismological Bureau
81
7
Refugees
counterplan
Border Corps
87
8
Human smuggling
Processing Conclusion
Border Corps
48
9
smuggling
Processing Conclusion
Border Corps
103
10
Human smuggling
Exercise instruments
Border Corps
35
11
Drug trafficking
Exercise instruments
Coast guard
105
12
Robberies
Processing Conclusion
Public security bureau
74
13
power grids damaged
counterplan
Power company
36
14
First aid
Handling norms
medical emergency center
66
15
Fire incident
Processing Conclusion
Fire Squadron
64
16
Mass riots
Handling norms
Public security Department
79
Because samples are of a small quantity, manually semantic annotations could be done in acceptable time, and no existed program affords the unambiguous work, for the frequently phenomena that one sense is multiple referenced in forms as verb Nominalizations phrase in documents. Then we assign semantic role labels which are defined in propbank system to the constituents by ourselves. Fig. 1 is illustrating the annotation of an isolated sentence. It is not difficult to judge the semantic roles in a single sentence, but in a written document some semantic roles are often omitted for the context implicitly provides their meaning. This phenomenon will introduce errors into the statistic work. So we take a pretreatment that regenerate the documents to make each sentence can describe a complete and unambiguous meaning independently by appending the implicit roles from context. When the annotation work finished, we count each semantic role present times in all samples and get table 3. In this table, we confirm that these documents consist of predicate centered sentences, and each verb always coexists with an Arg0 role which denotes the doer of the action. But no discipline of other arguments can be told yet. Little was raveled from detailed statistic which has been done in different ways until we classify the annotation by the sense of core verb. While the same meaning can be expressed in different forms, so to simplify the statistic we choose the most common expression of each independent meaning and name it as predicate verb (PV) and filter out the roles conveying dispensable meaning and of unremarkable present frequency like Arg4 ArgM –EXT , Next step is do the statistic of appearance frequency
482
J. Tan and X. Fan
chi/eat(V)
Arg0
Arg1 Arg5
ArgM -TMP
Taiboshi/Dr.Tai
yige cai/one dish
ArgM -LOC Tai/Tai
boshi/Dr. zuotian/yesterday zaishitang/at dining room chi/eat
le/up
yige/one
cai/dish
Fig. 1. Semantic annotation in propbank labels Table 3. Appearance frequency of all semantic roles Semantic role
Meaning
V
Verb
Arg0
Agent, Experiencer
Appearance
Possibility of appearance in a
frequency
sentence(total of sentences: 1102)
1102
1
1102
1 0.439201
Arg1
Theme ,Topic, Patient
484
Arg2
Recipient, Extent,
327
Predicate Arg3
Asset, Theme2,
59
Recipient 4
0.296733 0.053539
Arg4
Beneficiary
0.00363
Arg5
Destination
75
0.068058
ArgM -ADV
Adverbials
0
0
ArgM -BNE
Beneficiary
0
0
ArgM -CND
Condition
0
0
ArgM -DIR
Direction
5
0.004537
ArgM -DGR
Degree
0
0
ArgM -EXT
Extent
2
0.001815
ArgM -FRQ
Frequency
0
0
ArgM -LOC
Locative
752
0.682396
ArgM -MNR
Manner
5
0.004537
ArgM -PRP
Purpose or Reason
0
0
ArgM -TMP
Temporal
578
0.524501
ArgM -TPC
Topic
0
0
ArgM -CAU
Cause
0
0
ArgM -NEG
Negation
0
0
ArgM -MOD
Modal
0
0
The Establishment of Verb Logic and Its Application
483
of the PV‘s complement roles, then we find one PV and its coherent semantic roles have a fixed relationship. The part of result is shown in table 4. Table 4. Appearance frequency of some verb’s coexisted roles Predicate verb
ganfu/go for
jiuhu/ cure
Semantic role
Meaning Agent, Experiencer
Coexist frequency 123
Predicate verb frequency 123
Arg0 Arg5 ArgM -LOC
P(V|Arg*)
Destination
123
123
1
Locative
123
123
1
ArgM -TMP
Temporal
123
123
1
Arg0
Agent, Experiencer
75
75
1
Arg2
Recipient, Extent,
75
75
1
Arg3
Asset, Theme2,
75
75
1
1
Predicate Recipient ArgM -LOC
Locative
75
75
1
Arg0
Agent, Experiencer
18
18
1
Arg1
Theme ,Topic,
18
18
1
18
18
1 1
sushan/evacuate
Patient ArgM -LOC ArgM -TMP
Temporal
18
18
Arg0
Agent, Experiencer
37
37
1
Arg1
Theme ,Topic,
37
37
1
ArgM -LOC
Locative
37
37
1
ArgM -TMP
Temporal
37
37
1
daibu/arrest
fengshuo/ Blockade
Locative
Patient
Arg0
Agent, Experiencer
53
53
1
ArgM -LOC
Locative
53
53
1
ArgM -TMP
Temporal
53
53
1
Given the result, the semantic roles which have the possibility to coexist with one particular predicate verb can be limited, and all of them must show up with the predicate verb while the other roles absent. Now we confirm the conclusion that each predicate verb must coherent with several fixed semantic roles to express a complete and independent meaning. Furthermore, we collect all presented semantic roles, and the result is interesting that the complement roles is not the rest roles exclude Verb, the complement roles converged into seven types: Arg-tmp Arg-loc Arg0 Arg1 Arg2 Arg3 Arg5. That means in description of emergency response process, fixed types of arguments must be addressed, in other words, these types is enough for linguistic description of a whole response process of any event type. But why these roles are necessary to the documents? Why not the others? Some points are presented.
484
J. Tan and X. Fan
First, Arg0 is the actor and Arg1 is the objects or persons that received or affected by the action of the predicate verb. In rescue and response process, every action must executed by special persons or forces and the action must have specific effect to objects, otherwise the action is of no meaning. Second, Arg2 is the influence or goal of the action of the predicate verb, it is a basic complements of lasting actions to work for ,and implicitly for emergency manager estimate effect to decide the action’ going on or not. Third, Arg3 is the equipments or methods that took by the actions. it is reasonable that response team could not accomplish mission with empty hand and they must make preparation for the different situations in the spot. Fouth, Arg-TMP is the argument indicate temporal period or point in a sentence. Emergency response has the basic require for proceeding in an efficient time-ordered manner. So the time point when action take place or finished is remarkable important to the accuracy in the whole process. Fifth, Arg5 is the Destination and Arg-LOC is about the location. Actually they are the most important complement to a response action, because both emergency and its response are spatial, which means any event has its limited effective area in the whole life cycle. Their spatial properties is the foundation of where the response actions implements. Furthermore they are the necessary elements of most decision-making models which can tell the suitable route that emergency troop takes to reach the spot or other target place. 2.3 Expression of VL As a summary of statistics above, three useful points are stated. First, the emergency response describe documents compose of sentences which has a core predicate verb without exceptions. Second, with each verb the appearance of the semantic role Arg0 is a certainty. Third, all the complement semantic roles in the documents converge into a fixed and countable set that has seven members of roles in propbank. Enumerated as: Arg-tmp Arg-loc Arg0 Arg1 Arg2 Arg3 Arg5. We name these points together as verb logic in emergency response, VL for abbreviation. It is an semantic abstraction which is in a higher level than domain ontology, because the discipline has nothing to do with the specific type of events, neither fixed analytic model nor fixed reader or users, while in domain ontology, both the entities and their relations are concrete and can not extend to other domain that make the systems based on them hardly interoperate with each other. So VL is the rules of all kinds of emergency response. And it can be the foundation of universal emergency response system design,
3 Universal Emergency Response System Design Before the system design, in review of the former emergency response systems, it is obvious that they are almost aiming at one type of emergency and only a little can handle several kinds of emergency. The reason is every concrete emergency itself is a part of specific domain ontology, and involved with other entities in the domain. In
The Establishment of Verb Logic and Its Application UNIVERSAL EMERGENCY RESPONSE SYSTEM
485
FOMER EMERGENCY RESPONSE SYSTEM
PVL FIRE DOMAIN
MEDICAL DOMAIN ONE SPECIFIED DOMAIN EARTHQUAKE DOMAIN
………..
Fig. 2. System coverage comparison
other words, the domain ontology which includes a specific emergency event determines the concrete property of the event and its relatives. In the contrary, the VL is not domain specified; because it is a higher abstraction of all domains and can apply to any emergency response. It can be regard as the metadata of every emergency domain, while one type of domain ontology can be regard as the metadata of a concrete emergency response. So the system which implements VL must have the capacity of any emergency domain. That is the theory foundation of the capability in the universal emergency response system. 3.1 First Any Concrete Emergency Itself Is belong to a Domain Since every concrete emergency has domain ontology as its metadata, classifying emergent events by their superior domain is a right-on taxonomy which is the foundation of the universal emergency response system to keep it providing the particularity while in capable of every kinds of emergency. To build up these classes in practice, a tree structure, which has a root node named abstract emergency as the parent of any emergency, is helpful. And child nodes of the tree can be specific domain and other emergency type, but only the domain node can have descendants. If a concrete emergency happened, there must be a sole domain node can contain it and all it’s involved, that indicate the type of the emergency. And if an emergency is complex, also there is a high level domain node, which includes other detailed domains, can be the superior. In addition, the events belonging to one same domain should be grouped by its quantitative property such as numbers of involved persons. Acreage of the spot etc. for the superior domain ontology is more of qualitative classification.
486
J. Tan and X. Fan
Emergency Qualitative criterion Medical emergency
A sudden serious situation in health
Fire emergency Quantitative criterion Ordinary fire emergency Small fire emergency
Nobody in fire field
Medium fire emergency
1~10 persons in fire field
Large fire emergency
10 more persons in fire field
Chemical fire emergency Earthquake emergency
Fig. 3. Emergency classification
3.2 The Core Verb Turns to Emergency Command Emergency response and rescue is not a civilian affair, always implement by government. And all the actions are implicitly taken by force and press which is guaranteed by government or military power. The other point is that no civilian is responsible for handling emergency; they are the passive and supposed to obey the commands. The response actions is unconditional executed for the firemen, military troops, medical teams and civilian, so we call the core verb as emergency command, by analogy of military command, we also can extract an unambiguous abbreviation, which is a single word or short phrase, for any specific action. It could be easily defined and stored as string data type in computer. The emergency command is the center conception in universal emergency response system, because it has been stated in VL that the response work flow can be considered as a set of actions arranged in parallel or series. 3.3 The Arg0 Turns to Task Force The semantic role Arg0, which means the actor or sponsor of the predicate, always coexists with core verb and indicates who took the actions. In real world, they are correspond to the persons or special equipments which can implement the response and rescue actions, for instance a medical team can rescue (the injure). A battle plane can destroy (enemy), a salvage corps can extinguish (fire). Arg3 is the role which indicate the tools taken by the action, and don’t appear for sure in those sentences. But any time it shows off, it actually means that the tools and
The Establishment of Verb Logic and Its Application
487
equipments must be used or handled by men who may be not mentioned. Like this sentence, a battle plane is going to lock the hijacked plane by Air-to-air missile. It implicates at least one battle plane one missile and one pilot to do the “lock” job together. From these arguments, we make a definition of any inseparable group, which buildup by organized person and special equipment, and which also can implement at least one response action interpedently, as task force, TF for short. Despite of the clear definition, it is hardly to directly integrate all TF in response system for there are various TF, which made up of different Persons and equipments, and also no uniform format available of these domain entities. So by analogy to the emergency type classification, tree is the most suitable structure to organize all types of TF in one emergency response system. The TF tree has the same domain node to emergency type tree, each domain node could have descendants included detailed domain nodes and TF node, and the TF node is also can find a sole superior node marked specific domain. Also the leaf nodes can be group into different class by the measurable properties of their members and equipments like regimental battalion company, battery in military classification.
、
Task force Qualitative criterion Medical task force
A medical team of doctors and equipments
Fire task force Quantitative criterion Ordinary fire company Small fire company
5 firemen and a pumper
Medium fire company
10~15 firemen and 2~3 pumpers
Large fire company
15 more firemen,3 more pumpers
Chemical fire company Earthquake task force
Fig. 4. Task force classification
3.4 The arg-1…n Collected as Command Arguments There are some important complement roles to a response action in documents as argtmp arg-loc arg1 arg2 arg5. By the information they conveyed, what the concrete action is .can be determined, both for the documents reader and the TF who need to execute the action. So we call them action arguments, the variables to produce an real action.
488
J. Tan and X. Fan
In an order of clarity and comprehensibility, we map them into our universal emergency response system. First, the Atg-Tmp is a time point or period, which can be easily stored by numbered Date type parameters, commonly one for time point and two for period. Second, the Arg-loc and Arg5 are both indicating the location. In geographic information system, location can be defined by three types of spatial data, point\arc (line)\polygon, and these spatial data types already have their perfect definition after long time development in geographic field. Third, the Arg2 is the influence or target of an action. It is a bit more variable description than other arguments, for it can be qualitative or quantitative and it is action depended. Though it is hardly to directly extract a class from it, but we can use its original appearance in documents, which can be easily saved in computer as String Type, to make the expression and we call it Description. That is the best way by far. Fourth, Arg1 are substantives or persons that received or affected by the action of the predicate verb. We can not use a simple class as their deputy, and it is more difficult to build a composition model of persons and substantives which we can’t tell until the emergency occurs. Then a compromise class named Action object, which has the compulsory fields included name, description and an optional field location, can fulfill the roles. Name is the basic property of any object, while the description can be the container of its individual characteristics or can be regarded as the serialized result of concrete character at runtime. And occupation of a space is the implicit property of substance or person, and it is always useful in emergency response, so the optional location can be assigned whenever required, Briefly speaking, all the emergency command arguments in the universal emergency response system can be realized by these classes: TimePoint, Location, Description, ActionSubject, these classes are the super classes of any concrete emergency command argument and they are compose of the essential data types as String, point arc (line) polygon, Date. In this way the essential data types are able to transplant to different hardware and guarantee the capability of the response system. 3.5 Count Plan or Precaution Plan Is a Set or List of Actions After the mapping work of the VL, the main benefit of the universal emergency response system should be reviewed. It is that keeping the storage of involved information and assists the response process. Though the emergency involved semantic roles are stored, it is not enough for emergency manager to make decision in the scene, there is even no sufficient time to search for the information fragments to choose an action. The real efficient and needed is the preparation for every emergency---counterplan. Counterplan is the suppositional response process of a type of emergency, which could tell when\ where\ who to do something for emergency rescue in advance, but it must have the capability of the uncertainty during the emergency life cycle, otherwise could be useless. As argued in the semantic analysis section. The response process can be wholly described in predicate verb centered sentences. That means the process is made up of actions without exception. Then counterplan is also made up of actions. The capability of the
The Establishment of Verb Logic and Its Application
489
emergency uncertainty can be regard as some conditional options covered all the possibility of the scene. These options are also actions and at end of each of them there could also be a set of conditional options, in the angle of workflow the counterplan can be defined as a list of actions, arranged in series and parallel. More clearly, one set of parallel actions named as Emergency Step in our system. Then the design work of counter plan in universal emergency response system is, defining a step firstly, add actions secondly, in succession defining its next step for each action, and go through the approach in iteration until the response supposed finished. The fundamental action, which can be defined by limited semantic roles, disassembled into computer capable classes: emergency command task force and command arguments. So we can build up counterplan from bottom to top. many map to one One counterplan
Emergency type
Step 1
Executed in series
Step 2 Task force Parallel commands Emergency Command 1
Command argument 1
Emergency Command 2 Command argument 2 Emergency Command 3 Command argument 3 ….. …… Step 3 ….
Fig. 5. Structure of counterplan
4 Integration of Decision Models Decision models in emergency response are not the main theme of this paper, but they have been developed for years and a lot of them are fruitful. As well as emergency itself, the decision model is domain related for its inputs and arithmetic is altered among domains. But decision models are all quantitative process, their results are quantitative too. And these results must be used in emergency response process that in according to VL only have quantitative semantic roles of Arg-TMP,Arg-LOC,Arg5. Furthermore
490
J. Tan and X. Fan
the three roles have been defined as command arguments in universal emergency response system. So it can be deduced that all results of decision models in emergency response are command arguments. This conclusion means decision models can be stored as plugins and applied in the moment that command arguments needs to be assigned. That is the way to integrate any decision models.
5 The Work Flow of a Concrete Emergency Response After the preparation listed above, the work flow of a concrete emergency response can be constructed as following.
Fig. 6. Workflow in universal emergency response system
6 Conclusion In front of the challenge of complexity of emergency, path divides. Most of emergency response system researches focus on the interoperation among different information systems which aiming at one type of emergency or one part of the response procedure. But we choose the construction of a universal emergency response system, and are enlightened by the semantic analysis which has make lots of achievements in build up domain ontology or semantic web; we research on the documents in which the response process wholly described, using methods of semantic annotation. After statistic and analysis, we put forward three somatic rules as VL, which indicate the inner logic of all kinds of emergency responses. Then on the base of VL, the detailed design of universal emergency response system is made out, this design is capable of storing all the information fragments in emergency response process, even more it is capable of all types of emergency and related decision models.
The Establishment of Verb Logic and Its Application
491
7 Future Issues First, VL is extracted from limited samples; it could be of more persuasion if more documents of other kinds of emergency be analyzed. Second, mapping the semantic roles into computer could be more accurate and quantitative, in the system, the semantic roles that the goal and the object of action using String to provide for compatibility. The assumption that deeper semantic analysis of the predicate verb’s arguments could bring some new consequence to define the arguments more accurate and quantitative needs proof, Third, software engineering technology could make the system development more convenience. In our programming some design patterns and aspect oriented programming are adopted to provide the flexibility. Some other efficient technical methods need to be found out.
Acknowledgements We gratefully acknowledge the financial support of National Basic Research Program of China (973 Program, NO. 2009CB723906) and CAS Knowledge Innovation Program, NO.092101104A.
References 1. Meersman, R.A.: Semantic Ontology Tools in IS Design. In: Proceedings of the 11th International Symposium on Foundations of Intelligent Systems, vol. 1609, pp. 30–45 (1999) 2. Nagarajan, M., Scale, L.: Semantic Annotations in Web Services. Semantic Web Services. In: Semantic Web Services Processes and Applications, ch.2. Springer, Heidelberg (2003) 3. Xue, N.: A Chinese Semantic Lexicon of Senses And Roles. Language Resources and Evaluation 40, 3–4 (2006) 4. Abdalla, R., Tao, C.V., Li, J.: Challenges for the Application of GIS Interoperability in Emergency Management Geomatics Solutions for Disaster Management, pp. 389–405. Springer, Heidelberg (2007) 5. Allen, E., Edwards, G., Bedard, Y.: Qualitative causal modeling in temporal GIS. Spatial Information Theory 988, 397–412 (1995) 6. Bishr, Y.: Overcoming the Semantic And Other Barriers to GIS Interoperability. International Journal of Geographical Information Science 12(4), 299–314 (1998) 7. Briggs, D.: The Role of GIS: Coping with Space (and Time) in Air Pollution Exposure Assessment. Journal of Toxicology and Environmental Health-Part a-Current Issues 68 (13-14), 1243–1261 (2005) 8. Christakos, G., Bogaert, P., Serre, M.: Temporal GIS: Advanced Field-Based Applications, p. 217. Springer, Heidelberg (2002) 9. ElAwad, Y., Chen, Z., et al.: An Integrated Approach for Flood Risk Assessment for the Red River in Southern Manitoba. In: Annual Conference of the Canadian Society for Civil Engineering, Toronto (2005) 10. Erharuyi, N., Fairbairn, D.: Mobile Geographic Information Handling Technologies to Support Disaster Management. Geography 88, 312–318 (2003) 11. Farley, J.: Disaster Management Scenarios. Univ. of Arkansas, OGC Discussion Paper (1999)
An AIS-Based E-mail Classification Method* Jinjian Qing1, Ruilong Mao1, Rongfang Bie1,**, and Xiao-Zhi Gao2 1
College of Information Science and Technology, Beijing Normal University, Beijing 100875, P.R. China 2 Department of Electrical Engineering, Helsinki University of Technology, Espoo 02150, Finland
[email protected] Abstract. This paper proposes a new e-mail classification method based on the Artificial Immune System (AIS), which is endowed with good diversity and self-adaptive ability by using the immune learning, immune memory, and immune recognition. In our method, the features of spam and non-spam extracted from the training sets are combined together, and the number of false positives (non-spam messages that are incorrectly classified as spam) can be reduced. The experimental results demonstrate that this method is effective in reducing the false rate. Keywords: Artificial Immune System (AIS), Spam, E-mail Classification.
1 Introduction During the past decade, the flood of e-mail spam has been becoming extremely severe. It can take up a lot of resources like transmission, storage, and computation, cause the mail servers congested, spread pornography, publish reactionary remarks, deceive money, mislead the public, and make our social security jeopardized. It also affects the economics both directly and indirectly, bringing about inestimable loss in the countries all over the world. To fight against the e-mail spam, researchers have proposed numerous methods to distinguish spam messages [1, 2, 3], such as black list, manual rule sets [4], challenge response system, honeypot, etc. In general, these methods have restricted the spread of spam to a certain degree. However, due to their self-imposed limitations as well as the constant variation of spam characteristics, their performances in practice are not so satisfactory. Recently, Artificial Immune Systems (AIS) have been used for fault detection [13], spam detection [11, 12] and email classification [7]. This paper designs and implements a new e-mail classification method based on the Artificial Immune System (AIS). Firstly, it generates a set of spam detectors by training. Secondly, it generates non-spam detectors so as to optimize the detection space. After these two steps, it can classify emails by utilizing the non-spam and spam detectors successively and achieve *
Supported by National Natural Science Foundation of China (90820010) and the Academy of Finland under Grant 214144. ** Corresponding author. D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 492–499, 2009. © Springer-Verlag Berlin Heidelberg 2009
An AIS-Based E-mail Classification Method
493
the goal of reducing the false rate. In our experiments, we choose standard Chinese email samples from the CERNET Computer Emergency Response Team (CCERT), and compare our optimized classification method with traditional technique. The experimental results show that this new method is effective in reducing the number of false positives.
2 Background 2.1 Artificial Immune System Inspired by the biological immune system, the Artificial Immune System [5, 6] (AIS) is an emerging learning technique, which can provide evolutionary learning mechanism, such as noise tolerance, non-supervised learning, self-organization, and memorizing. It combines certain advantages of computational intelligence methods like neural networks, machine learning, etc. The AIS has been endowed with many features, e.g., diversity, adaptability, initiative, dynamics and robustness. When we use the AIS to deal with textual classification, we define information of interest as self and non-interest as non-self [7]. The classification scheme includes two stages. One is learning phase, in which the system constructs a series of information detectors based on the determined samples. The other is application phase, where we compare the new information considered as an intruding antigen with detectors generated previously so that we can determine whether it is self or non-self. 2.2 Evaluation Indicators of E-mail Classification Methods Generally, the recall rate and false rate are two evaluation indicators that can be used to assess any e-mail classification method. The following definitions are given. totalSum denotes the total number of e-mails we will check, among which the number of spam is spamSum, and that of non-spam is hamSum. The number of non-spam, which is falsely classified as spam is represented by falseSpamNum, and rightHamNum for legitimate e-mails correctly classified. Similarly, rightSpamNum indicates the number of spam with correct categorization, and falseHamNum denotes the number of spam, which is considered legitimate. According to these definitions, the recall rate and false rate are defined as below: a. recall rate:
Precall =
rightSpamNum × 100% , spamSum
(1)
Pvir =
falseSpamNum × 100% . hamSum
(2)
b. false rate:
2.3 E-mail Classification Method Based on AIS E-mail classification is a special textual problem with two categories. In general, an AIS-based classification method includes the following phases.
494
J. Qing et al.
Phase I: Pre-processing. It consists of code transformation, text body and header separation, Chinese word segmentation, and e-mail vector extraction. The extraction is the process of choosing a vector set, which best reflects the message characteristics [8]. The components of a vector can be the words in message, the size of text and attachments, the property that whether it is HTML coded, etc. The components are assigned with distinct weights according to their different importance, and they can represent the characteristics of e-mails. Phase II: Training. Generate a set of spam detectors based on the determined spam, i.e., find the most representative spam. A detector has a center and a radius, whose values can be adjusted during several experiments to find the most appropriate ones. We compare each new coming e-mail with the existing detectors to decide whether to add it to the detector set or not. If it is too close to a center of any detector, it is disregarded. The purpose of this approach is to make the detectors have the largest coverage while avoid their excessive aggregation. Because the length of an e-mail vector is indeterminate, it is difficult to calculate the affinity of two variable-length vectors. Generally, their affinity can be defined as follows:
aff (e[i ], e[ j ]) =
e[i ] ∩ e[ j ] , minL(e[i ], e[ j ])
(3)
aff(e[i], e[j]) represents the affinity of e-mail i and j, e[i]∩e[i] indicates the number of matching components of these two vectors, and the value of minL(e[i], e[i]) denotes the shorter length between them. Apparently, Equation (3) ensures the affinity ranges in [0, 1]. When two vectors have the same length, it works well, and the results are satisfactory. However, it becomes less effective, if the difference between their lengths becomes larger. Phase III: Application. The incoming e-mails are classified with the trained detectors. When a new e-mail arrives, its feature vector can be obtained after pre-processing, and the affinity between this e-mail and the detectors is calculated. If the affinity is greater than the threshold, it is a spam; otherwise non-spam. Phase IV: Feedback. It includes the manual intervention by the users, error correction, and re-training. This phase can guarantee the accuracy and timeliness of the detectors. To summarize, the AIS-based method utilizes the spam detectors in a straightforward way. In the detection space, only the e-mails falling within the radius range of a detector can be classified as spam.
3 Our Approach – An Optimized E-mail Classification Method Based on AIS As we know, the evaluation of an e-mail classification method depends on not only the recall rate, but also the false rate, which is an even more important indicator. From the users’ point of view, it is unacceptable to categorize the legitimate messages as spam. Hence, how to reduce the false rate of e-mail classification becomes an important research topic.
An AIS-Based E-mail Classification Method
495
In this paper, we design a new classification method, which employs two detector sets. One consists of non-spam detectors and the other spam detectors. By using both the non-spam and spam detectors successively, we can achieve efficient e-mail classification, and reduce the false rate. With the deployment of the non-spam detectors, this method divides the whole detection space into three parts, which are non-spam region, spam region, and blank region. Figure 1 shows the detection spaces of the traditional AIS-based and our classification methods.
(a)
(b)
Fig. 1. Detection spaces of traditional AIS-based and our classification methods. (a) Detection space of traditional AIS-based method: spam detector (circle with a round as its center), spam region (inside the circles) and non-spam region (outside the circles). (b) Detection space of our optimized method: spam detector (circle with a round as its center), non-spam detector (circle with a triangle as its center), spam region (inside the round center circles except shaded parts), non-spam region (inside the triangle center circles), and blank region (outside all the circles). Circles marked by “ ͖ ” are meaningless, which can be eliminated.
Figure 1(a) represents the detection space of a traditional AIS-based method. Here, a circle denotes a spam detector. The spam region is within the circle, while the outside is the area of non-spam. Figure 1(b) is the detection space of our classification
Fig. 2. Data flow chart of the optimized method
496
J. Qing et al.
method, where a spam detector is denoted by a circle with a round as its center and a non-spam detector with a triangle. More precisely, we use the non-spam detectors to do the first tier classification, and the second tier with the spam detectors. This ensures the cross parts of spam and non-spam is considered as the non-spam region instead of spam region. That is to say, the spam detectors can give priority to those non-spam detectors. In summary, during the training procedure, two sets of spam and non-spam detectors are generated. The flow chart is given in Fig. 2.
4 Experiments 4.1 Background We choose the e-mail sample set 2005-Jun (2005-Jun.tar.gz) as the experimental samples from the CCERT, which includes 25,088 spam and 9,272 non-spam. Lucene2.0 and IK_Canalyzer1.4 can be used as segmentation instruments for the Chinese words in our experiments. We check from the 1st to the 9,000th spam, dividing them into spamGroup1 to spamGroup9 with 1,000 messages each. For the non-spam, the same pre-processing is made, except that the group label is from hamGroup1 to hamGroup9. By training our scheme with spamGroup1 and hamGroup1, we finally generate 100 spam detectors and no more than 100 non-spam detectors. The detailed procedures are described as follows. We first generate a set of spam detectors by training. For example, select an e-mail from spamGroup1 for word segmentation, and generate an undetermined detector (denoted by det), which consists of a word list (wordList), the number of matched email (matchedCount) and the number of e-mail already detected (detedCount). We calculate the affinity on property wordList between det and every det[i] in detector set (detGroup). If it is greater than the threshold, add one to matchedCount of det[i]. Meanwhile, detedCount of det[i] should be increased, no matter whether their affinity is less or greater than the threshold. If the number of detGroup is less than 100, directly add det in; if it equals 100, add det to the set after eliminating the detector with the smallest weight. Next, we generate a set of non-spam detectors. Similar to the previous step, we circularly select an e-mail once from hamGroup1 for word segmentation, and generate an undetermined detector (also denoted by det). With regard to the affinity, the object to be compared with det is every spam detector generated previously rather than the non-spam. If any of the value is greater than the threshold, add det to the non-spam detector set. In particular, we point out that the reason why comparing a non-spam detector with the spam detectors is that we aim at searching for the detectors, which are similar to spam but actually belong to non-spam. Thus, we can discover the overlapping parts shown in Fig. 1(b). The definition of the weights is given as follows:
pValue =
matchedCount . detedCount
(4)
An AIS-Based E-mail Classification Method
497
For each detector newly added, its initial weight that is set to 1/20 has a close relationship with the quality of the detector sets. During the training process, to avoid excessive aggregation, we have to ensure that the over-similar detectors should not be added to the set simultaneously. After training, we carry out the other eight experiments for testing by using these trained detectors and sixteen remaining groups. 4.2 Results and Analysis The experimental results from the optimized classification method and traditional AIS-based method are as follows: the optimized one has a best recall rate of 83.1%,
Fig. 3. Recall rates of classification
Fig. 4. False rates of classification
498
J. Qing et al.
the worst 66.3%, and the average 71.93%. Concerning its false rate, the best is 1.4%, the worst 2.6%, and the average 1.89%. As for the traditional AIS-based classification method, the best recall rate is 85.2%, the worst 68.9%, and the average 74.05%. The corresponding best, worst, and average false rates are 3.2%, 5.0%, and 4.01%, respectively. Figure 3 shows the performance comparison between the optimized and traditional methods with respect to their recall rates, and the comparison of the false rates is demonstrated in Fig. 4. Figure 3 shows that in our eight experiments, the recall rate of the optimized classification method is lower than that of the traditional one. The range is from 0.4% to 2.3%, and the average drops to 75.2% from 76.4%. Figure 4 shows the false rate of the optimized classification method also decreases in varying degrees. The range is from 2% to 4%, and the average drops from 4.76% to 2.18%.
5 Conclusions This paper proposes a new e-mail classification method based on the AIS, which is endowed with good diversity and self-adaptive ability. Our method makes use of both the spam and non-spam from training sets, and the number of false positives is reduced. Based on the experiments and analysis, it is very effective in reducing the false rate. However, the disadvantage is that it also decreases the recall rate. How to make an appropriate trade-off between the false rate and recall rate will be a future research topic.
References 1. Ahmed, K.: An Overview of Content-based Spam Filtering Techniques. Informatica 31, 269–277 (2007) 2. Oda, T., White, T.: Developing an Immunity to Spam. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2723, pp. 231– 242. Springer, Heidelberg (2003) 3. Graham, P.: Better Bayesian Filtering. In: Proceedings of 2003 Spam Conference (2003) 4. SpamAssasin public corpus, http://spamassassin.apache.org 5. De Castro, L.N., Timmis, J.: Artificial Immune Systems: a New Computational Intelligence Approach. Springer, London (2002) 6. Freitas, A., Timmis, J.: Revisiting the foundations of artificial immune systems: A problem-oriented perspective. In: Timmis, J., Bentley, P.J., Hart, E. (eds.) ICARIS 2003. LNCS, vol. 2787, pp. 229–241. Springer, Heidelberg (2003) 7. Andrew, S., Freitas, A., Timmis, J.: AISEC: An Artificial Immune System for E-mail Classification. In: Proceedings of the IEEE Congress on Evolutionary Computation, Canberra, Australia, pp. 131–139 (2003) 8. Burim, S., Ohm, S.: Artificial Immunity-based Feature Extraction for Spam Detection. In: Eighth ACIS Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, Qingdao, P.R. China, pp. 359–364 (2006) 9. Chen, C.-L., Gong, Y.-C., Bie, R., Gao, X.Z.: Searching for interacting features for spam filtering. In: Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds.) ISNN 2008, Part I. LNCS, vol. 5263, pp. 491–500. Springer, Heidelberg (2008)
An AIS-Based E-mail Classification Method
499
10. Bie, R., Jin, X., Xu, C., Chen, C.-L., Xu, A., Shen, X.: Global and local preserving feature extraction for image categorization. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D.P. (eds.) ICANN 2007. LNCS, vol. 4669, pp. 546–553. Springer, Heidelberg (2007) 11. Oda, T., White, T.: Increasing the Accuracy of a Spam-detecting Artificial Immune System. In: Proceedings of the Congress on Evolutionary Computation, Canberra, Australia, vol. 1, pp. 390–396 (2003) 12. Oda, T., White, T.: Immunity from spam: An analysis of an artificial immune system for junk email detection. In: Jacob, C., Pilat, M.L., Bentley, P.J., Timmis, J.I. (eds.) ICARIS 2005. LNCS, vol. 3627, pp. 276–289. Springer, Heidelberg (2005) 13. Gao, X.Z., Ovaska, S.J., Wang, X., Chow, M.Y.: A Neural Networks-based Negative Selection Algorithm in Fault Diagnosis. Neural Computing and Applications, 17(1), 91–98 (2008)
A New Intrusion Detection Method Based on Antibody Concentration Jie Zeng, Tao Li, Guiyang Li, and Haibo Li School of Computer Science, Sichuan University, Chengdu 610065, China
[email protected] Abstract. Antibody is one kind of protein that fights against the harmful antigen in human immune system. In modern medical examination, the health status of a human body can be diagnosed by detecting the intrusion intensity of a specific antigen and the concentration indicator of corresponding antibody from human body’s serum. In this paper, inspired by the principle of antigenantibody reactions, we present a New Intrusion Detection Method Based on Antibody Concentration (NIDMBAC) to reduce false alarm rate without affecting detection rate. In our proposed method, the basic definitions of self, nonself, antigen and detector in the intrusion detection domain are given. Then, according to the antigen intrusion intensity, the change of antibody number is recorded from the process of clone proliferation for detectors based on the antigen classified recognition. Finally, building upon the above works, a probabilistic calculation method for the intrusion alarm production, which is based on the correlation between the antigen intrusion intensity and the antibody concentration, is proposed. Our theoretical analysis and experimental results show that our proposed method has a better performance than traditional methods. Keywords: Antibody concentration, Antigen intrusion intensity, Antigenantibody reactions, Intrusion detection, False alarm rate, Detection rate.
1 Introduction Intrusion detection is a process used to identify abnormal activities in a computer system. Both of Detection Rate (DR) and False Alarm Rate (FAR) are two key attributions for evaluating the performance of Intrusion Detection System (IDS). Generally, traditional methods often sacrifice their FAR to strengthen DR, which means much time and many system resources have to be wasted for dealing with these flooding false alarms [1]. Therefore, finding an effective way to reduce FAR is significant for promoting IDS accuracy. Recently, many techniques have been proposed for reducing FAR. In 2007, Hamsici et al. [2] proposed a detector training method relied on Naive Bayes (NB). It provides the one-dimensional subspace, where the detection rate error is minimized for intrusion detection problem with homoscedastic Gaussian distribution. In 2008, Hu et al. [3] presented an intrusion detection algorithm based on Multilevel Classifier (AdaBoost). In this algorithm, decision stumps are used as weak classifier. The decision rules are provided for both categorical and continuous features. By combing the D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 500–509, 2009. © Springer-Verlag Berlin Heidelberg 2009
A New Intrusion Detection Method Based on Antibody Concentration
501
weak classifier for categorical features into a strong classifier, the relations between these two different types of features are handled naturally, without forced conversions between continuous and categorical features. In 2009, Hu et al. [4] presented an anomaly detection method using Hidden Markov Model (HMM). It has been studied with emphasis placed on system call-based HMM training. An enhanced incremental HMM training framework has been proposed that incorporates a simple data preprocessing method for identifying and removing similar and false subsequences of system calls in real-time. In order to reduce FAR, the above methods aim at the optimization for detector lifecycle to fit the change of network running environment. However, all of them are lack of anomaly tolerance mechanism: their intrusion alarm is produced by “once detected, once triggering” strategy, which means IDS produces intrusion alarm immediately when an anomaly is detected by system. As a result, system will treat every detected anomaly as the harmful intrusion and increase FAR rapidly. Furthermore, without considering the intrusion destructive effect and the invaded computer importance, system will produce a lot of low accurate intrusion alarms that causing for FAR. There has a satisfied effect on reducing FAR when the ideas of immunology in Human Immune System (HIS) are used into the intrusion detection domain [5]. In 1994, Forrest et al. [6] presented a Negative Selection Algorithm (NSA) for simulateing the tolerance process of new immune cell in HIS to avoid matching any normal activity. In 2002, Kim [7] proposed an algorithm of Dynamic Clonal Selection (DynamiCS), where the self definition can be changed in real-time and has greatly promoted the research of FAR reduction. In 2005, Li [8] provided an Immune based Dynamic Intrusion Detection model (Idid). Compared with traditional methods, Idid has a better adaptability to the change of network running environment and a lower FAR result. In modern medical examination, the health status of a human body can be diagnosed by detecting the intrusion intensity of a specific antigen and the concentration indicator of the corresponding antibody from a human body’s serum [9]. This diagnosis method is established based on Burnet’s Clonal Selection Theory [10]: the HIS consists of a complex set of cells and molecules that protect organs against infection. With many different kinds of lymphocytes (B cell, T cell and so on) distributing all over the human body, the HIS can distinguish nonself from self and then eliminate nonself [11]. Once the affinity that a B cell matches to antigen comes up to a certain threshold, the B cell activates. At this moment, the concentration level of corresponding antibody is so low that human body does not appear any symptom. With expansion at the antigen intrusion intensity, the B cell will clone itself and secrete a lot of antibodies to catch more antigens, resulting in a rapid increase of antibody concentration. When the antibody concentration reaches at a certain level, human body appears some symptoms to warn of the disease, such as cough, inflammation and fever. After the elimination of antigens, the clone proliferation process is suppressed and the antibody concentration is decreased simultaneously [12]. After that, the human body restores its rehabilitation. Therefore, the health status of a human body can be diagnosed by combing the intrusion intensity of all types of antigens and the concentration of all kinds of antibodies. In this paper, inspired by this immune response in HIS, we present a New Intrusion Detection Method Based on Antibody Concentration (NIDMBAC) to reduce FAR without affecting DR. Our method takes advantage
502
J. Zeng et al.
of three processes, including the intrusion confirmation, the harmful intrusion confirmation and the intrusion alarm production whose results are coming from the correlation between the antigen intrusion intensity and the antibody concentration. The method is evaluated using the DARPA intrusion detection dataset [13]. Compared with NB, AdaBoost and HMM, the FAR of our proposed method is reduced by 8.66%, 4.93% and 6.36%, respectively. Our experimental results show that our proposed method has a better performance than previous methods. The rest of the paper is organized as follows. In Section 2 we describe the definitions and proposed model of NIDMBAC that we have established. Section 3 presents the experimental results of NIDMBAC, followed by conclusions.
2 Definitions and Proposed Model of NIDMBAC Certain of basic terms employed in this paper are defined below: Definition 1. Antigen (Ag) are defined as the n-dimensional vectors of network activity features composed of Internet Protocol (IP) addresses, port number, protocol type, TCP/UDP/ICMP fields, the length of IP packets, etc. The structure of an antibody is the same as that an antigen. It is given by
Ag = {a | a ⊂ f , f = { f1 , f 2 ,
, f n }l , f n ∈ [0,1], l > 0}.
(1)
Definition 2. Self antigens (Self) are normal elements in IP packets, including normal sanctioned network service transaction dataset, no malicious background clutter dataset and so on. It is given by
Self = {< s, rd >| s ∈ f , rd ∈ },
(2)
where s is a n-dimensional vector of network activity features, rd is the self radius and is the real number set. System allows variability of each self element is specified by its self radius. Definition 3. Nonself antigens (Nonself) are intrusions in IP packets. We have
Nonself = {< s, rd >| s ∈ f , rd ∈ },
(3)
where s is a n-dimensional vector of network activity features, rd is the nonself radius and is the real number set, such that Self ∪ Nonself = Ag ,
(4)
Self ∩ Nonself = ∅.
(5)
Definition 4. Detector set (D) simulates B cells in HIS for detecting antigens defined
D = {< ab, rd , aff , den, cos, count , age >| ab ∈ f , (rd , aff , den, cos) ∈ , (count , age) ∈ },
(6)
A New Intrusion Detection Method Based on Antibody Concentration
503
where ab is the antibody that represents a n-dimensional vector of network activity features, rd is the detection radius of detector, aff is the affinity of detector, den is the density of detector, cos is the costimulation of detector, count is the antigen number matched by ab, age is the age of detector, is the real number set and is the natural number set. The above parameters are calculated in our previous work (see the ref. [14]). 2.1 Antigen Classified Recognition
The task of antigen classified recognition is discriminating intrusion (nonself antigen) from normal element (self antigen) by the primary immune response in NIDMBAC. Detector Maturation. In order to avoid matching any self antigen, newly created detectors in an immature detector population should be tested by a negative selection operator immediately after their birth. As seen in the related work, a negative selection operator to given self antigens. From this comparison, if an immature detector matches to any self antigen during a predefined time period, this immature detector should be removed from an immature detector population. After that, the detector is matured. The process of detector maturation is given by Equation (7).
⎧ ⎪1, x.age > ω ∧ ∀y ∈ Self ∧ Mature( x) = ⎨ ⎪ ⎩0, otherwise,
n
∑ ( x.ab k =1
k
− y.sk )2 > x.rd ,
(7)
where x ∈ D and ω is the predefined time period for the process of detector maturation. Equation (7) shows if the function Mature( x) is equal to 0, in which the detector lies within the range of normal space, the detector is invalid and should be removed. Let T be the mature detector population given by T = {x | x ∈ D, Mature( x) = 1}.
(8)
Detector Activation. Mature detectors, which have gained tolerance against self antigens. Each mature detector attempts to match to new antigens but when a mature detector matches any new antigen, it does not immediately regard the detected antigen as nonself. Instead, it continues attempting to match more new antigens until it matches to a sufficient number of new antigens. After that, the detector is activated. When detectors are activated, they send “anomaly appearance” signals to other detectors. The process of detector activation is given by Equation (9).
⎧1, x.age ≤ σ ∧ x.count ≥ τ , Activate( x) = ⎨ ⎩0, otherwise,
(9)
where x ∈ T and σ is the predefined time period for the process of detector activation and τ is the detector activation threshold. If the antigen matched count satisfies the activation threshold before the age of a mature detector reaches the predefined time period, the mature detector activates.
504
J. Zeng et al.
Detector Memorization. Activated mature detector provides an anomaly detection signal and the detected antigens to system for further checking. If the detector receives a costimulation signal from system is greater than a specified value within a limited time, the mature detector immediately become memory one. After that, memory detector can recognize this anomaly as a type of nonself antigen definitely. The process of detector memorization is given by Equation (10).
⎧1, x.age ≤ θ ∧ x.cos ≥ λ , Memorize( x) = ⎨ ⎩0, otherwise,
(10)
where x ∈ T and θ is the predefined time period for the process of detector memorization and λ is the detector memorization threshold. Let M be the memory detector population given by M = {x | x ∈ D, Mature( x) = 1 ∧ Activate( x) = 1 ∧ Memorize( x) = 1}.
(11)
At the same time, we have = D ,
(12)
T ∩ M = ∅.
(13)
T ∪ M
2.2 The Change of Antibody Number
According to this secondary immune response in HIS, we present a calculation method to compute the change of antibody number in real-time. We divide the secondary immune response into two phases, including antibody number accumulation and antibody decrement. In NIDMBAC, when memory detector clones, we increase its antibody number. However, if a memory detector does not match any antigen during antibody number maintaining period ( γ ), its antibody number will be decreased to zero, which means this type of intrusion has been eliminated. However, if a memory detector matches an antigen again during a period of γ , its antibody number will be accumulated, which means this type of intrusion increases continually. Antibody Number Accumulation. If intrusions are continuous, the antibody number rapidly accumulates under the exciting process of clone proliferation in the memory detector population. Consider a system composed of detector subpopulations, Di , i = 1, , n . Each subpopulation when memorized grows according to
∂Di = Snew + Sclone Rmem Di , ∂t
(14)
where Snew is the rate of generation of new Di detectors in system, Sclone is the memory detectors clone rate and Rmem is the memorization ratio of Di .
A New Intrusion Detection Method Based on Antibody Concentration
505
When memorized, Di secretes antibody abi , at rate Si (t ) . Because there is a memorization period before antibody secretion, we assume ∂Si = Smem ( Smax _ secrete Rmem − Si ), ∂t
(15)
where Smax _ secrete is the maximum rate of antibody secretion and Smem determines the rate of memorization. At times long compared with 1/ S mem , the secretion rate is approximately Smax _ secrete Rmem . According to the definition of the density of detector, the antibody number, numi , accumulation of abi is given by Equation (16). ∂numi = Si Di − Sabb deni numi , ∂t
(16)
where Sabb is the rate of abi that bound by other antibodies. From Equations (14) to (16), we know that the larger and the more diversified memory detector population has, the more specific antibodies can be secreted to eliminate antigens. Antibody Number Decrement. If intrusions are stop, the antibody number decreased under the suppressed process of clone proliferation in the memory detector population. The antibody number, numi , decrement of abi is given by Equation (17).
∂numi 1 )numi , = (1 − ∂t γ − agei
(17)
where agei is the age of Di . 2.3 Intrusion Alarm Production
In modern medical examination, the health status of a human body can be diagnosed by detecting the intrusion intensity of a specified antigen and the concentration indicator of the corresponding antibody from a human body’s serum. Synthesizing the intrusion intensity of all types of antigens and the concentration of all kinds of antibodies, a disease can be analyzed. According to this relationship between the antigen intrusion intensity and the antibody concentration, a probabilistic calculation method for intrusion alarm production is established. The probability of intrusion alarm production is related to four factors, including the antibody concentration of memory detector, the antigen matched count of memory detector, the intrusion destructive effect and the invaded computer importance. At time t, the probability of intrusion alarm production is given by Equation (18), where v is the adjusted coefficient, con(t) is the antibody concentration of memory detector population at time t and intru(t) is the antigen matched count of memory detector population at time t.
506
J. Zeng et al.
P a l a r m ( t ) = m i n {1 : υ ⋅ c o n ( t ) ⋅ i n t r u ( t ) } ,
(18)
where con(t ) and intru (t ) is shown in Equations (19) and (20). 2
c o n (t ) = 1+ e
−μ
m
∑
− 1,
n
x
j
j =1
∑
(19)
yi n u m i ( t )
i=1
where μ is the antibody concentration adjusted coefficient, x j is the weight of jth invaded computer importance, yi is the weight of ith type intrusion destructive effect. in tr u (t ) =
m
n
j =1
i=1
∑ ∑ ∑
x∈ M
x .c o u n t ( t ) ,
(20)
i
where M i is the ith type of memory detector population.
3 Experimental Results The simulation experiments are carried on the DARPA intrusion detection dataset to test the performance of IDS. 3.1 The Change of Antibody Concentration Experiment
5
2.5
x 10
1 Antibody concentration
Antigen intrusion intensity (Packets)
The NIDMBAC monitors three computers (computer1, computer2, computer3), which are given the invaded computer importance as x1 is equal to 0.6, x2 is equal to 0.1 and x3 is equal to 0.3. We launch LAND intrusion (y1 is equal to 0.6) to computer1 and computer2, at the same time, computer3 is assaulted by DoS intrusion (y2 is equal to 0.4). Figs. 1 to 4 show the change of antibody concentration, which is produced from the stimulation of these intrusions and a specific intrusion that the whole network and a specific computer face, is calculated by Equation (19). From Figs. 1 to 4, we see that, when the antigen intrusion happens and the antigen intrusion intensity increases, the corresponding antibody concentration also increases synchronously. Furthermore, when the antigen intrusion intensity decreases, the corresponding antibody concentration also decreases synchronously
2 1.5 1 0.5 0 0
0.2
0.4
0.6
0.8 1 1.2 Time (Second)
(a)
1.4
1.6
1.8
2 4
x 10
0.8 0.6 0.4 0.2 0 0
0.2
0.4
0.6
0.8 1 1.2 Time (Second)
1.4
1.6
1.8
2 4
x 10
(b)
Fig. 1. (a) Antigen intrusion intensity on Computer1, (b) Antibody concentration on Computer1
507
5
x 10
2
0.4 Antibody concentration
Antigen intrusion intensity (Packets)
A New Intrusion Detection Method Based on Antibody Concentration
1.5 1 0.5 0 0
0.2
0.4
0.6
0.8 1 1.2 Time (Second)
1.4
1.6
1.8
0.3 0.2 0.1 0 0
2
0.2
0.4
0.6
4
x 10
(a)
0.8 1 1.2 Time (Second)
1.4
1.6
1.8
2 4
x 10
(b)
4
10
x 10
0.25 Antibody concentration
Antigen intrusion intensity (Packets)
Fig. 2. (a) Antigen intrusion intensity on Computer2, (b) Antibody concentration on Computer2
8 6 4 2 0 0
0.2
0.4
0.6
0.8 1 1.2 Time (Second)
1.4
1.6
1.8
0.2 0.15 0.1 0.05 0 0
2
0.2
0.4
0.6
4
x 10
(a)
0.8 1 1.2 Time (Second)
1.4
1.6
1.8
2 4
x 10
(b)
5
3
x 10
1 Antibody concentration
Antigen intrusion intensity (Packets)
Fig. 3. (a) Antigen intrusion intensity on Computer3, (b) Antibody concentration on Computer3
2
1
0 0
0.2
0.4
0.6
0.8 1 1.2 Time (Second)
1.4
1.6
1.8
2
0.8 0.6 0.4 0.2 0 0
4
0.2
0.4
0.6
0.8 1 1.2 Time (Second)
x 10
(a)
1.4
1.6
1.8
2 4
x 10
(b)
Fig. 4. (a) Antigen intrusion intensity on Network, (b) Antibody concentration on Network
3.2 Intrusion Alarm Production Experiment
We calculate the probability of intrusion alarm from the experimental results of Section 3.1 according to Equation (18). The probability of intrusion alarm production on invaded computers, intrusions and the whole network are shown in Figs. 5 to 8, whose curve tendency is changing with the antibody concentration and the antigen intrusion intensity. According to the experimental results, we know that the antibody concentration and the antigen intrusion intensity can provide reliable evidence for producing intrusion alarm.
508
J. Zeng et al. 0.5 Probability of intrusion alarm production
Probability of intrusion alarm production
1 0.8 0.6 0.4 0.2 0 0
0.2
0.4
0.6
0.8 1 1.2 Time (Second)
1.4
1.6
1.8
0.4 0.3 0.2 0.1 0 0
2
0.2
0.4
0.6
4
x 10
0.8 1 1.2 Time (Second)
1.4
1.6
1.8
2 4
x 10
Fig. 5. Probability of intrusion alarm produc- Fig. 6. Probability of intrusion alarm production on Computer1 tion on Computer2 1 Probability of intrusion alarm production
Probability of intrusion alarm production
0.25 0.2 0.15 0.1 0.05 0 0
0.2
0.4
0.6
0.8 1 1.2 Time (Second)
1.4
1.6
1.8
2 4
x 10
0.8 0.6 0.4 0.2 0 0
0.2
0.4
0.6
0.8 1 1.2 Time (Second)
1.4
1.6
1.8
2 4
x 10
Fig. 7. Probability of intrusion alarm produc- Fig. 8. Probability of intrusion alarm production on Computer3 tion on Network
3.2 FAR Reduction Experiment
We do the comparison experiment to prove NIDMBAC can reduce FAR effectively. Table 1 shows the comparison experimental results among NIDMBAC, NB, AdaBoost and HMM, the FAR of NIDMBAC is reduced by 8.66%, 4.93% and 6.36%. Table 1. The comparison between NIDMBAC and other methods on FAR Method NIDMBAC NB AdaBoost HMM
Detection Rate 95.62% ~ 96.06% 96.81% ~ 97.21% 90.04% ~ 90.88% 95.16% ~ 96.00%
False Alarm Rate 1.95% ~ 4.04% 7.94% ~ 12.70% 6.55% ~ 8.97% 1.33% ~ 10.40%
4 Conclusions In NIDMBAC, the concept of antibody concentration in HIS has been abstracted and extended, and a new method based on antibody concentration for intrusion detection has been presented. In contrast to the traditional methods, NIDMBAC not only has the intrusion detection ability while system is invaded, but also can effectively reduce false alarm rate without affecting detection rate and has strong ability for detecting DoS intrusion. Concretely, Our proposed method can calculate the types, number and intensity of intrusion quantitatively in real-time, produce intrusion alarm signal
A New Intrusion Detection Method Based on Antibody Concentration
509
according to the correlation between the antibody concentration and the antigen intrusion intensity to warn of the harmful intrusion, avoid alarm flood that caused by the “once detected, once triggering” strategy of traditional methods.
Acknowledgements This paper was supported by the National Natural Science Foundation of China (Nos. 60573130, 60502011 and 60873246), the National High-Tech Research and Development Plan of China (No. 2006AA01Z435) and the National Research Foundation for Doctoral Program of Higher Education of China (No. 20070610032).
References 1. Kemmerer, R.A., Vigna, G.: HI-DRA: Intrusion Detection for Internet Security. IEEE Transactions on Signal Processing 93, 1848–1857 (2005) 2. Hamsici, O.C., Martinez, A.M.: Bayes Optimality in Linear Discriminant Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 647–657 (2008) 3. Hu, W.H., Hu, W., Maybank, S.: Adaboost-Based Algorithm for Network Intrusion Detection. IEEE Transactions on Systems, Man and Cybernetics. 38, 577–583 (2008) 4. Hu, J.K., Yu, X.H., Qiu, D.: A Simple and Efficient Hidden Markov Model Scheme for Host-Based Anomaly Intrusion Detection. IEEE Network 23, 42–47 (2009) 5. Forrest, S., Hofmeyr, S.A.: Computer Immunology. Communications of the ACM 40, 88–96 (1997) 6. Forrest, S., Perelson, A.S.: Self-nonself Discrimination in a Computer. In: 1994 IEEE International Symposium on Security and Privacy, pp. 202–212. IEEE Press, Oakland (1994) 7. Kim, J., Bentley, P.: Toward an Artificial Immune System for Network Intrusion Detection: An Investigation of Dynamic Clonal Selection. In: 2002 IEEE Congress on Evolutionary Computation, pp. 1015–1020. IEEE Press, Honolulu (2002) 8. Li, T.: An Immune Based Dynamic Intrusion Detection Model. Chinese Science Bulletin 50, 2650–2657 (2005) 9. Mullighan, C.G., Philips, L.A., Su, X.P.: Genomic Analysis of the Clonal Origins of Relapsed Acute Lymphoblastic Leukemia. Science 322, 1377–1380 (2008) 10. Burnet, F.M.: The Clonal Selection Theory of Acquired Immunity. Cambridge University Press, New York (1959) 11. Han, B.R., Herrin, B.R., Cooper, M.D.: Antigen Recognition by Variable Lymphocyte Receptors. Science 321, 1834–1837 (2008) 12. Wrammert, J., Smith, K., Miller, J.: Rapid Cloning of High Affinity Human Monoclonal Antibodies against Influenza Virus. Nature 453, 667–672 (2008) 13. Aydin, M.A., Zaim, A.H., Ceylan, K.G.: A Hybird Intrusion Detection System Design for Computer Network Security. Computers and Electrical Engineering 35, 517–526 (2009) 14. Zeng, J., Zeng, J.Q.: A Novel Immunity-Based Anomaly Detection Method. In: 2008 International Seminar on Future Biomedical Information Engineering, pp. 195–198. IEEE Press, Wuhan (2008)
Research of the Method of Local Topography Rapid Reconstructed Minrong Zhao1, Shengli Deng2, and Ze Shi1 1
The Science Institute, Air-Force Engineering University, Xi’an, 710038, China 2 Air-Force 93861 Army, Sanyuan, 713800, P.R. China
[email protected] Abstract. For fast and convenient access to the environment based on the geomorphic characteristics of camouflage regional model for the complexity of topography, this article analyzes a variety of terrain modeling method's advantages and limitations, discussed a variety of modeling methods in the set up of the study of basic on the hybrid modeling method and the integrated use of research results to generate the details of the existing landform characteristics can be controlled on all-terrain results. Generate local terrain adaptive modeling method, as a regional model disguised form with the local terrain topography of the region to adapt to a good camouflage effect. Keywords: Terrain model, Surface integration, Fractal model, Disturbance.
1 Introduction Three-dimensional terrain accurate modeling to environment for all-terrain, are one of the key technologies in simulation and virtual, that is widely used in various different areas of the terrain model needs. With the development of means of reconnaissance and surveillance technologies enhance traditional methods of camouflage for protection can easily be identified and targeted by precision-guided weapons, people have begun to study the method of camouflage based on the environmental characteristics the topography. Express, accurate, efficient generation and positions around the geomorphic environment integration and can be camouflaged with the object of the camouflage to match the regional geomorphology, the guidance of weapons and equipment for self-adaptive camouflage of great significance. Traditional methods because of disguised algorithm and the technology itself the characteristics and limitations, together with the positions at the complexity of topography and irregular, and the existing topography of the simulation algorithm are under certain conditions or against some object that has applicability . To this end in accordance with the characteristics of landform position, combined with existing algorithms and technologies, A fractal technology directly to the existing methods used to generate the surface disturbance, with the surrounding environment to generate topography camouflage integration model to meet the needs of a good camouflage. D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 510–516, 2009. © Springer-Verlag Berlin Heidelberg 2009
Research of the Method of Local Topography Rapid Reconstructed
511
2 Currently Used Methods of Terrain Modeling Analysis 2.1 Mathematical Characteristics of the Real Terrain So far, none has a theory derived from a purely mathematical characteristics of the real terrain Ways, many geophysicists summed up the real terrain and math-related characteristics from across the wide range of temporal and spatial scales of topographic data analysis [1], there are as follows: z z z z
the fractal dimension; multi-level fractal. anisotropy. must scale the scope of statistical self-similarity
p ( X (t ) < x) = p ( X (γt ) < γ a x . z variation Satisfied bad
E[ X (t + h) − X (t )] 2 = k h
z profile power spectral density Satisfied z non-stationary process. z incremental smooth.
2a
.
G (ω ) = 2πkω − a .
Topography simulation problem to be solved is how to construct the terrain model, so that the model as much as possible with the above characteristics, or math with the practical application of the field of math required characteristics. 2.2
Commonly Used Methods of Terrain Modeling [2]
1) The simplest data model of mountains are sparse distribution of points from some of the elevation value of the triangle pose a number of easy-plane to form a topographic framework, according to the triangle affixed to mountains texture map. This method can be controlled through the location of sparse the point acheived the purpose of control, but because of the formation of the framework was too sketchy and cartoon graphics finally make very obvious effects, felt the impact of the real terrain. 2) Fractal geometry with infinite details and statistical self-similarity of the typical characteristics, it is using the recursive algorithm to make complex features that simple rules can be used to generate, in a variety of existing fractal Brownian motion method implementation, MPD method because algorithm is simple, easy to implement and achieve faster and been widely used. Surface Fitting Method 3) With spline interpolation and surface fitting method for fitting data points to form a smooth connection, can improve the effect, but the lack of details of the irregular terrain description, which is characterized by realistic terrain where. 4) Improved algorithm In order to increase the realistic effects, in fitting out to be perturbations on the surface, forming a more realistic surface topography. Perturbation method used in one of those who shape interpolation and multi-resolution stochastic approach (mulrtiesolutinostoehastie relaxtion) and so on. One of sub-J-zhen interpolation method is the use of IFS (Iteration Funetion Sysrem) Ways constructed fractal interpolation
512
M. Zhao, S. Deng, and Z. Shi
function. Multiresolution stochastic relaxation method combining the spline surfaces of the structure, requires energy optimization method structure spline surfaces to achieve a large quantity of calculation, more complex. A variety of modeling methods study to the development of topographic maps to build had a positive impact, and for further research provides a theoretical reference and methods of drawing. But there are some inadequacies and weaknesses, mainly embodied in the following ways: l) With a method for different types of lack of systematic geomorphology. The vast majority of modeling algorithms are targeted at specific regions and applications launched, but the actual terrain is often a variety, different landforms on the model of the requirements are not the same, so how can we reasonably effective use of these methods has become Can not not consider the question. 2) To different modeling methods lack of research of adaptive law. There has been no response to a landform type, using the same modeling algorithm to examine the geomorphological type of modeling algorithm for determination of what is best. There is therefore necessary to study the adaptability of the law of modeling side of the terrain modeling for the future provide a valuable reference.
3 FBM Mathematical Characteristics [3] We demand that the camouflage regional topography Analog pursue FBM incremental are stable and self-similarity, the product of such of its statistical topography is consistent. FBM are defined in a probability space on a stochastic process X ∶ [0, ∞] → R , and meet the following conditions:
X (1) continuous and X (0) = 0. (b) for any t ≥ 0 and h ≥ 0 , X (t + h) − X (t ) subject to the following distribu(a) With probability 1, tion:
p (( X (t + h) − X (t ) < x) = 1
p(( X (t + h) − X (t ))
γ , p( x; μˆ 0 , H 0 )
(12)
where μˆ1 is the maximum likelihood estimate (MLE) of μ1 assuming H 1 is true, and μˆ 0 is the MLE of μ 0 assuming H 0 is true. n
When the variance of residual is known, the MLE of μ is μˆ = ∑ x i / n = x and i =1
GLR statistic is ln
p( Hˆ 1 ) n( x ) 2 = , 2σ 2 p( Hˆ 0 )
(13)
or we decide H 1 if x > γ ′ When the variance of residual is unknown, the MLE of μ and σ 12 which satisfy n
∂p ( H 1 ) / ∂μ = 0 and ∂p( H 1 ) / ∂σ 2 = 0 at the same time are μˆ = ∑ x i / n = x , i =1
1 n 2 σˆ = ∑ (x i − x ) n i =1 2 1
, σˆ
2 0
1 n = ∑ x i2 n i =1
and GLR statistic is ln
p( Hˆ 1 ) n ⎛⎜ ( x ) 2 ⎞⎟ = ln 1 + 2 . σˆ1 ⎟⎠ p( Hˆ 0 ) 2 ⎜⎝
(14)
2 In this case, we decide the sensor is degraded when (x ) / σˆ 12 > γ One possibility for implementing the GLRT would be, at time t, to test for faults occurring at all prior timesteps. The computational expense, however, would be excessive. A more manageable approach taken in this paper is to test only for faults occurring in a window of length w (a user defined quantity), i.e., between timesteps tw+1 and the current time t. Thus, to implement the test one need only consider the w most recent residuals.
GLRT Based Fault Detection in Sensor Drift Monitoring System
605
The null hypothesis is that no fault has occurred, and each of the w alternative hypotheses is that the fault occurred at one of the N timesteps within the window. At current time t, the Generalized Likelihood Ratio (GLR) statistic, for the case that variance of residual is known, associated with the hypothesis that the fault occurred at time t-k +1(k= 1, 2, . . ., w) is given by Gt = max[GLRt (k ), 1 ≤ k ≤ w] = max[ k (xt (k ) )2 / σ 2 , 1 ≤ k ≤ w] where GLRt (k ) = k ⋅ ( xt (k ) )
2
,
(15)
t
and xt (k ) = ∑ xi k with a user defined threshold γ σ2 i =t − k +1 chosen to provide a desired in control Average Run Length (ARL). If G t < γ it is
concluded that no drift has occurred. If G t > γ it is concluded that the drift occurred at time k ∗ , where k ∗ is the value of k that maximizes (15). To implement the GLRT, the user must select the test parameters w and γ . For all examples in this paper w=50 was chosen. The required γ which produces a desired in-control ARL will depend on the fault signature and w. But Monte Carlo simulations for determining γ cannot be used because the data are acquired from the real nuclear power plant during startup period.
3 Application to the NPP Measurements 3.1 Experimental Data
The proposed algorithm was confirmed with the real plant startup data of the Kori Nuclear Power Plant Unit 3. These data are the values measured from the primary and secondary systems of the NPP. The data is derived from the following 11 types of measured signals: the reactor power (the ex-core neutron detector signal, Sensor 1); the pressurizer water level (Sensor 2); the SG steam flow rate (Sensor 3); the SG narrow range level (Sensor 4); the SG pressure (Sensor 5); the SG wide-range level (Sensor 6); the SG main feedwater flow rate (Sensor 7); the turbine power (Sensor 8); the charging flow rate (Sensor 9); residual heat removal flow rate (Sensor 10); and the reactor head coolant temperature (Sensor 11). The data were sampled at a rate of 1 minute for about 38 hours. The total observation number of measurement data is 2,290 which is divided into five subsets of equal size, i.e., one training subset, one test subset and three optimization subsets. Total data set was indexed using Arabic numerals, i.e., i = 1,2,. . ., 2,290. 458 patterns with the indices, i=5j+3, j=0,1,…,457, named KR3 were used to train SVR to capture the quantitative relation between 11 inputs and outputs. KR1 which has indices of 5j+1, j=0,1,…,457, used for the test of the model, while the remaining three subsets (KR2, KR4, KR5) for the optimization. Let (θ 1 , θ 2 , " , θ 11 ) denote principal components (PCs) obtained by applying PCA to the above plant data. As mentioned earlier, variance is used in selecting dominant PCs. We found that θ1 is the most dominant PC and explains about 84.12% of total
606
I.-Y. Seo et al.
variation in the original data. However, in order to minimize loss of information, the first seven PCs are considered in this study. The selected PCs explain more than 99.98% of total variation. The loss of information is less than 0.1%. Parameter regularization is an important step in AASVR modeling. There are three parameters to be determined in AASVR. They are bandwidth of Gaussian radial basis kernel ( σ ), insensitive band of loss function ( ε ), and regularization parameter (C). In this study, they are assumed common in every model of AASVR. This paper adopts a statistical design of experiments called response surface methodology (RSM), which has been widely used in the fields of product design and process optimization. From subsequent analysis results, it is shown that σ and ε have a significant effect on MSE, while C is of little significance. A response surface plot of log(MSE ) versus σ and ε is depicted in Fig. 4. The optimum point of the response surface is obtained as (σ , ε , C ) = (1.4,0.0005,6.8) .
-6
log(MSE)
-7
-8
-9
-10 0.06 2
0.04
1.5 1
0.02 0.5 Epsilon
0
0
Sigma
Fig. 3. Response surface plot of log(MSE) versus σ and ε
Fig. 4. RMS of estimated sensors
3.2 Test Results
Empirical model building is now done using PCSVR proposed in this paper. The numbers of support vectors needed for each SVR are 381 (83.2%), 230 (50.2%), 381 (83.2%), 108 (23.6%), 333 (72.7%), 206 (45.0%), 420 (91.7%), 384 (83.8%), 163 (35.6%), 105 (22.9%), and 436 (95.2%). The average number of support vectors is 286.1 (62.5%). Fig. 5 shows root mean square (RMS) for the test data of each sensor representing PCSVR model accuracy. The relative RMS errors for all sensors are compared with their rated values: 100%, 100%, 2.0Mkg/hr, 100%, 100%, 100, 2.0Mkg/hr, 1000Mw, 30m3/hr, 100m3/hr, 330oC. For the feedwater flow rate the relative RMS errors compared with the rated value (2.0Mkg/hr) are 0.3856%, 0.4791%, and 0.5621% for the training data, the optimization data, and the test data, respectively. In order to verify the failure detection algorithm, we artificially degraded the SG main feed water flow rate signal in test data KR1. The degraded signal is shown in
GLRT Based Fault Detection in Sensor Drift Monitoring System
607
Fig. 7 (a) which linearly increases at a rate of 3.14% per day from the first observation, i.e. 5% positive drift at the end of the observation. The estimated feedwater flow rate is almost the same as the actual feedwater flow rate although the measured feedwater flow rate is degraded (see Fig. 7(a)). Residuals which were used for GLRT are shown in Fig. 6(b) and Fig. 7(b) for the normal and degraded sensor, respectively. The mean and standard deviation of H 0 residual are -0.0011 and 0.0108, respectively. (a) Signal of Sensor 7
(a) Signal of Sensor 7 2.5 FW Flowrate(Mkg/Hr)
FW Flowrate(Mkg/Hr)
2.5 Esitimated Measured
2 1.5 1 0.5 0
0
50
100
150 200 250 300 350 Observation Number (b) Prediction Error Using PCSVR
400
1.5 1 0.5 0
450
0.2
0.1
0.1
0
50
100
0
50
100
150 200 250 300 350 Observation Number (b) Prediction Error Using PCSVR
400
450
400
450
Error
0.2
Error
Drifted Esitimated Actual
2
0
0
-0.1
-0.1
0
50
100
150
200 250 300 Observation Number
350
400
450
150
200 250 300 Observation Number
350
Fig. 5. Model output for normal state (a) Sensor Fig. 6. Model output for drift state (a) Sensignal (b) residual sor signal (b) residual
Gt for PCSVR, Dtift=1, σ state=0, S.P=25
Gt for PCSVR, Dtift=0, σ state=0 4000
30 UCL GT(50)
UCL GT(50)
3500
25 3000
20
2500 2000
15
1500
10 1000
5
0
500 0
0
50
100
150
200
250
300
350
400
450
500
0
50
150
200
250
300
350
400
450
500
(b) Gt for drift, σ known
(a) Gt for normal, σ known Gt for PCSVR, Dtift=0, σ state=1, S.P=15
Gt for PCSVR, Dtift=1, σ state=1, S.P=25
30
300 UCL GT(50)
UCL GT(50)
25
250
20
200
15
150
10
100
5
0
100
50
0
50
100
150
200
250
300
350
400
450
(c) Gt for normal, σ unknown
500
0
0
50
100
150
200
250
300
350
400
450
(d) Gt for drift, σ unknown
Fig. 7. Gt statistics for four cases
500
608
I.-Y. Seo et al. Hyp othe sis T e st for P CS V R, Dtift=1, σ sta te = 0, S .P = 25
Hypo th e sis Te st for P C S V R, Dtift= 1, σ sta te = 1, S .P = 15
1
0.8
Fault Hypothesis (true/false)
Fault Hypothesis (true/false)
1
P t.= 61 0.6
0.4
0.2
0
-0.2
0.8 P t= 53 0.6
0.4
0.2
0
0
50
100
150
200 250 300 Obs ervation Num ber
350
400
(a) Case 1 ( σ known)
450
-0.2
0
50
100
150
200 250 300 O bs ervation Num ber
350
400
450
(b) Case 2 ( σ unknown)
Fig. 8. Hypothesis test results
Fig. 8 shows Gt statistics for four cases. While Gt statistics for the normal sensor are shown in the left column, the right column is for the drift case. The upper row is Gt statistics for the σ known case and the lower row is for the σ unknown case. From the figure we can notice that Gt statistics for σ known cases are bigger than those for σ unknown cases. Especially when the drift occurs the Gt statistic for variance known case increases exponentially while for variance unknown case it increases linearly. In this example we choose 25 and 15 as the upper control limits (UCL) for σ known and unknown case, respectively. If Gt is greater than the UCL we decide the sensor is degraded. Fig. 9 shows hypothesis test results which are decided by the UCLs. The drift was detected at 61 and 53 timesteps which correspond to 0.67% and 0.59% in drift amount for σ known (case 1) and unknown case (case 2), respectively. The fault detection point for case 1 occurs 8 timesteps, i.e, 40 minutes in time, earlier than case 2 does. We can conclude that GLRT can be used for the detection of sensor degradation in NPP. Moreover, we can detect the sensor drift earlier by assuming the variance of residual is unknown than by assuming it is known.
4 Conclusions In this work, a PCSVR which utilizes PCA for extracting predominant feature vectors and Auto-Associative SVR for databased statistical learning, and the GLRT algorithm were used for the signal validation and calibration monitoring of NPP. The proposed PCSVR model was applied to the data of Kori Nuclear Power Plant Unit 3. The input signals into the AASVR are preprocessed by the principal component analysis. The first seven feature components are used as its input signals. The averaged relative RMS error for all predicted test output signals is about 0.3%. This error is so small enough. The model residual was tested for several cases using GLRT method. We can conclude that GLRT can be used for the detection of sensor degradation in NPP. Moreover, we can detect the sensor drift earlier by assuming the variance of residual is unknown than by assuming it is known. Though we need Monte Carlo simulation to get the better in-control ARL and UCL it is impossible in real NPP. For the future work we need to test using several sets of real plant startup data instead of Monte Carlo simulation.
GLRT Based Fault Detection in Sensor Drift Monitoring System
609
References 1. Upadhyaya, B.R., Eryurek, E.: Application of Neural Networks for Sensor Validation and Plant Monitoring. Nuclear Technology 97, 170–176 (1992) 2. Mott, Y., King, R.W.: Pattern Recognition Software for Plant Surveillance. U.S. DOE Report 3. Fantoni, P., Figedy, S., Racz, A.: A Neuro-Fuzzy Model Applied to Full Range Signal Validation of PWR Nuclear Power Plant Data. In: FLINS-1998, Antwerpen, Belgium (1998) 4. Cortes, C., Vapnik, V.: Support Vector Networks. Machine Learning 20, 273–297 (1995) 5. Zavaljevski, N., Gross, K.C.: Support Vector Machines for Nuclear Reactor State Estimation. In: ANS International Topical Meeting 7, Pittsburgh, USA (2000) 6. Gribok, A.V., Hines, J.W., Uhrig, R.E.: Use of Kernel Based Techniques For Sensor Validation In Nuclear Power Plants. In: NPIC&HMIT, Washington, DC (November 2000) 7. Seo, I.Y., Kim, S.J.: An On-line Monitoring Technique Using Support Vector Regression and Principal Component Analysis. In: CIMCA 2008, Vienna, Austria, pp. 10–12 (2008) 8. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995) 9. Chien, T.T., Adams, M.B.: A Sequential Failure Detection Technique and Its Application. IEEE Trans. Autom. Contr. 21, 750–757 (1976) 10. Steven, M.K.: Fundamentals of Statistical Signal Processing. Prentice Hall, Englewood Cliffs (1993)
A Support System for Making Archive of Bi-directional Remote Lecture – Photometric Calibration – Naoyuki Tsuruta, Mari Matsumura, and Sakashi Maeda Department of Electronics Engineering and Computer Science, Fukuoka University 8-19-1 Nanakuma Jonan-ku, Fukuoka, 814-0180, Japan
[email protected], {matumura,maeda}@mdmail.tl.fukuoka-u.ac.jp http://www.tl.fukuoka-u.ac.jp/~ tsuruta/
Abstract. We are developing a system that supports making lecture movie archive. This system enables us to combine CG or another movie with a lecture scene movie by using intelligent computer vision techniques, and supports us to generate an effective lecture archive. In this paper, we concentrate on a scenario to generate a movie that seems to be like a lecture done in one lecture room based on two movies of lectures done in different remote lecture rooms. Because the source movies are captured on the different camera work and the different illumination condition, not only geometric calibration but also photometric calibration are important to make the target movie realistic. To overcome this problem, we propose a photometric calibration technique based on “fast separation of direct and global components” method. By using this method, estimation of color of illumination on the scene becomes more stable than our conventional method. Keywords: Mixed Reality, Computer Vision, Computer Graphics, Lecture Archive.
1
Introduction
Realistic content including real image and computer graphics is well known as mixed reality. Fig.1 shows an example of mixed reality that used in a certain TV news program. The display panel synthesized by CG is shown in geometrically correct position even when camera moved. The reason why mixed reality is frequently used in TV program is that mixed reality is very effective for giving people information. This effectiveness must be also applicable to teaching aid such as lecture movie archive. On the other hand, requirement of providing lecture movie archive becomes very high to enable students study whenever and wherever they want. Fig.2 shows an example of automatic lecture archive generator[1]. By using intelligent computer vision technology, this system automatically generates a summary D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 610–617, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Support System for Making Archive of Bi-directional Remote Lecture
611
Fig. 1. An example of mixed reality used in a TV news program
Fig. 2. An example of automatic lecture archive generator
movie from original lecture video captured by a HDTV camera. Additionally, there are several researches that study lecture support system by using intelligent computer vision technology[4][5]. We are developing a system that supports making lecture archive. This system enables us to combine CG or another movie with a lecture scene movie by using intelligent computer vision techniques, and supports generating an effective lecture archive. In this paper, we concentrate on a scenario to generate a movie that seems to be like a lecture done in one lecture room based on two lecture movies done in different remote lecture rooms. Because the source movies are captured on the different illumination condition and different camera work, not only geometric calibration but also photometric calibration are important to make the target movie realistic. To overcome this problem, we propose a photometric calibration technique based on “fast separation of direct and global components” method. By using this method, estimation of color of illumination on the scene becomes more stable than our conventional method. In section 2, we describe a brief of overview of archive generator. After that, we focus on the photometric calibration, in section 3, and discuss about our conventional method and its problem. Then we clarify the necessarily of “separation of direct and global components” technique, and propose a new method. In section 4, we show experiment results that compare the conventional method and proposed method.
612
N. Tsuruta, M. Matsumura, and S. Maeda
Fig. 3. A scenario of bi-directional remote lecture and how to make archive of it
2 2.1
Overview of Archive Generator Scenario
As an example of bi-directional remote lecture, let us consider a scenario described in Fig. 3. In this scenario, two lecturers discuss staying in their each lecture room and pointing out their own screen on which the same content is projected. Two movies are captured for making an archive. If two archives are generated from each movie respectively, resolutions of them might be low and it is difficult for students to watch them spontaneously. Then, we consider combining those into a movie that seems to be like a lecture done in one lecture room. To do this, we need two techniques, that is, geometric calibration and photometric calibration. The geometric calibration is applied to one movie so that it can be overlapped onto a geometrically correct region in another movie. The photometric calibration is applied to equalize color of illumination of two scenes. 2.2
Giometric Calibration
Projection transformation is applied to a source image. Projection parameters are estimated from more than four pairs of corresponding points, which are selected from source and destination images, respectively. After this transformation, background subtraction is applied to the source image, and image regions of the lecturer are detected. Then finally, the source image is overlaid onto the destination image. If camera that was used to capture the source image moved during lecture, the source image is also geometrically transformed by using saved pan and tilt camera parameters. This transformation is done based on panorama generation technique for fixed focus camera[2].
A Support System for Making Archive of Bi-directional Remote Lecture
3 3.1
613
Photometric Calibration A Conventional Method
For the photometric calibration, we must know the color of global illumination lighting the target scene. A total radiance L measured at a camera pixel is given by the equation: L = N I, (1) where N and I could be an object color and an illumination color measured at a camera pixel, respectively. If we know the object color N at each pixel, we can calculate I from the input image L. But, generally, N is unknown. So, we use the known controllable light P . If we use a digital projector as P , we can estimate its light color in advance. When we add the light P , input image N is given by the equation: L = N (I + P ). (2) From subtraction equation (1) from equation (2), N is given by following equations: (L − L) = N (I + P ) − N I = N P, (3) N = (L − L)/P.
(4)
Once N is estimated, L is given by the equation (1). Then, inferred Image L2 captured under vartial illumination I2 can be calculated from N and L2 by the equation: L2 = N I2 . (5) However, the projector ray might actually reflect to the wall and be indirectly irradiated on the area for taking the movie, when the area is on the corner of the room as shown in Fig. 4. In this case, illumination color irradiated by the
Fig. 4. A problem on estimation of global illumination based on auxiliary light. (a):A condition that our conventional method supposed. (b):Irregular conditions frequently occurred.
614
N. Tsuruta, M. Matsumura, and S. Maeda
projector light on the area is not known. Illumination is described by (Pd + Pg ), where Pd and Pd are direct component and indirect (global) component of the projector light, respectively. Then, it is necessary to separate the illumination into an a direct component and global component. 3.2
Separation of Direct Compornent and Global Compornent
This method[3] restricts us to the use of a single camera and a digital projector as a single light source. A total radiance measured L at a camera pixel is the sum of the direct component and global component: L = Ld + Lg ,
(6)
where Ld and Lg are direct component and global component, respectively. Each point of the surface lit by the source scatters rays in the direction of the camera. Then the radiance of the point measured by the camera is referred to as the direct component Ld . The remaining radiance measured by the camera pixel is caused by inter-reflections and is referred to as the global component Lg . Let us divide the surface into a total of N patches that corresponds to a single pixel of the source. We denote the radiance of the patch i measured by the camera c as L[c; i], and its two components as Ld [c; i] and Lg [c; i], so that L[c; i] = Ld [c; i]+Lg [c; i]. The global component of i due to inter-reflections from all other surface patches can be written as: Lg [c; i] = A[i; j]L[i; j], (7) j∈P
where P = {j|1 ≤ j ≤ N, j = i} is the radiance of patch j in the direction of patch i and A[i; j] incorporates the BRDF of i as well as the relative geometric configuration of the two patches. Now let us assume that a half of the projector’s pixels are activated and that these activated pixels are well-distributed over the entire scene to produce a high frequency illumination pattern, such as a checker flag pattern shown in Fig. 5.
Fig. 5. An example of checker flag pattern used in the “Separation of Direct Compornent and Global Compornent” method
A Support System for Making Archive of Bi-directional Remote Lecture
615
Then, consider two captured images of the scene, where, in the first image L+ the scene is lit with high frequency illumination that has fraction 1/2 activated projector pixels and in the second image L− it is lit with the complementary illumination that has fraction (1 − 1/2) activated projector pixels. If the patch i is lit directly by the projector in the first image then it is not lit by the projector in the second image, and we get: 1 L+ [c; i] = Ld[c; i] + Lg[c; i]. 2
(8)
L− [c; i] = (1 − 1/2)Lg[c; i].
(9)
Therefore, if we subtract L− from L+ , then we have Ld . Furthermore, by subtracting Ld from L+ , we have 12 Lg . 3.3
Proposed Method
By replacing L in equation (1) and L in equation (2) by L− and L+ , respectively, we can get correct object color N : 1 L− = N (I + Pg ), 2 1 L+ = N (I + Pd + Pg ), 2 1 1 (L+ − L− ) = N (I + Pd + Pg ) − N (I + Pg ) = N Pd , 2 2 + − N = (L − L )/Pd .
(10) (11) (12) (13)
As mentioned above, Pd is observable if we use a digital projector as a auxiliary light.
4
Experimental Results
In this section, we describe two experimental results. In the first experiment, we estimated accuracy of illumination color estimation. Target area is a corner
Fig. 6. Environments for Experiment
616
N. Tsuruta, M. Matsumura, and S. Maeda
of our lecture room. We estimated illumination color respectively twice for both conventional method and proposed method. The first estimation was done under the regular condition shown in the left image of Fig. 6. The second estimation was done under the same target area, the same illumination condition and the same camera position with the first one, except for wall color. The second environment is shown in the right image of Fig. 6 Therefore, the illumination color measured in both conditions must be same. Table 1 shows RGB component and brightness of measured illumination color in both environments. The leftist column shows the difference between the estimations. Obviously, the difference for the proposed method is smaller than the conventional method. In the second experiment, we synthesized a bi-directional remote lecture movie. For simplification of scenario, we used a scene, in which two lecturers played OX game. Fig. 7 shows views of different lecture room. The brightness, color of illumination and geometrical aspect were different. Fig. 8 shows results of synthesize by both conventional photmetric calibration method without geometric calibration (left side) and proposed method (right side). In the left image, the region of right lecturer is too bright and pointing position is also wrong. On the other hand, the right image is subjectively better than the left one. Table 1. Comparison Accuracy of Proposed Method with Comventional Method. (Conv. Method:Conventional Method, Prop. Method: Proposed Method, Env.1: Environment 1, Env.2: Environment 2, Diff=—Env.1-Env.2—, V: Brightness). Method
Color Estimation Env.1 Env.2 Conv. Method R 0.58 0.63 G 0.55 0.51 B 0.59 0.53 V 1.74 3.58 Prop. Method R 0.54 0.51 G 0.60 0.60 B 0.58 0.59 V 1.22 1.12
Diff 0.05 0.04 0.06 1.84 0.03 0.00 0.01 0.10
Fig. 7. Environments for Experiment
A Support System for Making Archive of Bi-directional Remote Lecture
617
Fig. 8. Synthesized images. The left image is synthesized by conventional photometric calibration without geometric calibration. The right image is synthesized by proposed method.
5
Conclusion
In this paper, we described a scenario to generate a lecture movie that seems to be like a lecture done in one room based on two lecture movies done in remote different lecture rooms. In this scenario, the original movies are captured on the different illumination condition and different camera work. To make the target movie realistic, we proposed a photometric calibration technique based on “fast separation of direct and global components” method. In the experiment result, we showed that estimation of color of illumination on the scene became more stable than our conventional method.
References 1. http://www.i-collabo.jp/autorec/ 2. Wada, T.: Fixed Viewpoint Pan-Tilt-Zoom Camera and Its Applications. The Transactions of the Institute of Electronics, Information and Communication Engineers J81DII(6), 1182–1193 (1998) 3. Nayar, S.K., Krishnan, G., Grossberg, M.D., Raskar, R.: Fast Separation of Direct and Global Components of a Scene using High Frequency Illumination. In: Proceedings of ACM SIGGRAPH, pp. 935–944 (2006) 4. Marutani, T., Nishiguchi, S., Kakusho, K., Minoh, M.: Making a lecture content with deictic information about indicated objects in lecture materials. In: The 3rd AEARU Workshop on Network Education, pp. 70–75 (2005) 5. Yueh, H.P., Liu, Y.L., Lin, W.J., Shoji, T., Minoh, M.: Integrating face recognition techniques with blog as a distance education support system (DESS) in international distance learning. In: Sixth International Conference on Advanced Learning Technologies (IEEE ICALT2006), pp. 507–511 (2007)
Conflict-Free Incremental Learning Rong-Lei Sun The State Key Laboratory of Digital Manufacturing Equipment and Technology Huazhong University of Science and Technology, Wuhan 430074, China
[email protected] Abstract. Most machine learning algorithms are essentially generalized learning schemes. When used incrementally, conflict between the concepts already learned and the new events to be learned need be eliminated. This paper presents a new incremental learning scheme that divides induction into two phases: incrementally accurate learning and generalization. The new learning scheme is conflict-free thus it is suitable for incremental induction. Keywords: Incremental induction, Machine learning, Artificial intelligence, Boolean algebra.
1 Introduction Machine learning, being an efficient mean of acquiring knowledge such as if-then rules, attracts a lot of researchers. Theoretical studies regarding machine learning demonstrate that inductive learning is a NP-hard problem; that is, the computing complexity is increased exponentially with the complexity of the learning problem. Consider the problem on the contrary one can conducts that the computing complexity will decrease exponentially subject to the drop of complexity of the learning problem. Therefore it is easy to deal with a machine learning problem if the problem complexity is limited. Incremental learning strategy provides one of potential solutions to solve complicated real-world learning problems due to limit complexity of each learning epoch [1]. For AQ-family incremental learning [2], the new observed events used in the following learning epochs may conflict to the concept already learned. Therefore consistency maintenance becomes one of the central problems in the incremental learning fashion. The concept confliction problem results from the generalized learning scheme, i.e., generalizing while learning in each learning epoch. To deal with the problem Sanchez [4] presents an incremental learning algorithm for constructing Boolean functions from positive and negative examples. Sun [5] proposed a non-expandable DNF (disjunctive normal form) for accurate incremental learning. However the generalization problem is missing. This paper presents a generalization principle and a generalization algorithm, which making the non-expandable DNF be a real conflict-free incremental learning model. The rest of the paper is organized as follows. First in the next section, the methodology and basic concepts adopted in the article are introduced through a simple learning example. Incremental learning algorithm and generalization algorithm are introduced in D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 618–626, 2009. © Springer-Verlag Berlin Heidelberg 2009
Conflict-Free Incremental Learning
619
section 3 and 4, respectively. A simple but illustrative example is used to demonstrate the conflict-free incrementally inductive learning algorithm in section 5. Finally in the last section, we conclude the whole paper.
2 Basic Concepts and Methodology Generalized learning is such a scheme that learning and generalization are tightly coupled. That is, generalization is performing when learning is conducting. Because generalization introduces learning errors, concept consistency becomes a core problem when used incrementally. To address the problem, we divide the learning procedure into two separated phases: accurate learning and generalization. The aim of accurate learning is to record the observed events exactly and simplify the learning results. This procedure emphasizes the logical equivalency between the learning results and the observed events. The results obtained are called accurate concepts. After the events available at the current learning epoch have been learned, the generalization procedure starts and the generalized concepts are obtained. The learning system keeps both the accurate concepts and the generalized concepts. When new observed events are available, new learning epoch starts with the accurate concepts and learns the new events in the incremental fashion. The updated accurate concepts are then to be generalized. Because of the logical equivalency between the accurate concepts and the observed events, the proposed incremental learning model can certainly maintain the consistency over all observed events. It is a conflict-free learning algorithm for incremental induction. Consider a concept that can be described using attribute-valued pairs. Further more, as most attributes are discrete or with limited accuracy, we can use Boolean set containing two different elements to represent each of them. For example, suppose that an attribute F has 4 possible values, it can be characterized by two Boolean variables a1 and a2, each of which belongs to the Boolean set. When performing such a transformation, a concept can be represented as a DNF over the simplest Boolean set. Consequently the above accurate learning and generalization are transformed to simplification and generalization of Boolean expressions, respectively. For simplification of complex Boolean expressions, it is ordinary to adopt QuineMcCluskey procedure [3]. This method, however, can not be used incrementally. In this paper we deal with the problem as following: each time only one observed event is accessed. When the event is added to the learning system, the Boolean expression is simplified immediately. Before introducing the accurate learning algorithm and generalization algorithm, a simple example is used to demonstrate the principle of the proposed learning algorithm. Example 1. Suppose that a concept can be expressed by four binary attributes. Let e1 = 0101, e2 = 0000, e3 = 0111, e4 = 1111, e5 = 1101 be observed events that are represented by Boolean products or vectors. The aim of the accurate learning is to simplify the Boolean expression (e1+ e2+ e3+ e4+ e5), where + represents Boolean sum. Suppose that the initial concept R is an empty set. The incremental learning procedure is illustrated as follows.
620
R.-L. Sun
(1) Learn e1 thus R = e1 = 0101. Denote 0101 by r1 so we have R = r1. (2) Learn e2 thus R = r1 + e2 = 0101 + 0000. Denote 0000 by r2 so we have R = r1 + r2. (3) Learn e3 thus R = r1 + r2 + e3 = 0101 + 0000 + 0111 = 01∗1 + 0000. Denote 01∗1 by r3 so we have R = r3 + r2. (4) Learn e4 thus R = r3 + r2 + e4 = 01∗1 + 0000 + 1111 = 01∗1 + 0000 + ∗111. Denote ∗111 by r4 so we have R = r3 + r2 + r4. (5) Learn e5 thus R = r3 + r2 + r4 + e5 = 01∗1 + 0000 + ∗111 + 1101 = ∗1∗1 + 0000. Denote ∗1∗1 by r5 so we have R = r5 + r2. In the above example ∗ represents the corresponding variable does not occur in a Boolean product. The number of ∗ occurring in a vector r is defined as degree of the vector, denoted by degree(r). Each time when learning a new event e, it generates at least one new Boolean product, such as r1 ~ r5 in the example. Each new product must include the new observed event being learned. It should also include events already learned as many as possible so as to obtain a simpler concept expression. According to its relations to the original products of R, a new product can be classified into two categories. (1) If a new product includes original product(s) of R, it is called a covering product. For example, after R learns e3, the new product r3 includes original product r1 of R; after R learns e5, the new product r5 includes the original product r3 and r4 of R. (2) If a new product does not include any product of R, it is called a cross product. For example, after R learns e4, the new product r4 includes neither r2 nor r3 of R. Definition. Let a, b ∈ {0, 1, ∗}. Define generally equality (denoted by a ⇔ b) as follows. 0⇔0, 1⇔1, ∗⇔∗, 0⇔∗, 1⇔∗ Note that general equality is not transitivity. Define the general distance (or distance for short) between a and b as:
⎧ 0, a ⇔ b dist (a, b) = dist (b, a) = ⎨ ⎩ 1, otherwise Let r1 = a1a2…an and r2 = b1b2…bn be two vectors. The general distance between r1 and r2 is defined as:
dist( r1 , r2 ) = dist( r2 , r1 ) =
n
∑ dist(a , b ) . i
i
i =1
The non-expandable DNF, maximally cross vector, intsect(v, r), N(e|R), S(e|R), and other relative concepts for the incremental learning can be found in [5].
Conflict-Free Incremental Learning
621
3 Accurate Learning Algorithm In the incremental learning, new products belong to only one of the two categories: covering product or maximally cross vector. To obtain a more concise concept expression it is the first choice to produce the covering product whenever it is possible. In this case the new product not only learns (or includes) the event but also absorbs at least one original product of R, which makes R more concise. If no covering product is produced it needs to generate a maximally cross product so as to make the new product simplest. Therefore the core problems for the incremental learning is to generate the covering products or the maximally cross product while maintaining the concept expressed by DNF to be non-expandability. Consider the following accurate learning problem. Given: (1) Initial concept R (expressed with non-expandable DNF) about a concept C. (2) Observed event set E about concept C and E ∩ R = ∅. Find: A concept expressed with non-expandable DNF, which is logically equivalent to (R + E). We will adopt the following strategy to solve the learning problem. The learning procedure is divided into a series of learning epochs. Each epoch deals with only one observed event. When learning an event e, the non-expandability of (R+e) must be kept. Let R = r1 + r2 + … + rm be the initial concept expressed with non-expandable DNF. Without loss of generality, suppose that ∀ 1 ≤ i < j ≤ m, degree(ri) ≥ degree(rj). For a given event e ∈ E, let k = | S(e|R)|. The accurate algorithm of incremental learning is described as following [5]. for (select an event e ∈ E){ //each learning epoch deals with only one event for (i = 1 to m){ //test for each vector of R if it can produce a covering vector if (dist(e, ri) >1) {continue;} //can not produce a covering vector including ri if (degree(ri) ≥ k) {continue;} //can not produce a covering vector including e construct a vector v including both e and ri according to the theorem 3 [5]. if (v is a covering vector){ add v to R. delete from R all the vectors included in v. delete e from E. modify m accordingly. } else {discard v} }
622
R.-L. Sun
if (no covering vector is obtained){ //produce the maximally cross vector for (i = k to 1, k = k-1){ construct all vectors with degree i according to the theorem 5 [5]. Denote each of them by v. for (each v){ if (v is a cross vector){ add v to R. delete e from E. m = m+1. break; } } if(cross vector is obtained) {break;} } } re-order the vectors of R according to their degrees. } It can be shown that, each time after learning an event according to the above learning algorithm, the R obtained is still a non-expandable DNF.
4 Generalization Algorithm The aim of inductive learning is to abstract general formulas from the observed events so as to obtain the essential concepts, which are more universal and cover more events. Therefore the generalization is a crucial aspect of inductive learning. Now let us deal with the generalization aspect based on the accurate concepts obtained by the learning algorithm described in the last section. Let R+ = r1+ + r2+ + … + rm+ represents accurate concept obtained from positive events; aik+ represents kth component of ith vector of R+. Let R- = r1- + r2- + … + rlrepresents accurate concept obtained from negative events; aik- represents kth component of ith vector of R-. Generalization algorithm is described as following. (1) If ∃k∈{1, 2, …, n}, such that: ∀i, j ∈ {1, 2, …, l}, aik- ⇔ ajk-. And if ∃h ∈ {1, 2, …, m}, such that ∀i ∈ {1, 2, …, l}, ahk+ ⇔ aik-. + Then perform generalization operation: ahk = ∗. + Suppose that the generalized vector is denoted by rh . + + Delete from R those vectors being included in rh . (2) If ∃k∈{1, 2, …, n}, such that: ∀i, j ∈ {1, 2, …, m}, aik+ ⇔ ajk+. And if ∃h ∈ {1, 2, …, l}, such that: ∀i ∈ {1, 2, …, m}, ahk- ⇔ aik+. Then perform generalization operation: ahk = ∗.
Conflict-Free Incremental Learning
623 -
Suppose that the generalized vector is denoted by rh . Delete from R those vectors being included in rh . +
-
(3) Repeat step(1) and step(2) until R , R can not be generalized any more. It is clear that the generalized R+ and R- are not intersectant. That is, R+∩ R- = ∅. The concepts learned are completeness and consistency over the observed event set E.
5 Numerical Example An illustrative example is used to demonstrate the incrementally inductive learning algorithm based on the non-expandable DNF. Example 2. Suppose that people can be characterized using three kinds of features, each of which has discrete attribute values. Height ∈ {Short, Tall}; Hair ∈ {Blond, Black, Red}; Eyes ∈ {Blue, Dark, Gray}. There are nine persons who are divided into two groups according to their features (see Table 1-a). Find concept descriptions using the incremental learning algorithm. Table 1. Positive and negative observed events with different features
(a) Original observed events No. 1 2 3 4 5 6 7 8 9
Height Short Tall Tall Short Tall Short Tall Tall Short
Hair Blond Red Blond Blond Blond Black Black Black Blond
Eyes Blue Blue Blue Gray Dark Blue Blue Gray Dark
(b) Coded events Class +
-
a1 0 1 1 0 1 0 1 1 0
a2 a3 00 10 00 00 00 01 01 01 00
a4 a5 00 00 00 10 01 00 00 10 01
First we formulate the problem using the following Boolean variables. Let a1 represents Height: a1 = 0 means Height = Short; a1 = 1 means Height = Tall. Let a2a3 represent Hair: a2a3 = 00 means Hair = Blond; a2a3 = 01 means Hair = Black; a2a3 = 10 means Hair = Red. Let a4a5 represent Eyes: a4a5 = 00 means Eyes = Blue; a4a5 = 01 means Eyes = Dark; a4a5 = 10 means Eyes = Gray. The coded events are listed in Table 1-b.
624
R.-L. Sun
Learning Let R be an empty set. First consider the positive events listed in Table 1. (1) Learn the event e = 00000. 00000 → R. R = 00000 = r1. (2) Learn the event e = 11000. Q dist(e, r1) = dist(11000, 00000) = 2. ∴Covering vector v including r1 can not be produced after learning e = 11000. 11000 → R. R = 00000+11000 = r1 + r2. (3) Learn the event e = 10000. Q dist(e, r1) = dist(10000, 00000) = 1, dist(e, r2) = dist(10000, 11000) = 1. ∴Construct vectors: v = ∗0000, v = 1∗000. Let us consider v = ∗0000 first. v1 = intsect(v, r1) = intsect(∗0000, 00000) = 00000. |v| - |e| - |v1| = |∗0000| - |10000| - |00000| = 2 – 1 –1 = 0. ∴v = ∗0000 is a covering vector. ∗0000 → R. Q r1 = 00000 ⊂ ∗0000. Delete from R the vector included in ∗0000 and rename the vectors of R, we have: R = ∗0000+11000 = r1 + r2. Similarly, we can verify that v = 1∗000 is also a covering vector. 1∗000 → R. Q r2 = 11000 ⊂ 1∗000. Deleting from R the vector included in 1∗000 and rename the vectors of R, we have: R = ∗0000 + 1∗000 = r1 + r2. (4) Learn the event e = 00010. Qdist(e, r1) = dist(00010, ∗0000) = 1, dist(e, r2) = dist(00010, 1∗000) = 2. ∴Construct a vector: v = ∗00∗0. Let v1 = intsect(v, r1) = intsect(∗00∗0, ∗0000) = ∗0000; v2 = intsect(v, r2) = intsect(∗00∗0, 1∗000) = 10000; v12 = intsect(v1, v2) = intsect(∗0000, 10000) = 10000. |v|- |e| - |v1| - |v2| + |v12| = |∗00∗0| - |00010| - |∗0000| - |10000| + |10000| = 4 – 1 –2 –1 +1 = 1 ≠ 0. ∴v = ∗00∗0 is not a covering vector and should be discarded. Now let us find the maximally cross vector. QN(e|R) = N(00010|R) = {00000}. ∴|S(e|R)| = |S(00010|R)| = |{00000}| = 1.
Conflict-Free Incremental Learning
625
So the maximal degree of the cross vector equals to one. The vector including observed event e = 00010 and its neighboring event (00000) is with the form: v = 000∗0. In a manner similar to that used in (3), we can verify that v = 000∗0 is a cross vector. 000∗0 → R. ∴R = ∗0000 + 1∗000 + 000∗0. Denote R by R+ = r1+ + r2+ + r3+ = ∗0000 + 1∗000 + 000∗0. Likewise, the negative events listed in Table 1 can also be learned, and we have: R- = r1- + r2- + r3- = ∗0001 + ∗0100 + 101∗0. Generalization Let aik+ represents kth component of ith vector of R+; aik- represents kth component of ith vector of R-. Generalization for the above accurate concepts is as following.
Qa11- ⇔ 1, a21- ⇔ 1, a31- ⇔ 1 and a21+ ⇔ 1.
∴Generalize r2+ as: r2+ = ∗∗000.
Deleting from R+ all the vectors included in (∗∗000) and rename the vectors, we have: R+ = r1+ + r2+ = ∗∗000 + 000∗0.
Qa12- = a22- = a32- = 0 and a22+ = 0.
∴Generalize r2+ as: r2+ = 0∗0∗0.
Deleting from R+ all the vectors included in (0∗0∗0) and rename the vectors, we have: R+ = r1+ + r2+ = ∗∗000 + 0∗0∗0.
Qa14- ⇔ 0, a24- ⇔ 0, a34- ⇔ 0 and a14+ ⇔ 0.
∴Generalize r1+ as: r1+ = ∗∗0∗0.
Deleting from R+ all the vectors included in (∗∗0∗0) and rename the vectors, we have: R+ = r1+ = ∗∗0∗0. The negative concept R- can be generalized in the same way as which is used for positive concept. The generalized negative concept is: R- = r1- + r2- = ∗∗∗∗1 + ∗∗1∗∗. Expressing the above concepts using attribute-valued pairs, we have: R+ = (Hair = Blond or Red) and (Eyes = Blue or Gray). R- = (Hair = Black) or (Eyes = Dark).
6 Conclusions For most learning algorithms, learning and generalization are tightly coupled. The learning procedure is also a generalization procedure. Such a learning scheme is
626
R.-L. Sun
called generalized learning in this paper. The generalized learning results in concept inconsistency when used in incremental induction. To deal with the problem this paper divides the learning procedure into two separate phases: accurate incremental learning and generalization. The accurate learning procedure is to record the observed events exactly and simplify the learning results in incremental fashion. This phase emphasizes the logical equivalency between learning results and observed events. When learning is completed, generalization starts. Because of the logical equivalency between accurate concept and observed events, the proposed incremental learning method can maintain the consistency over all observed events. It is a conflict-free learning algorithm and thus it is more suitable for incremental induction.
Acknowledgement The research work is supported by the National Nature Science Foundation of China (Grant No. 50575079, 50875100) and the National Key Technology R&D Program of China (Grant No. 2008BAI50B04).
References 1. Maloof, M.A., Michalski, R.S.: Incremental learning with partial instance memory. Artificial Intelligence 154, 95–126 (2004) 2. Kaufman, K.A., Michalski, R.S.: An adjustable description quality measure for pattern discovery using the AQ methodology. J. of Intelligent Information Systems 14, 199–216 (2000) 3. Rosen, K.H.: Discrete mathematics and its applications. McGraw-Hill, Inc., New York (1988) 4. Sanchez, S.N., Triantaphyllou, E., Chen, J., Liao, T.W.: An incremental learning algorithm for constructing Boolean functions from positive and negative examples. Computers and Operations Research 29, 1677–1700 (2002) 5. Sun, R.-L.: Study on the non-expandability of DNF and its application to incremental induction. In: Xiong, C.-H., Liu, H., Huang, Y., Xiong, Y.L. (eds.) ICIRA 2008. LNCS (LNAI), vol. 5314, pp. 699–706. Springer, Heidelberg (2008)
Supervised Isomap for Plant Leaf Image Classification Minggang Du1, Shanwen Zhang2, and Hong Wang1 1
2
Shanxi Normal University, Linfen 041004, China Faculty of Science, Zhongyuan University of Technology, Zhengzhou, 450007, China
[email protected] Abstract. Plant classification is very important and necessary with respect to agricultural informization, ecological protection and plant automatic classification system. Compared with other methods, such as cell and molecule biology methods, classification based on leaf image is the first choice for plant classification. Plant recognition and classification is a complex and difficult problem, and is very important in Computer-Aided Plant Species Identification technology. The feature extraction is a key step to plant classification. This paper presents a method to extract discriminant features for plant leaf images by using supervised Isomap. Experiments on the leaf image dataset have been performed. Experimental results show that the supervised Isomap is very effective and feasible. Keywords: Plant classification, Isomap, Supervised Isomap, Plant leaf image,
K-nearest neighbor.
1 Introduction As the development of digital image processing and pattern recognition, the plant species identification based on computer is likely to be accessed. Since the characteristics can be obtained in the forms of numerical images, the efficiency of plant species classification can be improved with image processing and pattern recognition. Computer-Aided Plant Species Identification technology tries to recognize the known plant species by the salient features of the leaf. The focus of system is to extract the stable features of the plants, which are discriminable from others, then to classify and recognize plant species. It acts significantly on plant digital museum system and systematic botany which is the groundwork for research and development of plant. Plants are basically classified according to shapes, colors and structures of their leaves and flowers. However, if we want to recognize the plant based on 2D images, it is difficult to analyze shapes and structures of flowers since they have complex 3D structures. On the other hand, the colors of leaves are always green; moreover, shades and the variety of changes in atmosphere and season cause the color feature having low reliability. The plant leaf shape feature is one of the most important features for characterizing a plant, which is commonly used in plant recognition, matching and registration. In addition, the plant leaf recognition is also an important part of machine intelligence that is useful for both decision-making and data processing. D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 627–634, 2009. © Springer-Verlag Berlin Heidelberg 2009
628
M. Du, S. Zhang, and H. Wang
More importantly, the plant recognition based on plant leaf shape feature is also a central problem in those fields such as pattern recognition, image technology and computer vision, etc., which have received considerable attention of agricultural scientist in recent years. Plant recognition, leaf image preprocessing, computer vision, handwriting analysis, and medical diagnosis, etc., are some of the common application areas of plant leaf shape recognition. For plant leaf shape recognition, there have been a wide range of methods proposed [1,2] structural methods organizing local features into graphs, trees, or strings, such as Fuzzy methods, Statistical methods, Transform methods, Fourier transform [3] or Hough transforms, Neural networks methods [4]. But most of these approaches are confined to specific image types and require that all plant leaf shapes must be preprocessed before recognition. Therefore, we decided to recognize various plants by the grey-level leaf image of plant. The leaf of plant carry useful information for classification of various plants, for example, aspect ratio, shape and texture. Some recent work [5-8] has focused on leaf feature extraction for recognition of plant et al. [5] used a hierarchical polygon approximation representation of leaf shape to recognize the Acer family variety. Wang et al. [6] presented the combination of different shape-based feature sets such as centroidcontour distance curve, eccentricity, and angle code histogram and adopted fuzzy integral for leaf image retrieval. Moreover, Saitoh et al. [7] required two pictures, a frontal flower image and a leaf image to recognize the plant. According to features utilized in object recognition, past research can be broadly classified into two categories [9]: contour-based [10,11] and region-based approaches [12-14]. Most plant recognition methods [5-8] used contour-based features. The disadvantage of the contour-based features is that correct curvature points are hard to find. Thus, our method adopts region-based approach. Although there are plenty of region-based features [12-15], projection used in optical character recognition and object recognition are effective because its recognition performance is good and can be computed in real time [13,14]. This study tries to adopt projection as shape feature. Dimensionality reduction is an essential step involved in many research areas such as gene expression data analysis, face recognition and leaf images processing. It is a procedure of finding intrinsic low dimensional structures hidden in the high dimensional observations. Dimension reduction can be achieved by keeping only the most important dimensions, i.e, the ones that hold the most useful information for the task at hand, or by projecting the original data into a lower dimensional space that is most express for the task. For example, the goal of dimensionality reduction for visualization is to map a set of observations into a (two or three dimensiona1) space that preserves as much as possible. Three methods have been proposed to tackle the nonlinear dimensionality reduction problem, namely Laplacian Eigenmaps (LE), Isomap and Local Linear Embedding (LLE) [16-18], which are the extension of MDS with the assumption that only the distance of two local points can be approximated by Euelidean distance and the distance of two points far away from each other should be inferred from local distances. Among all those, the most widely used one is LE, which is also a spectral mapping method. Usually, LE aims to find a low dimensional representation that preserves the local properties of the data lying on a low dimensional manifold. However, LE is a nonlinear dimensionality reduction technique, whose
Supervised Isomap for Plant Leaf Image Classification
629
generalization ability is very weak. That is to say, the projection results in low dimensional space of the test image can not be easily obtained. Learning a kernel matrix for dimensionality reduction is another proposed method which is to learn a kernel matrix whose implicit mapping into feature space “unfolds” the manifold from which the data was sampled. Both of these methods attempt to preserve as well as possible the local neighborhood of each object while trying to obtain highly nonlinear embeddings. The central idea of Local Embeddings is using the locally linear fitting to solve the globally nonlinear problems, which is based on the assumption that data lying on a nonlinear manifold can be viewed as linear in local areas. Although, both Isomap and LLE have been used in visualization and classification, they are unsupervised and not so powerful when confronted with noisy data, which is often the case for real-world problems. In this paper, a supervised Isomap method based on the idea of Isomap is proposed to deal with such situation. Unlike the unsupervised learning scheme of Isomap, supervised Isomap follows the supervised learning scheme, i.e. it uses the class labels of the input data to guide the manifold learning. The rest of the paper is organized as follows. Section 2 introduces Isomap and supervised Isomap. Experimental results and analysis are illustrated in Section 3. A conclusion is given in Section 4.
2 Isomap and Supervised Isomap For data lying on a nonlinear manifold, the “true distance” between two data points is the geodesic distance on the manifold, i.e. the distance along the surface of the manifold, rather than the straight-line Euclidean distance. The main purpose of Isomap is to find the intrinsic geometry of the data, as captured in the geodesic manifold distances between all pairs of data points. The approximation of geodesic distance is divided into two cases. In case of neighboring points, Euclidean distance in the input space provides a good approximation to geodesic distance. In case of faraway points, geodesic distance can be approximated by adding up a sequence of “short hops” between neighboring points. Isomap shares some advantages with PCA, LDA and MDS, such as computational efficiency and asymptotic convergence guarantees, but with more flexibility to learn a broad class of nonlinear manifolds. The Isomap algorithm takes as input the distances d ( xi , x j ) between all pairs xi , x j from N data points in the
high-dimensional input space. The algorithm outputs coordinate vectors yi in a ddimensional Euclidean space that best represent the intrinsic geometry of the data. The detailed steps of Isomap are listed as follows:
Step 1. Construct neighborhood graph: Define the graph G over all data points by connecting points xi , x j if they are closer than a certain distance ε, or if xi is one of the K nearest neighbors of x . Set edge lengths equal to d ( x , x ) . j
i
j
Step 2. Compute shortest paths: Initialize d ( x , x ) = d ( x , x ) if x , x are linked G
i
j
i
j
i
j
by an edge; d G ( xi , x j ) = +∞ otherwise. Then for each value of k=1,2,…,N in turn, replace all entries d G ( xi , x j ) by min { d G ( xi , x j ) , d G ( x j , xk ) + d G ( xk , x j ) }. The matrix of
630
M. Du, S. Zhang, and H. Wang
final values DG ( xi , x j ) = {d G ( xi , x j )} will contain the shortest path distances between all pairs of points in G (This procedure is known as Floyd’s algorithm). Step 3. Construct d-dimensional embedding: Let λ p be the p-th eigenvalue (in decreasing order) of the matrix τ ( DG ) . The operator τ is defined by τ ( D ) = − HSH / 2 ,
where S is the matrix of squared distances {S = D } , and H is the “centering matrix” 2
, δ is the Kronecker delta function. [7]), and V be the i-th compoij
{H = δ − 1/ N } ij
ij
ij
i
ij
p
nent of the p-th eigenvector. Then set the p-th component of the d-dimensional coordinate vector y equal to λ V .This is actually a procedure of applying classical i
p
i
p
MDS to the matrix of graph distance D . G
There may be two reasons to extend the Isomap. First, the real-world data are often noisy, which can weaken the mapping procedure of Isomap. Second, the goal of the mapping in classification is different from that in visualization. In visualization, the goal is to faithfully preserve the intrinsic structure as well as possible, while in classification, the goal is to transform the original data into a feature space that can make classification easier, by stretching or constricting the original metric if necessary. Both reasons indicate that some modification should be made on Isomap for the tasks of classification. We presented a supervised Isomap. First a new distance metric is designed as follows:
(
)
D( xi , x j ) = xi − x j + λ ⋅ max xi − x j ⋅ (1 − S ( xi , x j )) i, j
(1)
where λ is adjustable parameter, 0 ≤ λ ≤ 1 , S ( X i , X j ) is expressed as follows
If both X i and X j are k nearest neighbors ⎧ 1 ⎪ each other and have the same label; ⎪ 2 ⎪ ⎪ X −Xj If both X i and X j are k nearest neighbors S ( X i , X j ) = ⎨exp(− i ) β each other and have different labels; ⎪ ⎪ ⎪ ⎪⎩ 0 otherwise Based on above distance metric, we proposed a supervised Isomap, the steps as follows: (1) Build a sparse graph with K-nearest neighbors, where above distance metric is adopted. (2) Infer other interpoint distances by finding shortest paths on the graph (Dijkstra's algorithm). (3) Build a low dimensionality embedded space to best preserve the complete distance matrix. The error function is expressed: E = τ ( DG ) − τ ( DY ) , where τ ( D ) is inner product G
distances in graph, τ ( D ) is inner product distances in new coordinate system. Y
Supervised Isomap for Plant Leaf Image Classification
631
Solution: set points Y to top eigenvectors of D . The shortest distance on a graph is easy to compute. It is shown in Fig1 (see www.combinatorica.com). G
Fig. 1. The sketch map of Dijkstra’s algorithm
3 Experiments The purpose of image segmentation is to separate leaf objects from background so that we can extract leaves’ shape features exactly in the later procedures, and the output of image segmentation is a binary image in which the leaf objects are numerically displayed with 1 and the background is with 0. After segmentation we can locate the leaf objects in binary images. Notice that there exist some variance on length and curvature of leafstalks. To keep the precision of shape features extraction these leafstalks should be removed. Therefore, we consider applying opening operation of mathematical morphology to binary images, which is defined as an erosion operation followed by a dilation operation using a same structuring element. By performing opening operation several times, we can successfully remove the leafstalks while preserving the main shape characteristics of leaf objects. There are many kinds of leaf image features such as shape features, color features and texture features, etc, which can be used for leaf images classification. In our experiments, we use supervised Isomap to extract classification features from leaf images which have been pre-processed including segmentation, denoising. In our work, a leaf image sub-database is used in the following experiment, which was collected and built by segmentation in our lab. This database includes 20 species of different plants. Each species includes at least 100 leaves images, 50 of which are used as training samples, the rest as testing set. The leaf images of 20 species are shown in Fig.2. Generally speaking, plant classification depends not only on the class
632
M. Du, S. Zhang, and H. Wang
distribution but also on the classifier to be used. The combining classifiers are commonly used for plant recognition. The 1-NN classifier is adopted for its simplicity.
Fig. 2. The leaf images of 20 kinds of plants
Then k nearest neighbor criterion is adopted to construct the adjacency graph and k is set to 3 to 10 with step 0.1, the adjustable parameter λ is set to 0.1 to 1 with step 0.1. After supervised Isomap has been applied to extract feature, the 1-NN classifier is adopted to predict the labels of the test data. The statistical experimental result is shown in Table 1 comparing with the other classification methods. The algorithm is programming with Matlab 6.5, and run on Pentium 4 with the clock of 2.6 GHz and the RAM of 256M under Microsoft Windows XP environment. Table 1. Classification rate of plant Method Classification rate (%)
Reference [19] 88.89
Reference [20] 89.47
Our proposed 92.25
4 Conclusion The plant leaf shape feature is one of the most important features for characterizing a plant, which is commonly used in plant recognition, matching and registration. In this paper, a plant recognition method is presented. The proposed method produces better classification performance. The experimental result illustrates the effectiveness of this method. The result of the proposed algorithm not only takes a further step to the technique of computer-aided plant species identification, but also remarkably enhances the correct rate of computer-aided plant species identification. At the end of this paper we have a prospect of technique of computer-aided plant species identification. The extensive application in the future of the plant species identification technique will also attracts more algorithms.
Supervised Isomap for Plant Leaf Image Classification
633
Future work should be directed to the following topics. (1) More difference leaf images in variety cases should be collected to generalize the proposed classification method. (2) More features of leaf should be included to improve the classification performance of the proposed method. Further study should be study a supervised robust feature extraction algorithm for plant classification. (3) The features of different leaf classes in the world should be taken into account to improve the classification accuracy. (4) The proposed method should be extended to handle distortion problems such as broken leaves and oriented leaves.
Acknowledgment This work was supported by the grant of the National Natural Science Foundation of China, No. 60805021, the grant from the National Basic Research Program of China (973 Program), No.2007CB311002.
References 1. Loncaric, S.: A Survey of Shape Analysis Techniques. Pattern Recognition 31(8), 983–1001 (1998) 2. Zhang, D.S., Lu, G.J.: Review of Shape Representation and Description Techniques. Pattern Recognition 37(1), 1–19 (2004) 3. Rui, Y., She, A.C., Huang, T.S.: Modified Fourier Descriptors for Shape Representation-a practical Approach. In: First International Workshop on Image Databases and Multi Media Search, Amsterdam, The Netherlands (August 1996) 4. Papadourakis, G.C., Bebis, G., Orphanoudakis, S.: Curvature Scale Space Driven Object Recognition with an Indexing Scheme Based on Artificial Neural Networks. Pattern Recognition 32(7), 1175–1201 (1999) 5. Im, C., Nishida, H., Kunii, T.L.: Recognizing Plant Species by Leaf Shapes—a Case Study of the Acer Family. Proc. Pattern Recognition 2, 1171–1173 (1998) 6. Wang, Z., Chi, Z., Feng, D.: Fuzzy Integral for Leaf Image Retrieval. Proc. Fuzzy Systems 1, 372–377 (2002) 7. Saitoh, T., Kaneko, T.: Automatic Recognition of Wild Flowers. Proc. Pattern Recognition 2, 507–510 (2000) 8. Wu, S.G., Bao, F.S., et al.: Recognition Algorithm for Plant Classification Using Probabilistic Neural Network. arXiv:0707 4289 (2007) 9. Huang, D.S.: Systematic Theory of Neural Networks for Pattern Recognition. Publishing House of Electronic Industry of China, Beijing (1996) 10. He, Y., Kundu, A.: 2-D Shape Classification Using Hidden Markov Model. IEEE Transaction on Pattern Recognition and Machine Intelligence 13(11), 1172–1184 (1991) 11. Nishida, H.: Matching and Recognition of Deformed Closed Contours Based on Structural Transformation Models. Pattern Recognition 31(10), 1557–1571 (1998) 12. Agazzi, O.E., Kuo, S.S.: Hidden Markov Model Based Optical Character Recognition in the Presence of Deterministic Transformations. Pattern Recognition 26(12), 1813–1826 (1993) 13. Landraud, A.M.: Image Restoration and Enhancement of Characters Using Convex Projection Methods. Computer Vision, Graphics and Image Processing 3, 85–92 (1991)
634
M. Du, S. Zhang, and H. Wang
14. Fuh, C.S., Liu, H.B.: Projection for Pattern Recognition. Image and Vision Computing 16, 677–687 (1998) 15. McCollum, A.J., Bowman, C.C., Daniels, P.A., Batchelor, B.G.: A Histogram Modification Unit for Real-time Image Enhancement. Computer Vision, Graphics and Image Processing 12, 337–398 (1988) 16. Dietterich, T.G., Becker, S., Ghahramani, Z.: Advances in Neural Information Processing Systems, vol. 14, pp. 585–591. MIT Press, Cambridge (2002) 17. Roweis, S.T., Saul, L.K.: Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science 290, 2323–2326 (2000) 18. Tenenbaum, J.B., Silva, V., de Langford, J.C.: A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 290, 2319–2323 (2000) 19. Gu, X., Du, J.-X., Wang, X.-F.: Leaf recognition based on the combination of wavelet transform and gaussian interpolation. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 253–262. Springer, Heidelberg (2005) 20. Wang, X.-F., Du, J.-X., Zhang, G.-J.: Recognition of leaf images based on shape features using a hypersphere classifier. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 87–96. Springer, Heidelberg (2005)
Integration of Genomic and Proteomic Data to Predict Synthetic Genetic Interactions Using Semi-supervised Learning Zhuhong You1,2, Shanwen Zhang1, and Liping Li3 1
Intelligent Computing Lab, Hefei Institute of Intelligent Machines, Chinese Academy of Sciences, P.O. Box 1130, Hefei Anhui 230031, China 2 Department of Automation, University of Science and Technology of China, Hefei 230027, China 3 The Institute of Soil and Water Conservation of Gansu, Lanzhou 730020, China
[email protected],
[email protected],
[email protected] Abstract. Genetic interaction, in which two mutations have a combined effect not exhibited by either mutation alone, is a powerful and widespread tool for establishing functional linkages between genes. However, little is known about how genes genetic interact to produce phenotypes and the comprehensive identification of genetic interaction in genome-scale by experiment is a laborious and time-consuming work. In this paper, we present a computational method of system biology to analyze synthetic genetic interactions. We firstly constructed a high-quality functional gene network by integrating protein interaction, protein complex and microarray gene expression data together. Then we extracted the network properties such as network centrality degree, clustering coefficient, etc., which reflect the local connectivity and global position of a gene and are supposed to correlate with its functional properties. Finally we find relationships between synthetic genetic interactions and function network properties using the graph-based semi-supervised learning which incorporates labeled and unlabeled data together. Experimental results showed that Semi-supervised method outperformed standard supervised learning algorithms and reached 97.1% accuracy at a maximum. Especially, the semi-supervised method largely outperformed when the number of training samples is very small.
,
Keywords: Genetic Interaction, Functional Gene network, Network Property Semi-supervised learning.
1 Introduction In the post-genomic era, one of the most important steps in modern molecular biology is to understand how gene products or proteins, interact to perform cellular functions. In [1] the proteins in the same pathway tend to share similar synthetic lethal partners, therefore for two genes the number of common genetic interaction partners can be used to measure the probability of physical interaction or sharing a biological function. Therefore, identifying gene pairs which participated in synthetic genetic interaction D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 635–644, 2009. © Springer-Verlag Berlin Heidelberg 2009
636
Z. You, S. Zhang, and L. Li
(SGI), especially synthetic sick and lethal (SSL), is very important for understanding cellular interaction and determining functional relationships between genes. However, little is known about how genes interact to produce phenotypes. We can experimentally test the phenotype of all double concurrent perturbation to identify whether pairs of gene have the relation of SGI [1]. However, the comprehensive identification of SGI in genome-scale by experiment is a laborious and time-consuming work. Therefore, it is crucial that SGI is predicted using the computational method. In this paper, we proposed a novel in-silico model to predict the synthetic genetic interaction gene pairs. Specifically, we firstly constructed a systematic functional gene network by integrating protein interaction data, protein complex data and microarray gene expression data together and employed the tool of network analysis to address the issue of SGI prediction. To predict synthetic genetic interactions gene pairs, we choose data source which will be helpful in identifying the SGI. For example, protein interaction data or microarray gene expression data can provide a great deal of information about the functional relationship between genes. The protein complex data also contain rich information about functional relationship among involved proteins. PPI data can be modeled as a connectivity graph in which the nodes represent proteins and the edges represent physical interactions. In PPI network, the interactions reflect the direct collaboration of proteins. Gene expression profiles do not describe direct physical interaction but measure the expression levels of certain gene in whole genome scale. Protein complexes means groups of proteins perform a certain cellular task together. In previous work [2], it has proven that genes corresponding to the interacting proteins tend to have similar expression patterns. Following this observation, a natural idea is to integrate PPI data, protein complex data and microarray gene expression data to utilize more information for genetic interaction prediction. In recent years, it has been a growing and hot topic to integrate diverse genomic data to protein-protein network and improve the coverage and accuracy of the network. Previous works which integrate diverse of data source indeed proved that such integrated function network is more reliable than network just based on a single data source [3]. However, most previous works for SGI prediction only consider protein interaction data or microarray gene expression data [4, 5]. In our study, we integrated genomic and proteomic data to build a biological network. Comparing with previous work, one contribution of our method is to build a probabilistic functional network where a probabilistic weight is assigned to each edge based on the projected functional linkage. Network analysis is a quantitative method, which originates from the social science, to study the nodes’ properties related to connectivity and position in the network. It has become increasingly popular to be applied to diverse areas, such as molecular biology, computational biology, etc. Network analysis is a powerful tool that allows one to study the relationships between two nodes in the network, providing information on the relationship that two nodes have similar properties. In this work, we extracted the network properties such as network centrality degree, clustering coefficient, betweenness centrality, etc., which reflect the local connectivity and global position of a gene and are supposed to correlate with its functional properties. Then we find relationships between synthetic genetic interactions and function network properties based on the machine learning algorithms.
Integration of Genomic and Proteomic Data to Predict Synthetic Genetic
637
Many machine learning algorithms including support vector machine (SVM), probabilistic decision tree have been developed to predict the SGI of gene pairs. However, most of previously mentioned learning algorithms predict the SGI only from labeled samples. When the number of labeled samples which used as training set is small, these traditional approaches do not work well. Usually obtaining unlabeled samples is much easier than getting labeled samples. Therefore, it is very desirable to develop a predictive learning algorithm to achieve high performance using both labeled samples and unlabeled samples. In this study, we proposed a method for predicting the genetic interactions based on the functional gene network which integrated genomic and proteomic data. The prediction of SGI is performed by using the graphbased semi-supervised learning which incorporates labeled and unlabeled data together and overcomes all the drawbacks mentioned above. To sum up, we presented a computational method of system biology to analyze synthetic genetic interactions. This in-silico model can be used to find relationships between synthetic genetic interactions and function network properties. The proposed semi-supervised method only needs a few number of labeled training data and can approach a high performance. Experimental results showed that Semi-supervised method outperformed standard supervised learning algorithms, i.e., SVM, and reached 97.1% accuracy at a maximum. Especially, the semi-supervised method largely outperformed when the number of training samples is very small.
2 Methodology 2.1 General Procedure In our approach, we first combined protein interaction data, protein complex data and microarray gene expression profiles of saccharomyces cerevisiae to build a high coverage and high precision weighted biological network. Specifically, PPI and protein complex data are used to determine the topology of the network. The weights of the interaction are calculated based on the gene expression profile. The weights are assigned as the confidence score that represents their functional coupling of the biological network in this way. Secondly several graphbased network properties are extracted based on the biological network. Then these network properties which correspond to single gene or gene pair, the experimentally obtained gene pairs which have been confirmed to have or not have the synthetic genetic interaction are inputted to a semi-supervised classifier to prediction some other unknown interaction gene pairs. The overall workflow is illustrated in Fig. 1. The details of above procedure will be described as below. 2.2 Multiple Data Sources Although each kind of genome or proteome data source can provide information on gene function insights, each data always contain severe noise and only reflect certain information. Therefore integrating different kinds of data source to build a network has been a hot topic and is expected to provide more and reliant information. The high-throughput data such as microarray gene expression data, protein interaction data and protein complexes data are becoming essential resource for understanding the dynamics of the cell. In this study, these three data source were integrated into a functional biology network.
638
Z. You, S. Zhang, and L. Li
Fig. 1. Schematic diagram for predicting the Synthetic Genetic Interactions by integrating of Genomic and Proteomic Data. The Semi-supervised Learning is used to utilize both labeled samples and unlabeled samples.
2.2.1 Protein–Protein Interaction Data Proteins play crucial roles, such as transferring signals, controlling the function of enzymes, in cellular machinery. One protein interacts with other proteins, DNA or other molecules. The Protein–protein interaction (PPI) data is believed to contain valuable insight for the inner working of cells; therefore it may provide useful clues for the function of individual proteins or signaling pathways. In our work, all the PPI data of yeast are from the BioGrid database [6]. The BioGrid contains many protein and genetic interaction curated from other literatures or databse. To study the protein interaction from a global perspective, we generated the PPI by extracting physical interactions reported in BioGrid, altogether with 12990 unique interactions among 4478 proteins. 2.2.2 Protein Complexes In the protein complexes, although it is unclear which proteins are in physical contact, the protein complex data contain rich information about functional relationship among involved proteins. For simplicity, we assigned binary interactions between any two proteins participating in a complex. Thus in general, if there are n proteins in a protein complex, we add n(n-1)/2 binary interactions. We got the protein complex data from [7, 8]. Altogether about 49000 interactions are added to the protein interaction network. 2.2.3 Microarray Gene Expression Data Protein-protein interaction data produced by current state of art methods contain many false positives, which can influence the analyses needed to derive biological insights. One way to address this problem is to assign confidence scores that reflect the reliability and biological significance of each interaction. Most previously described scoring methods use microarray gene expression data as another data source [9, 10]. Gene expression profile contains information on regulation between regulators and their targets. For this study we downloaded gene expression microarrays from the
Integration of Genomic and Proteomic Data to Predict Synthetic Genetic
639
Gene Expression Omnibus (GEO) database at the NCBI. This database contains data from over 300 large-scale microarray and SAGE experiments. However, when selecting datasets a tradeoff between reliability and computational costs has to be made. In this work we use 76 experimental conditions for all the genes in yeast related to cell regulation and migration. For each experiment, if there was a missing point, we substituted its gene expression ratio to the reference state with the average ratio of all the genes under that specific experimental condition. 2.3 Construction of Probabilistic Functional Network As mentioned above each node pair linkage in a FGN carries a confidence score to represent the functional coupling between the two biological entities that it represents. The successful applications of integrating heterogeneous data have proved that integrating PPI and microarray gene expression data is superior to methods which just use PPI data [11]. For the protein-protein interaction, the interaction reflects the directly physical interaction of proteins to implement certain function. However, the PPI data is flooded with many false positive or false negative interaction mistakes. Therefore the prediction of synthetic genetic interactions just based on PPI data is often not very accurate and can not approach good performance. We suppose that genes with similar expression profiles are involved in the control of the same transcriptional factors and thus functional associated. The microarray gene expression data can provide useful information Following the previous work [9, 10], we calculated the weighted score of two genes which indicate the function similarity based on different kinds of highthroughput data. We consider a microarray gene expression data set as X={x1,x2,…,xM}, where xi={xi1,xi2,…,xiM} is a N dimensional vector representing gene i with N conditions. We firstly used the clustering algorithm to group the genes into S different clusters C1,C2,…,CS. Specifically, the Pearson Correlation Coefficient (PCC) is adopted to measure the similarity or dissimilarity of the expression pattern of two genes. Let us consider genes xi and xj and the PCC can be calculated as
∑ (x N
PPC ( x , x ) = i
j
ik
− x )( x − x ) i
jk
j
k =1
∑ (x − x ) N
ik
k =1
i
∑ (x − x )
(1)
N
2
jk
2
j
k =1
where xik and xjk are the expression values of the kth condition of the ith and jth genes respectively. xi , x j are the mean values of the ith and jth genes respectively. PCC is
always in the range [-1,1]. A positive value of PCC means that the two genes are coexpressed and negative value denotes that they are the opposite expressed gene pairs. As proposed in [12], we used the Pearson correlation coefficient as a measure of similarity or dissimilarity to cluster genes with similar or different expression patterns, which means genes with coexpressed pattern are assigned to same cluster and vice versa. At first, all genes of the microarray data are considered in a single cluster and the cluster is partitioned into two disjoint clusters. Partitioning is done in such a way that xi and xj which have the most negative value of PCC will be assigned in two
640
Z. You, S. Zhang, and L. Li
different clusters. Genes having larger value of PCC with xi compared with xj are assigned in the cluster that contains xi . Otherwise, they are placed in the cluster that contain xj. In the next iteration, a cluster having a pair of gene (xi, xj) with the most negative PCC value will be selected and the above partitioning procession is repeated until there is no negative PCC value present between any pair of genes inside any cluster. This kind of cluster method ensures that all pairs of genes in any cluster are only positively correlated. In [12] it has been proven that this method is able to obtain clusters with higher biological significance than that obtained by some other algorithms such as Fuzzy K-means, GK and PAM clustering methods. Based on the above obtained gene expression profile which has been partitioned into a couple of clusters, we calculated the weight scores of the interactions between two proteins as below formula: W (x , x ) = L * ( X − C i
j
1
i
2 Xi
+ X −C j
2 Xj
) + (1 − L ) * C i − C 1
X
2 Xj
(2)
where xi and xj represent genes i and j with N conditions respectively. CX i , CX j denote the centroids of the clusters in which genes xi and xj located respectively. . denotes
the Euclidean distance. In equation (2), the constant L1 is a trade-off parameter used to tune the ratio of the first and second term in the weight function. According to we chose L1=0.3 because we supposed that the distance between centroids of two cluster more significant comparing with the distance of each gene from its centroid. The outcome of the integration method is a weighted undirected graph and the graphbased properties will be extracted and then be used to predict the synthetic genetic interaction. 2.4 The Properties of Probabilistic Functional Network for Predicting SGI (Synthetic Genetic Interaction)
Graph theory enables us to analyze structural properties of the network and link them to other information, such as function. The construction of a validated probabilistic functional network allows an in-depth analysis of the local connectivity and global position of a gene, which supposed to correlate with its functional properties. Here the network properties of the PFN, such as local connectivity and global position, are examined with the aim of discover the relationship between the network properties of a pair of genes and the existence of a synthetic genetic interactions relationship. We used graph-based semi-supervised classifier to model the correlation between the network properties of a pair of genes and the existence of a synthetic genetic interactions relationship. Several network properties of one gene or gene pair are input to the semi-supervised classifier. In this study, we integrated microarray gene expression , protein-protein interaction and protein complexes data of Saccharomyces cerevisiae in the form of an undirected weighted graph G (V, E), which formally consists of a set of vertices V and edges E between them. An edge connects vertex with vertex . Most previous works only consider the topological aspects of the protein interaction network and ignore the underlying functional relationships which can be reflected by the microarray gene expression data [5]. Therefore, in this work we extend the concepts of network
Integration of Genomic and Proteomic Data to Predict Synthetic Genetic
641
properties in unweighted network to weighted network. The network properties such as Centrality degree, clustering coefficient, betweenness centrality, closeness centrality and eigenvector centrality was used in this study. All of the above properties are properties of single nodes in a network. The first two are sensitive only to the local network structure around the node, while all the others are sensitive to the global network topology. Furthermore, the last two properties depend not only on shortest paths through the network, but on other paths as well. In addition to the above single-node properties, we also computed a set of two-node properties, for example the inverse of shortest distance d(p,q) between proteins p and q, number of mutual neighbors between proteins p and q. 2.5 Graph-Based Semi-supervised Classifier
Generally speaking, to predict the synthetic genetic interaction based on biological network is challenging for two reasons. Firstly, only a small number of synthetic genetic interaction or non-interaction gene pairs have experimentally provided, which means the available knowledge of labels is usually very sparse. Therefore, it is difficult to obtain sufficient training data set for the supervised algorithm. Secondly, biological network integrated by heterogeneous data is inherently very noisy, which let the methods that exploit the network properties prone to noise propagation. In this study, we address these two issues by proposing a graph-based semi-supervised learning to synthetic genetic interaction prediction from functional biological network based on the network properties analysis. Semi-supervised learning (SSL) is halfway between supervised and unsupervised learning. In this study, the prediction of synthetic genetic interaction is modeled as a classification problem. The candidate gene pairs are considered as points with a couple of features, which correspond to the properties of these two genes in the biological network. Consider the data set which can be denoted by X =(Xl,Xu) of labeled inputsXl={x1,x2,…,xl} and unlabeled inputs Xu ={xl+1,xl+2,…,xn} along with a small portion of corresponding labels{y1,y2,…,yl}. In our study the labels are binary: yl∈{1,0}, Consider a connected weighted graph G=(V,E) with vertex V corresponding to above n data points, with nodes L={1,2,…,l} corresponding to the labeled points with labels y1,y2,…,yl and U={l+1,l+2,…,n} corresponding to unlabeled points. For semi-supervised learning, the objective is to infer the labels {yl+1,yl+2,…,yn} of the unlabeled data {xl+1,xl+2,…,xn}, typically l 0 ⎪ φ ( x, y ) ⎨ = 0 ⎪< 0 ⎩
if (x,y) is inside C (x,y) ∈ C . if (x,y) is outside C
(3)
Accordingly, the global term in (2) can be rewritten so as to evaluate the level set function φ on the domain Ω:
A Level Set Based Segmentation Method for Images with Intensity Inhomogeneity
2
673
2
E G (c1 , c2 , φ ) = ∫ u0 ( x, y ) − c1 H (φ ( x, y))dxdy + ∫ u0 ( x, y ) − c2 (1 − H (φ ( x, y )))dxdy Ω
Ω
(4)
where H ( z ) is the Heaviside function. Usually, after (4) comes to a steady state, or approximately to be zero, the evolving curve C (zero level set of φ ) will separate the object from the background. However, for the images with intensity inhomogeneity, the final obtained curve C can hardly divide the image into object region and background region even after a long iteration time. The reason is that the global term assumes that the image intensity is piecewise constant like the CV model. Thus, the averages c1 and c2 actually act as the global information and can not represent the inhomogeneous intensities of object region and background region in the images with intensity inhomogeneity. So, to achieve a good performance in segmenting the images with intensity inhomogeneity, the local image information needs to be included. 3.2 Local Term
Here, the local term is introduced in (5) which uses the local statistical information as the key to improve the segmentation capability of our model on the images with intensity inhomogeneity. E L (d1 , d 2 , C ) =
∫
2
g k ∗ u0 ( x, y ) − u0 ( x, y ) − d1 dxdy +
inside ( C )
2
∫
g k ∗ u0 ( x, y ) − u0 ( x, y ) − d 2 dxdy ,
outside ( C )
(5)
where g k is a averaging convolution operator with k × k size window. d1 and d 2 are the intensity averages of difference image ( g k ∗ u0 ( x, y ) − u0 ( x, y )) inside C and outside C , respectively. The assumption behind the proposed local term is that smaller image regions are more likely to have approximately homogeneous intensity and the intensity of the object is statistically different from the background. It is significative to statistically analyze each pixel with respect to its local neighborhood. The most simple and fast statistical information function is the average of the local intensity distribution, the rationale being that if the object pixels are brighter than the background, they should also be brighter than the average. By subtracting the original image from the averaging convolution image, the contrast between foreground intensities and background intensities can be significantly increased. Note that the difference image ( g k ∗ u0 ( x, y ) − u0 ( x, y )) with higher image contrast is still not easily to be segmented due to the weak object boundaries and complicated topological structure. It needs under a level set evolution for obtaining better segmentation result. In the same manner as global term, the local term (5) can also be reformulated in terms of the level set function φ ( x, y ) as follows: 2
E L (d1 , d 2 , φ ) = ∫ g k ∗ u0 ( x, y ) − u0 ( x, y ) − d1 H (φ ( x, y ))dxdy Ω
2
+ ∫ g k ∗ u0 ( x, y ) − u0 ( x, y ) − d 2 (1 − H (φ ( x, y )))dxdy . Ω
(6)
674
X.-F. Wang and H. Min
3.3 Regularization Term
In order to control the smoothness of the zero level set and further avoid the occurrence of small, isolated regions in the final segmentation, we add to the regularization term a length penalty term which is defined related to the length of the evolving curve C . Here, through replacing the curve C by the level set function φ ( x, y ) , the length penalty term can be reformulated as:
L(φ = 0) = ∫ ∇H (φ ( x, y )) dxdy = ∫ δ (φ ( x, y )) ∇φ ( x, y ) dxdy . Ω
Ω
(7)
The use of length penalty term implies that the evolving curve C which minimizes the overall energy functional should be as short as possible. It imposes a penalty on the length of the curve that separates the two phases of image, i.e., foreground and background, on which the energy functional will make a transition from one of its values, c1 ( d1 ), to the other, c 2 ( d 2 ). Re-initialization has been extensively used as a numerical remedy for maintaining stable curve evolution and ensuring desirable results in the level set methods. Unfortunately, it is obviously a disagreement between the theory of level set method and its implementation, since it has an undesirable side effect of moving the zero level set away from its original location. Moreover, it is quite complicated and time-consuming, and when and how to apply it is still a serious problem [11]. In this paper, we did not directly use the re-initialization step to keep the level set function as a signed distance function but add to the regularization term a penalty term [12] as follows: 2
1 ( ∇ φ ( x, y ) − 1 ) . Ω2
P(φ ) = ∫
(8)
which can force the level set function to be close to a signed distance function. Actually, this penalty term is more like a metric which characterizes how close a function φ is to a signed distance function. The metric plays a key role in the elimination of re-initialization in our method. So, in the proposed model, the regularization term E R should be composed of two terms: E R (φ ) = μ ⋅ L(φ = 0) + P(φ ) Ω
2
1 ( ∇ φ ( x, y ) − 1 ) , Ω2
= μ ⋅ ∫ δ (φ ( x, y )) ∇φ ( x, y ) dxdy + ∫
(9)
where μ is the parameter which can control the penalization effect of length term: if μ is small, then smaller objects will be detected; if μ is larger, then larger objects are detected. 3.4 Level Set Formulation
In the level set formulation, the curve C is represented by the zero level set of a Lipschitz function φ . The overall energy functional can be described as follows:
A Level Set Based Segmentation Method for Images with Intensity Inhomogeneity
2
2
2
2
675
E HLS (c1 , c2 , d1 , d 2 , φ ) = ∫ (α ⋅ u0 ( x, y ) − c1 + β ⋅ g k ∗ u0 ( x, y ) − u0 ( x, y ) − d1 ) H ε (φ ( x, y ))dxdy Ω
+ ∫ (α ⋅ u0 ( x, y ) − c2 + β ⋅ g k ∗ u0 ( x, y ) − u0 ( x, y ) − d 2 )(1 − H ε (φ ( x, y )))dxdy Ω
+ ( μ ⋅ ∫ δ ε (φ ( x, y )) ∇φ ( x, y ) dxdy + ∫ Ω
Ω
(10)
2
1 ( ∇φ ( x, y ) − 1) ), 2
where α and β are two positive parameters which govern the tradeoff between the global term and the local term. g k is the averaging convolution operator with k × k size window for local information detection. H ε ( z ) is the noncompactly supported, smooth and strictly monotone approximation of H ( z ) and δ ε ( z ) is the regularized
approximation of δ ( z ) which are respectively computed by:
Hε ( z) =
z 1 2 1 ε 1 + arctan , δε ( z) = ⋅ 2 . 2 π ε π ε + z2
(11)
By calculus of variations, it can be shown that the constant functions c1(φ ) , c 2(φ ) , d1(φ ) and d 2(φ ) that minimize E HLS (c1 , c2 , d1 , d 2 , φ ) for a fixed function φ are given by:
∫ u ( x, y)H ε (φ ( x, y))dxdy 0
c1 (φ ) =
Ω
∫ H ε (φ ( x, y))dxdy
Ω
d1 (φ ) =
∫ (g
k
∫ u ( x, y)(1 − H ε (φ ( x, y)))dxdy 0
, c2 (φ ) =
Ω
∫ (1 − H ε (φ ( x, y )))dxdy
.
(12)
Ω
∗ u0 ( x, y ) − u0 ( x, y ))H ε (φ ( x, y ))dxdy
Ω
∫ H ε (φ ( x, y))dxdy
.
(13)
Ω
d 2 (φ ) =
∫ (g
k
∗ u0 ( x, y ) − u0 ( x, y ))(1 − H ε (φ ( x, y )))dxdy
Ω
∫ (1 − H ε (φ ( x, y )))dxdy
.
(14)
Ω
Keeping c1 , c 2 , d1 and d 2 fixed, and minimizing the overall energy function E HLS in (10) with respect to φ , we can deduce the associated Euler–Lagrange equation for φ . The minimization of (10) can be done by introducing an artificial time variable t ≥ 0 , and moving φ in the steepest descent direction to a steady state with the initial condition defined in (16) and boundary condition defined in (17):
∂φ = δ ε (φ )[−(α ⋅ (u0 − c1 ) 2 + β ⋅ ( g k ∗ u0 ( x, y ) − u0 ( x, y ) − d1 ) 2 ) ∂t + (α ⋅ (u0 − c2 )2 + β ⋅ ( g k ∗ u0 ( x, y ) − u0 ( x, y ) − d 2 ) 2 )] + [ μ ⋅ δ ε (φ )div(
∇φ ∇φ ) + (∇ 2φ − div( ))] , ∇φ ∇φ
(15)
676
X.-F. Wang and H. Min
φ (0, x, y ) = φ0 ( x, y ) in Ω ,
(16)
∂φ r = 0 on ∂Ω , ∂n
(17)
r where n denotes the exterior normal to the boundary ∂Ω . In fact, α and β should be set according to the intensity inhomogeneity presenting in the images. For images without intensity inhomogeneity, the value of α is suggested to be near or equal to that of β . If images present distinct intensity inhomogeneity, the value of α should be selected less than that of β so as to restrict the intensity inhomogeneity. It should be noticed that the case of α = 0 may be acceptable in segmenting some images. However, it is not suggested since the global term can sometimes have a restriction effect on noise and maintain the boundary details. In our experiments, we usually fixed β = 1 and then dynamically adjusted the value of α according to the intensity property of images. The partial differential equation in the continuous domain defined in (15) can be solved by a finite difference method in numerical scheme. All the spatial partial derivatives are approximated by the central difference and the temporal partial derivatives are approximated by the forward difference as follows:
φin, +j 1 − φin, j Δt
= δ ε (φin, j ){−(α ⋅ (ui , j − c1 (φ n )) 2 + β ⋅ ( g k ∗ ui , j − ui , j − d1 (φ n )) 2 ) + (α ⋅ (ui , j − c2 (φ n ))2 + β ⋅ ( g k ∗ ui , j − ui , j − d 2 (φ n )) 2 )}
(18)
+ [ μ ⋅ δ ε (φin, j )κ + (φin+1, j + φin−1, j + φin, j +1 + φin, j −1 − 4φin, j − κ )] , where Δt is the time-step and h is the grid spacing. Curvature κ can be discretized using a second-order central differencing scheme:
κ = div(
2 2 ∇φ φxxφ y − 2φ xyφxφ y + φ yyφx = . ∇φ (φx2 + φ y2 )3/ 2
(19)
4 Experimental Results In this Section, we shall present the experimental results of our hybrid level set (HLS) model on some synthetic and real images. The proposed model was implemented by Matlab 7 on a computer with Intel Core 2 Duo 2.2GHz CPU, 2G RAM, and Windows XP operating system. The processing time referred later in this section starts after choosing the initial contour. We used the same parameters of the time-step Δt = 0.1 , the grid spacing h = 1 , ε = 1 (for H ε ( z ) and δ ε ( z ) ), μ = 0.01∗ 2552 , the window size of averaging convolution operator k = 15 for all the experiments in this section. As described in Section 3.4, we fixed β = 1 and dynamically adjusted the value of
A Level Set Based Segmentation Method for Images with Intensity Inhomogeneity
677
α . In our experiments, α has two corresponding values: 0.1 and 1 for images with/without intensity inhomogeneity. We firstly considered the simplest case: segmentation of images with the intensity homogeneity. Fig.1 shows the segmentation of a noisy synthetic image using the proposed HLS model. The robustness property of our model is due to the usage of global term and length penalty term. To show the high efficiency of our model, the initial contour was placed at the bottom left corner (as shown in Fig.1 (b)) unlike other methods in which the initial contour was usually placed in the center of images or touches the target objects. It can be seen from Fig.1 (c), the evolving curve successfully reach the true boundary of flower shape after 25 iterations.
(a)
(b)
(c)
Fig. 1. Noisy synthetic image segmentation using the proposed HLS model. (a) Original noisy image. (b) Initial contour. (c) Final segmentation result at the 25th iteration. Size= 98 × 90 , α = β = 1 . Processing time = 2.6s.
In the next two experiments (Fig.2 and Fig.3), we shall illustrate the ability of the proposed HLS model to segmenting images with the intensity inhomogeneity. Fig.2 shows the segmentation results for the well-known synthetic image using both the CV model and the proposed HLS model. It can be seen from Fig.2 (a) that the intensity decreases gradually from the left to the right. The initial contour was placed at the intersectional region of high intensity area and low intensity area, as shown in Fig.2 (b). The failing segmentation of this synthetic image using the CV model is illustrated in Fig.2(c), which shows that the evolving curve of the CV model can not pass through the intersectional region of high intensity area and low intensity area even after 100 iterations. Fig.2 (d) shows the segmentation result of the HLS model where the evolving curve of the HLS model can successfully reach the boundary of object after 15 iterations. Here, the value of the controlling parameter α = 0.1 is less than that of β = 1 to maintain a restriction effect on the intensity inhomogeneity. Fig.3 shows the segmentation results for two real blood vessel images with the intensity inhomogeneity using both the CV model and the proposed HLS model. It can be seen from the third column of Fig.3 that the CV model failed to segment both two images with the intensity inhomogeneity as we anticipated. The reason is due to the inherent disadvantage of not using the local information. The fourth column of Fig.3 shows that the proposed HLS model can successfully segment the images in the first column. It should be noticed that the iteration times will be efficiently decreased if the initial contours are placed on certain part of the objects.
678
X.-F. Wang and H. Min
(a)
(b)
(c)
(d)
Fig. 2. The comparisons of the CV model and the proposed HLS model on segmenting a synthetic image with the intensity inhomogeneity. (a) Original image. (b) Initial contour. (c) Final segmentation result using the CV model. (d) Final segmentation result using the proposed HLS model. Size= 88 × 85 , α = 0.1 , β = 1 . Processing time=5.3s (CV), 0.9s (HLS).
Fig. 3. The comparisons of the CV model and the proposed HLS model on segmenting two real blood vessel images with the intensity inhomogeneity. The first column: Original images. The second column: Initial contours. The third column: Final segmentation results using the CV model. The fourth column: Final segmentation results using the proposed HLS model. Size=103 × 131 ,110 × 110 . α = 0.1 , β = 1 .
5 Conclusions In this paper, we propose a new hybrid level set (HLS) model for segmenting the images with intensity inhomogeneity. The total energy functional for the proposed model consists of global term, local term and regularization term. By incorporating the local image information into the proposed model, the images with intensity inhomogeneity can be efficiently segmented. To avoid the time-consuming re-initialization step, a penalizing energy is introduced into the regularization term. Finally, experiments on some synthetic and real images have demonstrated the desired segmentation performance of our proposed model for the images with or without intensity inhomogeneity.
A Level Set Based Segmentation Method for Images with Intensity Inhomogeneity
679
Acknowledgments. This work was supported by the grants of the National Science Foundation of China, Nos. 60873012 & 60805021, the grant from the National Basic Research Program of China (973 Program), No.2007CB311002, the grant from the National High Technology Research and Development Program of China (863 Program), No. 2007AA01Z167, the grant of the Graduate Students’ Scientific Innovative Project Foundation of CAS (X.F. Wang).
References 1. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. Int. J. Comput. Vision 1(4), 321–331 (1987) 2. Caselles, V., Catte, F., Coll, T., Dibos, F.: A Geometric Model for Active Contours in Image Processing. Numer. Math. 66(1), 1–31 (1993) 3. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic Active Contours. Int. J. Comput. Vision 22(1), 61–79 (1997) 4. Malladi, R., Sethian, J.A., Vemuri, B.C.: Shape Modeling with Front Propagation: A Level Set Approach. IEEE Trans. Patt. Anal. Mach. Intell. 17(2), 158–175 (1995) 5. Chan, T.F., Vese, L.A.: Active Contours without Edges. IEEE Trans. Image Processing 10(2), 266–277 (2001) 6. Tsai, A.Y., Willsky, A.S.: Curve Evolution Implementation of the Mumford-Shah Functional for Image Segmentation, Denoising, Interpolation, and Magnification. IEEE Trans. Image Processing 10(8), 1169–1186 (2001) 7. Paragios, N., Deriche, R.: Geodesic Active Regions and Level Set Methods for Supervised Texture Segmentation. Int. J. Comput. Vision 46(4), 223–247 (2002) 8. Gao, S., Bui, T.D.: Image Segmentation and Selective Smoothing by Using Mumford– Shah Model. IEEE Trans. Image Processing 14(10), 1537–1549 (2005) 9. Vovk, U., Pernuš, F., Likar, B.: A Review of Methods for Correction of Intensity Inhomogeneity in MRI. IEEE Trans. Med. Imag. 26(3), 405–421 (2007) 10. Hou, Z.J.: A Review on MR Image Intensity Inhomogeneity Correction. International Journal of Biomedical Imaging 2006, 1–11 (2006) 11. Gomes, J., Faugeras, O.: Reconciling Distance Functions and Level Sets. J. Visiual Communic. and Imag. Representation l(11), 209–222 (2000) 12. Li, C.M., Xu, C.Y., Gui, C.F., Fox, M.D.: Level set Formulation without Re-initialization: A New Variational Formulation. In: Proc. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 430–436 (2005)
Analysis of Enterprise Workflow Solutions Cui-e Chen1, Shulin Wang2, Ying Chen1, Yang Meng1, and Hua Ma3 1
Changsha Social Work College, Changsha, Hunan, 410004, China School of Computer and Communication, Hunan University, Changsha, Hunan, 410082, China 3 Hunan International Economics University, Changsha, Hunan, 410082, China
[email protected] 2
Abstract. Since the 90's, workflow technology has been widely applied in various industries, such as office automation(OA), manufacturing, telecommunications services, banking, securities, insurance and other financial services, research institutes and education services, and so on, to improve business process automation and integration capabilities. In this paper, based on Workflow theory, the author proposed a set of policy-based workflow approach in order to support dynamic workflow patterns. Through the expansion of the functions of Shark, it implemented a Workflow engine component-OAShark which can support retrieval / rollback function. The related classes were programmed. The technology was applied to the OA system of an enterprise project. The realization of the enterprise workflow solutions greatly improved the efficiency of the office automation. Keywords: Workflow, dynamic workflow, workflow pattern, office automation, Shark.
1 Introduction Enterprise solutions, such as online shopping process (start -> browser -> options -> payment -> collection -> end), printing business processes (receiving and preparing documents -> proof -> make-up and ready to print -> RIP and printing -> postprocessing -> archive -> printing business -> provide chain management), and issued business processes in office automation (OA) (in Fig.1) (draft -> review -> check -> sign -> issued -> reviewed circulation -> proof -> No maintenance -> archive -> form red head document), and so on, pursuit " shorter life cycle, less turn-around time, less cost, and higher cost-effective". "Better, faster and more inexpensive" hardware is of course important, but it is not the ultimate solution for enterprises. There is a growing concern about the operation of business processes, from the initial demand of the users to the service of the final product. The use of the same resource with more production, is to improve the efficiency of the entire process, and is the key to obtain more profit. To obtain such efficiency, the method is guaranteed a reasonable and logical business processes path, at the same time as much as possible to increase the degree of automation and process integration, and this is all the contents of the workflow. D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 680–687, 2009. © Springer-Verlag Berlin Heidelberg 2009
Analysis of Enterprise Workflow Solutions
681
Fig. 1. Issued business processes in OA
2 Basic Concepts on Workflow 2.1 The Origin of Workflow The concept of workflow was proposed in 1968 by Fritz Nordsieck. From the 70's to 80's, workflow technology was in research. In early workflow systems, only a few have been successful. After the 90's, as the technical conditions were mature, research and development of workflow systems came into a new craze. In August, 1993, the industry organization of standardization of workflow technology - Workflow Management Coalition (WfMC) was established. Workflow Management Coalition issued a workflow reference model which cooperates between workflow management systems and had formulated a series of industry standards, in 1994. Thus, workflow technology has been widely applied in various industries, such as office automation, manufacturing, telecommunications services, the consumer goods industry, banking, securities, insurance and other financial services, logistics services, property services, large and medium-sized import and export trade companies, government utilities, research institutes and education services, especially in large multinational enterprises, and so on. 2.2 The Definition of Workflow Workflow Management Coalition references the definition of workflow [1]: Workflow is concerned with the automation of procedures where documents, information or tasks are passed between participants according to a defined set of rules to achieve, or contribute to, an overall business goal. Whilst workflow may be manually organised, in practice most workflow are normally organised within the context of an IT system to provide computerised support for the procedural automation and it is to this area that the work of the Coalition is directed. In short, the workflow is the series of operational activities or tasks which are interrelated and automatically carried out. We can consider the entire business as a river, the water which is flowing through the workflow.
682
C. Chen et al.
The working efficiency of many companies is very low, using manual transmission mode with paper forms from a signature-level approval. It can not realize statistic report forms. By using workflow software, the user can simply fill out the relevant forms on his/her computer. The person at the next level of approval will receive the relevant information and may modify, track, manage, query, count, or print if necessary. They can obtain greater efficiency, realize the knowledge management, and enhance the company's core competitiveness. 2.3 Workflow Management System The definition of the workflow management system given by the Workflow Management Coalition is [1]: A system that completely defines manages and executes workflows through the execution of software whose order of execution is driven by a computer representation of the workflow logic. A Workflow Management System is one which provides procedural automation of a business process by management of the sequence of work activities and the invocation of appropriate human and/or IT resources associated with the various activity steps. Workflow management system is not a business system, but a software supporting environment provided for business systems. It can benefits the businesses as the following: improving and optimization of business processes to improve operational efficiency, achieving better control of business processes to improve customer service quality, improving flexibility of business processes, etc. 2.4 Workflow Engine Workflow engine [2-3] is the core of workflow management system. It’s responsible for the execution of various process instances. It provides the core solutions to the transmission route of information and the content of level in accordance with different of role, division of labour and conditions which play a decisive role on application system. In recent years, a large number of workflow engines have emerged, such as Enhydra Shark, Gray Fox Willow, Osworkflow, Ofbiz, Xi'an synergy, etc.
3 Workflow Patterns 3.1 Basic Control Workflow Patterns In the workflow modelling, workflow pattern [4] is usually considered as a prototype. A prototype pattern of the workflow can be used to test the performance of the workflow server capacity, that is, how the workflow meets the business needs. Professor Petri studied 21 kinds of workflow patterns [5-6] based on the principle of Petri nets for workflow modelling and analysis. There are five basic control workflow patterns mainly: Sequence workflow pattern, Parallel Split workflow pattern, Synchronization workflow pattern, Exclusive Choice workflow pattern, Simple Merge workflow pattern in Fig.2.
Analysis of Enterprise Workflow Solutions
(1) Sequence
683
(2) Parallel Split (3) Synchronization (4) Exclusive Choice (5) Simple Merge Fig. 2. Basic control workflow pattern
3.2 Dynamic Workflow Patterns Because of the uncertainty and variability of modern business process, workflow management system needs to have the flexibility and dynamic processing power, so the work has led to a dynamic flow study[7-8]. If a workflow management system supports to modify the running workflow process instance, we call this system a dynamic workflow system. Meanwhile, the workflow which has been modified is called dynamic workflow. In order to support the dynamic changes and flexible control in workflow, we need to increase operations (insert, delete, modify, Jump back, back, etc.) for the procedure of work process and organization model, and rules for related operating. The dynamic characteristics of workflow are illustrated in Fig. 3, which is an "approval" process. The rollback list is ("draft") when the activities execute to the "trial". The rollback list is empty when the activities execute to "review 1". The rollback list is ("review 1") when the activities execute to "review 3". When the activities execute to "review 7" the rollback list is ("review 3", "review 4", "review 1"). The same token, when the activities execute to "review 8" the rollback list is ("review 5", "audit 6", "review 2"). When the activities execute to "general review" the rollback list is ("draft", "sent"). From this figure we know that the approval process include the five basic control workflow patterns and dynamic pattern for rollback.
Fig. 3. “Approval” flowchart
4 Shark-Based Dynamic Workflow Strategy Set 4.1 Dynamic Workflow Based on Strategy Set In order to achieve the support of dynamic workflow pattern, a workflow method is proposed based on static and dynamic strategy set, shown in Fig. 4.
684
C. Chen et al.
Fig. 4. The workflow method based on static and dynamic strategy set
4.2 Analysis of Instances An instance based on static strategy set is shown in Fig. 5. An instance based on the dynamic strategy set is shown in Fig. 6.
Fig. 5. An instance based on static strategy set
Fig. 6. An instance based on the dynamic strategy set
Analysis of Enterprise Workflow Solutions
685
5 The Application Approach of Enterprise Workflow 5.1 System Architecture The OA system is designed and developed to improve the office efficiency and accuracy. The system's main function includes: document processing, file management, personal secretary of subsystems, public information management, administrative office management, system management, etc. Since there are a large number of business flow relations between the modules, workflow applications is inevitable. The OA system was built with Shark-based workflow technology, thus it can obtain better system flexibility and dynamic characteristics. The architecture of the system is shown in Fig. 7. As shown in Fig. 7, OA system modules (document flow, administrative logistics, personal information and information portal) are built on the "workflow engine" components. The workflow engine mainly realizes the core functions such as process control, resource scheduling, task distribution, and meets the practical needs of the system through developing open resource Shark workflow engine. The "workflow engine" unified and coordinated these five basic components that include a unified user management, system management, Web content management, e-mail system and the Domino / Notes, etc. These components were formed by a number of generalpurpose modular of website and office automation system. We can quickly developed office automation systems and portal systems on the basis of these six components, according to different circumstances.
Fig. 7. OA system architecture
5.2 The Component Model of Shark Workflow Engine A recovery / rollback process management was added on the workflow framework. It can manage recovery / rollback processes, which include accessing the rollback list, the implementation of rollback, workflow affair and data consistency. In the process of recovery / rollback, the affairs characteristics of business data should be guaranteed. So the operation of the business data and recovery / rollback operation of the processes are putted into the affairs in order to ensure the overall affairs characteristics of the
686
C. Chen et al.
recovery / rollback. For this reason a recovery / rollback data manager was added. Through the extended functions of Shark, a Workflow engine component-OAShark which can support retrieval / rollback function has been implemented, the component model of which is shown in Fig. 8.
Fig. 8. Workflow engine component model supporting the retrieval / rollback
5.3 ActivityBackward Class The workflow system only needs to provide a method of rollback activities, on the system of the business point of view, to implement the access control of recovery and rollback. A class of special deal with the activities of rollback has been created: ActivityBackward.java, which was called by the core class of engine to achieve recovery / rollback functions. Description of the class diagram is shown in Fig. 9.
Fig. 9. ActivityBackward class diagram
Analysis of Enterprise Workflow Solutions
687
6 Conclusions In this paper, workflow pattern, dynamic workflow, Shark open source workflow engine based on the current workflow technology has been studied previously, in order to achieve the flexibility and dynamic characteristics of the workflow. A set of strategy-set-based workflow approaches have been achieved. Through the extended functions of Shark, solutions to the problems that how the recover / rollback needs dynamic support were found. The technology was applied to the OA system of an enterprise project. After the project has developed, it was put into operation to achieve the desired goal. The system is running stable.
References [1] [2] [3] [4] [5] [6] [7]
[8]
Fan, Y.H., Luo, H.B., Lin, H.P., et al.: Workflow management technology infrastructure, pp. 28–78. Tsinghua University Publisher, Springer Publisher, Beijing (2001) Wang, H.J., Fan, L.Q., Yang, L.F., Feng, Y.D., Cai, Y.: Research on Shark-based workflow process solution. Mechanical Engineering & Automation 2, 28–31 (2005) Wan, D.S., Yu, C.H.: Design and implementation of distributed workflow system based on shark system. Microelectronics and Computer 22(2), 96–99 (2005) Zhang, L., Yao, S.Z.: Research on workflow patterns based on Petrinets. Computer Integrated Manufacturing Systems (CIMS) 12(1), 54–58 (2006) Workflow Patterns, http://is.tm.tue.nl/research/patterns/patterns.htm Swaminathan, V.: Amazing Race analogy. The Architecture Journal 4(7), 8–15 (2006), http://msdn.microsoft.com/en-us/architecture/bb410935.aspx Mulle, J.A., Bohm, K., Roper, N., Sunder, T.: Building conference proceedings requires adaptable workflow and content management. In: Proceedings of the 32nd international conference on very large data bases, pp. 1129–1139 (2006) Chen, C.E.: Research and application of retrieval / rollback dynamic workflow pattern based on shark system. Application of Computer Systems 10, 85–87 (2007)
Cooperative Spectrum Sensing Using Enhanced Dempster-Shafer Theory of Evidence in Cognitive Radio Nguyen-Thanh Nhan, Xuan Thuc Kieu, and Insoo Koo School of Electrical Engineering, University of Ulsan 680-749 San 29, Muger 2-dong, Ulsan, Republic of Korea
[email protected] http://mcsl.ulsan.ac.kr
Abstract. Cooperative is an appropriate method for improving the performance of spectrum sensing when cognitive radio system is under the deep shadowing and fading environment. The Dempster-Shafer theory of evidence for fusion has similar reasoning logic with human. Thus an enhanced scheme for cooperative spectrum sensing based on an enhanced Dempster-Shafer Theory of Evidence is proposed in this paper. Our scheme utilizes the signal to noise ratios to evaluate the degree of reliability of each local spectrum sensing terminal on a distributed Cognitive Radio network to adjust the sensing data more accuratly before making fusion by Dempter-Shafer theory of evidence. Simulation results show that significant improvement of the cooperative spectrum sensing gain is achieved. Keywords: Cognitive radio, Cooperative spectrum sensing, Data fusion, Dempter-Shafer theory of evidence.
1
Introduction
Recently, Cognitive Radio (CR) has been proposed as a promising technology to improve spectrum utilization. In order to exploit licensed bands, which are underutilized either temporally or spatially, Cognitive Radio users (CU), unlicensed users, must have capability of sensing the spectrum environment. CUs are allowed to use the licensed bands opportunistically when such bands are not occupied, and must abandon its contemporary band to seek for the new idle spectrum again when the frequency band is suddenly accessed by the licensed user (LU). Therefore the key role of Cognitive Radio is played by spectrum sensing. Generally, spectrum sensing can be achieved by a single CU. The detection techniques often used in local sensing are energy detection, matched filter and cyclostationary feature detection [1], [2]. In [1], it is showed that the receiver signal strength could be seriously weakened at a particular geographical location due to multi-path fading and shadow effect. On these circumstances, single sensing
Corresponding author.
D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 688–697, 2009. c Springer-Verlag Berlin Heidelberg 2009
Cooperative Spectrum Sensing Using Enhanced Dempster-Shafer Theory
689
node is difficult to distinguish between an idle band and a deep fade one. In order to overcome this problem, cooperative spectrum sensing has been considered [5], [8], [10]. It uses a distributed detection model to solve the severe decadent of receiver signal strength at some location of the network. In distributed detection, a diversity of cooperative spectrum sensing scheme has been studied. Reference [8] proposes an optimal half-voting rule. However this rule only works well when all the threshold of CUs is identical, this is not regularly proper in practice. In [10], the optimal data fusion rule, which was first mention in [9] by Z. Chair and P. K. Varshney, combinning with a counting rule which is use for estimating the probability of detection and probability false alarm as [11], [12] has been proposed to obtain an adaptive cooperative spectrum sensing scheme. Although it gives a good performance, it needs a time to respond when the system is changed. In [5] a method to improve cooperative gain by combining both decisions and credibility of all the sensing nodes via Dempster-Shafer theory of evidence (D-S theory) is proposed. Nevertheless, under the practice condition of distributed cooperative spectrum sensing in CR context that there are a lot of conflict data, the D-S theory has showed a problem under such condition which has been mention in many studies [6], [7], [13]. In this paper, we proposed an enhanced cooperative spectrum sensing scheme based on D-S theory and reliability source evaluation. Compared with Bayesian theory the D-S theory feels closer to natural decision making logic of human. Its capability to assign uncertainty or ignorance to propositions is a powerful tool for dealing with a large range of problems that otherwise would seem intractable, especially in the case of the fast changing RF environment of CR network. The combination rule of evidence theory supplies a method to merge all of the mass function which has the similar meaning of credibility of each CUs sensing decision. Therefore, the mass function sent from sensing node has a very important role to the final decision. So we propose a new appropriate method to obtain it from source. Furthermore the credibility of each sensing node can be more accurate if they have been measured on the same scale. In other word, the credibility should have an adjustment based on the measurement relationship between nodes. From this aspect, we propose a Reliability Source Evaluation that utilizes all of available information to obtain an adjustment weight that quantifies the measurement correlation in order to enhance the performance of the fusion scheme. The rest of this paper is organized as follows. In Section 2, we describe the system model briefly. Section 3 illustrates the evidence theory and its problem. In Section 4, we develop our proposal for an enhanced cooperative spectrum sensing scheme. In Section 5, the simulation results are shown and analyzed. Finally, we conclude this paper in the last section.
2
System Description
For LU detection, we consider the cooperative spectrum sensing scheme like figure 1.
690
N.-T. Nhan, X.T. Kieu, and I. Koo
Shadowing
CU CU LU AP CU
CU CU CU: Cognitive Radio User LU: Licensed User AP: Access Point
Fig. 1. Cooperative spectrum sensing scheme
Each CU conducts its local sensing process, calculates and estimates some necessary information that will be transmitted to AP. At AP, the final decision will be made. Generally, the whole process of the scheme includes two steps: – Local spectrum sensing at CU – Data fusion at AP 2.1
Local Spectrum Sensing
Each Cognitive Radio user (CU) conducts a spectrum sensing process, which is called local spectrum sensing in distributed scenario for detecting Licensed User’s (LU) signal. Local spectrum sensing is essentially a binary hypotheses testing problem: H0 : x (t) = n (t) (1) H1 : x (t) = h (t) s (t) + n (t) where H0 and H1 are respectively correspondent to hypotheses of absence and presence of LU’s signal, x (t) represents received data at CU, h (t) denotes the amplitude gain of the channel, s (t) is the signal transmitted from the primary user and n (t) is the additive white Gaussian noise. Additionally, channels corresponding to different CUs are assumed to be independent, and further, all CUs and LUs share common spectrum allocation. 2.2
Energy Detection
Among various methods for spectrum sensing, energy detection has shown that it is quite simple, quick and possible to detect primary signal - even if the feature is unknown. Here we consider the energy detection for local spectrum sensing. Figure 2 shows the block diagram of energy detection scheme.
Cooperative Spectrum Sensing Using Enhanced Dempster-Shafer Theory
x( t )
N
BPF
∑x
ADC
j =1
2 j
xEi >
200 ), xE can be well approximated as a Gaussian random variable under both hypotheses H1 and H0 , with mean μ1 , μ0 and variance σ12 ,σ02 respectively [3] such that μ0 = N σ02 = 2N (3) 2 μ1 = N (γ + 1) σ1 = 2N (2γ + 1) where γ is the signal to noise ratio (SNR) of the primary signal at the CU.
3
The Dempster-Shafer Theory of Evidence
The D-S theory was first introduced by Dempster in 1960’s, and was later extended by Shafer. In D-S theory, a representation of ignorance is provided by assigning a non-zero mass function to hypotheses m, also called the basic probability assignment, is defined for every hypothesis A, such that the mass value m (A) belongs to the interval [0, 1] and satisfies the following conditions: m (∅) = 0 m: (4) m (A) = 1 A ⊆ Θ where, Θ is the framework of discernment, a fixed set of q mutually exclusive and exhaustive elements. Assigning a non-zero mass to a compound hypothesis A∪B means that we have an option not to make the decision betweenA and B but leave it in the A ∩ B class. In D-S theory, two functions named belief (Bel) and plausibility (P ls) are defined to characterize the uncertainty and the support of certain hypotheses. Bel measures the minimum or necessary support whereas P ls reflects the maximum or potential support for that hypothesis. These two
692
N.-T. Nhan, X.T. Kieu, and I. Koo
measures, derived from mass values, are respectively defined as a map from set of hypotheses to interval[0, 1]: Bel (A) = m (B) (5) B⊆A
P ls (A) =
m (B)
(6)
B∩A =∅
Mass function from different information sources,mj (j = 1, ..., d) are combined with Dempsters rule, also called orthogonal sum. The result is a new mass function,m (Ak ) = (m1 ⊕ m2 ⊕ ... ⊕ md ) (Ak ) which incorporates the joint information provided by the sources as follows: ⎛ ⎞ −1 ⎝ m (Ak ) = (1 − K) × (mj (Aj ))⎠ (7) A1 ∩A2 ...Ad =Ak
K=
A1 ∩A2 ...Ad =∅
⎛ ⎝
1jd
⎞ (mj (Aj ))⎠
(8)
1jd
where K is often interpreted as a measure of conflict between the different sources and is introduced as a normalization factor. The larger K is the more the sources are conflicting and the less sense has their combination. The factor K indicates the amount of evidential conflict. If K = 0, this shows complete compatibility, and if 0 < K < 1, it shows partial compatibility. The orthogonal sum does not exist when K = 1. In this case the sources are totally contradictory, and it is no longer possible to combine them. In the cases of sources highly conflicting, the normalization used in the Dempster combination rule can be mistaking, since it artificially increases the masses of the compromise hypotheses. Such case is actually possible in our circumstance where there are a lot of difference environment conditions between sources. It is necessary to make some enhancement for combination rule as suggestion in [7],[13].
4
Cooperative Spectrum Sensing Ultilizing Enhanced D-S Theory of Evidence at AP
For LU detection, we consider an enhanced cooperative spectrum sensing scheme like figure 3. After local spectrum sensing process, each CU calculates and estimates some necessary information and transmits the data to data fusion center where the final decision is made. 4.1
Basic Probability Assignment Estimation
In order to apply D-S theory of evidence to make the final decision, the frame of discernment A is defined as {H1 , H0 , Ω}, where Ω denotes either hypotheses is
Cooperative Spectrum Sensing Using Enhanced Dempster-Shafer Theory
693
LU
CU1
CU2
CUn
. . .
Local Sensing
m1 ( H 0 )
m2 ( H 0 )
mn ( H 0 )
γ1
γ2
γn
m1 ( H1 )
m2 ( H1 )
mn ( H1 )
Reliability Sources Evaluation
wi Adjust BPA mi′ ( H 0 )
Data Fusion Center
mi′ ( H1 )
D-S rule of Combination m (H0 )
N
m ( H1 )
m ( H 0 ) < m ( H1 )
Y u0 = H 1
u0 = H 0
Fig. 3. Enhanced cooperative spectrum sensing scheme
true. After sensing time, each CU will estimate its self-assessed decision credibility which is equivalent to basic probability assignment (BPA) for two hypotheses. We propose a more appropriate BPA function as a form of the cumulative density function instead of probability density function [5] as follows: +∞
mi (H0 ) = xEi
xEi mi (H1 ) = −∞
−
1 √ e 2πσ0i
1 √ e 2πσ1i
−
(x−μ0i )2
σ2 0i
(x−μ1i )2 σ2 1i
dx
(9)
dx
(10)
where mi (H0 ), mi (H1 ) is respectively the BPA of hypotheses H0 and H1 of the i-th CU, respectively. Using these function, the BPA of hypotheses H0 and H1 are unique for each test statistics value xEi and vary in such a way that the larger xEi is the larger mi (H1 ) and the smaller mi (H0 ) are and vice versa.
694
N.-T. Nhan, X.T. Kieu, and I. Koo
4.2
Reliability Source Evaluation
Instead of combining all the CUs’ self-assessed BPA which means treating all nodes equally, the BPA of each CU should be adjusted by relative relationship of reliability between nodes for more accuracy. The lower weight is allocated the less reliability to the CUs sensing data is, and vice verses. Subsequently, we propose an enhancing stage to the fusion center called Reliability Source Evaluation. The reliability of CUs sensing information should be evaluated under considering of relative relationship with other CUs on the system. From this aspect, we consider again all of available information to obtain an adjustment weight with which we can quantify the measurement relationship between nodes. From the fact that the BPA of both hypotheses of each CU depends on the value of test statistic, its mean, and variance and that the more distance between two means value of both hypotheses is, the more reliability the source should be assigned, we utilize the distance between two means value of both hypotheses, Di Di = μ1i − μ0i (11) Di is subsequently calculated by substituting (3) into (6) Di = N (γi + 1) − N = N γi
(12)
where γi is SNR of primary signal at the i-th CU. Finally, the weight wi of the i-th CU is obtained by normalizing the distance: wi =
Di N γi γi = = max (Di ) max (N γi ) max (γi ) i
4.3
i
(13)
i
Data Fusion
The BPA of the i-th CU for both hypotheses mi (H0 ) and mi (H1 ) are adjusted with corresponding weight wi as follows: ⎧ ⎪ ⎨ mi (H0 ) = wi mi (H0 ) mi (H1 ) = wi mi (H1 ) (14) ⎪ ⎩ mi (Ω) = 1 − mi (H1 ) − mi (H0 ) As mentioned in section 3, the combination of adjusted BPA can be obtained by D-S theory as follows: m (H0 ) =
m1
⊕
m2
⊕ ...mn (H0 ) =
A1 ∩A2 ∩...An =H0 i=1
mi (Ai ) (15)
1−K
m (H1 ) = m1 ⊕ m2 ⊕ ...mn (H1 ) =
n
n
A1 ∩A2 ∩...An =H1 i=1
1−K
mi (Ai ) (16)
Cooperative Spectrum Sensing Using Enhanced Dempster-Shafer Theory
where K=
n
695
mi (Ai ).
A1 ∩A2 ∩...An =∅ i=1
In conclusion, the final decision is made upon a following simple strategy H1 m (H0 )
< >
m (H1 ).
(17)
H0
5
Simulation Results and Analysis
For our simulation, we assume the LU signal is DTV signal as in [4], and the probability of presence and absence LU signal are both 0.5. The bandwidth of LU signal is 6 MHz, and AWGN channel is considered. Five sensing nodes are spread in the network to perform local sensing. The local sensing time is 50 μs. Our scheme of data fusion has been tested with many case of CUs’ SNR. In figure 4, our algorithm have been experienced under condition that all the forth first CUs have same AWGN channel with SNR = -12dB, and the fifth CU’s channel condition is changed from -22dB to -6dB, which is reasonable for spectrum sensing problem in CR context. Under such condition, probability of 1
0.9
0.8
Pd Or rule
0.7
Pd And rule Pd D-S theory fusion (Reference [5])
Pd & Pf
0.6
Pd Optimal fusion rule (Reference [9]) Pd Enhance D-S rule (Our Scheme)
0.5
Pf Or rule Pf And rule Pf D-S theory fusion (Reference [5])
0.4
Pf Optimal fusion rule (Reference [9]) Pf Enhance D-S rule (Our Scheme)
0.3
0.2
0.1
0 -22
-20
-18
-16
-14 SNR[dB] of CU5
-12
-10
-8
-6
Fig. 4. Probability of detection and probability of false alarm comparison between proposed scheme and other combination rule under condition that SNR of CU1 - CU4 are -12dB, SNR of CU5 is changed from -22dB to -6dB
696
N.-T. Nhan, X.T. Kieu, and I. Koo 1 0.9
Pd & Pf
0.8 0.7
Pd Or rule
0.6
Pd D-S theory fusion (Reference [5])
Pd And rule Pd Optimal fusion rule (Reference [9]) Pd Enhance D-S rule (Our Scheme)
0.5
Pf Or rule Pf And rule
0.4
Pf D-S theory fusion (Reference [5]) Pf Optimal fusion rule (Reference [9])
0.3
Pf Enhance D-S rule (Our Scheme) 0.2 0.1 0 -22
-20
-18
-16
-14 SNR[dB] of CU5
-12
-10
-8
-6
Fig. 5. Probability of detection and probability of false alarm comparison between proposed scheme and other combination rule under condition that SNR of CU1 - CU4 are -21dB, -17dB, -13dB, -9dB, respectively and SNR of CU5 is changed from -22dB to -6dB
detection Pd of “And” rule is always largest but its probability of false alarm Pf also is similar, and vice versal for “Or” rule, i.e. both Pd and Pf are smallest, which means that both rules indicate bad performance. For our proposed scheme, both Pd and Pf always gives a remarkable improvement, compared to “D-S theory fusion” in [5] and even with “Optimal fusion rule” in [9]. In order to evaluate our proposed scheme with a more practical situation, where distributed CUs endure difference channel condition, we consider the condition that the received SNR of forth first CUs (CU1 - CU4 ) are -21dB, -17dB, -13dB, -9dB, respectively and SNR of CU5 is changed from -22dB to -6dB. Under such condition, the figure 5 also shows that our proposed scheme is outperform the other rules. Numerous other situations have also been tested and our scheme have given similar best result.
6
Conclusions
In this paper, we have proposed an enhanced cooperative spectrum sensing scheme based on the combination between the D-S theory, using a more appropriate BPA function and a novel reliability source evaluation by utilizing SNR of LU signal. Simulation results have shown that this proposed scheme can achieve a better gain of combination for cooperative spectrum sensing than other rules like “And”, “Or”, and even better than “Optimal fusion rule”.
Cooperative Spectrum Sensing Using Enhanced Dempster-Shafer Theory
697
Acknowledgement This work was supported in part by the Ministry of Commerce, Industry, and Energy and in part by Ulsan Metropolitan City through the Network-based Automation Research Center at the University of Ulsan.
References 1. Cabric, D., Mishra, S.M., Brodersen, R.W.: Implementation Issues in Spectrum Sensing for Cognitive Radios. In: Conf. Record of the 38th Asilomar Conf. on Signals, Systems and Computers, vol. 1, pp. 772–776 (2004) 2. Akyildiz, I.F., Lee, W., Vuran, M.C., Mohanty, S.: NeXt Generation/Dynamic Spectrum Access/Cognitive Radio Wireless Networks: a survey. Computer Network 50, 2127–2159 (2006) 3. Urkowitz, H.: Energy Detection of Unknown Deterministic Signals. Proceedings of the IEEE 55, 523–531 (1967) 4. Shellhammer, S.J., Shankar, S., Tandra, R., Tomcik, J.: Performance of Power Detector Sensors of DTV Signals in IEEE 802.22 WRANs. In: Proc. of the 1st int. Workshop on Technology and Policy For Accessing Spectrum, vol. 222. ACM Press, New York (2006) 5. Peng, Q., Zeng, K., Wang, J., Li, S.: A Distributed Spectrum Sensing Scheme Based on Credibility and Evidence Theory in Cognitive Radio Context. In: IEEE 17th Int. Symposium on Personal, Indoor and Mobile Radio Communications, pp. 1–5 (2006) 6. Wentao, Z., Tao, F., Yan, J.: Data Fusion Using Improved Dempster-Shafer Evidence Theory for Vehicle Detection. In: 4th Int. Conf. on Fuzzy Systems and Knowledge Discovery, vol. 1, pp. 487–491 (2007) 7. Smets, P.: The Combination of Evidence in the Transferable Belief Model. Pattern Analysis and Machine Intelligence, IEEE Transactions, 12, 447–458 (1990) 8. Wei, Z., Mallik, R.K., Letaief, K.B.: Cooperative Spectrum Sensing Optimization in Cognitive Radio Networks. In: IEEE Int. Conf. on Communications, pp. 3411–3415 (2008) 9. Chair, Z., Varshney, P.K.: Optimal Data Fusion in Multiple Sensor Detection Systems. IEEE Trans. Aerospace and Electronic Systems AES-22, 98–101 (1986) 10. Chen, L., Wang, J., Li, S.: An Adaptive Cooperative Spectrum Sensing Scheme Based on the Optimal Data Fusion Rule. In: 4th Int. Symposium on Wireless Communication Systems, pp. 582–586 (2007) 11. Ansari, N., Chen, J.G., Zhang, Y.Z.: Adaptive Decision Fusion for Unequiprobable Sources. IEEE Proceedings on Radar, Sonar and Navigation 144, 105–111 (1997) 12. Mansouri, N., Fathi, M.: Simple counting rule for optimal data fusion. In: Proceedings of 2003 IEEE Conference on Control Applications, vol. 2, pp. 1186–1191 (2003) 13. Wang, P.: The Reliable Combination Rule of Evidence in Dempster-shafer Theory. In: Congress on Image and Signal Processing, CISP 2008, vol. 2, pp. 166–170 (2008)
A Secure Distributed Spectrum Sensing Scheme in Cognitive Radio Nguyen-Thanh Nhan and Insoo Koo School of Electrical Engineering, University of Ulsan 680-749 San 29, Muger 2-dong, Ulsan, Republic of Korea
[email protected] http://mcsl.ulsan.ac.kr
Abstract. Distributed spectrum sensing provides an improvement for primary user detection but leads a new security threat into CR system. The spectrum sensing data falsification malicious users can decrease the cooperative sensing performance. In this paper, we propose a distributed scheme in which the presence and absence hypotheses distribution of primary signal is estimated based on past sensing received power data by robust statistics, and the data fusion are performed according to estimated parameters by Dempster-Shafer theory of evidence. Our scheme can achive a powerful capability of malicious user elimination due to the abnormality of the distribution of malicious users compared with that of other legitimate users. In addition, the performance of our data fusion scheme is enhanced by supplemented nodes’ reliability weight. Keywords: Cognitive radio, Spectrum sensing, Distributed, Malicious user, Robust statistics.
1
Introduction
Cognitive Radio (CR) has been proposed as a promising technology to improve spectrum utilization. CUs are allowed to use the licensed bands opportunistically when such bands are not occupied, and must abandon its contemporary band to seek for the new idle spectrum again when the frequency band is suddenly accessed by the licensed user (LU). Therefore the key role of CR is played by spectrum sensing. Generally, spectrum sensing can be achieved by a single CU. The detection techniques often used in local sensing are energy detection, matched filter and cylcostationary feature detection. In [1], it is showed that the receiver signal strength could be seriously weakened at a particular geographical location due to multi-path fading and shadow effect. On these circumstances, single sensing node is difficult to distinguish between an idle band and a deep fade one. In order to overcome this problem, distributed spectrum sensing has been considered [4], [5], [7]. In [7], a data fusion scheme which can give a significant improvement in detection probability as well as reduction false alarm rate based on Dempster-Shafer theory of evidence (D-S theory) was described. Numerous other fusion schemes D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 698–707, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Secure Distributed Spectrum Sensing Scheme in Cognitive Radio
699
also have been proposed for such distributed model such as “And rule”, “Or rule”, “Optimal fusion rule”, etc. However, few works have considered a novel security threat of distributed spectrum sensing which is called spectrum sensing data falsification [3] in which, an attacker or malicious user can send false local spectrum sensing results to a data fusion center and can cause data fusion center to make wrong spectrum sensing decision. This kind of security attack was firstly mentioned in [2] and further considered in [3], [4]. In [4] the spectrum sensing data falsification problem was solved by Weighted Sequential Probability Ratio Test which gives a good performance. However this method requires the knowledge of physical location of sensing terminals and position of LU for obtaining some required prior probabilities. This is inappropriate for applying to mobility CR system, and to such systems in which the information of primary user are completely not known. In this paper, we propose a robust secure distributed spectrum sensing scheme that uses robust statistics to approximate the distributions for both hypotheses of all nodes, discriminatingly, based on their past data reports. Achieved parameters are used for testing of malicious users and calculating necessary information for data fusion by means of D-S theory. Our algorithm, taking the advantage of an appropriate method of data fusion and the benefit of robust statistics for outlier testing based on two estimated distribution separately, can operate without any requirement for knowledge of primary systems, even in a very bad circumstance where numerous of malicious users occur. Furthermore, a reliability evaluation stage from past performance of each node by a counting rule [11] is added to our scheme for improving cooperative gain of data fusion and capability of malicious user detection. The rest of this paper is organized as follows. Section 2 describes the system model briefly. In section 3, we develop our proposal scheme based on evidence theory and robust statistics. In Section 4, the simulation results are shown and analyzed. Finally, we conclude this paper in the last section.
2
System Description
We consider a distributed spectrum sensing scheme in which CR users (CUs) endure different channel condition such as shadowing and multi-path fading as figure 1. Each CU conducts energy detection and transmits the received signal powers to Data Fusion Center (DFC). At DFC, the global decision will be made based on the combination of local data. Further, we assume that our distributed CR network is in an adversarial environment where presents of malicious users which send incorrect spectrum sensing data. Local spectrum sensing of each CU is essentially a binary hypotheses testing problem: H0 : x (t) = n (t) (1) H1 : x (t) = h (t) s (t) + n (t) , where H0 and H1 are respectively correspondent to hypotheses of absence and presence of LU’s signal, x (t) represents received data at CU, h (t) denotes the
700
N.-T. Nhan and I. Koo
Shadowing
CU Multi-path Fading
CU LU DFC CU
CU Malicious CU
CU CU: Cognitive Radio User LU: Licensed User DFC: Data Fusion Center
Fig. 1. Distributed spectrum sensing scheme
amplitude gain of the channel, s (t) is the signal transmitted from the primary user and n (t) is the additive white Gaussian noise. Additionally, channels corresponding to different CUs are assumed to be independent, and further, all CUs and LUs share common spectrum allocation. In our system model, CU sensing technique is based on energy detection spectrum sensing. Each CU’s test statistic is equivalent to an estimation of received signal power as following function: xE =
N
2
|xj | ,
(2)
j=1
where xj is the j-th sample of received signal and N = 2T W where T and W are correspondent to detection time and signal bandwidth, respectively. When N is relatively large (e.g. N > 200 ), xE can be well approximated as a Gaussian random variable under both hypotheses H1 and H0 [10].
3
Secure Distributed Spectrum Sensing Scheme
After sensing, in our system, each CU sends its own received power data to a DFC where the global sensing decision is made. For purpose of improving security and cooperative sensing gain, we consider a robust secure distributed spectrum sensing scheme like figure 2. 3.1
Data Fusion at DFC
Basic Probability Assignment Estimation. In order to apply D-S theory of evidence to make the final decision, the frame of discernment A is defined as {H1 , H0 , Ω}, where Ω denotes either hypotheses is true. At each sensing time k-th, each CU will send its received power signal to fusion center where each
A Secure Distributed Spectrum Sensing Scheme in Cognitive Radio
At sensing time k-th
CU1
. . . CUn
xEn
Local Sensing
Parameters Update
xE2
Data Fusion Rule
LU
xE1 Malicious user elimination
CU2
701
Data Fusion Center
Fig. 2. Secure distributed spectrum sensing scheme
node’s decision credibility corresponding to basic probability assignment (BPA) of two hypotheses are estimated. We propose a more appropriate BPA function as a form of the cumulative density function instead of probability density function [7] as follows: +∞ ˆ 0i )2 (x−μ − 1 σ ˆ2 0i √ mi (H0 ) = e dx, (3) 2πˆ σ0i xEi
xEi mi (H1 ) = −∞
1 √ e 2πˆ σ1i
−
ˆ 1i )2 (x−μ σ ˆ2 1i
dx,
(4)
where mi (H0 ), mi (H1 ) are the BPA of hypotheses H0 and H1 of the i-th CU, respectively, and (ˆ μ0i , σ ˆ0i ) and (ˆ μ1i , σ ˆ1i ) are the estimated means and estimated standard deviations, which will be described in subsection 3.2. Using eqn. (3) and (4), the BPA of hypotheses H0 and H1 are unique for each test statistics value xEi and vary in the way that the larger xEi is the larger mi (H1 ) and the smaller mi (H0 ) are and vice versa. BPA Adjustment. Instead of combining all the CUs’ assigned BPA which means treating all nodes equally, the BPA of each CU should be adjusted by relative relationship of reliability among nodes reli which will be describe in subsection 3.2. The weight wi of the i-th CU is obtained by normalizing the node’s reliability reli (n): wi (n) =
reli (n) . max (reli (n)) i
(5)
702
N.-T. Nhan and I. Koo
The BPA of the i-th CU for both hypotheses mi (H0 ) and mi (H1 ) are adjusted with corresponding weight wi as follows: ⎧ ⎪ ⎨ mi (H0 ) = wi mi (H0 ) mi (H1 ) = wi mi (H1 ) (6) ⎪ ⎩ mi (Ω) = 1 − mi (H1 ) − mi (H0 ) . From eqn. (6), the effect of less reliable (lower performance) node is decreased and vice versa while the local spectrum sensing decision is being preserved. D-S Theory Combination. According to D-S theory of evidence, the combination of adjusted BPA can be obtained by: m (H0 ) =
m1
⊕
m2
⊕ ...mn (H0 ) =
A1 ∩A2 ∩...An =H0 i=1
m (H1 ) = m1 ⊕ m2 ⊕ ...mn (H1 ) = where K =
n
A1 ∩A2 ∩...An =∅ i=1
mi (Ai ) ,
1−K
n
n
A1 ∩A2 ∩...An =H1 i=1
1−K
mi (Ai ) ,
(7)
(8)
mi (Ai ) and m (H0 ) and m (H1 ) are the final
credibility of CR system for each hypothesis in form of BPA. From these results, a simple decision strategy is chosen; the final decision is made upon as follows: H0 : m (H0 ) > m (H1 ). (9) H1 : m (H0 ) < m (H1 ). 3.2
Parameters Update
Without any other knowledge about primary system, our algorithm can utilize the advantage combination of D-S theory fusion rule and the benefit of outlier resistance of robust statistics. Therefore, the parameters updating stage will play a very important role for our scheme performance. Reliability Source Evaluation. In distributed spectrum sensing, the global decision is usually more reliable than local decisions. Therefore, we can use it as a supervisor to estimate the nodes’ reliability in the form of weight factor. That is, if the local decision is the same as the global decision, we think such local decision is correct; on the other hand, if the local decision contradicts the global decision, we think such local decision is incorrect. By counting the local decisions and global decisions, we can estimate the node’s reliability exactly. For the ith sensing node at the kth time, Si (k) will denote for current state of decision, where occurs one over forth state as follows:
A Secure Distributed Spectrum Sensing Scheme in Cognitive Radio
S11 : S00 : S10 : S01 :
global global global global
decision decision decision decision
is is is is
H1 H0 H1 H0
and and and and
local local local local
decision decision decision decision
is is is is
703
H1 , H0 , H0 , H1 .
Then we get the cumulative state Ji (n) of n dectection time slot Ji (n) =
n
Si (k) = n11i (n)S11 + n00i (n)S00 + n10i (n)S10 + n01i (n)S01 , (10)
k=1
where n11i (n), n00i (n), n10i (n) and n01i (n) represent the times that S 11 , S 00 , S 10 and S 01 have occurred over n time slot, respectively. Ji (n) can be rewritten as iterative format Ji (n) = Ji (n − 1) + Si (n) . (11) For adapting to complicated RF environment, we use a window of fixed length over observed data. n Ji (n) = Si (k). (12) k=n−L+1
where L is the length of observation window. Intuitively, a good performance node which has both a low false alarm rate and a high detection rate simultaneously should have more reliability. Consequently, the reliability of each node is defined as follow reli = P˜d .(1 − P˜f ), (13) where P˜d and P˜f are detection rate and false alarm rate, respectively. From eqn.(13) and based on Ji (n), we can estimate the reliability of node as reli (n) =
n11i (n) n00i (n) . . n11i (n) + n10i (n) n00i (n) + n01i (n)
(14)
With eqn.(14), the reliability of CUs can be achieved by a simple counting rule of their past performance. Estimation Hypothesis Distribution Parameters. As mentioned in section 2, when time bandwidth product N is large enough, both hypotheses H1 and H0 can be approximated as a Gaussian distribution. Theoretically, according to reference [10] the mean and variance of these distributions can be estimated as follows
μ0 = N σ02 = 2N (15) μ1 = N (γ + 1) σ12 = 2N (2γ + 1), where γ is the signal to noise ratio (SNR) of the primary signal at the CU. However, in an adversarial environment, it is more secure to estimate the hypothesis distribution parameters based on the past sensing energy data. For our scheme, the mean and variance of hypotheses distribution H1 and H0 are estimated from available received power data by robust statistics.
704
N.-T. Nhan and I. Koo
In distributed spectrum sensing, the global decision is usually more reliable than local decisions. Therefore, after completing one fusion interval, the received signal power report of each node will be enforced to one of two data sets {xEi |H0 } or {xEi |H1 } corresponding to hypothesis of H0 or hypothesis of H1 of the global decision. After this step, the parameters of each hypothesis distribution (mean and variance) will be estimated by Hubber’s method [8], [9] as follows: – Step 1: sets initial value of mean and variance of observation data set Y μ(0) = M ED (Y ) (16) σ (0) = 1.4826M AD (Y ), where M ED (Y ) is the sample median of Y and M AD (Y ) is the sample median absolute deviation of Y. – Step 2: tests and adjusts outlier data. Instead of removing the data as in [5], robust statistics is enable to use dubious data as follow. ⎧ (0) ⎨ μ − 1.5σ (0) if Yi < μ(0) − 1.5σ (0) Yi = μ(0) + 1.5σ (0) if Yi > μ(0) + 1.5σ (0) . (17) ⎩ Yi elsewhere where value 1.5 for the multiplier is chosen for our winsorisation process. (1) – Step 3: calculates an improved estimate of mean as μ = mean(Yi ) , and (1) of the standard deviation as σ = 1.134stdev Yi , where mean and stdev is the function of taking mean and standard deviation, and the factor 1.134 is derived from the normal distribution. – Step 4: repeats step 2, and 3. Eventually, after j iteration times, the process converges to an acceptable degree of accuracy when no more value data have to be adjusted in step 2. The resulting values μ(j) and σ (j) are the robust estimated mean and standard deviation, respectively. As well as previous part, in order to well adapt to RF environment, we also consider a fixed length window L of observation data for estimating distribution parameters. Consequently, the parameters (ˆ μ0i , σ ˆ0i ) and (ˆ μ1i , σ ˆ1i ) are estimated from {xEi (Q − L + 1) , ..., xEi (Q) |H1 } and {xEi (S − L + 1) , ..., xEi (S) |H0 }, respectively, where Q, S are corresponding to the length of data set {xEi |H0 } and {xEi |H1 }. If the data set belong to a malicious node its distribution will be differ from other normal node due to it abnormal data distribution. 3.3
Malicious User Detection
We consider two kinds of malicious nodes: “Always-yes” node and “Always-no” node. An “Always-no” user will always report the absence of primary signal whereas “Always-yes” node will always inform the presence of LU. “Always-yes” users increase the probability of false alarm Pf while “Always-no” users decrease the probability of detection Pd .
A Secure Distributed Spectrum Sensing Scheme in Cognitive Radio
705
As already described in subsection 3.2, a malicious user will have abnormal estimated parameters. Based on this feature we can easily detect the consistent malicious users by the following test condition: |ˆ μ1i − μ ˆ 0i | < ε1 where N is mentioned in section 2, ε1 , is the detection thresholds which is predefined based on N so that the malicious can be removed exactly. This test is used for detecting “consistent malicious” nodes which generate false sensing data from one hypothesis. An “always-yes” or “always-no” node will have very small difference between two hypotheses means and deviations since its data set {xEi |H0 } and {xEi |H1 } are derived from one hypothesis distribution or even from a constant value. If a node has the distance between two mean values of two hypotheses smaller than a minimum tolerable value, it will be considered as a consistent malicious user. For our scheme, the threshold ε1 is defined as the theoretical lowest distance of two mean values and can be derived from eqn. (15) as ε1 = (μ1 − μ0 )min = N γmin ,
(18)
where γmin is the minimum value of SNR for operating of energy detection.
4
Simulation Results and Analysis
For our simulation, we assume the LU signal is DTV signal as in [6]. The bandwidth of LU signal is 6 MHz, and AWGN channel is considered. The local sensing time is 50 μs and 10 sensing nodes are spread in the network to perform local sensing. The fixed length window size for both reliability source evaluation stage and hypothesis distribution parameters estimation stage is chosen to be 100 and the presence and absence probability of LU signal are both 0.5. In fig. 3, we simultaneously observe the false alarm and detection probability according to the number of “Always yes” malicious user. Each “Always yes” malicious user randomly generates a large value derived from an only high SNR (0dB) distribution of hypothesis H1 . From the figure, we can see that both detection probability Pd and false alarm probability Pf of “Or rule” is approximately one which means that the “Or rule” is strongly affected by “Always yes” malicious user. For “And rule”, the performance is better than that of “Or rule” but is not the best. With “D-S theory fusion rule” in which BPAs are obtained from both theoretical hypotheses distribution, the detection probability Pd always approximate one due to the effect of “Always yes” nodes. However, the false alarm probability Pf is increased very large when the number of “Always yes” node is increased. For our scheme, we use the detection probability Pd and the false alarm probability Pf of “D-S theory fusion rule” without malicious nodes as the comparison boundary of data fusion. The result indicates that our scheme can achieve approximately to the boundary of combination until the number of malicious user is up to eight nodes. Similar results, which is not presented in this paper, is obtained for case of “Always yes” user that generate large constant value.
706
N.-T. Nhan and I. Koo 1
0.9
Pf Proposed scheme Pf D-S theory - no malicious user
0.8
Pd Proposed scheme
Pd
Pd D-S theory - no malicious user 0.7
Pd Or rule Pf Or rule Pd And rule
Probability
0.6
Pf And rule Pd D-S theory
0.5
Pf
Pf D-S theory 0.4
0.3
0.2
0.1
0
1
2
3
4
5 6 Number of malicious users
7
8
9
Fig. 3. Probabilitiy vs. number of “Always yes” user with various fusion rules
1
0.9
0.8
Pd
0.7
Probability
0.6 Pf Proposed scheme 0.5
Pf D-S theory - no malicious user Pd Proposed scheme
0.4
Pd D-S theory - no malicious user Pd Or rule
0.3
Pf Or rule Pd And rule
Pf
Pf And rule
0.2
Pd D-S theory Pf D-S theory
0.1
0
1
2
3
4
5 6 Number of malicious users
7
8
9
Fig. 4. Probabilitiy vs. number of “Always no” user with various fusion rules
In fig. 4, the effect of “Always no” malicious user is also considered. Similarly to previous cases, other data fusions that have not malicious node identification capability shows a degraded performance while our scheme with an effective malicious user detection can achieve the same performance of data fusion boundary of “D-S theory fusion rule” without malicious node.
5
Conclusions
In this paper, a secure distibuted spectrum sensing scheme has been proposed and analyzed. The scheme, which is only based on the past sensing nodes received
A Secure Distributed Spectrum Sensing Scheme in Cognitive Radio
707
power data without any other knowledge of primary system, can utilize both the advantage of D-S theory fusion rule combined with an enhanced weighting stage and the powerful capability of robust statistics used for malicious user elimination. Numerical results indicate that our scheme can achive both a high gain datafusion and a powerful malicious users elimination.
Acknowledgement “This work was supported by the Korea Research Foundation Grant funded by the Korean Government(MOEHRD)” (KRF-2009-0063958).
References 1. Cabric, D., Mishra, S.M., Brodersen, R.W.: Implementation Issues in Spectrum Sensing for Cognitive radios. In: Conf. Record of the 38th Asilomar Conf. on Signals, Systems and Computers, vol. 1, pp. 772–776 (2004) 2. Mishra, S.M., Sahai, A., Brodersen, R.W.: Cooperative Sensing among Cognitive Radios. In: IEEE International Conference on Communications, ICC 2006, pp. 1658–1663 (2006) 3. Ruiliang, C., Jung-Min, P., Hou, Y.T., Reed, J.H.: Toward Secure Distributed Spectrum Sensing in Cognitive Radio Networks. IEEE Communications Magazine 46, 50–55 (2008) 4. Ruiliang, C., Jung-Min, P., Kaigui, B.: Robust Distributed Spectrum Sensing in Cognitive Radio Networks. In: IEEE The 27th Conference on Computer Communications, INFOCOM 2008, pp. 1876–1884 (2008) 5. Kaligineedi, P., Khabbazian, M., Bhargava, V.K.: Secure Cooperative Sensing Techniques for Cognitive Radio Systems. In: IEEE International Conference on Communications, ICC 2008, pp. 3406–3410 (2008) 6. Stephen, J.S., Sai Shankar, N., Rahul, T., James, T.: Performance of Power Detector Sensors of DTV Signals in IEEE 802.22 WRANs. In: Proceedings of the first international workshop on Technology and policy for accessing spectrum. ACM, Boston (2006) 7. Peng, Q., Zeng, K., Wang, J., Li, S.: A Distributed Spectrum Sensing Scheme Based on Credibility and Evidence Theory in Cognitive Radio Context. In: IEEE 17th International Symposium on in Personal, Indoor and Mobile Radio Communications, pp. 1–5 (2006) 8. Rousseeuw, P.J.: Robust Regression and Outlier Detection. John Wiley & Sons, Inc., Chichester (1987) 9. Huber, P.J.: Robust Statistics. John Wiley & Sons, Inc., Chichester (1981) 10. Urkowitz, H.: Energy detection of unknown deterministic signals. Proceedings of the IEEE 55, 523–531 (1967) 11. Mansouri, N., Fathi, M.: Simple counting rule for optimal data fusion. In: Proceedings of 2003 IEEE Conference on in Control Applications, CCA 2003, vol. 2, pp. 1186–1191 (2003)
An Optimal Data Fusion Rule in Cluster-Based Cooperative Spectrum Sensing Hiep-Vu Van and Insoo Koo School of Electrical Engineering, University of Ulsan 680-749 San 29, Muger 2-dong, Ulsan, Republic of Korea
[email protected] http://mcsl.ulsan.ac.kr
Abstract. In this paper, we consider a cluster-based cooperative spectrum sensing approach to improve the sensing performance of cognitive radio (CR) network. In the cluster-based cooperative spectrum sensing, CR users with the similar location are grouped into a cluster. In each cluster, the most favorable user namely cluster header, will be chosen to collect data from all CR users and send the cluster decision to common receiver who makes a final decision on the presence of primary user. In the cluster-based cooperative spectrum sensing, data fusion rule in the cluster takes an important role to reduce the rate of reporting error. Subsequently we propose optimal fusion rule for each cluster header with which we can minimize the sum of probability of false alarm and probability of missed detection in each cluster header. Keywords: Cognitive radio, Cooperative spectrum sensing, Optimal data fusion, Improvement sensing performance.
1
Introduction
Nowadays, wireless communication is applied in more and more applications of many fields in our modern life such as military, entertainment, communication and so on. Actually, in wireless network the licensed devices often occupy almost range of frequency, but they use those frequency bands with under 100% capability. Especially, in some cases this utilization is just few percentages [1]. Undoubtedly, frequency band is a limited resource. Thereby, all of frequency bands should be used more effectively by increasing their utilization proportion. Recently, CR technology is used as a useful tool for limited frequency bank [2], [3]. By using this technology, the available frequency from PU can be detected and used by CR users and otherwise CR users should vacate their occupied frequency when the presence of PU is detected. Therefore, sensing the status of PU is a prerequisite of CR technology. The best sensing performance will let every CR user know exactly whether PU is present or not and use free frequency band from PU without any harmful influence.
Corresponding author.
D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 708–717, 2009. c Springer-Verlag Berlin Heidelberg 2009
An Optimal Data Fusion Rule
709
In practice, the performance of individual sensing in CR network can be deteriorated by deep fading and shadowing. This problem can be solved by allowing some CR users to perform cooperative spectrum sensing [4]-[6]. In cooperative spectrum sensing, we rely on the variability of signal strength at various locations of CR users to obtain a better performance of detecting the PU’s signal in a large CR network with sensing information integrated between neighbors compared to individual sensing [7]. Cooperative spectrum sensing process often has 3 steps: sensing, reporting and making a decision. In sensing step, all CR users perform spectrum sensing individually and later in reporting step, all local sensing observations are sent to common receiver. In final step, making a decision, common receiver uses a data fusion rule to fuse all local observations together as a global decision about presence of PU. When some CR users coordinate to perform cooperative spectrum sensing, more accurate detection can be obtained. However, when the local observations are forwarded to a common receiver through fading channels, the sensing performance can be severely degraded. In order to overcome this problem, Sun et al. have proposed a cluster-based cooperative sensing method[8]. In this method, few CR users with similar location are collected into a cluster. In each cluster, a favorable user is selected to be cluster header. Cluster header will receive local sensing information from all CR users to make a cluster decision and later report to common receiver. This approach really improves the sensing performance in comparison with conventional method. However, in the reference paper [8] OR-rule is used in both cluster header and common receiver, which is not optimal. In order to improve the performance of cluster-based cooperative sensing, in the paper we propose optimal data fusion rule for cluster header with which we can find optimal thresholds for both CR users in cluster and cluster header such that we can minimize the sum of probability of false alarm and probability of missed detection in each cluster header. We also consider three data fusion rules at common receiver: haft-voting rule, AND-rule and OR-rule to prove the efficiency of the proposed optimal fusion rule for cluster header.
2
System Model
This paper considers a CR network included K clusters with nj (j = 1, 2 . . . K) CR users for each cluster and a common receiver. In a cluster, the position of all CR users is close together. Therefore, we assume that all CR users in the same cluster have the similar channel communicated with PU (same SNR, γj ) and common receiver (same SNR, ρj ) as shown in F igure 1. The common receiver functions as a base station (BS) which manages the cognitive radio network and all associated cognitive radio. This network focuses to sense the presence of PU by performing cooperative spectrum sensing. In the network, each CR users should sense individually by using one of detection methods such as matched filter detection, energy detection, feature detection, and so on [9], [10]. In those detection methods, if the CR
710
H.-V. Van and I. Koo
Fig. 1. System Model
user has limited information about signals of the PU, then the energy detection is optimal [10]. In the energy detection, the radio frequency energy in the sensing channel is collected in a fixed bandwidth W over an observation time window T to decide whether the channel is utilized or not. We assume that each CR user performs local spectrum sensing using energy detector independently and the sensing channel is time-invariant during the sensing process. In the i − th (i = 1, 2 . . . nj ) CR user, the local spectrum sensing is to decide between two following hypotheses.
H0 : xi (t) = ni (t) H1 : xi (t) = hi s(t) + ni (t)
(1)
where xi (t) is the observation signal at the i − th CR user, s(t) is the signal of PU, ni (t) is the additive white Gaussian noise (AWGN), and hi is the complex channel gain of the sensing channel between the PU and the i − th CR user. In energy detection, the collected energy in the frequency domain is denoted by Ei , which serves as a decision statistic and has the following distribution [11].
H0 : Ei = χ22u H1 : Ei = χ22u (2γi )
(2)
where χ22u denotes a central chi-square distribution with 2u degrees of freedom and χ22u (2γi )denotes a noncentral chi-square distribution with 2u degrees of freedom and a non-centrality parameter 2γi . The instantaneous SNR of the received signal at the i-th CR user is γi and u = T W is the time-bandwidth product. In this paper, we use u = 10.
An Optimal Data Fusion Rule
3
711
Cluster-Based Cooperative Spectrum Sensing
For the aim of enhancing the reliability ratio of sensing performance, we consider a cluster-based cooperative spectrum sensing method which is recognized through following steps [1]. – Step 1: All CR users in each cluster perform local spectrum sensing individually using energy detector and sends their local observations to the cluster header. – Step 2: The cluster header receives those local observations and later makes a cluster decision. – Step 3: The cluster decisions of each cluster are reported to the common receiver by their cluster header. After that a global decision will be made. 3.1
Local Spectrum Sensing with Energy Detection
For using the energy detector, the average probability of false alarm (Pf,i,j ), the average probability of detection (Pd,i,j ) and the average probability of missed detection (Pm,i,j ) of local decision are given respectively, by [11]. Pf,i,j = Prob{Ei > λj |H0 } =
λ
Γ (u, 2j ) Γ (u)
(3)
Pd,i,j = Prob{Ei > λj |H1 } = Qu ( 2γj , λj )
(4)
Pm,i,j = 1 − Pd,i,j
(5)
and where λj and γj denote the energy threshold and the instantaneous SNR of CR users in the j-th cluster respectively, Γ (a, x) is the incomplete gamma function ∞ which is given by Γ (a, x) = x ta−1 e−t dt, Γ (a) is the gamma function, and Qu (a, b) is the generalized Marcum Q-function which is given by: Qu (a, x) = ∞ u − t2 +a2 1 2 Iu−1 (at)dt, Iu−1 (.) is the modified Bessel functions of the first au−1 x t e kind and order u-1. 3.2
Cluster Header Decision
All 1-bit decisions from CR users in a cluster are fused together according to following logic rule. ⎧ nj ⎪ ⎪ Gj,i ≥ thj ⎨ Bj = 1, i=1 where thj ∈ [1, 2 . . . nj ] (6) nj ⎪ ⎪ Gj,i < thj ⎩ Bj = 0, i=1
where thj is a optimal threshold of the j − th cluster header. Gj,i = {0, 1} is the local decision of the i − th user in the j − th cluster.
712
H.-V. Van and I. Koo
It can be easy seen that if thj = 1 or thj = nj the data fusion rule will be OR-rule or AND-rule respectively. From the assumption that all CR users in the j − th cluster has the same SNR, we have the same probability of false alarm (Pf,j ), probability of detection (Pd,j ) and also probability of missed detection Pm,j = 1 − Pd,j in all CR users of the j − th (j = 1, 2 . . . K) cluster. Therefore, the probability of false alarm in the j − th cluster header (Qf,j ) is given by [12] Qf,j = Prob {H1 |H0 } nj (7) = C l P l (1 − P )nj −l l=thj
nj
f,j
f,j
Similarly, the probability of missed detection in the j − th cluster header (Qm,j ) is given by Qm,j = Prob {H0 |H1 } nj (8) =1− C l P l (1 − P )nj −l l=thj
nj
d,j
d,j
In this paper, one of our objectives is to find the optimal energy threshold λj for all CR users in the j − th cluster (j = 1, 2 . . . K) as well as the optimal threshold thj of the j − th cluster header. Here, we define the optimal thresholds as λopt j and thopt that can minimize (Q + Q ) of the j − th cluster header, such that f,j m,j j we have opt [λopt arg min(Qf,j + Qm,j ) (9) j , thj ] = λopt ,thopt j j
λopt and thopt can be found by numerical method. j j 3.3
Global Decision
In common receiver, the global decision will be created by integrating all received cluster decisions according the below logic rule. ⎧ K ⎨ H = H1 : Bj ≥ thg (10) j=1 ⎩ H = H0 : otherwise where Bj = {0, 1} is the decision of j − th cluster and thg is the threshold of common receiver. For the sake to prove the efficiency of optimal rule in cluster, we consider three values of threshold thg as follows. No. thg Fusion Rule 1 1 OR-rule 2 K +1 Half-Voting rule 2 3 K AND-rule
An Optimal Data Fusion Rule
713
Here, we let Qej be the error probability when cluster decision Bj is reported to the common receiver but the decision Bj is obtained. For the case of BPSK and a given ρj , the error probability of the j − th cluster header over Rayleigh fading channels can be given by [8] ∞ Qej = 0 Qej|ρmax,j f (ρmax,j )dρmax,j
n−1 m ρj n = Cn−1 (−1)n−m−1 2(n−m) 1 − n−m+ρ j
(11)
m=0
where Qej is error probability of the channel between the j − th cluster header and common receiver. Commonly, the CR user with the highest SNR will be chosen to be the cluster header. However, as the assumption that all CR users in a cluster have the similar SNR of the channel with common receiver, the cluster header will be chosen randomly within all CR users in cluster. In common receiver, the global decision will be made by integrating all cluster decisions with their channel errors according to respective data fusion rule. Those are assumed as below: – Data fusion rule is OR-rule (thg = 1) By using this rule, the global false alarm probability (Qf ) and global missed detection probability (Qm ) in common receiver are given by following, respectively. Qf,or = 1 −
K
((1 − Qf,j (1 − Qe,j ) + Qf,j Qe,j )
(12)
(Qm,j (1 − Qe,j ) + (1 − Qm,j )Qe,j )
(13)
j=1
and Qm,or =
K
j=1
– Data fusion rule is Half-Voting rule (thg = K 2 + 1) Actually, when we use half-voting rule to make global decision, we will confront with a difficult problem of calculation the global probabilities of false alarm (Qf,half ) and missed detection (Qm,half ) for the general case of K clusters. Therefore, we consider the case K = 4 and derive (Qf,half ) and (Qm,half ) for K = 4 as follows. Qf,half = Qf e,1 .Qf e,2 .Qf e,3 .(1 − Qf e,4 ) +Qf e,1 .Qf e,2 .(1 − Qf e,3 ).Qf e,4 +Qf e,1 .(1 − Qf e,2 ).Qf e,3 .Qf e,4 +(1 − Qf e,1 ).Qf e,2 .Qf e,3 .Qf e,4 +Qf e,1 .Qf e,2 .Qf e,3 .Qf e,4
(14)
714
H.-V. Van and I. Koo
and
where
Qm,half = Qme,1 .Qme,2 .(1 − Qme,3 ).(1 − Qme,4 ) +Qme,1 .(1 − Qme,2 ).Qme,3 .(1 − Qme,4 ) +Qme,1 .(1 − Qme,2 ).(1 − Qme,3 ).Qme,4 +(1 − Qme,1 ).Qme,2 .Qme,3 .(1 − Qme,4 ) +(1 − Qme,1 ).Qme,2 .(1 − Qme,3 ).Qme,4 +(1 − Qme,1 ).(1 − Qme,2 ).Qme,3 .Qme,4 +Qme,1 .Qme,2 .Qme,3 .(1 − Qme,4 ) +Qme,1 .Qme,2 .(1 − Qme,3 ).Qme,4 +Qme,1 .(1 − Qme,2 ).Qme,3 .Qme,4 +(1 − Qme,1 ).Qme,2 .Qme,3 .Qme,4 +Qme,1 .Qme,2 .Qme,3 .Qme,4
(15)
Qf e,j = (1 − Qf,j )Qej + (1 − Qej )Qf,j Qme,j = (1 − Qm,j )Qej + (1 − Qej )Qm,j (j = 1, 2, 3, 4)
(16)
– Data fusion rule is AND-rule (thg = K) In this case, the global probabilities of false alarm (Qf,or ) and global probabilities of missed detection (Qm,or ) in common receiver are given by following. Qf,and =
K
(Qf,j (1 − Qe,j ) + (1 − Qf,j )Qe,j )
(17)
j=1
and Qm,and = 1 −
K
((1 − Qm,j )(1 − Qe,j ) + Qm,j Qe,j )
(18)
j=1
4
Simulation Results
In this simulations, our concern is the efficiency of sensing performance in a cluster as well as in common receiver. Therefore, we consider the j − th cluster with ten CR users (n = 10) and assume SNR uniformly within the range of 5dB to 10dB at each CR users. In the j − th cluster header, optimal data fusion rule will be used to create cluster decision. For the sake of comparison, we also provide the sensing performance with the AND-rule and OR-rule. Figure 2 shows the sensing performance of the cluster corresponding Optimalrule, OR-rule and AND-rule. In which, we can see that the Optimal-rule can significantly reduce the error probability in cluster Qf,j +Qm,j which is expressed by eqn. (7) and (8), when compared with OR-rule or AND-rule in each value of SNR. When the SNR is about 10dB the reporting error can achieve an acceptable value - less than 0.002 with Optimal-rule, but it achieves the value bigger than 0.015 with both AND-rule and OR-rule.
An Optimal Data Fusion Rule
715
0
10
−1
Qfj+Qmj
10
−2
10
Optimal−rule OR−rule AND−rule
−3
10
5
6
7
8
9
10
SNR
Fig. 2. Reporting error Qf,j +Qm,j in the j −th cluster corresponding different decision fusion rules and different values of SNR
0
−1
10
f
Q +Q
m
10
Optimal−rule OR−rule AND−rule
−2
10
0
2
4
6 8 Average SNR
10
12
14
Fig. 3. Reporting error Qf + Qm in common receiver versus different values of average SNR and different decision fusion rules in Cluster-Case 1
0
Qf+Qm
10
−1
10
Optimal−rule OR−rule AND−rule
−2
10
0
2
4
6 8 Average SNR
10
12
14
Fig. 4. Reporting error Qf + Qm in common receiver versus different values of average SNR and different decision fusion rules in Cluster-Case 2
716
H.-V. Van and I. Koo 0
10
Optimal−rule OR−rule AND−rule
−1
Qf+Qm
10
−2
10
−3
10
0
2
4
6 8 Average SNR
10
12
14
Fig. 5. Reporting error Qf + Qm in common receiver versus different values of average SNR and different decision fusion rules in Cluster-Case 3
In the other hand, for our concern of sensing performance in common receiver, we consider network with four clusters which composed ten CR users for each cluster and the SNR of channels between PU and four clusters are 7.5dB, 8.0dB, 8.5dB and 9.0dB respectively. Moreover, SNR of channels between four clusters and common receiver will be changed by following table. Channel 1 2 3 4 SNR (dB) 0.5*m 0.75*m 1*m 1.25*m where m has value in frame within 1 and 15. In this step, we consider three cases of applied data fusion rules in cluster as well as in common receiver, which can be determined as like below table. Case In Cluster In Common Receiver Optimal-Rule 1 OR-Rule OR-Rule AND-Rule Optimal-Rule 2 OR-Rule AND-Rule AND-Rule Optimal-Rule 3 OR-Rule Half-Voting Rule AND-Rule In all cases of data fusion rule, cooperative spectrum sensing performance, which are shown in F igure 3 − 5, can achieve better performance with optimal-rule in cluster. Specially, from Figure 5 we can get the best performance with optimalrule and half-voting rule in cluster and common receiver respectively.
An Optimal Data Fusion Rule
5
717
Conclusion
In this paper, we consider cluster-based cooperative spectrum sensing as well as data fusion rule to improve sensing performance. In cluster, we find an optimalrule which is proved to be the best rule with the smallest value of reporting error. Moreover, the optimal-rule, which is utilized in cluster, and the half-voting rule, which is utilized in common receiver, are the best combination with the minimum ratio of reporting error.
Acknowledgement This work was supported by the Korea Research Foundation Grant funded by the Korean Government(MOEHRD)” (KRF-2009-0063958).
References 1. Federal Communications Commission: Spectrum Policy Task Force. Rep. ET Docket, 02–135 (2002) 2. Mitola, J., Maguire, G.Q.: Cognitive Radio: Making Software Radios More Personal. IEEE Pers. Commun. 6, 138 (1999) 3. Haykin, S.: Cognitive Radio: Brain-empowered Wireless Communications. IEEE J. Select. Areas Commun. 23, 201–220 (2005) 4. Ganesan, G., Y. Li, G.: Cooperative Spectrum Sensing in Cognitive Radio Networks. In: Proc. IEEE Symp. New Frontiers in Dynamic Spectrum Access Networks (DySPAN5), Baltimore, USA (2005) 5. Ghasemi, A., Sousa, E.S.: Collaborative Spectrum Sensing for Opportunistic Access in Fading Environments. In: Proc. IEEE Symp. New Frontiers in Dynamic Spectrum Access Networks (DySPAN5), Baltimore, USA, vol. 81, pp. 131–136 (2005) 6. Mishra, S.M., Sahai, A., Brodersen, R.: Cooperative Sensing Among Cognitive Radios. In: Proc. IEEE Int. Conf. Commun., Turkey, vol. 4, pp. 1658–1663 (2006) 7. Cabric, D., Mishra, S.M., Brodersen, R.W.: Implementation Issues in Spectrum Sensing for Cognitive Radios. In: Proc. of Asilomar Conf. on Signals, Systems and Computers, Pacific Grove, CA, USA, pp. 772–776 (2004) 8. Sun, C., Zhang, W., Letaief, K.B.: Cluster-based Cooperative Spectrum Sensing for Cognitive Radio Systems. In: Proc. IEEE Int. Conf. Commun., Glasgow, Scotland, UK, pp. 2511–2515 (2007) 9. Hur, Y., Park, J., Woo, W., Lim, K., Lee, C.H., Kim, H.S., Laskar, J.: A Wideband Analog Multi-resolution Spectrum Sensing (MRSS) Technique for Cognitive Radio (CR) Systems. In: Proc. IEEE Int. Symp. Circuit and System, Greece, pp. 4090–4093 (2006) 10. Sahai, A., Hoven, N., Tandra, R.: Some Fundamental Limits on Cognitive Radio. In: Proc. Allerton Conf. on Communications, control, and computing, Monticello (2004) 11. Digham, F.F., Alouini, M.S., Simon, M.K.: In the Energy Detection of Unknown Signals Over Fading Channels. In: Proc. IEEE Int. Conf. Commun., Anchorage, AK, USA, pp. 3575–3579 (2003) 12. Zhang, W., Mallik, R.K., Letaief, K.B.: Cooperative Spectrum Sensing Optimization in Cognitive Radio Networks. In: Proc. IEEE Int. Conf. on Commun., Beijin, pp. 3411–3415 (2008)
Exact Bit Error Probability of Multi-hop Decode-and-Forward Relaying with Selection Combining Bao Quoc Vo-Nguyen and Hyung Yun Kong Department of Electrical Engineering, University of Ulsan, San 29 of MuGeo Dong, Nam-Gu, Ulsan, 680-749 Korea {baovnq,hkong}@mail.ulsan.ac.kr http://wcomm.ulsan.ac.kr
Abstract. In this paper, an exact closed-form bit error rate expression for M -PSK is presented for multi-hop Decode-and-Forward Relaying (MDFR) scheme, in which selection combining technique is employed at each node. We have shown that the proposed protocol offers remarkable diversity advantage over direct transmission as well as the conventional decode-and-forward relaying (CDFR) scheme. Simulations are performed to confirm our theoretical analysis. Keywords: Bit Error Rate (BER), Decode-and-Forward Relaying, Rayleigh fading, Selection Combining, M -PSK, cooperative communication.
1
Introduction
Recently, relaying dual-hop transmission has gained more attention under forms of cooperative communications and it is treated as one of the candidates to overcome the channel impairment like fading, shadowing and path loss [1]. The main idea is that in a multi-user network, two or more users share their information and transmit jointly as a virtual antenna array. This enables them to obtain higher diversity than they could have individually [1-9]. In the past, relatively few contributions concerning evaluating performance of the DF relaying protocol with multi relays and maximal ratio combining (MRC) or selection combining (SC) have been published [2-9]. In particular, in [2], Jeremiah Hu and Norman C. Beaulieu derived a closed-form expression for outage probability of the CDFR networks with SC when the statistics of the channels between the source, relays, and destination are assumed to be independent and identically distributed (i.i.d.) and independent but not identically distributed (i.n.d.). In [4, 5], the performance of CDFR with maximal ratio combining at the destination in terms of outage probability and bit error probability over independent but not identically distributed channels was also examined. In [3, 6-9], a class of multi-hop cooperative scheme employing decode-and-forward relaying with MRC, called multi-hop Decode-and-Forward Relaying (MDFR) scheme, was proposed, and various performance metrics were also provided. D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 718–727, 2009. c Springer-Verlag Berlin Heidelberg 2009
Exact Bit Error Probability of MDFR with Selection Combining
719
However, to the best of the authors’ knowledge, there is no publication concerning the exact expression for bit error rate of the MDFR with selection combining in both i.i.d. and i.n.d. Rayleigh fading channels. In this paper, we focus on selective decode-and-forward relaying where the relay must make an independent decision on whether or not to decode and forward source information [1]. In addition, a concept of cooperative diversity protocols for multi-hop wireless networks, which allows relay nodes to exploit all information they overhear from their previous nodes along the route to the destination to increase the change of cooperation, is applied. To that effect, the receiver at each node can employ a variety of diversity combining techniques to obtain diversity from the multiple signal replicas available from its preceding relaying nodes and the source. Although optimum performance is highly desirable, practical systems often sacrifice some performance in order to reduce their complexity. Instead of using maximal ratio combining which requires exact knowledge of the channel state information, a system may use selection combining which is the simplest combining method. It only selects the best signal out of all replicas for further processing and neglects all the remaining ones. The benefit of using SC as opposed to MRC is reduced hardware complexity at each node in the network. In addition, it also reduces the computational costs and may even lead to a better performance than MRC, because in practice channels with very low SNR can not accurately estimated and contribute much noise. The contributions of this paper are as follows. We derive an exact closedform expression bit error rate for M -PSK of the MDFR scheme. In addition, the comparison between the performance of MDFR and that of CDFR [2] is performed and it confirms that the proposed protocol outperforms CDFR in all range of operating SNRs. The rest of this paper is organized as follows. In Sect. 2, we introduce the model under study and describe the proposed protocol. Section 3 shows the formulas allowing for evaluation of average BER of the system. In Sect. 4, we contrast the simulations and the results yielded by theory. Finally, the paper is closed in Sect. 5.
2
System Model
We consider a wireless relay network consisting of one source, K relays and one destination operating over slow, flat, Rayleigh fading channels as illustrated in Fig. 1. The source terminal (T0 ) communicates with the destination (TK+1 ) via K relay nodes denoted as T1 , · · · , Tk , · · · , TK . Due to Rayleigh fading, the channel powers, denoted by αTi ,Tj = |hTi ,Tj |2 are independent and exponential random variables where hTi ,Tj is the fading coefficient from node Ti to node Tj with i = 0, · · · , K, j = 1, · · · , K + 1 and i < j. We define λTi ,Tj as the expected value of αTi ,Tj . The average transmit powers for the source and the relays are denoted by ρTi with i = 0, · · · , K, respectively. We further define γTi ,Tj = ρTi αTi ,Tj as the instantaneous SNR per bit for the link Ti → Tj .
720
B.Q. Vo-Nguyen and H.Y. Kong
T2 T1
T0
T3
T4
Fig. 1. A MDFR system with 3 relays (K = 3)
For medium access, a time-division channel allocation scheme with K +1 time slots is occupied in order to realize orthogonal channelization, thus no inter-relay interference is considered in the signal model. According to the selective DF relaying protocol [1], the relay decides to cooperate or not with the source in its own time slot, based on the quality of its received signals. Since selection combining technique is used, the relay adaptively chooses the strongest signal (on the basic of instantaneous SNR) among available ones to demodulate and then check whether its received data are right or wrong. If they are right, that relay will cooperate with the source in its transmission time slot, otherwise, it will keep silent. We define a decoding set D(Tk ) for node Tk , k = 1, 2, · · · , K + 1, whose members are its preceding relays which decode successfully. So it is obvious that D(Tk ) is a subset of C = {T1 , T2 , · · · , TK }. In real scenario, the decoding set is determined after receiving one frame by employing cyclic-redundancy-check (CRC). However, in this paper, we assumed that the decoding set can be decided by symbol-by-symbol for mathematical tractability of BER calculation [4]. We further assume that the receivers at the destination and relays have perfect channel state information (CSI) but no transmitter CSI is available at the source and relays.
3
BER Analysis
Similarly as in [2-7], namely applying the theorem of total probability, the bit error rate of the multi-hop decode-and-forward relaying can be derived as a weighted sum of the bit error rate for SC at the destination, BD [D(TK+1 )], corresponding to each set of decoding relay D(TK+1 ). Thus the end-to-end bit error rate for M -PSK of the system Pb can be written as Pb = Pr [D(TK+1 )] BD [D(TK+1 )] (1) D(TK+1 )∈2C
where 2C denotes the power set of C that is the set of all subsets of C.
Exact Bit Error Probability of MDFR with Selection Combining
721
Since selection combining is exploited at each relay and the destination, the signal with largest SNR is always selected from the signals received from its k decoding set as well as from the source. Let us define {γi }ni=1 as the instantaneous SNR per bit of each path received by the node Tk from the set D∗ (Tk ) with their k expected values {¯ γi }ni=1 , respectively, where D∗ (Tk ) = D(Tk ) ∪ {T0 } and nk is the cardinality of the set D∗ (Tk ), i.e., nk = |D∗ (Tk )|. Under the assumption that all links are subject to independent fading, the cumulative distribution function (CDF) of βk = max ρTi αTi ,Tk = max γi ∗ i=1,...,nk
Ti ∈D (Tk )
can be determined by [10] Fβk (γ) = Pr[γ1 < γ, . . . , γi < γ, . . . , γnk < γ] =
nk
1 − e−γ/¯γi
(2)
i=1
Hence, the joint pdf of βk is given by differentiating (2) with respect to γ [11]. ⎡ ⎤ nk nk ∂ ⎢ ⎥ i−1 fβk (γ) = Fβ (γ) = ωi e−ωi γ ⎦ (3) ⎣(−1) ∂γ k m ,...,m =1 i=1 1
i
m1 0 are a lot, the amount of transmittable packet is limited by the number of transmittable packets. And Φ , t also compares which one is less between the number of packet transmittable just by subchannel and the amount of packet loss estimated. For all packets in queue, therefore, it has no need to check whether maximum delay time is exceeded or not, but it needs to check from the head of buffer to a value that it is added 1 with the number of packets transmittable of subchannel n of user k. If there are more than two subchannel information fedback by users, it selects the biggest number of packets transmittable of them since there exists just one amount of packet loss estimated even though user has some subchannel. The limited value of user k, C , t is defined as the following.
A Packet Scheduling Algorithm for IEEE 802.22 WRAN Systems
C
t
,
A
,
t
1,
743
(4)
where A , t denotes the number of transmittable packets of the subchannel which has the best channel state, bsc means the subchannel which has the best channel state of subchannels of user k and is defined as the following. bsc
arg maxA
,
(5)
.
4 Simulation Environments and Results 4.1 Voice Packet Model The voice source generates the pattern of “active duration” and “non-active duration” independent to each other. The average of each duration is 1 and 1.35 seconds. The voice packet is generated only in active duration, it has the generation rate of 16 kbps bit, and it is made of 320 bits unit. If assume that there are 50 frames per 1 second, which means that it’s possible to satisfy QoS of the voice service even if only one voice packet per a frame is transmitted. Voice traffic assumes that the length of ON-OFF is determined by the exponential function. If t is the average of “active duration”, and if assume that active duration ends at time T, the probability γ is defined as the following equation. γ
1
(6)
.
And, if t is the average duration of “non-active duration”, and if it ends at time T, the probability σ is defined as the following. σ
1
(7)
.
4.2 Video Packet Model Real-time video streaming traffic model is described with continuous generating of the frame that has constant duration. And a frame is divided in packet with fixed number. The size of each packet is determined by the Pareto distribution [11]. In a video session, the continuous video frames exist, 8 video packets that have variable length is generated per a frame. The length of each packet and the arrival time interval between packets is determined by the Pareto distribution. The Pareto distribution is defined by Pareto k, m, α where k is the minimum value, m is the maximum value, and α means the shape parameter. The average value and the peak value of the Pareto distribution are as follows. Average
/
R
Peak
R
m.
,
(8) (9)
744
Y.-d. Lee, T.-j. Yun, and I. Koo Table 1. The system parameter
Parameter system Downlink channel bandwidth A number of slot per a frame A number of subchannel to transmit data A number of subcarrier per subchannel Frame duration Slot duration
Value OFDMA / TDD 20 MHz 10 12 128 10 ms 1 ms
4.3 Simulation Result Analysis This simulation is based on the requirements of IEEE 802.22 WRAN, the simulator written by C-language is used. The superframe structure of the WRAN system consists of 16 frames. In order to analyze in the viewpoint of resource allocation, the control frame part is excepted. In each superframe, a QP frame exists. In order to support the voice and video user, we consider two service classes. The video users are fixed with 20, we observe while increasing by 10 of voice users from 50 to 100. We assume that all users feedback the best channel 3 in 12 channel states to base station. The required packet loss rate of voice and video, PLR , are given as 10 and 10 respectively[2] [3]. The maximum delay time for the voice and video , D , are 20 ms and 40ms. The other system parameter values are the same as Table 1. In order to analysis the performance of the proposed scheduling algorithm, we compare the existing M-LWDF with the proposed scheduling algorithm, and the existing PLFS with that. In the viewpoint of the packet loss rate and the whole throughput, the results are compared and analyzed. Fig. 3 shows the PLR of the voice service. In the figure, “PLA is applied” represents the case that the proposed scheduling algorithm is applied. A part of all lines except the PLFS aren’t drawn on the figure, because the PLR value is very low.
Fig. 3. Packet loss rate of the voice service
Fig. 4. Packet loss rate of the video service
A Packet Scheduling Algorithm for IEEE 802.22 WRAN Systems
745
1.001 1 0.999
whole throughput
0.998 0.997 0.996 M-LWDF M-LWDF that PLA is applied PLFS PLFS that PLA is applied
0.995 0.994 0.993 0.992 0.991 50
55
60
65
70 75 80 the number of voice User
85
90
95
100
Fig. 5. The throughput of the whole users
The PLFS draws the fluent curve because of considering the PLR but it can’t satisfy the required PLR. The reason that the PLR of the PLFS is much bigger than the other scheduling algorithms is because resource allocation of the PLFS is handled by the instant PLR between users. The M-LWDF satisfies the required PLR until 80 voice user, but it can’t satisfy that from over that. In the case of the proposed scheduling algorithm completely satisfy it. Fig. 4 shows the PLR of the video service. Although the required PLR level is much lower than the voice service, the video service will show the low performance because it generates the packets much more than voice service per a frame. We can see that all scheduling algorithms can’t satisfy the required PLR but the proposed scheduling algorithm is nearest to it. As comparing Fig. 3 and 4, we can find that the proposed scheduling algorithm has the similar PLR performance in both the voice and the video service. Through this, we easily can analogize that the influence determining PLR size of user will be bigger the PLA factor than the packet delay of queue HOL. Fig. 5 shows the throughput of all users. This value is calculated as “(the number of packets that is successfully transmitted) / (the number of the packets that is generated)”. As shown in the figure, the size of the enhanced gap of the M-LWDF is bigger than the PLFS. If the performance of the proposed scheduling algorithm is not almost affected by the scheduling algorithm for the second resource allocation, we could use the simpler scheduling algorithm for the second resource allocation. We, through Fig. 5, can find that the performance of the existing scheduling algorithms is enhanced when the proposed scheduling algorithm is applied. Since now, in order to more easily notate the proposed packet scheduling algorithm we use EPLA(the Estimated PLA). Fig. 6 and 7 shows the number of checking packets in the queue which stores voice traffic and video traffic respectively. In voice traffic case, the number of checking has few differences between conventional EPLA and EPLA applied simple method. This is because it’s enough to make a data transmission by using just a packet per a frame when considering the characteristic of voice traffic. In most of case, the amount of packet in the queue is less than the number of packet transmittable. In video traffic case, we can find that the number of checking packets in the queue for EPLA applied simple method is reduced as about a half when comparing to conventional EPLA. This is because the generated amount of video traffic is relatively much greater than that of voice traffic and in most of case it
746
Y.-d. Lee, T.-j. Yun, and I. Koo
Fig.6. The comparison for the number of checking packet in the queue in voice traffic case
Fig.7. The comparison for the numberr of checking packet in the queue in video traaffic case
is greater than the number of o packets transmittable so that it will increase the num mber of packets remaining in the queue.
Fig. 8. The compaarison for the performance through the packet loss rate
Fig. 8 shows that even th hough the number of checking packets in queue is reducced, the performance of packet loss rate is not reduced. Since the packet loss rate has direct relation to the through hput, we can think that the whole performance is not degraded.
5 Conclusion WRAN system inevitably experiences e a deep delay by the QP and the real-time traaffic that has a short lifetime req quires more urgent transmission. The situation will generrate a lot of packet losses. Thereefore, the WRAN system will need a scheduling algoritthm sensitive to the packet losss. In this paper, we have proposed a scheduling algoritthm based on the estimation off the PLA for supporting real-time traffic in IEEE 8022.22 WRAN Systems. The pro oposed scheduling algorithm provides better PLR and throughput than the existin ng scheduling algorithms such as the M-LWDF and the PLFS. In order to reduce complexity for estimating PLA, the simple calculattion
A Packet Scheduling Algorithm for IEEE 802.22 WRAN Systems
747
reduction method has also been proposed such that the number of checking packets in the queue is reduced as much as about a half time in the video traffic case without any performance degradation.
Acknowledgement This work was supported in part by the Ministry of Commerce, Industry, and Energy and in part by Ulsan Metropolitan City through the Network-based Automation Research Center at the University of Ulsan.
References 1. Mitola, J.: Cognitive Radio for flexible mobile multimedia communications. In: Proc. of IEEE Workshop on Mobile Multimedia Comm., pp. 3–10 (1999) 2. IEEE 802.22.: IEEE 802.22 / D0.2 Draft Standard for Wireless Regional Area Networks Part22: Cognitive Wireless RAN Medium Access Control and Physical specifications: Policies and procedures for operation in the TV Bands (2006) 3. IEEE 802.22.: IEEE 802.22 / D0.3.7 Draft Standard for Wireless Regional Area Networks Part22: Cognitive Wireless RAN Medium Access Control and Physical specifications: Policies and procedures for operation in the TV Bands (2007) 4. Shakkottai, S., Stoylar, A.L.: Scheduling for Multiple Flows Sharing a Time-varying Channel: the Exponential Rule. Bell Laboratories Technical Report (2000) 5. Shakkottai, S., Stoylar, A.L.: Scheduling Algorithms for a Mixture of Real and Non-realtime Data in HDR. Bell Laboratories Technical Report (2000) 6. Shin, S.J., Ryu, B.H.: Packet Loss Fair Scheduling Schemefor Real-time Traffic in OFDMA System. ETRI Journal 26(5), 391–396 (2004) 7. Park, T.W., Shin, O.S., Lee, K.B.: Proportional Fair Scheduling for Wireless Communication with Multiple Transmit and Receive Antennas. In: Vehicular Technology Conference, vol. 3, pp. 1573–1577 (2003) 8. Shakkottai, S., Stoylar, A.L.: Scheduling Algorithm for a Mixture of Real-Time and NonReal-Time Data in HDR. In: 17th International Teletraffic Congress (2001) 9. Andrews, M., Kumaran, K., Ramanan, K., Stolyar, A., Whiting, P., Vijayakumar, R.: Providing Quality of Service over a Shared Wireless Link. IEEE Comm. Mag., 150–154 (2001) 10. Andrews, M., Kumaran, K., Ramanan, K., Stolyar, A., Vijayakumar, R., Whiting, P.: CDMA Data QoS Scheduling on the Forward Link with Variable Channel Conditions. Bell Labs Tech. Memo (2000)
Study on Multi-Depots Vehicle Scheduling Problem and Its Two-Phase Particle Swarm Optimization∗ Suxin Wang1, Leizhen Wang1, Huilin Yuan1, Meng Ge1, Ben Niu2, Weihong Pang1, and Yuchuan Liu1 1
Northeastern University at Qinhuangdao, Qinhuangdao Hebei, 066004 China 2 College of Management, Shenzhen University, Shenzhen, 518060 China
[email protected],
[email protected],
[email protected] Abstract. To get global solution in multi-depots vehicle scheduling problem (MDVSP), MDVSP models are established. Two-phase particle swarm optimization (TPPSO) is established to solve MDVSP. The optimization course are as follow: first phase, set up goods number dimension particle position vector, vector’s every column corresponds to goods, vector elements are random vehicle serial number, thus we can assign goods to vehicles. Second phase, particle position matrix is set up, matrix’s column number equal to vehicle freight goods number, every column corresponds to a goods, and matrix has two row, the first row correspond to goods start depot, second row correspond to end depot, matrix elements are random number between 0 and 1, matrix elements are sort ascending according to sort rules, we can get single vehicle route. Then evaluate and filtrate particles by optimization aim, circulate until meet terminate qualification. TPPSO can assign all freights to all vehicles and easy to get optimized solution. Keywords: Multi-depots vehicle scheduling problem (MDVSP), Particle swarm optimization (PSO), Two-phase method.
1 Introduction Vehicle routing problem (VRP) was first proposed by Dantzig and Ramser in 1959. Multi-depots vehicle scheduling problem (MDVSP) is complex to single depot VRP, many algorithms are used to deal with MDVSP, such as one-stage approach[1], exact algorithm[2], variable neighborhood search (VNS) [3], genetic algorithm [4] and so on. These algorithms transform MDVSP into many single depots VRP, depot is visited only once, and easy get into local minima. ∗
This work is partially supported by National Scientific and Technical Supporting Programs Funded by Ministry of Science & Technology of China (NO.2006BAH02A09), National Natural Science Foundation of China (70431003), Hebei Procce Technical Research and Develop Instruct Programs (072135214), Shenzhen-Hong Kong Innovative Circle project (Grant no.SG200810220137A) and Project 801-000021 supported by SZU R/D Fund.
D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 748–756, 2009. © Springer-Verlag Berlin Heidelberg 2009
Study on MDVSP and Its Two-Phase Particle Swarm Optimization
749
To deal with this issue, and get the global solution, two-phase particle swarm optimization (TPPSO) is proposed in this paper for MDVSP.
2 VSP Parameter Demarcate and Model 2.1 Multi-Depots Freight Modes Multi-depots vehicle freight modes see (a) in Fig 1.
Mode 1: Mode 2: Mode 3:
depot 1
depot 2
depot 3
depot 4 … depot 1+n depot 2+n depot 3+n …
(load (load load
load unload) load
load (load unload
load …) (unload unload) … (load load … unload
unload unload) unload
unload (load load
…) … …
(a) Freight Modes
Vehicle load
Vehicle load
Vehicle load
Vehicle capacity
Vehicle capacity
Vehicle capacity
1 2 3 4 5 6 7 Depots
1 2 3 4 5 6 7 Depots
1 2 3 4 5 6 7 Depots
(b) Mode 1 vehicle load
(c) Mode 2 vehicle load
(d) Mode 1 vehicle load
Fig. 1. Multi-depots vehicle freight modes
In multi-depots vehicle freight modes, vehicle departs from depot, turn to start or another depot when vehicle accomplish goods conveyance. In mode 1, vehicle load continuous, then unload continuous, vehicle load first ascend, then descend, vehicle load changes we can see (b) in Fig 1. In mode 2, vehicle load goods, then unload goods and recurrence, vehicle load changes is rectangle wave, which we can see (c) in Fig 1. In mode 3, vehicle load and unload goods randomly, vehicle load is fluctuant, vehicle load changes we can see (d) in Fig 1. Mode 3 includes mode 1 and mode 2, vehicle position is random and it needn’t go to its start point, which can make good use of goods and vehicle resource. To MDVSP, goods start and end depot should be visited once, if goods beyond vehicle capacity, depot should be visited more than once.
750
S. Wang et al.
2.2 Model Parameter Demarcate Graph G= (V, A), where V is the depots set, depots i, j and u∈V, A is the arc set. Qv is vehicle capacity. Nv is vehicle numbers, nv is vehicle serial number, nv ∈ Nv If vehicle nv departs from depot i to j, xijn =1, other wise xijn =0. dij is the distance of depot i
;
v
v
to j. Lvmax is vehicle route length limit. Oij is the goods which need to consign from depot i to j, Oij start and end depots are linked by arrowhead. qij is Oij’quality, q nj is v
;
vehicle load when vehicle nv departs from depot j, NO is Oij sum number If Oij is on vehicle nv, oijn =1, otherwise oijn =0 If Oju is going to load on vehicle nv, y nju =1, v
;
v
v
otherwise y nju =0. v
2.3 MDVSP Model Assumption: (1) When Oij load on vehicle nv, it load until vehicle nv reach Oij end depot, which means no vehicle transfer. (2) Vehicle types are multiplicity and there are enough vehicles, vehicle position is random. (3) When vehicle finish all Oij consignment, it needn’t go to its start point. Objective function: min
∑ ∑∑ (d
n v ∈ N v i∈V j ∈V
nv ij ij
x ),
(1)
Subject to:
0 < q ij ≤ Qv ,
(2)
0 ≤ q njv ≤ Qv ,
(3)
∑ ∑ (d
x ) ≤ Lv max ,
nv ij ij
i∈V j∈V
∑o
nv ∈ N v
nv ij
= 1,
(4)
(5)
when xijn =1, v
q njv = qinv − ∑ oijnv qij + ∑ q ju y njuv . i∈V
u∈V
(6)
In the above proposed model, the objective function (1) is to minimize vehicle route length. The constraints for vehicle load and goods quantity are showed in (2) and (3). The constraint for vehicle route length is showed in (4). The same goods are ferried by one vehicle is showed in (5). Vehicle load and unload information are showed in (6).
Study on MDVSP and Its Two-Phase Particle Swarm Optimization
751
3 MDVSP Model Strategy Based on TPPSO 3.1 Particle Swarm Optimization Particle swarm optimization (PSO) [5-9] is a population based stochastic optimization technique developed by Dr. Eberhart and Dr. Kennedy in 1995, inspired by social behavior of bird flocking or fish schooling. In PSO, every particle (individual) adjusts its "flying" according to its own flying experience and its companions’ flying experience. Every particle is treated as a point in a D-dimensional space. Particle number is R, the rth particle is represented as Xr = (xr1, xr2, …, xrd, …, xrD). The best previous position (the position giving the fitness value) of the rth particle is recorded and represented as Pr = (pr1, pr2, …, prd, …, prD). The index of the best particle among all the particles in the population is represented by Pg. The position change rate (velocity) for particle r is represented as Vr = (vr1, vr2, …, vrd, …, vrD). The particles are manipulated according to the following equation: vrd = wvrd + c1b1(prd - xrd) + c2b2(pgd - xrd),
(7)
xrd = xrd + vrd,
(8)
In (7), c1 and c2 are two positive constants, b1 and b2 are two random numbers between 0 and 1, and w is the inertia weight. Equation (7) is used to calculate the particle’s new velocity according to its previous velocity and the distances of its current position from its own best experience (position) and the group’s best experience. Then the particle flies toward a new position according to (8). The dth (1≤ d ≤D) dimension bounds of X are [xd min, xd max], V are [vd min, vd max], if xd or vd exceed bound in iteration, take bound value. Particle’s position and velocity are produced randomly, and iterate according to (7) and (8) until meet terminate condition. 3.2 Two-phase Particle Swarm Optimization MDVSP is optimized by TPPSO. First phase in TPPSO assign task (goods) to vehicle, second phase, optimize single vehicle route, the fix-and-optimize approach is shown in Fig 2. 1) First Phase Parameter Representation At the first phase in TPPSO, to assign goods to vehicle, particle position vector expression is presented in Fig 3. Vector’s every column corresponds to an Oij, vector’s element is random vehicle serial number, and by comparing vector’s element, we can find the goods that vehicle to freight. For example, in Fig 3 we can see vehicle nv ferry goods Oi (j+3) and O (j+2) u. 2) Second phase Parameter Representation At the second phase in TPPSO, particle position matrix is set up. In particle position matrix, column number equal to vehicle freight goods number, every column corresponds to a Oij, and matrix has two row, the first row corresponds to Oij start depot,
752
S. Wang et al.
Task assignment (assign goods to vehicle)
First phase
…
Vehicle 1 route optimization
Second phase
Vehicle nv … route optimization
Meet terminate condition?
No
Yes End Fig. 2. Two-phase method for MDVSP
X=
Oij
Oi(j+3)
…
O (i-1) (j-1)
O(j+2)u
Oju
nv +2
nv
…
nv -2
nv
nv -3
Fig. 3. First phase particle position vector expression
second row correspond to Oij end depot. Matrix elements are random number between 0 and 1. Particle position matrix expression is presented in Fig 4. To get single vehicle route, matrix’s elements are sort ascending, due to matrix elements corresponding to depots, we can get single vehicle route.
X=
(goods) (Oij start depot) (Oij end depot) (Oij start depot random number) (Oij end depot random number)
Oij i j 0.2 0.1
Oi(j+3) i j+3 0.5 0.7
… … … … …
Oju j u 0.7 0.9
Fig. 4. Second phase particle position matrix expression
To avoid consigning Oij to reverse direction, sort rules are: a) If random numbers are equal among some Oij, random select a depot to get global solution. b) Oij end depot can’t be visited until Oij start depot is visited, so Oij start depot random number should less to Oij end depot, otherwise interchange random number. If random numbers are equal, then start depot first. 3.3 Optimization Process The overall learning process can be described as follows: Step 1——Initialize particles
Study on MDVSP and Its Two-Phase Particle Swarm Optimization
753
1) Initialize PSO parameters 2) Get random integer numbers between 1 and Nv for particle position matrix 3) Get random numbers between (1- Nv) and (Nv -1) for particle velocity matrix 4) Find the goods that vehicle to freight, get single vehicle route by PSO according to sort rules arrange matrix elements 5) Calculate particle fitness value by objective function 6) Make first particle position matrix as Pr and find Pg in particles Step 2——Iterance hereinafter step until meet terminate condition 1) For every particle, calculate particle velocity according (7), update particle position according (8), take bound value if particle’s velocity or position exceed boundary 2) Find the goods that vehicle to freight, get single vehicle route by PSO according to sort rules arrange matrix elements 3) Calculate particle fitness value by objective function 4) If the fitness value is better than the best fitness value in history (Pr), set current value as Pr, and save vehicle route 5) In particles, choose the particle with the best fitness value as Pg and save vehicle route, if there are multi-Pg, get a Pg randomly.
4
Computational Results
4.1 Example In one area and certain time, goods consign information are showed in Fig 5, goods start and end depots are linked by arrowhead. Goods information is showed in Table1, set Qv =10t, Lvmax =100km, it is required to finish goods consignment with optimized vehicle route and without overload, if overload happened, objective function should be punished. 4.2 Optimized Solution 1) TPPSO for MDVSP PSO parameters are set as c1 = c2=1.49445, w =0.729, particle septuple columns number, iterate 30 times. At the first phase in TPPSO for MDVSP, particle position matrix has 17 columns according to table 1, and every column corresponds to a Oij. At the second phase, particle position matrix column are set by goods that the vehicle to freight. Run TPPSO 20 times, vehicle optimized sum route length is 190.24km, and there are two vehicles, the detailed information can be seen in Tab 2. In Tab 2 we can see, depot 11, 13, 1, 8,9,0,5 and 14 are visited more than once due to vehicle load restriction, vehicle load or unload randomly, vehicle load is fluctuant. These phenomena relate to goods information and depots position. 2) ACO for MDVSP Ant colony optimization (ACO) is used to MDVSP, ACO parameters set as: the relative influence of the pheromone trails α=1, the relative influence of the distances β=5, the number of ant equal to vehicle, trails retain coefficient ρ=0.7, constant
754
S. Wang et al.
Fig. 5. Goods consign information Table 1. Goods Information Oij
qij(t)
Oij
qij(t)
Oij
qij(t)
Oij
qij(t)
Oij
qij(t)
O(0)(5)
2
O(6)(5)
2
O(11)(0)
3.8
O(14)(3)
1.6
O(4)(8)
3.3
O(8)(5)
3
O(11)(16)
4.2
O(11)(13)
4.7
O(10)(14)
2.1
O(13)(5) O(12)(5)
2.6 3.5
O(11)(7) O(11)(2)
1.4 0.6
O(1)(9) O(9)(1)
1.4 4.3
O(8)(14) O(8)(15)
1.2 2
,
Q=100 iterate 30 times. Run ACO 20 times, vehicle optimized sum route length is 204.87/km and there are three vehicles. 3) Comparision between ACO and TPPSO for MDVSP From the optimization result we can see, by TPPSO, vehicle optimized sum route length is 190.24km, and there are two vehicles take part in goods conveyance. By ACO, vehicle optimized sum route length is 204.87/km and there are three vehicle take part in goods conveyance. So TPPSO is good to ACO in optimizing MDVSP. Table 2. Load and unload information for vehicle route
route length vehicle /km route 11→
1
98.33
16→ 12→ 7→ 6→ 2→ 5→
load goods
O(11)(16) O(11)(7) O(11)(2) O(11)(0) O(12)(5) O(6)(5)
unload goods
vehicle load /t 10
O(11)(16)
O(11)(7) O(11)(2) O(6)(5) O(12)(5)
5.8 9.3 7.9 9.9 9.3 3.8
Study on MDVSP and Its Two-Phase Particle Swarm Optimization
755
Table 2. (continued) route
2
5
route length vehicle /km route 4→ 1→ 0→ 8→ 9→ 10→ 14→ 15
91.91
11→ 14→ 13→ 9→ 5→ 1→ 0→ 8→ 5→ 3
load goods
O(4)(8) O(1)(9) O(8)(15)O(8)(14)
O(10)(14) O(11)(13) O(14)(3)
O(13)(5)
O(9)(1) O(0)(5) O(8)(5)
unload goods
O(11)(0) O(4)(8) O(1)(9 O(8)(14)O(10)(14) O(8)(15) O(11)(13) O(13)(5) O(9)(1) O(0)(5)O(8)(5) O(14)(3)
vehicle load /t 7.1 8.5 4.7 4.6 3.2 5.3 2 0 4.7 6.3 4.2 8.5 5.9 1.6 3.6 6.6 1.6 0
Conclusions
There are two reasons to get the global solution. First, in MDVSP model, goods with start and end depots are denoted as Oij, goods consignment relation is clear. Second, vehicle route to all goods is searched with optimized route by TPPSO, optimization processes are depended, and tend to get the global solution.
References 1. Lim, A., Wang, F.: Multi-depot vehicle routing problem: a one-stage approach. Automation Science and Engineering 397, 397–402 (2005) 2. Aristide, M.: The multi-depot periodic vehicle routing problem. In: Zucker, J.-D., Saitta, L. (eds.) SARA 2005. LNCS (LNAI), vol. 3607, pp. 347–350. Springer, Heidelberg (2005) 3. Michael, P., Richard, F.H., Karl, D., et al.: A variable neighborhood search for the multi depot vehicle routing problem with time windows. Springer Netherlands, Journal of Heuristics 10(6), 613–627 (2004) 4. Yang, Y.F., Cui, Z.M., Cheng, J.M.: An Improved Genetic Algorithm for Multiple-Depot Vehicle Routing Problem with Time Window. Journal of Soochow University (Engineering Science Edition) 26(2), 20–23 (2006) 5. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks, pp. 1942–1948 (1995)
756
S. Wang et al.
6. Ben, N., Zhu, Y.L., He, X.X., Hai, S.: A Multi-swarm Optimizer Based Fuzzy Modeling Approach for Dynamic Systems Processing. Neurocomputing 71, 1436–1448 (2008) 7. Ben, N., Zhu, Y.L., He, X.X., Wu, H.: MCPSO: A Multi-Swarm Cooperative Particle Swarm Optimizer. Applied Mathematics and Computation 185(2), 1050–1062 (2007) 8. Niu, B., Li, L.: A novel PSO-DE-based hybrid algorithm for global optimization. In: Huang, D.-S., Wunsch II, D.C., Levine, D.S., Jo, K.-H. (eds.) ICIC 2008. LNCS (LNAI), vol. 5227, pp. 156–163. Springer, Heidelberg (2008) 9. Ben, N., Zhu, Y.L., He, X.X., Wu, H., Hai, S.: A Lifecycle Model for Simulating Bacterial Evolution. Neurocomputing 72, 142–148 (2008)
Image Segmentation to HSI Model Based on Improved Particle Swarm Optimization∗ Bo Zhao1, Yajun Chen2, Wenhua Mao1, and Xiaochao Zhang1 1
The Institute of Electrical and Mechanical Technology, Chinese Academy of Agricultural Mechanization Sciences, Beijing 100083, China 2 The Department of Information Science, Xi’an University of Technology, Xi’an 710048, China
Abstract. According to the characteristics of the particle swarm optimization, a method for the image segmentation to HSI model based on the improved particle swarm optimization was proposed in this paper. Firstly, the basic principle of the algorithm was introduced. Secondly, the characteristics on the image segmentation were analyzed. Finally, the image segmentation method based on the improved PSO was proposed, which can effectively overcome shortages which are the slow rate of the particle swarm optimization and the poor segmentation quality by using other algorithms. Experimental results proved that the improved algorithm was an effective method for the image segmentation in the practical application, which could segment the object accurately. Keywords: Image segmentation, HIS model, PSO.
1
Introduction
Image segmentation is hot research issues on the image processing and a key process of the image analysis and the image comprehension. Traditional segmentation methods are effective to some images, but are limited to other images which are applied to the especial field and characteristics. Therefore, some intelligence algorithms were applied to the image segmentation, such as the neural network, genetic, swarm intelligence, and so on [1]. Swarm intelligence takes inspiration from the social behaviors of insects and of other animals. In particular, birds have inspired a number of methods and techniques among which the most studied and the most successful is the general purpose optimization technique known as particle swarm optimization (PSO). The PSO which is a powerful stochastic evolutionary algorithm has some advantages which are the PSO can solve a variety of difficult optimization problems but has ∗
Project supported by The National High Technology Research and Development Program of China (863Program)(No. 2006AA10A305 and No. 2006AA10Z254), The National Natural Science Funds of China (No. 30771263) and The Key Technology R&D Program (No. 2007BAD89B04).
D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 757–765, 2009. © Springer-Verlag Berlin Heidelberg 2009
758
B. Zhao et al.
shown a faster convergence rate than other evolutionary algorithms on some problems, and the great advantage of the PSO is that it has very few parameters to adjust, which makes it particularly easy to implement. Therefore the algorithm takes on the excellent performance and the tremendous development potential, and it is successfully applied to the function minimization, the neural network training, the data mining, the control of the fuzzy system, and so on.
2
Particle Swarm Optimization
The PSO, first introduced by Kennedy and Eberhart [2] is a stochastic optimization technique which can be likened to the behavior of a flock of birds. A simple explanation of the PSO’s operation is as follows. The population of the PSO is called a swarm and each individual in the population of the PSO is called a particle. Each particle represents a possible solution to the optimization task at hand. During each iteration, each particle accelerates in the direction of its own personal best solution found so far, as well as in the direction of the global best position discovered so far by any of the particles in the swarm. This means that if a particle discovers a promising new solution, all the other particles will move closer to it, exploring the region more thoroughly in the process [3],[4]. The swarm size of the PSO is denoted to s . Each particle has the following attributes which are a current position xi in the search space, a current velocity vi , and a personal best position pi in the search space. During each iteration, each particle in the swarm is updated using (1) and (2).
vi +1 = ϖvi + c1r1 ( pi − xi ) + c2 r2 ( p g − xi ) .
(1)
The new position of a particle is calculated using xi +1 = xi + vi +1 .
(2)
The variable ϖ is the inertia weight, this value is typically setup to vary linearly from 0 to 1 during the course of a training run. The variable c1 and c2 are the acceleration coefficients, which can control how far a particle will move in a single iteration. Typically, these are both set to a value of 2.0, although assigning different values to c1 and c2 sometimes leads to improved performance. The variable r1 and r2 are two random numbers in the range (0, 1). The variable p g is the global best position found by all particles. The velocity vi of each particle can be clamped to the range [− vmax , vmax ] to re-
duce the likelihood of particles leaving the search space. The value of vmax is usually chosen to be k × mmax , with 0.1 ≤ k ≤ 1.0 [3]. Note that this does not restrict the values
of xi to the range [− vmax , vmax ] , it only limits the maximum distance that a particle will move during one iteration.
Image Segmentation to HSI Model Based on Improved Particle Swarm Optimization
3
759
Image Segmentation Based on the Improved PSO
The main purpose of the using PSO is to find an image segmentation method which could effectively process images of HSI model, which is applied to the traffic signs recognition, the license plate recognition, the vision navigation of the agricultural robot widely, as shown in Fig. 1, and the process effect of HIS model is superior to other model, such as RGB model, YUV model and so on. Fig. 2 shows the H component image after processing by using (3) to the original color image of Fig. 1.
I = 13 ( R + G + B) , S = 1 − ( R + G3 + B ) [min( R, G, B )] , ⎧ [( R − G ) + ( R − B ) ] / 2 ⎫ H = arccos⎨ . 1/ 2 ⎬ ⎩ [( R − G ) 2 + ( R − B )(G − B ) ] ⎭
(3)
(a)
(b) Fig. 1. The original color image: (a) is an image of the traffic signs recognition, (b) is an image of the vision navigation of the agricultural robot
760
B. Zhao et al.
(a)
(b) Fig. 2. The H component of HSI model: (a) is the H component of Fig.1 (a), (b) is the H component of Fig.1 (b)
3.1
Improved Algorithm
The results of the traditional PSO are exact, but its convergence rate is slow, therefore the algorithm has to be improved [6]. In order to the high convergence rate and exact image segmentation, we consider the maximum, the minimum and the average gray value as the starting position of three particles, the position of other particles are the random number in the range (0, 255), calculate the distance dij from each pixel to initializing position of all particles using (3), setup a personal best position of all particles and the global best position, compute by the PSO, and update the position of all particles. Through the heuristic guidance, the improved PSO could fast find the optimal value.
d ij = ( g ij − xk ) 2 .
(4)
The variable g ij is image gray value, The variable xk is current position of each particle.
Image Segmentation to HSI Model Based on Improved Particle Swarm Optimization
3.2
761
Algorithm Description
1) Initialize the parameters s , ω , c1 and c2 , and setup the iterative times N. 2) Consider the maximum, the minimum and the average gray value as the position of three particles, setup the random position of other particles, and setup the random velocity of all particles. 3) Calculate the distance d ij from each pixel to initializing position of all particles using (3), setup a personal best position of all particles and the global best position. 4) Start the circulation, update the position and the velocity of all particles using (1) and (2), clamp the velocity and the position of each particle, evaluate the new adaptive value using (4). 5) Compare the new adaptive value of each particle. 6) If the global best position is not change, end the circulation, otherwise, return to step 4, and continue the circulation. 7) Setup the global best position to segmentation threshold and segment the image.
4 Experimental Results and Discuss 4.1
Experimental Results
The improved PSO algorithm includes some tuning parameters which greatly influence the algorithm performance, often stated as the exploration–exploitation tradeoff. Exploration is the ability to test various regions in the problem space in order to locate a good optimum, hopefully the global one. Exploitation is the ability to concentrate the search around a promising candidate solution in order to locate the optimum precisely [5]. Therefore these parameters are adjusted as well. The key parameters of the improved PSO are the swarm size s , the maximal velocity vmax , the inertia weight ω and the iterative times N in this paper. If the swarm size s was too many, the cost time will be increased, and the image segmentation results were inferiority. If the swarm size s was too few, the image segmentation results were also inferiority, although the cost time was short. Therefore the range of the swarm size s was from 25 to 40, the image segmentation results can meet requirement. With the increase of the iterative times N , the cost time of the image segmentation was also increased rapidly. When the inertia weight ω was from 0.5 to 0.9, the image segmentation results can meet requirement. When the range of the maximal velocity vmax was from 0.4 to 0.9, the image segmentation results can meet requirement. Typically the acceleration coefficient c1 and c2 were 2. In order to meet the practical application, the criterion of the choosing optimization parameters is to decrease the convergence rate, and improve the image segmentation effect to the different images. By the above experiments, the optimization parameters of the improved algorithm were that the swarm size s was 30, the maximal velocity vmax was 0.5, the inertia weight ω was 0.9 and the iterative times N was 1 respectively. The twenty different images were tested by using these parameters. The average cost time was 495 ms and the segmentation results were excellent, as shown in Fig. 3(a) and Fig. 4 (a) (All images size is 640 × 480 in this paper.).
762
B. Zhao et al.
(a)
(b)
(c)
(d) Fig. 3. The image segmentation: (a) is the image segmentation result of Fig. 2(a) by using the improved PSO, (b) is the image segmentation result of Fig. 2(a) by using the PSO, (c) is the image segmentation result of Fig. 2(a) by using the iteration threshold algorithm, (d) is the image segmentation result of Fig. 2(a) by using the K-means clustering
Image Segmentation to HSI Model Based on Improved Particle Swarm Optimization
763
(a )
(b)
(c)
(d) Fig. 4. The image segmentation: (a) is the image segmentation result of Fig. 2(b) by using the improved PSO, (b) is the image segmentation result of Fig. 2(a) by using the PSO, (c) is the image segmentation result of Fig. 2(b) by using the iteration threshold algorithm, (d) is the image segmentation result of Fig. 2(b) by using the K-means clustering
764
B. Zhao et al. Table 1. The experiment results analysis of different algorithm The algorithm
Improved PSO PSO Iteration threshold algorithm K-means clustering
The segmentation quality excellent good
The average cost time(ms) 495 19874
good
31
good
83
In order to test image segmentation quality, the PSO, the K-means clustering algorithm and the iteration threshold algorithm were compared with the improved PSO. Fig. 3 and Fig. 4 show the image segmentation results of Fig. 2 respectively. Table 1 shows the experiment results analysis of different algorithm to the twenty different images including the segmentation quality and the average cost time. The segmentation quality of images was evaluated by the subjective observation and the subsequent process which could obtain object accurately. 4.2 Discuss
1) The improved PSO could apply to the HIS model image, and the segmentation result was excellent to the PSO, the iteration threshold algorithm and the K-means clustering obviously, due to exact calculating and the heuristic guidance of the improved PSO. 2) The improved PSO algorithm circulated one times only, the segmentation threshold could obtained, however the traditional PSO need circulate many times, the cost time is quite long. 3) Although the average cost time of the improved PSO was long to the iteration threshold algorithm and the K-means clustering, the average cost time was acceptable.
5
Conclusion
According to the characteristics of the PSO, a method for the image segmentation to HIS model based on the improved PSO was proposed in this paper. Experimental results proved that the improved PSO was an effective method for the image segmentation which could segment the object accurately.
References 1. Zheng, X.X., Yan, J.L.: A Survey of New Image Segmentation Methods. Computer and Digital Engineering 35(8), 103–106 (2007) 2. Eberhar, T.R.C., Kennedy, J.: A new optimizer using particle swarm theory. In: Proc. 6th Int. Symp. Micro Machine and Human Science, Nagoya, Japan, pp. 39–43 (1995) 3. Frans, V.D.B., Andries, P.: A cooperative approach to particle swarm optimization. IEEE Transaction on Evolutionary Computation 8(3), 225–229 (2004)
Image Segmentation to HSI Model Based on Improved Particle Swarm Optimization
765
4. Tony, H., Ananda, S.M.: A Hybrid Boundary Condition for Robust Particle Swarm Optimization. IEEE Antennas and Wireless Propagation Letters 4, 112–117 (2005) 5. Shi, Y., Eberhart, R.C.: Parameter selection in particle swarm optimization. In: Porto, V.W., Saravanan, N., Waagen, D., Eiben, A.E. (eds.) Evolutionary Programming, vol. VII, pp. 591–600. Springer, Berlin (1998) 6. Han, S., Zhang, Q., Ni, B., et al.: A guidance directrix approach to vision-based vehicle guidance systems. Computers and Electronics in Agriculture 43, 179–195 (2004) 7. Shi, Y., Eberhart, R.C.: Empirical study of particle swarm optimization. In: Proc. Congr. Evolutionary Computation, Washington, DC, pp. 1945–1949 (1999) 8. Zhao, B., Qi, L.X., Mao, E.R., et al.: Image Segmentation Based on Swarm Intelligence and K-Means Clustering. The Journal of Information and Computational Science 4(3), 934–942 (2007) 9. Fang, C.Y., Chen, S.W., Fuh, C.S.: Road-Sign Detection and Tracking. IEEE Trans. on Vehicular Technology 52(5), 1329–1341 (2003) 10. Ng, H.F.: Automatic Thresholding for Defect Detection. In: IEEE Proc. Third Int. Conf. on Image and Graphics, pp. 532–535 (2004) 11. Niu, B., Zhu, Y.L., He, X.X., Shen, H., Wu, Q.H.: A Lifecycle Model for Simulating Bacterial Evolution. Neurocomputing 72, 142–148 (2008) 12. Niu, B., Li, L.: A novel PSO-DE-based hybrid algorithm for global optimization. In: Huang, D.-S., Wunsch II, D.C., Levine, D.S., Jo, K.-H. (eds.) ICIC 2008. LNCS (LNAI), vol. 5227, pp. 156–163. Springer, Heidelberg (2008) 13. Niu, B., Zhu, Y.L., He, X.X., Wu, Q.H.: MCPSO: A Multi-Swarm Cooperative Particle Swarm Optimizer. Applied Mathematics and Computation. 85(2), 1050–1062 (2007)
Emotional Particle Swarm Optimization Wei Wang, Zhiliang Wang, Xuejing Gu, and Siyi Zheng School of Information Engineering, University of Science and Technology Beijing, 100083 Beijing, China
[email protected] Abstract. It is known that there is only information sharing in most particle swarm optimization. But competition among particles which is a good feature for searching progress does not exist. For all these, based on the idea of multiagent with emotion, bring in competition controlled by emotion to enhance performance of PSO after describing similarity between particles swarm and multi-agent system. And from the point of emotional view, problem that agents whether should compete or not is stated qualitatively. Furthermore, velocity threshold is deduced. Utilizing these, the method proposed could improve both local and global searching ability of particle swarm optimization. In addition, simulation results show that the improvement is effective. Algorithm with it has good efficiency. Keywords: Optimization algorithm, Particle swarm optimization, Emotion, Multi-agent, Velocity.
1 Introduction Without centralized controlling and global model, particle swarm optimization (PSO) provides a new method to find out the solution in complex distributed problem. There are two key issues in PSO algorithm. Firstly, achieve faster rate of convergence while searching at same accuracy. Raise the search precision while having comparable convergence speed. Secondly, based on this algorithm, effective method to solve practical problem should be studied. Research on PSO has carried on for more than a decade. The main contributions are parameters setting [1, 2], swarms variety keeping [3, 4], population structure changing [5], multi-algorithm fusing [6, 7] and theory analyzing. The first improved method usually changes inertia weight ω and acceleration constant c1 ,c 2 . It is convenient. However, effect is limited. The second one uses various mechanisms to avoid prematurity in searching process. The third one changes topology structure of particles to influence on the type of information spreading. And the last improvements could utilize advantages of many other algorithms. But it increases algorithm complexity generally. Obviously, Improvement on algorithm could enhance its performance. Unlike the methods above, different thought is proposed from another point of view. It is known that there is only information sharing in PSO. But competition among particles which is a good feature for searching progress does not exist. To be D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 766–775, 2009. © Springer-Verlag Berlin Heidelberg 2009
Emotional Particle Swarm Optimization
767
of this feature, explain PSO from aspect of multi-agent in artificial intelligent firstly. Then, emotion is brought in to reach the goal above. Therefore, emotional PSO (EPSO) is proposed. Through experiments, improvement effects are shown. The algorithm is also applied to a clustering problem. This article is structured as follows: Sect. 2 describes similarity between particles swarm and multi-agent system from three aspects: particle description, single particle action and topology structure among multi particles. Furthermore, the reason why multi-agent idea is adopted is expatiated. In Sect. 3, emotion is brought in swarm, and analyzes its rationality. And upper threshold of particle speed is deduced. Sect. 4 tests effect of algorithm through experiments with Benchmark function. Moreover, apply the improved one to a clustering problem. Finally, this article is concluded in Sect. 5.
2 Similarity Explanations and Analysis In this section, introduce standard PSO firstly. And then, based on this, similarity between particles in swarm and agents in multi-agent system is explained and analyzed. 2.1 Standard PSO Assume that there are n particles in M-dimensional searching domain. Xi = (xi1, xi2,L, xim),i =1,2,L, n is the location vector of particle i , and Vi = (vi1, vi2,L, vim) is its speed vector. They all have M dimensions. Pi = ( pi1 , pi 2 ,L, pim ) is the location vector with best fitness of particle i in the optimization process, which is called individual best location. Correspondingly, Pi = ( pg1, pg2 , L, pgm) is named global best location representing the best solution so far. When the particle i in generation t evolves into next generation t+1, its speed and location of jth dimension could be deduced with following equations: v ij ( t + 1) = ω ( t ) v ij ( t ) + c 1 r1 ( p ij ( t ) − x ij ( t )) + c 2 r2 ( p gj ( t ) − x ij ( t ))
x ij ( t + 1 ) = x ij ( t ) + v ij ( t + 1 )
where, i = 1, 2, L , n, j = 1, 2, L , M , constant, and
ω
(1)
,
(2)
,
is inertia weight,
c1 ,c 2 is acceleration
r1 , r2 is uniform distributed random numbers in domain of
[ 0 ,1 ]
.
2.2 Explanation of Standard PSO Based on Multi-agent Idea Agent is defined as a computing system which tries to reach a goal in complex dynamic surrounding by Maes. It could read environment by sensors, and interact with outside by actuators. And Wooldridge and Jennings gave the definition of agent. According to the definition above, particles swarm could be considered as a generalized multi-agent system. And every particle is regarded as an agent. The reason to think so is that particles locate in solution space spontaneously. Moreover, they search global solution by themselves. Soon after, similarity between particles swarm and
768
W. Wang et al.
multi-agent system will be introduced from three aspects: particle description, single particle action and topology structure among multi particles. Particle Description Definition 1: (Particle State). At a point of the discrete time, position of a particle in swarm is S i : S i = Xi = ( xi1,xi 2 ,… ,xim ), i = 1,2,…, n . Where m is the dimension of solution space, and the number of agent in multi-agent system is n . Definition 2: (Particle Environment). At a point of the discrete time, collection of limited particle states in swarm is
n
US
i
. Pg is the best state. And collection E constitutes all Pg at
i =1
every point of the discrete time which is called particle environment E = {Pg , Pg' , …} . Definition 3: (Particle Action). It could be deduced from evolution equation of particle swarm as follows: v ij ( t + 2 ) = ω ( t + 1) v ij ( t + 1) + c1 r1 ( p ij ( t + 1) − x ij ( t + 1)) + c 2 r2 ( p gj ( t + 1) − x ij ( t + 1)) , x ij ( t + 2 ) = x ij ( t + 1 ) + v ij ( t + 2 )
,
(3) (4)
x ij ( t + 2 ) = [1 + ω ( t + 1 ) − c 1 r1 − c 2 r 2 ] ⋅ x ( t + 1 ) − ω ( t + 1 ) ⋅ x ( t ) + c 1 r1 ⋅ p i ( t + 1 ) + c 2 r 2 ⋅ p g ( t + 1 ) = ϕ [ x ( t + 1 ), x ( t ), p i ( t + 1 ), p g ( t + 1 )].
(5)
In Eq. (5) obtained above, ϕ (⋅) denotes particle action. ϕ (⋅) = {ϕ (⋅), ϕ ' (⋅), L} . According to particle states of the first two steps, historical best state shored in particle and particle environment of multi-agent system, certain particle action is taken. Thus, a new particle state could be got. Single Particle Action Focusing on the agent with inner state, single particle action could be explained in Figure 1.
Fig. 1. This shows single particle activity. There exist n particles corresponding to n agents. Relation between agents and environment is demonstrated as well as the structure in agent.
As intelligent agent, there is inner data structure in it. For the ith particle, states of the first two steps S i (t − 1) , Si (t − 2) and its historical best state store in the data structure. Suppose that S ={Si (t −1),Si (t − 2),Pi (t −1)} is collection of all inner historical
Emotional Particle Swarm Optimization
769
state. Mapping relations of See, Action and Next is given in paper [8]. From environment to perception, See : E → Per . From inner state to action, Action : S → ϕ (⋅) . And from inner state and perception to inner state, Next : S × Per → S .To sum up, as an intelligent agent, particle action could be concluded as follows. Firstly, agent has an initial inner state s 0 ∈ S . Then, it observes particle environment Pg to produce perception See(Pg ) . Secondly, inner state of agent turns into Next( S0 , See(Pg )) using Next function. Finally, agent goes into a new loop with Action( Next(s0 ,See(Pg ))) . Topology Structure among Multi Particles A standard multi-agent system includes some agents which could interact with each other. They response to and influence on environment. Particles in swarm also have interactive topology structure Figure 2 shows five types defined by Mendes and Kennedy.
Fig. 2. This is topological structure of swarms. To simple optimization problem, particles connect to each other with All-Connection structure shown in Figure 2(a) usually. Particles are able to control particle environment Pg as agents could do. But situation is different on complex condition. Structures displayed in Figure 2(c)-(e) are adopted for agent could only control it partially.
2.3 Analysis Through the description of similarity between particles swarm and multi-agent system from three aspects above, consider PSO from another point of view is possible. According to the view of Nolfi and Floreano, evolution and study are two types with which organism could adapt to environmental change. They make agents in multi-agent system enhance the surviving possibility through competing with and studying from another agent. On the one hand, study is a strategy of agent itself to adapt to environment. Agents use some helpful information from outside to change its inner state. Sense changing, and adapt itself to it. This point exists in PSO already, such as sensing environment Pg . On the other hand, evolution is strategy of multiagent system. It does not reflect in PSO. So competition among particles is brought in to enhance solving efficiency of algorithm in the article.
3 Emotional Reflections in Competition In this section, emotional reflection is stated firstly. Based on this, how EPSO could capture competitive ability is proposed.
770
W. Wang et al.
3.1 Emotion PSO simulates scene that birds prey on food. Starvation exposes birds to death danger. To avoid this, they speed up to search food. But searching process reduces guard time which makes birds be preyed on. So when catching food, they slow down in order to get safe. Interpretation of contents above is given below in emotional view for emotion could influence actions [9]. According to Maslow’s hierarchy theory of needs, needs could be divided into seven levels. In addition, the first two levels are called low demands. P. V. Siminov proposed the relation between information and emotion. Based on this theory, emotional model could be built from the point of information as follows: E = − N (In − Ia) ,
(6)
where, E denotes emotion. N is need coefficient which influenced by physiological needs and security needs. N = ( N p ⊕ Ns ) . I n is needs information. And I a is received information [10]. When emotion is passive (E0), relevant action is promoted. To the agents that do not find the best solution, I n of them is the best solution. I a includes the solution they got. So E>0, prey action is promoted according to statement above. And to the agents that find the best solution, the situation is the opposite. 3.2 Competitive Action in EPSO To bring in competitive action to improve PSO algorithm, an energy function should be proposed firstly. Energy ( k , i ) = Starvation
max
− Starvation ( k , i ) ,
(7)
where, Energy(k ,i) is the ith agent energy value at the time of k. Starvationmax is maximum starvation value of agents. Correspondingly, Starvation(k ,i) is current one of the ith agent. As an agent, every particle strives to increase its energy. In Eq. (7), energy increase denotes that Starvation(k, i) must decrease. Starvation value depends on whether agent finds the best solution, such as birds prey on food. If getting the best solution, Starvation(k, i) will reduce. Thus, competition among them happens. Considering physiological needs and security needs at the same time, starvation value is given as x(t ) = x0 e −art [11]. Where, x0 is initial value, and t is diet time.
a, r are constants. Because PSO uses difference equation for searching, some changes on the formula above should be done. New starvation value is expressed as: [ Starvation ( k ,1) Starvation ( k , 2 ) ... Starvatio n ( k , n ) ] T = [ Starvation ( 0 ,1) Starvation ( 0 , 2 ) ... Starvatio n ( 0 , n ) ] ⎡ e − ar ( n1 k − n10 ) ⎢ 0 ×⎢ ⎢ M ⎢ 0 ⎢⎣
0 e − ar ( n 2 k − n 20 ) M
L L M
L
0
⎤ ⎥ ⎥, ⎥ ⎥ e − ar ( n nk − n n 0 ) ⎥⎦ 0 0 0
(8)
Emotional Particle Swarm Optimization Starvation ( k ) T = Starvation (0) T × e C
where,
771
(9)
0 0 L ⎡− ar ( n1k − n10 ) ⎤ ⎢ ⎥, 0 ( ) 0 − − ar n n L 2k 20 ⎥ C=⎢ ⎢ ⎥ 0 M M M ⎢ ⎥ 0 0 0 ( ) − − ar n n nk n 0 ⎣ ⎦
th nik is the number of getting the best solution for the i agent at the time of k. While the ith agent gets the best value, nik adds one. Otherwise, it subtracts one. ni 0 is an initial constant .For maximum speed of preying on food is proportional to starvation value, it could be deduced as follow. b is a constant.
(10)
Vmax ( k ) T = b × Starvation(k ) T .
4 Simulation and Results 4.1 Test with Benchmark Function With improved method, three groups of comparing experiments with four Benchmark functions have been done. Emotion could be added to existed PSO algorithms, so experiments are comparison between standard PSO (SPSO) and SPSO with emotion, comparison between chaos PSO (CPSO) and CPSO with it, and comparison between niche PSO (NPSO) and NPSO with it. Four Benchmark functions are shown as: 1) Levy No.5 Function 5
5
i =1
j =1
min F1 ( x, y) = ∑[i × cos((i − 1) × x + i )]× ∑[ j × cos(( j + 1) × y + j )] + ( x + 1.42513) 2 + ( y + 0.80032) 2 .
(11)
2) Shaffer’s F6 Function min
F2 ( x , y )
=
2
sin
2 − 0 .5 x 2 + y
( 1 + 0 . 001
( x 2 + y 2 )) 2
−
0 .5
.
(12)
3) Generalized Schwefel’s Problem 2.26 min
F3 (X ) = −
30
∑
( x i ⋅ sin(
x i )) .
i=1
(13)
4) Generalized Griewank Function min
F4 (X ) =
1 4000
30
∑
i =1
x i2 −
30
∏
i =1
cos(
xi ) + 1. i
(14)
Characters of every function are listed in Table 1. Because F3 and F4 have 30 dimensions, population N pop is set to 100. For F1 and F2, N pop = 50 . k = 1, r = 0.2, x0 = 80 , Vmax_ H = 160 , Vmax_ L = 40 , c1 = 1.8, c2 = 1.8 , ω(t ) = 1.0 - (0.5 ∗ t)/iter_max . t is current generation. iter_ max is the maximum generation. Suppose that algorithm will stop when generation surpasses 1000. That is to say iter_ max = 1000 . Niche is stetted to 5. Minimum scale of sub-population is 15.
772
W. Wang et al.
And the Maximum one is 40. Range of disturbance in chaos is δ = 0.5 . When accuracy is 0.00001, compare their iteration. Run every algorithm 20 times, and statistics are shown in Table 2. From Table 2, it is known that variance of all particles value is smaller, which denotes that distributions of particles are centered in. And average value got closer to optimizing value. It shows that all particles in swarm are around optimizing value. Moreover, every algorithm with strategy stated in section 3 is a little better than corresponding one without the strategy in aspects of variance and average value. Improvement is effective. Because functions F1 F2 only have 2 dimensions, their Best value reached optimizing value when algorithm finished. But functions F3 F4 have 30 dimensions; searching process is a little difficult.
、
、
Table 1. Function characteristic
Function F1 F2 F3 F4
Domain of definition Best state (1.3068,-1.4248) x ≥ −10, y ≤ 10 x ≤ 100 , y ≤ 100
xi ≤ 500 xi ≤ 600
40
F1
30
Best value
Features
(0,0) Xi=420.9687
–176.1375 -1.000 -12569.5
760 local extremums Infinitelocal extremums 30Dim with many extremums
Xi=0
0
30Dim
F2
20 n
10 0 -10
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
45
47
49
Generation
Fig. 3. Comparison of looking for food. Along with increasing of generation, the number grows. Correspondingly, the value of other agents who do not get the best solution should decrease.
Based on an overall consideration of Best value and Worst value, effects of algorithm with strategy are better than the one without it. On condition that every algorithm is running 20 times, average generation of algorithm with strategy is smaller than the corresponding one without it. Furthermore, generation variance is also smaller which shows that improvement make algorithm converge more rapidly and stably. Convergent generations are in a range of smaller fluctuations. Therefore improvement is effective. For agent who gets the best solution, upper threshold of its speed gets lower. It could make algorithm search around optimizing solution. And agents without the best solution increase their threshold to speed up in order to search the global solution. Table 2 demonstrates that upper threshold of speed could enhance performance of algorithm. Now if there is evidence that competition among agents could lead to changes of upper threshold, it can be said that adjusting strategy proposed in section 3 is effective. For function F1 F2, the number of getting the best
、
Emotional Particle Swarm Optimization
773
Table 2. Statistics of function value
Average Variance
F1
F2
F3
F4
SPSO SPSO with CPSO CPSO NPSO NPSO SPSO SPSO with CPSO CPSO NPSO NPSO SPSO SPSO with CPSO CPSO NPSO NPSO SPSO SPSO with CPSO CPSO NPSO NPSO
Best
Worst Average Genera- Best Worst Con-
-148.909 -143.576
1.89E+03 1.57E+03
–176.13 –176.13
0.00000 0.00000
434.600 6.18E+02 430.700 6.05E+02
369 386
1000 1000
80% 85%
-134.629 -136.935
2.77E+03 2.29E+03
–176.13 –176.13
0.00000 -6.5019
381.500 1.70E+03 431.750 1.48E+03
327 358
1000 1000
90% 90%
-165.895 -167.036
6.05E+02 1.01E+02
–176.13 -19.7140 348.900 1.73E+04 –176.13 -138.1500 297.100 5.88E+03
169 170
1000 1000
85% 90%
-0.98937 -0.98950 -0.98914
1.31E-04 1.20E-04 1.18E-04
-1.00 -1.00 -1.00
-0.91917 507.0625 1.44E+04 -0.92051 470.7500 6.16E+03 -0.92178 462.1333 5.67E+03
353 328 346
1000 1000 1000
80% 80% 75%
-0.99010 -0.97302
9.70E-05 3.78E-03
-1.00 -1.00
-0.92180 427.6471 4.50E+03 -0.50397 474.2222 1.02E+04
310 381
1000 1000
85% 60%
-1.00
-0.63676 445.5000 5.55E+03
-0.98657
8.58E-04
-6393.880 -8801.460
7.90E-08 -7673.40 -5303.8 4.43E-08 -10713. 00 -6270.0
347
1000
60%
1000 1000
1000 1000
0% 0%
-8114.680 -9224.160
1.65E-04 -11048. 00 -6370.2 679.0000 6.27E+03 5.97E-08 -12154. 00 -8008.3 644.0000 5.62E+03
623 591
1000 1000
10% 10%
-5937.690 -7488.490
4.82E+05 2.65E+05
-7270.80 -9618.10
-676.52 -3032.1
-
-
1000 1000
1000 1000
0% 0%
1.00173 1.00103 1.00193
0 0 0
1.0015 1.0003 1.0009
1.00190 1.00150 1.00280
-
-
1000 1000 1000
1000 1000 1000
0% 0% 0%
1.00083 1.00237
0 0
1.0005 1.0001
1.00150 1.00019
-
-
1000 1000
1000 1000
0% 0%
1.00097
0
1.0001
1.00015
-
-
1000
1000
0%
-
-
solution is shown in Figure 3. Along with increasing of generation, the number grows. Correspondingly, the value of other agents who do not get the best solution should decrease. According to Eq. (10), Vmax (k)T is adjusting. 4.2 Clustering Problem Application As we know, if there are two data Objecti and Objectj, Euclid and Cosine distance is defined separately as follows: d (Objecti , Object j ) = Objecti , Object j =
m
m
∑ (Object k =1
d (Objecti , Objectj ) = 1 − sim(Objecti , Objectj ) = 1 − ∑ (Objectik , Objectjk ) k =1
ik
, Object jk ) 2 ,
m
∑ (Object k =1
ik
(15) m
) 2 ⋅ ∑ (Objectjk ) 2 , k =1
(16)
where, i, j=1,2,3,…,n is index, and • is Euclid distance, Objecti , Object j ∈ R m . Apply SPSO and SPSO with emotion presented in section 3 to this clustering problem. Feature of data set Iris is listed in Table 3[12].
774
W. Wang et al. Table 3. Feature of data set
Data set Iris
Instances Number
Attributes Number
Classes Number
150
4
3
According to the first two features of Iris, clustering result is shown in Figure 4.
〇 □
〇
Fig. 4. Clustering result. In this figure, *, and + denote three correct classes. There are some and +. It illustrates that the first two features could not classify the second and overlaps of is date classified falsely. third type.
,
,
In this application, population is Npop=100. And c1=1.49 c2=1.49 iter_max=300. Clustering centers are selected randomly. In addition, date is not pretreated.
5 Conclusion This article describes similarity between particles swarm and multi-agent system from three aspects: particle description, single particle action and topology structure among multi particles. Based on multi-agent idea, emotion brings competition in PSO. In addition, analyze emotion element in competition of preying on food. And upper threshold of speed is given with formula. Simulations show that improved PSO has good effect.
Acknowledgments This work is supported by the National Natural Science Foundation of China (60573059), the High Technology Research and Development Program of China (2007AA01Z160).
References 1. Gallad, A.E., Hawary, M.E., Sallaam, A., et al.: Enhancing the Particle Swarm Optimizer via Proper Parameters Selection. In: Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering, Winnipeg, Canada, pp. 792–797 (2002) 2. Shi, Y., Eberhart, R.: Fuzzy Adaptive Particle Swarm Optimization. In: Proceedings of Congress on Evolutionary Computation, Seoul, Korea, pp. 101–106 (2001)
Emotional Particle Swarm Optimization
775
3. Krink, T., Vesterstrom, J.S., Riget, J.: Particle Swarm Optimization with Spatial Particle Extension. In: Proceedings of the IEEE Congress on Evolutionary Computation, Honolulu, USA, pp. 1474–1479 (2002) 4. Lvbjerg, M., Rasmussen, T.K., Krink, T.: Hybrid Particle Swarm Optimizer with Breeding and Subpopulations. In: Proceedings of the Genetic and Evolutionary Computation Conference, San Francisco, USA, pp. 469–476 (2001) 5. Kennedy, J., Mendes, R.: Population Structure and Particle Swarm Performance. In: Proceedings of the IEEE Congress on Evolutionary Computation, Honolulu, USA, pp. 1671–1676 (2002) 6. Zhao, F.Q., Zhang, Q.Y., Yang, Y.H.: A Scheduling Holon Modeling Method with Petrinet and Its Optimization with a Novel PSO-GA Algorithm. In: Proceedings of 2006 10th International Conference on Computer Supported Cooperative Work in Design, Nanjing, China, pp. 1302–1307 (2006) 7. Peng, X.Y., Wu, H.X., Peng, Y.: Parameter Selection Method for SVM with PSO. Chinese Journal of Electronics 15(4), 638–642 (2006) 8. Shi, C.Y., Zhang, W., Xu, J.H.: An Introduction to MultiAgent System, Publishing House of Electronics Industry, Beijing (2003) 9. Wang, Z.L.: Artificial Psychology─A most Accessible Science Research to Human Brain. Journal of University of Science and Technology Beijing 22(5), 478–481 (2000) 10. Wang, Z.L.: Artificial Psychology. China Machine Press, Beijing (2007) 11. Lang, X.: Function Maximum Principles Used in Animal Behavior Modeling. Journal of Mathematics for Technology 16(3), 17–21 (2000) 12. Blake, C.L., Merz, C.J.: UCI Machine Learning repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLSummary.html
Symbiotic Multi-swarm PSO for Portfolio Optimization Ben Niu1, Bing Xue1, Li Li1, and Yujuan Chai2 1
College of Management, Shenzhen University, Shenzhen, Guangdong 518060, China 2 Faculty of Science, McMaster University, Hamilton, Ontario L8S4L8, Canada
[email protected] Abstract. This paper presents a novel symbiotic multi-swarm particle swarm optimization (SMPSO) based on our previous proposed multi-swarm cooperative particle swarm optimization. In SMPSO, the population is divided into several identical sub-swarms and a center communication strategy is used to transfer the information among all the sub-swarms. The information sharing among all the sub-swarms can help the proposed algorithm avoid be trapped into local minima as well as improve its convergence rate. SMPSO is then applied to portfolio optimization problem. To demonstrate the efficiency of the proposed SMPSO algorithm, an improved Markowitz portfolio optimization model including two of the most important limitations are adopted. Experimental results show that SMPSO is promising for this class of problems. Keywords: Symbiotic PSO, particle swarm, portfolio optimization.
1 Introduction Portfolio Optimization (PO), also known as mean-variance optimization (MVO), is risk management tool which allows you to construct optimal portfolios considering the trade-off between market risk and expected returns. PO problem is NP-hard and non-linear with many local optima. Mathematical programming methods have been applied to this problem for a long time [1, 2, 3]. Nowadays, a number of different heuristic algorithms have been proposed for solving this problem, including genetic algorithms (GA) [4, 5], simulated annealing [6], neural networks [7] and others [8, 9, 10]. However most of the PO models used in those pioneer works may often be considered too basic, as it ignores many of the constrains, such as the transaction fee and whether short sale is permitted, and the upper and the lower bounds of proportion of each asset in the portfolio. In this work, we use a modified PO model considering the transaction costs and no short sales. The main motivation of this study is to employ an improved multi-swarm cooperative PSO (MCPSO) for the modified PO model. MCPSO was firstly proposed by B. Niu in 2005[11], which is inspired by the phenomenon of symbiosis in natural ecosystems, where many species have developed cooperative interactions with other species to improve their survival. MCPSO has been successfully applied in many problems, including function optimization [11], D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 776–784, 2009. © Springer-Verlag Berlin Heidelberg 2009
Symbiotic Multi-swarm PSO for Portfolio Optimization
777
neural networks training[12], fuzzy modeling designing[13] etc. In this paper we will apply an improved MCPSO, i.e. symbiotic multi-swarm particle swarm optimization (SMPSO) to find efficient portfolio by solving the PO model. The rest of the paper is organized as follows. Section 2 gives a review of PSO and a description of the proposed algorithm SMPSO. Section 3 describes portfolio optimization model. Section 4 gives the detailed experimental studies. Finally, conclusions are drawn in Section 5.
2 PSO and SMPSO 2.1 Particle Swarm Optimization (PSO) The basic PSO is a population based optimization tool, where the system is initialized with a population of random solutions and the algorithm searches for optima by updating generations. In PSO, the potential solutions, called particles, fly in a Ddimension search space with a velocity which is dynamically adjusted according to its own experience and that of its neighbors. The position of the ith particle is represented as xri = ( xi1 , xi 2 ,..., xiD ) , , d ∈[1, D] , ld , ud are the lower and upper bounds for the dth dimension, respectively. The rate of velocity for particle i is represented as vri = ( vi1 , vi 2 , ..., viD ) , is clamped to a maximum velocity vector vrmax , which is specified where
xid ∈ [ld , ud ]
by the user. The best previous position of the ith particle is recorded and represented as Pi = ( Pi1 , Pi 2 , ..., PiD ), which is also called pbest . The index of the best particle among all the particles in the population is represented by the symbol g , and p g is called gbest . At each iteration step, the particles are manipulated according to the following equations:
,
vid = wvid + R1c1 ( Pid − xid ) + R2 c 2 ( p gd − xid )
(1)
xid = xid + v id .
(2)
Where w is inertia weight; c1 and c2 are acceleration constants; and R1 , R2 are random vectors with components uniformly distributed in [0, 1]. For Eq. (1), the portion of the adjustment to the velocity influenced by the individual’s own pbest position is considered as the cognition component, and the portion influenced by gbest is the social component. After the velocity is updated, the new position of the ith particle in its dth dimension is recomputed. This process is repeated for each dimension of the ith particle and for all the particles in the swarm. 2.2 Symbiotic Multi-swarm Particle Swarm Optimizer (SMPSO) In our previous proposed MCPSO algorithm, the population is divided into several sub-swarms in which some sub-swarms are master swarms and the other sub-swarms are slave swarms. Both the master and slave swarms have different properties. The master swarms update particles information according to the slave swarms and their own. While the slave swarms update the particles information only based on their own
,
778
B. Niu et al.
Start Divide the population into several sub-swarms Initialize the particles of each sub-swarm Evaluate the fitness of initial sub-swarm Find the personal best and global best in each sub-swarm Calculate the averaged global best of each sub-swarm Update the particle velocity and position in each sub-swarm
Is Maximum Iterations reached?
No
Yes End Fig. 1. Flow chart of SMPSO
information. It should be noted that there is no information exchange between slave swarms which will slow down the convergence rate. The detailed introduction of MCPSO can be referred to [11]. To deal with this issue, the population in SMPSO consists of several sub-swarms with the same properties, i.e. they are both identical sub-swarms. Each sub-swarm can supply many new promising particles to other sub-swarm as the evolution proceeds. Each subswarm updates the particle states based on the best position discovered so far by all the particles both in the other sub-swarms and its own. The interactions between the other sub-swarms and its own influence the balance between exploration and exploitation and maintain a suitable diversity in the population, even when it is approaching the global solution, thus reducing the risk of converging to local sub-optima.
Symbiotic Multi-swarm PSO for Portfolio Optimization
779
Table 1. Pseudocode for SMPSO algorithm
Algorithm SMPSO Begin Randomize positions and velocities of N × P particles in search space. Divide whole population into N species with P particles randomly; Evaluate the fitness value of each particle Repeat Do in parallel Swarm n , 1 ≤ n ≤ N End Do in parallel Barrier synchronization //wait for all processes to finish Select the center particle and determinate its position according to Eq.(4) Evolve each sub-swarm //Update the velocity and position using Eq. (3) and (2), respectively Evaluate the fitness value of each particle Until a terminate-condition is met End
The search information can be transformed among sub-swarms by a center communication mechanism that uses a center particle whose position is averaged by the sub-swarms to guide the flight of particles in all the sub-swarms. During the flight each particle of the sub-swarm adjusts its trajectory according to its own experience, the experience of its neighbors, and the experience of the particles in other subswarms, making use of the best previous position encountered by itself, its neighbors and the center particle position. In this way, the search information can be transformed between sub-swarms which can accelerate the convergence rate. In SMPSO, we use a population of N ×P individuals, or in symbiosis terminology, an ecosystem of N ×P organisms. The whole population is divided into N species to modeling symbiosis in the context of the evolving ecosystems (for convenience, each species has the same population size P). As in nature, the species are separated breeding populations and evolve parallel, while interact with one another within each generation and have a symbiotic relationship. To realize this mechanism, we propose a modification to the original PSO velocity update equation. In each generation, particle i in species n will evolve according to the following equations: n n n n n n n n vi (t + 1) = wvi (t) + R1c1( pi − xi (t )) + R2c2 ( pg − xi (t)) + R3c3( pc − xi (t )) ,
where
n pi
and
n Pg
(3)
are the best previous solution achieved so far by particle i and the
species n, respectively.
R3 is a random value between 0 and 1. c3 is acceleration n constant; Pc represents the center position of the global best particle in all the sub-
swarms. After N sub-swarms update their positions and best performed particle is found, a center particle is updated according to the following formula:
780
B. Niu et al. n n 1 N ∑ p g (t ) , Pc ( t + 1) = N i =1 n = 1, 2, L N ,
(4)
i = 1, 2, L P .
Unlike other particles, the center particle has no velocity, but it is involved in all operations the same as the ordinary particle, such as fitness evaluation, competition for the best particle, except for the velocity calculation. The flow chart SMPSO is shown in Fig.1. and pseudocode for the SMPSO is listed in Table 1.
3 Portfolio Optimization Problem The portfolio optimization problem is one of the most important issues in asset management, which deals with how to form a satisfying portfolio. Modern portfolio analysis started from pioneering research work of Markowitz (1952) [14]. The original portfolio optimization model is usually called mean–variance model, firstly proposed by Markowitz. In this paper, we use an improved mean–variance model considering the transaction costs and no short sales. It assumes an investor allocates the wealth among n assets. Some notations are introduced as follows:
,
ri : The yield of the i asset i = 1 , … … , n ; R = ( R1 … … , R n ) S : R i = E ( ri ) denoting the expected yield; σ i j = c o v ( ri , r j ) X = ( x1 … … , x n )
: the covariance of ri and r j ;
: x i is proportionment of the i asset that investor want to invest;
k = ( k1 … … , k n ) : k i is the transaction fee of the i asset; λ : The risk factor distributing in [0, 1]. Larger λ represents
investor love risk more.
Based on these defined variables, the function f ( x ) and g ( x ) denotes the revenue and risk in the portfolio optimization problem can be obtained as following: f ( x) =
n n ∑ Ri x i − ∑ ki x i i=1 i=1
n n g ( x ) = ∑ ∑ σ ij x i x j i =1 i = 1
,
.
(5) (6)
The improved portfolio optimization model can be formulated as: m in F ( x ) = m in { λ g ( x ) -
⎧⎪ ∑n x i ⎨ i =1 ⎪⎩ 0
(1 - λ ) f ( x )}
= 1;
< xi .
Where 0 < xi means that the short sale is not permitted.
(7)
Symbiotic Multi-swarm PSO for Portfolio Optimization
781
When we use SMPSO to solve the model, there is an n demension search space denoting n kinds of sassets, and the position of the particle X = ( x1 … … , x n ) presents the proportionment of every assesst. The position of the particle with the minimum fitness value is the best selection of portfolio optimization.
4 Illustrative Examples In order to test the effectiveness of SMPSO for portfolio optimization, we use the data of five assets as the sample that can be referred to [15]. ki is set as 0.075%. Different risk preference is considering, where three value of risk factors λ (0.2, 0.5, 0.8) identifying the different kind of inverstors is used. In applying PSO to the above model, wmax , wmin , c1 , c2 are set to be 0.9, 0.4, 2.0, 2.0, respectively. For SMPSO, c1 = c2 = 1.367 and c3 = 2 is used, wmax , wmin is set the same as those defined in PSO. The max iterations of the two methods are set to be 200. A total of 50 runs are performed. Numerical results with different λ obtained by the standard PSO and the SMPSO are showed in the Table 2-3. Figures 2-4 present the mean relative performance using different λ generated by PSO and SMPSO. The max value, the min value, the standard deviation and the mean value are summarized in Table 2-3. It is clear that for almost of all the different risk preferences, SMPSO owns smaller standard deviation and mean value, which demonstrated it outperforms PSO in terms of result robustness and solution quality.
-0.034 PSO SMPSO
-0.036
Fitness
-0.038 -0.04 -0.042 -0.044 -0.046 0
50
100 iterations
150
Fig. 2. Mean relative performance using λ = 0.2
200
782
B. Niu et al.
-0.012 PSO SMPSO
Fitness
-0.0125
-0.013
-0.0135
-0.014 0
50
100 iterations
150
200
Fig. 3. Mean relative performance using λ = 0.5 -3
-0.4
x 10
PSO SMPSO
-0.6
Fitness
-0.8 -1 -1.2 -1.4 -1.6 -1.8 0
50
100 iterations
150
200
Fig. 4. Mean relative performance using λ = 0.8
From Figures 2-4, it is obviously found that SMPSO has quicker convergence rate in the different situations compared with PSO. Furthermore, its convergence process is much steadier than that of PSO.
Symbiotic Multi-swarm PSO for Portfolio Optimization
783
Table 2. Numerical results with different λ
Worst Best -4.07894e-002 -4.46873e-002 SMPSO -4.47326e-002 -4.47331e-002
λ = 0.2 PSO λ = 0.5 PSO
Mean Std -4.36409e-002 1.07146e-003 -4.47331e-002 7.68445e-008
-1.29264e-002
-1.36157e-002
-1.34150e-002 1.617577e-004
SMPSO -1.36159e-002
-1.36159e-002
-1.36159e-002 4.32891e-012
-1.01019e-003
-1.76112e-003
-1.52787e-003 2.17589e-004
SMPSO -1.76128e-003
-1.76129e-003
-1.76128e-003 3.46453e-010
λ = 0.8 PSO
Table 3. Numerical results with different λ x1
λ = 0.2 PSO
SMPSO λ = 0.5 PSO SMPSO λ = 0.8 PSO SMPSO
3.5878e-006 2.2211e-008 5.4377e-013 7.5996e-013 3.1966e-011 1.3609e-010
x2
1.7489e-005 1.2714e-007 1.3446e-001 1.3851e-001 6.3375e-001 6.3285e-001
x3
3.4526e-001 2.9494e-001 7.9098e-001 7.8621e-001 9.2079e-002 9.1060e-002
x4
x5
1.8246e-011 6.1913e-008 4.4508e-002 4.2837e-002 2.7417e-001 2.7608e-001
6.5472e-001 7.0506e-001 3.0046e-002 3.2448e-002 3.7942e-013 1.5641e-009
All the results presented in the tables and figures can proved that the SMPSO could be a more effective way for the investors to solve the portfolio optimizations problems.
5 Conclusions In this paper, we proposed a new variant of original PSO, i.e. symbiotic multi-swarm PSO that is inspired by the phenomenon of symbiosis in natural ecosystems. SMPSO is based on a multiple swarms scheme, in which the whole population is divided into several sub-swarms. The particles in each sub-swarm are enhanced by the experience of its own and the other sub-swarms. By introducing the center communication mechanism the search information can be transferred among subswarms, that help accelerate the convergence rate and avoid the particles be trapped into local minima. We also use the improved Markowitz model considering two real-world constraints to test our proposed algorithm. The preliminary experimental results suggest that SMPSO have superior features, both in high quality of the solution and robustness of the results. Our proposed portfolio model and SMPSO are applicable and reliable in real markets with large number of stocks.
784
B. Niu et al.
Acknowledgment This work is supported by Shenzhen-Hong Kong Innovative Circle project (Grant no.SG200810220137A) and Project 801-000021 supported by SZU R/D Fund.
References 1. Young, M.R.: A Minimax Portfolio Selection Rule with Linear Programming Solution. Management Science 44, 673–683 (1998) 2. Arenas, M., Bilbao, A., Rodriguez Uria, M.V.: A Fuzzy Goal Programming Approach to Portfolio Selection. European Journal of Operational Research 133, 287–297 (2001) 3. Ballestero, E., Romero, C.: Portfolio Selection: A Compromise Programming Solution. Journal of the Operational Research Society 47, 1377–1386 (1996) 4. Oh, K.J., Kim, T.Y., Min, S.: Using Genetic Algorithm to Support Portfolio Optimization for Index Fund Management. Expert Systems with Applications 28, 371–379 (2005) 5. Yang, X.: Improving Portfolio Efficiency: A Genetic Algorithm Approach. Computational Economics 28, 1–14 (2006) 6. Crama, Y., Schyns, M.: Simulated Annealing for Complex Portfolio Selection Problems. European Journal of Operational Research 150, 546–571 (2003) 7. Fernandez, A., Gomez, S.: Portfolio Selection Using Neural Networks. Computers & Operations Research 34, 1177–1191 (2007) 8. Derigs, U., Nickel, N.H.: On a Local-search Heuristic for a Class of Tracking Error Minimization Problems in Portfolio Management. Annals of Operations Research 131, 45–77 (2004) 9. Derigs, U., Nickel, N.H.: Meta-heuristic Based Decision Support for Portfolio Optimization with a Case Study on Tracking Error Minimization in Passive Portfolio Management. OR Spectrum 25, 345–378 (2003) 10. Schlottmann, F., Seese, D.: A Hybrid Heuristic Approach to Discrete Multi-Objective Optimization of Credit Portfolios. Computational Statistics & Data Analysis 47, 373–399 (2004) 11. Niu, B., Zhu, Y.L., He, X.X., Wu, H.: MCPSO: A Multi-Swarm Cooperative Particle Swarm Optimizer. Applied Mathematics and Computation 185, 1050–1062 (2007) 12. Niu, B., Zhu, Y.-l., He, X.-X.: A Multi-Population Cooperative Particle Swarm Optimizer for Neural Network Training. In: Wang, J., Yi, Z., Żurada, J.M., Lu, B.-L., Yin, H. (eds.) ISNN 2006. LNCS, vol. 3971, pp. 570–576. Springer, Heidelberg (2006) 13. Niu, B., Zhu, Y.L., He, X.X., Shen, H.: A Multi-swarm Optimizer Based Fuzzy Modeling Approach for Dynamic Systems Processing. Neurocomputing 71, 1436–1448 (2008) 14. Markowitz, H.W.: Foundations of Portfolio Theory. Journal of Finance 46, 469–477 (1991) 15. Yang, K.Y., Wang, X.F.: Solving the Multi-solution Portfolio Selection Model Based on the GA (Chinese). Journal of ShanDong finance college 6, 60–63 (2003)
A Novel Particle Swarm Optimization with Non-linear Inertia Weight Based on Tangent Function Li Li1, Bing Xue1, Ben Niu1, Lijing Tan2, and Jixian Wang3 1
College of Management, Shenzhen University, Shenzhen 518060, China 2 Measurement Specialties Inc, Shenzhen 518107, China 3 School of Engineering , Anhui Agricultural University, Hefei 230036, China
[email protected] H
H
Abstract. Inertia weight is a most important parameter of particle swarm optimization (PSO), which can keep a right balance between the global search and local search. In this paper, a novel PSO with non-linear inertia weight based on the tangent function is provided. The paper also presents the method of determining a control parameter in our proposed method, saving the user from a tedious trial and error based approach to determine it for each specific problem. The performance of the proposed PSO model is amply demonstrated by applying it for four benchmark problems and comparing it with other three PSO algorithms. From experimental results, it can be concluded that using a non-linear dynamic inertia weight makes the rapidity of convergence rate with higher precision. Keywords: Particle swarm optimization, inertia weight, tangent function.
1 Introduction During the 1990’s, the researchers paid their attentions on the group animals such as the birds, the ants, or the fishes, which is not very clever alone, but they can finished the high-performance cooperative work when they are together. Swarm intelligence algorithms were generated based on the investigations on these group animals. Particle swarm optimization (PSO) is one of the effective swarm intelligence algorithms firstly proposed by Eberhant and Kennedy [1, 2] in 1995. As a relatively new swarm intelligence algorithm, PSO has shown some important advances, such as easy in implementation , few parameters to be adjusted, and has a faster convergence rate. It has been successfully applied in many areas [3, 4, 5, 6] However, in the original PSO, every particle searches the best solution based on the previous local best and the global best, and all the particles will be more and more familiar, which leads to the particles to be trapped in a local minima easily. To deal with these issues, many researchers provided different attempts, and adjusting the inertia weight is one of the most effective ways. The inertia weight can balance the global search and the local search to get better results: the global search is stronger when the inertia weight is larger, while the algorithm is good at the local search with a smaller inertia weight [7]. Therefore, the proper inertia weight can improve the performance with fewer generations. Many researchers have done lots of investigations D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 785–793, 2009. © Springer-Verlag Berlin Heidelberg 2009
786
L. Li et al.
on it, such as Shi provided a linearly-decrease inertia weight [8] and the dynamic inertia weight based on fuzzy reasoning [9] Van offered the random inertia weight (RIW) [10], Han produced a novel PSO with an adapting inertia weight [11], Niu proposed a new hybrid global optimization algorithm PSODE combining particle swarm optimization (PSO) with differential evolution(DE) [12]. In the present paper, we proposed a novel non-linear inertia weight adjusting method based on tangent function. To illustrate the effectiveness and performance of the new strategy, a set of four representative benchmark functions was employed to evaluate it in comparison with other three different improved PSO with fixed inertia weight FIW), a linearly-decrease inertia weight (LIW)[8], and Nonlinearly-decrease inertia weight (NIW)[13], respectively. The remaining contents of this paper are organized as follows: The next section introduces the standard PSO. A novel PSO with non-linear dynamic inertia weight is presented in Section 3. In Section 4, we describe the benchmark functions, experimental settings, and compare experimental results of the novel PSO with other three PSO with different inertia weight. Finally, Section 5 we end the paper with some conclusions and future work propositions.
,
(
2 Standard Particle Swarm Optimization Algorithm PSO is a population-based searching algorithm and is initialized with a population of random solutions to search for the optima by updating generation. The particle i denoting one potential solution is flying in the n-dimensions; in the tth generation, it owns a position x i ( t ) = ( x i1 , x i 2 … … , x in ) , and velocity v i ( t ) = ( v i1 , v i 2 … … , v in ) . There is a fitness value evaluated with a predefined fitness function to appraise the particle’s current position. At the same time, every particle can remember the best position they had ever been. Therefore, in every generation there are two “Best”: The local best value is the best position that the particle has achieved in the current stage, which is called Pbest; the global best value is the overall best solution tracked by the particle swarm which is called Gbest. Based on the Pbest and Gbest, the PSO searches for the best solution by updating every particle’s position and velocity until meeting the end conditions. And the following equations are used to update the generations:
,
Vid (t + 1) = Vid (t ) + c ⋅ rand () ⋅ ( pid − xid (t )) + c2 ⋅ rand () ⋅ ( pgd − xid (t )) , 1
(1)
x i ( t + 1) = x i ( t ) + V i ( t + 1) .
(2)
、
Where the c1 c 2 in the Eq(1) are the learning factors, they both are often taken as 2. rand () is a random value, which is distributing during [0,1]. pid presents the Pbest while
pgd
presents the Gbest. In order to make sure the position
region, there is a maximum Vid ( t ) =
vi (t )
xi (t )
is in the feasible
denoted by vmax to confine the velocity:
⎧⎪V ( t )................ V ( t ) < Vmax id id . ⎨ ................... ( ) V V t ≥ V ⎪⎩ max max id
(3)
A Novel PSO with Non-linear Inertia Weight Based on Tangent Function
787
As when using the above equations to search the best solution, the convergence velocity in the earlier period is fast, but the local search is poor and the convergence velocity is slower along with generations, and the precise of the solution often can not meet the satisfactory level. For these problems, Shi [11] introduced the inertia weight w into the velocity equation: Vid (t + 1) = wVid (t ) + c1 ⋅ rand () ⋅ ( pid − xid (t )) + c2 ⋅ rand () ⋅ ( pgd − xid (t )) .
(4)
The Eq(4) also can be divided into three parts: the first part indicates the previous velocity of the particle, which can make sure the global search, the second and third part are the social reasons that lead to the change the velocity of the particle, and they can decide the local search of the algorithm. So the inertia weight w can control the influence of previous velocity on the new velocity, and it can make a balance between the global search and the local search: global search performance is fine with larger inertial weight while a smaller inertia weight facilitates a local search. The two searches with appropriate coordination can bring a better performance, so to find a proper w is one of the crucial ways to improve the capability of the algorithm.
3 Novel Non-linear Inertia Weight PSO Algorithms Based on Tangent Function Based on lots of experiments, we found that the global search is strong when w is larger, and the algorithm owns faster convergence ability but it is hardly to get precise. On the other hand when w is smaller, the local search is stronger and the result is more precise, but the convergence ability is poorer, neither the ability to escape the local best. So the w should vary with the generations, and we try to find a proper variations to improve the performance of the PSO. During the initial experimental study, we tried to introduce a monotone increasing or decreasing strategy to update w . In our proposed method tangent function y = tan( x ) is used to be the updating function, in which result y is increasing along with the independent variable x , and the speed of the increase also increases. When x = 7/8 y = 1 . Based on this observation, we proposed a novel PSO with non-linear initial weight and the resulting inertia weight updating equation is as following:
,
w(t ) = ( wstart − wend ) * tan(7/8 * (1 − (
t tmax
k ) )) + wend .
(5)
Where wstart is the initial value of the inertia weight, which is also the largest value and normally set wstart =0.9 , wend is the final value of the inertia weight, which also is the smallest one and normally set wend = 0.4 ; t max is the maximum number of iterations. According to the equation, w is nonlinearly-decrease along with the increase of the iterations. In the initial iteration, the PSO with the larger w is with stronger global search, so the particle can fly around the total search space quickly. The local research becomes stronger along with the w becoming smaller and smaller. The new strategy may enhance the capability of the algorithm to avoid premature convergence and
788
L. Li et al.
escapes the local optimal. There is a coefficient 7/8 in the Eq (5) to guarantee the distributed in [0.4, 0.9]: When t = 1 , w( t ) = wstart = 0.9 and w when t = t max , w(t ) = wend = 0.4 . k is the control variable which can control the smoothness of the curve that reflects the relationship between the w and t . Figure 1-3 show the variation of inertia weight along with the generations when k = 0.2 k = 1 k = 3 , respectively. From these figures, it can be found that: when k = 0.2 , the functions between the w and t is convex function; when k = 1 , it is almost a linear one leaning to concave; and when k = 3 , it is a concave function.
、 、
0.9
0.9
0.9
k=1
k=0.2
k=2
0.8
0.8
0.7
0.7
0.7
w
w
w
0.8
0.6
0.6
0.6
0.5
0.5
0.5
0.4 0
500
1000
1500
t
0.4 0
500
1000
1500
0.4 0
500
Fig. 1. k = 0.2
Fig. 2. k = 1
1000
1500
t
t
Fig. 3. k = 2
In order to choose the best value of k , a multimode Griewank function is employed in the experiments. And k is confined in [0.1~2.0]. The experimental result (i.e., the mean and the standard deviations of the function values found in 20 runs) are listed in Table 1. From Table 1, it shows that when k is within [0.4~0.6] and [1.4~1.7], the mean and the standard deviations of the function values are both stable. So we can choose the value which can take a faster convergence rate. Figure 4 and 5 show the variation of the logarithm (base 10) of the mean values along with the generations when k = 0.6 , k = 1.7 respectively. It is easy to find it can get a faster convergence rate when k = 0.6 than k = 1.7 . In the following experiments, we choose k = 0.6 . Table 1. Experimental results on Griewank function using different k k
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
mean value 0.0294 0.0300 0.0329 0.0266 0.0258 0.0254 0.0315 0.0300 0.0301 0.0264
standard deviation 0.0276 0.0223 0.0248 0.0191 0.0278 0.0207 0.0307 0.0343 0.0235 0.0267
k
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
mean value 0.0302 0.0355 0.0413 0.0230 0.0289 0.0263 0.0261 0.0334 0.0257 0.0280
standard deviation 0.0230 0.0194 0.0291 0.0212 0.0273 0.0200 0.0198 0.0219 0.0253 0.0256
A Novel PSO with Non-linear Inertia Weight Based on Tangent Function 4
3 k=1.7 Best-fitness(log)
Best-fitness(log)
k=0.6 2 0 -2 -4 0
789
500 1000 iterations
2 1 0 -1 -2 0
1500
500 1000 iterations
Fig. 4. k = 0.6
1500
Fig. 5. k = 1.7
4 Experimental Studies 4.1 Test Functions and Parameters Settings In order to test the effectiveness and performance of our proposed method four representative benchmark functions are used in comparison with other three different inertia weight strategies (FIW, LIW, and NIW). The parameter settings of these four functions are listed in Table 2. Table 2. Test functions and parameters setting
Functio ns
Functio n mod el
Dim Se ar c h
name Sp here
Vmax
sp ace n 2 f1 ( x ) = ∑ xi
20
(-100,100)
20
( -30
20
( -10
i =1
(
)
Ro senbro ck
n ⎛ 2⎞ 2 2 f 2 x = ∑ ⎜ 100 xi +1 − xi + xi − 1 ⎟ i =1 ⎝ ⎠
Rastr igin
f3
(x ) =
n ∑ i =1
Gr iewank
f4
(x) =
n 2 n 1 ⎛x ∑ x i − ∏ cos ⎜ i i =1 4000 i =1 ⎝ i
( )
(
)
( x i2 − 1 0 c o s ( 2 π x i ) + 1 0 ) ⎞ ⎟ +1 ⎠
20
,3 0)
,1 0) (-600,600)
10 0 30 10 60 0
The four test functions can be classified as unimodal (Sp here function and Ro senbro ck function) and multimodal functions (Rastr igin function and Gr iewank function). w = 0.68 is set in FIW. Eq (6) and Eq(7) are used to determine w in LIW and NIW , respectively. w ( t ) = w s ta r t −
w sta r t − w e n d × t tm ax
,
(6)
790
L. Li et al.
2 w(t ) = w ) × exp(−k ∗ ( t t ) ) . − (w −w start start end max
(7)
In our experimental studies, w used in w in the three methods (TANW LIW and NIW) are all set within [0.9 0.4], that is wstart = 0.9 wend = 0.4 . The other pa-
,
rameters
c1 = c2 = 2.0
σ = 1e − 80 , and conducted.
,
, the swar m size is 40, and the allowab le err or
tmax = 1500
. A total of 50 runs for each experimental setting are
4.2 Experimental Results The results of the four functions are listed in Table 3, including the maximum value, the minimum value and the standard deviations. The graphs presented in Figs 6–9 illustrate the evolution of best fitness found by three algorithms, averaged for 50 funs for the four functions. Table 3. Results for all algorithms on benchmark functions
Function Sphere
strategy Max value
TANW NIW FIW LIW Rosenbrock TANW NIW FIW LIW Rastrigin TANW NIW FIW LIW Griewank TANW NIW FIW LIW
2.6552e-015 6.3016e-012 5.8000 e-002 9.7600e-009 2.6385e+002 4.4738e+002 8.0398e+002 5.6734e+002 3.5819e+001 3.1839e+001 4.8523e+001 3.3859e+001 7.8700e-002 1.3510e-001 7.4520e-001 1.0520e-002
Min value 1.2535e-020 1.1269e-015 1.000 e-003 4.8377e-012 2.1250e-001 1.1850e+001 1.0502e+001 4.4772e+000 6.9647e+000 5.9698e+000 1.1067e+001 6.9649e+000 0.0000e+000 2.6645e-015 6.2000e-003 9.9886e-011
standard deviations 3.7727e-016 1.3952e-012 1.4700e-002 1.6531e-009 5.5068e+001 7.7280e+001 1.4272e+002 1.0744e+002 5.8282e+000 5.3219e+000 9.0932e+000 5.3284e+000 2.0800e-002 2.6700e-002 1.9350e-001 2.5600e-002
Mean value 9.0940e-017 7.4996e-013 1.0200e-002 6.8240e-010 4.1048e+001 5.0810e+001 1.2477e+002 7.0154e+001 1.6916e+002 1.8548e+001 2.9706e+001 1.8067e+001 2.4000e-002 2.6400e-002 2.1550e-001 3.2800e-002
The data in Table 3 show that new way of w (TANW) can obtain more precise results for all of the four functions than other three methods. As seen from the figures, TANW with the fastest convergence rate can get the best solution. Although for Rastrigin function, the standard deviations obtained by TANW is higher than the results obtained by LIW and NIW, while it still gets the faster convergence rate and more a promising end-results. FIW that owns fixed w cannot balance the global search and the local search. Therefore, it owns faster convergence rate in the earlier
A Novel PSO with Non-linear Inertia Weight Based on Tangent Function 10 TANW FIW LIW NIW
fitness (log)
5 0 -5 -10 -15 -20 0
500
1000
1500
iterations
Fig. 6. Sphere function 8 TANW FIW LIW NIW
7
fitness (log)
6 5 4 3 2 1 0
500
1000
1500
iterations
Fig. 7. Rosenbrock function 2.8 TANW FIW LIW NIW
2.6
fitness (log)
2.4 2.2 2 1.8 1.6 1.4 0
500
1000
iterations
Fig. 8. Rastrigin function
1500
791
792
L. Li et al. 3 TANW FIW LIW NIW
fitness (log)
2 1 0 -1 -2 0
500
1000
1500
iterations
Fig. 9. Griewank function
period, but the final result is worst. NIW represented the weaker robustness (the larger standard deviations) in Griewank function and less precise (the larger mean of the function) in Rastrigin function than LIW, besides the two points, it produced better performance than LIW as a whole. The results also indicated that in the most time, the fixed inertia weight FIW often cannot give the satisfactory result, and the non-liner inertia weight (NIW and TANW) performs better than linear one (LIW).
( )
5 Conclusion and Future Work Based on analyzing the effectiveness of the inertia weight and the features of the tangent function, a novel PSO with non-liner inertia weight is proposed in this paper. Its performance is evaluated by four benchmark functions compared with other three methods. The experimental results illustrated that our proposed TANW not only has a faster convergence rate but also yields high quality of the solution and robustness of the results. Some further research to apply the proposed method to solve the actual problems should be carried out, such as the VRP and the portfolio optimization and other engineering problems.
Acknowledgment This work is supported by Shenzhen-Hong Kong Innovative Circle project (Grant no.SG200810220137A) and Project 801-000021 supported by SZU R/D Fund.
References 1. Eberchart, R.C., Kennedy, J.: A New Optimizer Using Particle Swarm Theory. In: Proceedings of the 6th International Symposium on Micromachine and Human Science, Nagoya, Japan, pp. 39–43 (1995)
A Novel PSO with Non-linear Inertia Weight Based on Tangent Function
793
2. Eberchart, R.C., Kennedy, J.: Particle Swarm Optimization. In: Proceedings of the IEEE International Conference on Neural Networks, Perth, Australia, pp. 1942–1948 (1995) 3. Mendes, R., Cortez, P., Rocha, M., Neves, J.: Particle Swarms for Feedforward Neural Network Training. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN 2002), pp. 1895–1899 (2002) 4. Venayagamoorthy, G.K., Doctor, S.: Navigation of Mobile Sensors Using PSO and Embedded PSO in a Fuzzy Logic Controller. In: Proceedings of the 39th IEEE IAS Annual Meeting on Industry Applications, Seattle, USA, pp. 1200–1206 (2004) 5. Parsopoulos, K.E., Papageorgiou, E.I., Groumpos, P.P.: A First Study of Fuzzy Cognitive Maps Learning Using Particle Swarm Optimization. In: Proceedings of IEEE Congress on Evolutionary Computation 2003 (CEC 2003), Canbella, Australia, pp. 1440–1447 (2003) 6. Abido, M.A.: Optimal Power Flow Using Particle Swarm Optimization. Int. J. Elect. Power Energy Syst. 24, 563–571 (2002) 7. Shi, Y., Eberhart, R.C.: Parameter Selection in Particle Swarm Optimizations. In: Porto, V.W., Waagen, D. (eds.) EP 1998. LNCS, vol. 1447, pp. 591–600. Springer, Heidelberg (1998) 8. Shi, Y., Eberhart, R.C.: A Modified Particle Swarm Optimizer. In: Proceedings of IEEE International Conference on Evolutionary Computation, Anchorage, AK, pp. 69–73 (1998) 9. Shi, Y., Eberhart, R.C.: Fuzzy Adaptive Particle Swarm Optimization. In: Proceedings of the Congress on Evolutionary Computation, Seoul, Korea, pp. 101–106 (2001) 10. Van den Bergh, F.: An Analysis of Particle Swarm Optimizer. Department of Computer Science. University of Pretoria, South Africa (2002) 11. Han, J.H., Li, Z.H.: An Adapting Particle Swarm Optimization and the Simulation Study. System simulation Journal (2006) 12. Niu, B., Li, L.: A novel PSO-DE-based hybrid algorithm for global optimization. In: Huang, D.-S., Wunsch II, D.C., Levine, D.S., Jo, K.-H. (eds.) ICIC 2008. LNCS (LNAI), vol. 5227, pp. 156–163. Springer, Heidelberg (2008) 13. Tan, L.J.: Particle Swarm Optimization and Its Application on the Control System of Continuous Conveyer of Disc-tube Assemble (Chinese). Master thesis, Liaoning Technical University (2007)
Particle Swarm Optimizer Based on Dynamic Neighborhood Topology Yanmin Liu1,2, Qingzhen Zhao1, Zengzhen Shao1, Zhaoxia Shang1, and Changling Sui2 1
College of Management and Economics, Shandong Normal University 2 Department of math, Zunyi Normal College
[email protected] Abstract. In this paper, a novel dynamic neighborhood topology based on small world network (SWLPSO) is introduced. The strategy of the learning exemplar choice of the particle is based upon the clustering coefficient and the average shortest distance. This strategy enables the diversity of the swarm to be preserved to discourage premature convergence. Experiments were conducted on a set of classical benchmark functions. The results demonstrate good performance in solving multimodal problems used in this paper when compared with the other PSO variants. Keywords: Particle swarm optimizer (PSO), dynamic neighborhood, topology, small world network.
1 Introduction Optimization has been an active area of research for several decades. As many realworld optimization problems become increasingly complex, better optimization algorithms are always needed. The particle swarm optimization (PSO) algorithm is a new entrant to the family of evolutionary algorithms (EAs). It was first proposed by Kenney and Eberhart[1, 2] based on the metaphor of social behavior of birds flocking, fish schooling or the sociological behavior of a group of people. Each individual in the swarm called a particle (a point) with a fitness value and a velocity which is dynamically adjusted in the search space according to its own flying experiences and the best experiences of the swarm. It is easy to implement PSO to solve the optimization problem. But when solving multimodal problems, it may be easily trapped in a local minimum. However, in realworld optimization problem, most of them are multimodal problems. Therefore, in order to overcome this defect and improve the PSO performance, some researches investigated neighborhood topology of the swarm [3-14]. In this paper, we propose a novel neighborhood topology based on small-world network [15] to improve the PSO performance when dealing with the multimodal problems. Section 2 presents an overview of the PSO, as well as a discussion of previous attempts to improve its performance. In Sections 3, we will discuss the improved PSO based on the dynamic neighborhood topology. Section 4 presents the test functions, D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 794–803, 2009. © Springer-Verlag Berlin Heidelberg 2009
Particle Swarm Optimizer Based on Dynamic Neighborhood Topology
795
the experimental settings for the comparative algorithms and the results. Finally, some conclusions and directions for future researches are discussed in section 5.
2 Particle Swarm Optimization (PSO) 2.1 Original Particle Swarm Optimization (OPSO) Each individual as possible solution can be modeled as a particle that moves in ndimensional search space. The velocity of each particle is determined by the vector vi ⊂ R n , and the velocity and position of the ith particle are updated as follows: ur ur uur ur uur ur (1) vi (t ) = vi (t − 1) + ϕ1 ⋅ rand1 ( pi − xi (t − 1)) + ϕ2 ⋅ rand 2 ( pg − xi (t − 1))
ur ur ur xi (t ) = xi (t − 1) + vi (t ) where
uur xi ( t ) = ( xi1 , xi 2 ,L, xi n )
(2)
represents the position of the ith particle in current iteration t ;
n is the dimension of the search space. t is the number of current iteration; uur 1 2 n vi ( t ) = ( vi , vi , L , vi )
denotes the ith particle velocity;
ϕ1 and ϕ2
are two positive
numbers known as the cognitive and social acceleration coefficients; rand1 and rand2 are two random numbers with the uniform distribution in the range of [0, 1]; uur 1 2 n pi = ( pi , pi , L , pi ) is uuur p g = ( p g1 , p g 2 ,L, p g n ) is
the best position of the current particle found so far by itself; the best position of all particles found so far by the whole
swarm. To make particle fly in the search space, each dimension velocity of a particle is limited to Vmax , which is constant value defined by the user. 2.2 Improved PSOs Since the introduction of the PSO algorithm in 1995 by Kennedy and Eberhart, the PSO algorithm has attracted a great attention [17-20]. Many researchers have worked on improving its performance in various ways to derive several interesting variants. For example, the original PSO did not have an inertia weight, Shi and Eberhart [21] added it, as follows: ur ur uur ur uur ur (3) vi (t ) = ω ⋅ vi (t − 1) + ϕ1 ⋅ rand1 ( pi − xi (t − 1)) + ϕ 2 ⋅ rand 2 ( p g − xi (t − 1)) where ω that is called the inertia weight, is used to balance the global and local search abilities. A large inertia weight is more appropriate for global search, and a small inertia weight facilitates local search. Inertia weight research also attracted a high level of interests [22-24]. These researchers proposed various value methods of inertia weight. In [25-27], Clerc et al indicated that a constriction factor ( χ ) may help to ensure the convergence. Application of the constriction factor results in Eq.(4). ur ur uur ur uur ur vi (t ) = χ ⋅[ω ⋅ vi (t − 1) + ϕ1 ⋅ rand1 ( pi − xi (t − 1)) + ϕ2 ⋅ rand2 ( pg − xi (t − 1)) ]
(4)
796
Y. Liu et al.
Kennedy [3, 4] claimed that PSO with a small neighborhood might perform better on complex problems, while PSO with a large neighborhood would perform better on simple problems. In [7], constructing a unified particle swarm optimizer (UPSO) by combining the global version and local version was proposed. Mendes and Kennedy [8] introduced a fully informed PSO to update the particle velocity instead of using the original PSO (OPSO) methods, all the neighbors of the particle are used to update the velocity. In [9], Peram et al proposed the fitness-distance-ratio-based PSO (FDRPSO) with near neighbor interactions, when updating each dimension velocity dimension, the FDR-PSO algorithm selects one other particle, which has a higher fitness value and is nearer to the particle being updated. Liang et al [10, 11] proposed an improved PSO called CLPSO, which uses a novel learning strategy. Some researchers also combined the search techniques to improve particle swarm performance, for example, in [12-14], combining with evolutionary operators to improve PSO performance.
3 Particle Swarm Optimizer Based on Dynamic Neighborhood Topology 3.1 Small-World Network Small-world network is one network which displays both the small diameter of the random graph, and the heavy clustering coefficient of the organized nearest-neighbor graphs. The small-world model [15] is an explicit construction of such graphs. Firstly, it starts with a regular network, such as a nearest-neighbor graph, and then rewires a probability p of the edges by changing one end to a uniformly random destination. Let p to vary from 0 to 1, one edge can be added (Fig.1. (c)). Small-world Network is one very similar to the random graphs of Erd¨os and Renyi [29]. Watts has suggested that information communication through social networks is affected by several aspects of the networks, as follows: 1) The degree of connectivity: The degree of connectivity among nodes in the network. Each particle in swarm identifies the best position found by its k neighbors; k is the variable that distinguishes local PSO from global PSO topologies, and is likely to affect performance. 2) The clustering coefficient (CC): If a node’s neighbors are also neighbors to one another. The number of neighborhood in-common can be counted per node, and can be averaged over the graph. 3) The average shortest distance (L): The average shortest distance from one node to another was an important network characteristic for determining the spread of information through the network similar to choice of learning exemplar in the swarm. Mendes and Kennedy[4] analyzed the effects of various population topologies on the particle swarm algorithm, and get a conclusion: the standard particle swarm topology learning from “gbest” facilitates the most immediate communication, and all particles are directly connected to the best position in the population (Fig.1.(a)). On the other hand, the ring lattice known as “lbest” is the slowest, most indirect communication pattern (Fig.1. (b)). Where i is opposite on the lattice, a good solution found by particle i
Particle Swarm Optimizer Based on Dynamic Neighborhood Topology
797
p
(a)
(b)
(c)
Fig. 1. The neighborhood topology1
has to pass through particle i’s immediate neighbor, that particle’s immediate neighbor, and so on. Thus a solution found by particle i, moves very slowly around the ring. 3.2 Dynamic Neighborhood Topology Based on Small-world Network In the PSO, each individual aims to produce the best solution by learning from other individuals, thereby the different neighborhood topology will effect each particle learning. The neighborhood topology is similar to small world network that affected the information communication in the swarm. In this paper, the proposed algorithm is constructed based on Eq.(3) and the small-world network. In this neighborhood topology, we use the following velocity updating equation: ur ur uuuur ur uur ur vi (t ) = ω ⋅ vi (t −1) + ϕ1 ⋅ rand1 ( pibest − xi (t −1)) + ϕ2 ⋅ rand2 ( pg − xi (t −1)) ibest = { i | CCi = max(CC), Li = min(TP1, TP2 )} , i ∈TP1 or i ∈TP2
where TP denotes total population. CCi
=
(5)
(3 K - 2)is the biggest clustering coefficient for 4( K −1)
the ith particle in TP, K is the degree of the ith [30]. ibest is the best particle position uur uur which may equal to p g or unequal to p g , this two different results don’t affect our algorithm. When updating the particle velocity in each iteration, which is shown in Fig.2, we firstly produce two small world networks as initial neighborhood topology (denoted as NT), and then choose the learning exemplar of the particle from other particle’s pbest as the following criteria: (1) If CC (1) = CC (2) & L1 ≠ L2 , NT1 wins. If L1 < L2 , we choose particle i in NT1 as the exemplar. (2) If CC (1) = CC (2) & L1 = L2 , NT1 or NT2 is chose at will. (3) If CC (1) ≠ CC (2) & L1 = L2 , NT1 wins. If CC(1) > CC(2) , we choose particle i in NT1 as the exemplar. (4) If CC (1) ≠ CC (2) & L1 < L2 , NT1 wins, and we choose particle i in NT1 as the exemplar. 1
(a) Global neighborhood. (b) Local neighborhood. (c) Small world network neighborhood.
798
Y. Liu et al.
c
b a
d p
p
p
e p
p
i
2
Fig. 2. Learning exemplar of particle i in the small world network
(5) If CC (1) > CC (2) || L1 < L2 , NT1 wins, then we choose particle i in NT1 as the exemplar. (6) If CC (1) > CC (2) || L1 < L2 , NT1 wins, the particle i in NT1 is chosen as the exemplar. (7) If CC (1) < CC (2) || L1 < L2 and the average degree < K > in NT2 is morn than the < K > in NT1, NT2 wins, and we choose particle i in NT2 as the exemplar. Otherwise, population NT1 wins. Rosenbrock Function
Rosenbrock Function
Rastrigin Function
Rastrigin Function 16
16
16
8 6 4
8 6 4
0 0
500
1000
1500
2000
ITERATION
2500
3000
10
10
2
2
12
12
10
10
14
14
12
12
0
16
14
14
O rig in a l P S O (B e s t p a rtic le )
18
18
S W L P S O (B e s t p a rtic le )
20
18
O rig in a l P S O (B e s t p a rtic le )
20
18
S W L P S O ( B e s t p a r tic le )
20
8 6 4 2
0
500
1000
1500
2000
2500
3000
0
8 6 4 2 0
0
1000
2000
ITERATION
3000
4000
5000
ITERATION
6000
0
1000
2000
3000
4000
5000
6000
ITERATION
Fig. 3.The best particle index in the swarm
Complying with the above seven criteria,
uuuuuuur pibest
can generate new positions in the
process of the search using the information derived from the different particle historical best positions. Fig.3 shows that the SWLPSO has more diversities than the OPSO. We can observe the main differences between the SWLPSO and the original PSO: uuur uur Instead of using particle’s own pi and pg as the exemplars, all particles can learn from the most excellent neighbor by the CC and L. This strategy not only avoids the blind exemplar choice, but also saves a lot of computation time. The strategy of updating particle velocity increases the diversity of the swarm in contrast with the OPSO. The pseudocode of the SWLPSO is given in Fig.4. 2
Particle a, b, c, d and e are connected with particle i by probability p. Particle c is choose as exemplar based on CC and L.
Particle Swarm Optimizer Based on Dynamic Neighborhood Topology
799
Begin For each particle 1. Initialize small world network. Computer CC and L. (Predefined probability p) 2. Initialize the particles’ position and velocity 3. Computer fitness value. 4. If pbestval(i)>pbestval(j) i ≠ j ∈ popsize Then pbest=pbest(i). Gbestval=pbestval(i). Endfor While (stopping criteria) For each particle 5. Updating particle velocity and position in terms of Eq.(5) and (2). 6. Choose learning exemplar in terms of CC and L. 7. Updating particle position 8. Updating global gbest and global fitness. Endfor Endfor Fig. 4. The pseudocode of the SWLPSO algorithm
4 Experiment and Results 4.1 Test Function Description
To evaluate the performance of the proposed approach, we wish to test the SWLPSO on the diverse test functions and our main objective is to improve PSO’s performance on multimodal problems, therefore we choose two unimodal functions and 4 multimodal benchmark functions, which can be considered “difficult” global optimization problems for an evolutionary algorithm. All test functions are tested on 30 dimensions. the parameters of these functions are presented in Table 1. More details about test function are collected in [11]. Table 1. The parameters of test function
Function name Sphere Rosenbrock Ackley Griewanks Rastrigin Weierstrass
n(Dimension) 30 30 30 30 30 30
Search space 30
[-100,100]
[-2.048,2.048]30 30
[-32.768,32.768] [-600,600]30 [-5.12,5.12]30 [-0.5,0.5]
Global optimum(x*)
f(x*)
[0,0,…,0]
0
[1,1,…,1]
0
[0,0,…,0]
0
[0,0,…,0] [0,0,…,0] [0,0,…,0]
0 0 0
4.2 Parameter Settings for SWLPSO Algorithms
The experiments were conducted to compare five PSO algorithms on the six test problems with 30 dimensions. The five PSO algorithms are the PSO with constriction factor (PSO-CF) [27], Fully informed particle swarm (FIPS)[8], FDR-PSO[9], CLPSO [11] ands the proposed algorithm(SWLPSO). In order to make these different
800
Y. Liu et al.
algorithms comparable, all parameters are set as follows: the population size is set at 30 and the maximum fitness evaluations (FEs) is set at 30000. All experiments were run 30 times. The mean values of the results are presented in Table 2. Table 2. Result of Experiment Using Various PSO Variants
Function name
Optimal
Sphere Rosenbrock Ackley Griewank Rastrigin Weierstrass
0 0 0 0 0 0
PSO-CF
The Means of Different PSO Algorithm FIPS FDR-PSO CLPSO SWLPSO
1.532e-080
2.574e-028
2.237e-100
6.432e-033
2.256e+001
2.233e+000
9.968e-002
1.491e+000
4.983e-018 1.512e-002
1.155e+000
3.553e-015
2.254e-009
1.309e-014
1.001e-016
2.675e-002
2.753e-002
6.132e-002
6.382e-002
2.782e-009
8.955e+000
1.872e+000
1.992e+000
8.823e-014
5.553e-015
1.573e+001
6.962e-001
1.112e+000
2.703e-002
2.3381e-002
Table 3. Result of t-tests for the PSO algorithms
a=0.05
PSO-CF SWLPSO
FIPS SWLPSO
FDR-PSO SWLPSO
CLPSO SWLPSO
p-value result
0.0034
0.00026
0.0045
0.0059
1
1
1
1
4.3 Experiment Result
Table 2 presents the mean of the 30 runs of the five algorithms on six test functions with 30 dimensions. The best results among the five algorithms are shown in bold. Fig.5. presents the convergence characteristics in terms of the best fitness value of the median run of each algorithm for each test function. From the results, we observe that for the unimodal Sphere function, the SWLPSO could not converge as fast as other PSO algorithms. The reason is that when updating velocity in each iteration, a particle can learn from the best exemplar in all neighbors. The learning strategy may lead to a large search space and could not converge as fast as the other PSO algorithms. Note that on Rosenbrock function, the SWLPSO algorithm achieved the almost same best result as FDR-PSO. However, the SWLPSO algorithm performs well in all multimodal problems and outperforms all other PSO algorithms, especially on Griewanks, Ackley’s and Rastrigin functions. The PSO with constriction factor (PSO-CF) is the global versions where the whole population is neighborhood. It performs well on unimodal problems. The CLPSO algorithm performs better than the all other PSO variants on Rastrigin and Weierstrass functions except for the SWLPSO algorithm. Comparing the convergence characteristics graphs, we observed that SWLPSO has faster convergence speed than other PSO algorithms on Rosenbrock, Ackley’s Griewank and Rastrigin functions. The learning strategy of the SWLPSO is somewhat similar to elite learning strategy, but not the same. This learning strategy may avoid the blind learning from another particle. In order to determine whether the results obtained by the SWLPSO are statistically different from the results generated by other algorithms, we adopted the practice of performing t-tests on pairs of PSO variants with the SWLPSO.
Particle Swarm Optimizer Based on Dynamic Neighborhood Topology
10
Best Function Value
10
10
10
10
-50
-100
0
PSO-cf FIPS FDR-PSO CLPSO SWNPSO 0.5
1
1.5
2
Fitness Evaluation (d)
5
2.5
10
10
10
10
3 x 10
10
10 PSO-cf FIPS FDR-PSO CLPSO SWNPSO
3
2
1
0
-1
0
-10
0
PSO-cf FIPS FDR-PSO CLPSO SWNPSO 0.5
1
1.5
2
Fitness Evaluation
2.5
3 x 10
1
1.5
2
2.5
Fitness Evaluation 10
-5
0.5
4
0
4
Best Function Value
0
(b)
4
10
10
10
10
10
10
10
10
3 x 10
10
(e)
5
-5
-15
0
PSO-cf FIPS FDR-PSO CLPSO SWNPSO 0.5
1
0
-5
-10
-20
10
10
10
10 1.5
2
Fitness Evaluation
PSO-cf FIPS FDR-PSO CLPSO SWNPSO
-15
10
0
-10
(c)
5
0
0.5
4
Best Function Value
10
10
Best Function Value
10
(a)
50
Best Function Value
Best Function Value
10
2.5
1
1.5
2
2.5
Fitness Evaluation (f)
2
3 x 10
4
1
0
-1
-2
3 x 10
801
4
0
PSO-cf FIPS FDR-PSO CLPSO SWNPSO 0.5
1
1.5
2
Fitness Evaluation
2.5
3 x 10
4
(a) Sphere (b) Rosenbrock (c) Acley (d) Griekwank (e) Rastrigin (f) Weierstrass Fig. 5. The convergence characteristics
These tests give a p-value which is compared to a constant called a, to determine whether a difference is significant or not. This criterion is defined as follows: ⎧1 if p < a result = ⎨ ⎩0 if p ≥ a
(6)
If p is less than a, the performance is statistically different, whereas, if p is equal or greater than a, it means that the performance is not statistically different. The comparison results are shown in Table 3.
5 Conclusions and Future Work We have presented a novel neighborhood topology on based small world network. To assure a particle in the swarm to learn from the best exemplar in its neighbor, we apply some characteristics of small world network in the SWLPSO algorithm, such as the clustering coefficient which indicates the numbers of the neighbor around the current particle i and shows the particle i exploration and exploitation capability. According to the result of the experiments, the SWLPSO does not perform well for unimodal(Sphere function). But, in terms of “no free lunch” theorem [31], any elevated performance over one class of problems is offset by the performance over another class. It means that the SWLPSO does not perform well on the unimodal problems, but it will outperform on the modal test functions, or its performance does surpass other algorithms remarkably, but its convergence characteristics are better than others. Therefore, we may not expect the best performance on all test functions, as the purpose of the proposed algorithm is on improving the PSO’s performance for
802
Y. Liu et al.
solving multimodal problems in the real world. The SWLPSO achieves the satisfied results. This implies that the SWLPSO is more effective in solving modal problems, which is experimented in this paper. In the future, we will focus on: (i) experimenting to test our proposed algorithm effectiveness with more multimodal test problems and several composition functions that is more difficult to be optimized; (ii) applying the proposed algorithm to some applications to verify its effectiveness. Acknowledgments. The first author acknowledges the support from Ministry of Education in Guizhou Province, Zunyi technology division Dean's Office and Department of Scientific Research Of Zunyi Normal College through the project number 070520, B07015, [2008]21 and 2007018, respectively.
References 1. Eberhart, R., Kennedy, J.: New Optimizer Using Particle Swarm Theory. In: Proc. 6th Int. Symp. Micro Machine Human Science, pp. 39–43 (1995) 2. Kennedy, J., Eberhart, R.: PSO Optimization. In: Proc. IEEE Int.Conf. Neural Networks, Perth, Australia, vol. 4, pp. 1941–1948 (1995) 3. Kennedy, J.: Small Worlds and Mega-minds: Effects of Neighborhood Topology on Particle Swarm Performance. In: Proc. Congr. Evol.Comput, pp. 1931–1938 (1999) 4. Kennedy, J., Mendes, R.: Population Structure and Particle Swarm Performance. In: Proc. IEEE Congr. Evol. Comput., Honolulu, HI, pp. 1671–1676 (2002) 5. Niu, B., Zhu, Y.L., He, X.X., Wu, H., Shen, H.: A Lifecycle Model for Simulating Bacterial Evolution. Neurocomputing 72, 142–148 (2008) 6. Niu, B., Li, L.: A novel PSO-DE-based hybrid algorithm for global optimization. In: Huang, D.-S., Wunsch II, D.C., Levine, D.S., Jo, K.-H. (eds.) ICIC 2008. LNCS (LNAI), vol. 5227, pp. 156–163. Springer, Heidelberg (2008) 7. Parsopoulos, K.E., Vrahatis, M.N.: UPSO-A Unified Particle Swarm Optimization Scheme. Lecture Series on Computational Sciences, pp. 868–873 (2004) 8. Mendes, R., Kennedy, J., Neves, J.: The Fully Informed Particle Swarm: Simpler, Maybe Better. IEEE Trans. Evol. Comput, 204–210 (2004) 9. Peram, T., Veeramachaneni, K., Mohan, C.K.: Fitness-distance-ratio Based Particle Swarm Optimization. In: Proc. Swarm Intelligence Symp., pp. 174–181 (2003) 10. Liang, J.J., Qin, A.K., Suganthan, P.N., Baskar, S.: Evaluation of comprehensive learning particle swarm optimizer. In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004. LNCS, vol. 3316, pp. 230–235. Springer, Heidelberg (2004) 11. Liang, J.J., Qin, A.K., Suganthan, P.N., Baskar, S.: Comprehensive Learning Particle Swarm Optimizer for Global Optimization of Multimodal Functions. IEEE Transactions on Evolutionary Computation 10(3), 281–295 (2006) 12. Angeline, P.J.: Using Selection to Improve Particle Swarm Optimization. In: Proc. IEEE Congr. Evol. Comput., Anchorage, AK, pp. 84–89 (1998) 13. Lovbjerg, M., Rasmussen, T.K., Krink, T.: Hybrid Particle Swarm Optimizer with Breeding and Subpopulations. In: Proc. Genetic Evol.Comput. Conf., pp. 469–476 (2001) 14. Miranda, V., Fonseca, N.: New Evolutionary Particle Swarm Algorithm (EPSO) Applied to Voltage/VAR control. In: Proc. 14thPower Syst. Comput. Conf., Seville, Spain (2002) 15. Watts, D.J., Strogatz, S.H.: Collective Dynamics of Small-world Networks. Nature 393, 440–442 (1998)
Particle Swarm Optimizer Based on Dynamic Neighborhood Topology
803
16. Wilke, D.N.: Analysis of the Particle Swarm Optimization Algorithm. Master’s thesis, Dept. Mechanical and Aeronautical Eng., Univ. of Pretoria, Pretoria, South Africa (2005) 17. Schutte, J.F., Groenwold, A.A.: Sizing Design of Truss Structures Using Particle Swarms. Struct. Multidisc. Optim. 25(4), 261–269 (2003) 18. Coello, C.A.C.G., Pulido, T., Lechuga, M.S.: Handling Multiple Objectives with Particle Swarm Optimization. IEEE Trans. Evol.Comput. 8, 256–279 (2004) 19. Messerschmidt, L., Engelbrecht, A.P.: Learning to Play Games Using a PSO-based Competitive Learning Approach. IEEE Trans. Evol.Comput. 8, 280–288 (2004) 20. Wachowiak, M.P.: An Approach to Multimodal Biomedical Image Registration Utilizing Particle Swarm Optimization. IEEE Trans. Evol. Comput. 8, 289–301 (2004) 21. Shi, Y., Eberhart, R.C.: A Modified Particle Swarm Optimizer. In: Proc. IEEE Congr. Evol. Comput., pp. 69–73 (1998) 22. Shi, Y., Eberhart, R.C.: Particle Swarm Optimization with Fuzzy Adaptive Inertia Weight. In: Proc.Workshop Particle Swarm Optimization, Indianapolis, pp. 101–106 (2001) 23. Ratnaweera, A., Halgamuge, S., Watson, H.: Self-organizing Hierarchical Particle Swarm Optimizer with Time Varying Accelerating Coefficients. IEEE Trans. Evol. Comput. 8, 240–255 (2004) 24. Fan, H.Y., Shi, Y.: Study on Vmax of Particle Swarm Optimization. In: Proc.Workshop Particle Swarm Optimization, Indianapolis, IN (2001) 25. Clerc, M.: The Swarm and the Queen: Toward a Deterministic and Adaptive Particle Swarm Optimization. In: Proc. ICEC 1999, Washington, DC, pp. 1951–1957 (1999) 26. Corne, D., Dorigo, M., Glover, F.: New Ideas in Optimizaton, pp. 379–387. McGraw-Hill, New York (1999) 27. Clerc, M., Kennedy, J.: The Particle Swarm: Explosion, Stability, and Convergence in a Multi-dimensional Complex Space. IEEE Trans. Evol.Comput. 6, 58–73 (2002) 28. Newmama, M.E.J., Watts, K.J.: Renormalization Group Analysis of the Small-world Network Model. Phys.Lett.A. 263, 341–346 (1999) 29. Erdös, P., Renyi, A.: On Random Graphs. Publicationes Mathematicae 6, 290–297 (1959) 30. Barrat, A., Weigt, M.: On the Properties of Small World Networks. Europe Physicals 13, 547–560 (2003) 31. Wolpert, D.H., Macready, W.G.: No Free Lunch Theorems for Optimization. IEEE Trans. Evol. Comput. 1, 67–82 (1997)
An Improved Two-Stage Camera Calibration Method Based on Particle Swarm Optimization Hongwei Gao1, Ben Niu2, Yang Yu1, and Liang Chen1 1
School of Information Science & Engineering, Shenyang Ligong University, Shenyang, 110168 China 2
College of Management, ShenzhenUniversity, Shenzhen, 518060 China
[email protected],
[email protected] Abstract. According to the calibration of binocular vision, an improved twostage camera calibration method involved with multi-distortion coefficients is introduced in this paper. At the first stage, the 3D points’ coordinate are calculated by the imitated direct linear transformation (DLT) triangulation based on distortion compensation. And at the second stage, particle swarm optimization (PSO) is selected to determine two cameras’ parameters. In this way the parameters of the two cameras can be tuned simultaneously. In order to assist estimating the performance of the proposed method, a new cost function is designed. Simulation and experiment are made under the same calibration data sets. The performance of PSO used to tune the parameters is also compared to that of GA. The experiment results show that the strategy of taking the 3D reconstruction errors as object function is feasible and PSO is the best choice for camera parameters’ optimization. Keywords: Computer vision, Image analysis, 3D reconstruction, PSO.
1 Introduction The accuracy of calibration will directly determine the performance of many vision tasks. Many classic camera calibration methods based on two-stage have been proposed until this time, among which Tsai’s [1] method with radial lens distortion is famous and applied frequently for its high accuracy, However, it takes only one radial distortion coefficient into consideration which is efficient for long focal length and little distortion lens. J Weng [2] introduced a camera model considering radial decentering and thin prism distortions. The camera parameters can be calculated by matrix decomposition which is hard to achieve high accuracy. Some improved methods are put forward recently. Z Zhang [3] proposed a two-stage camera calibration method by viewing a planar pattern shown at a few different orientations which is easy to use and flexible. J Heikkilä [4] proposed a four-steps calibration method considering distortion compensation including tangent distortion. The cost functions in the above algorithms were defined as the distance between the measured image points and estimated image points, while the refined solution was only optimal to measured 2D image points, not to the real 3D points. Taking these factors into consideration, C Zhou [5] D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 804–813, 2009. © Springer-Verlag Berlin Heidelberg 2009
An Improved Two-Stage Camera Calibration Method Based on PSO
805
used a new cost function which is based on the sum of distance in the world coordinate between ground truth 3D points and calculated 3D points. A Perspective 3-Point (P3P) algorithm is used to finish 3D point’s reconstruction, but the optimization of two cameras is executes twice, that is to say two cameras’ parameters are optimized respectively. The traditional parameter optimal algorithms are gradient-based computation system. The drawbacks of them are sensitive to the initial value of variable and easy to get into local minimum, furthermore, the time consumption of these algorithms to run a good result are considerable. The advent of evolutionary algorithm (EA) has attracted considerable interest in the optimal calculation [6, 7], EA shows its excellent performance on global optimization. The most well known of which is genetic algorithm (GA). As compared to traditional gradient-based computation system, EA denotes all kinds of complicated data structure by simple coding technique, and then directs the direction of studying and searching by means of simple genetic operation and nature select of winner take all based on coding table. This strategy can solve the problems such as knowledge denotation and combination explosion. Recently, a new evolutionary computation technique, the particle swarm optimization (PSO) algorithm, is introduced by Kennedy and Eberhart [8], and has already come to be widely used in many areas [9, 10]. Compared to GA, PSO has some attractive advantages: (1) it has a fast convergence rate (2) it is easy to implement (3) it has few parameters to adjust. Consequently, we introduce PSO into the optimal procedure of new cost function and make some comparisons with other optimal algorithms. Experiment and simulation results prove that the improved methods have much higher calibration precision than that of the traditional ones; the parameters obtained by PSO are more optimal than GA which shows that PSO can be applied into camera calibration.
2 Camera Model with Multi-distortion Coefficients In the pin-hole model, the world coordinate and image coordinate of a point related by the formula as follows: ⎡X w ⎤ ⎡X w ⎤ ⎡Xw ⎤ ⎡u ⎤ ⎡ fc (1) alpha _ c ∗ fc (1) cc(1) ⎤ ⎢Y ⎥ , ⎢Y ⎥ ⎢Y ⎥ Z c ⎢⎢ v ⎥⎥ = ⎢⎢ 0 fc (2) cc(2)⎥⎥ [R T ]⎢ w ⎥ = K [R T ]⎢ w ⎥ = M ⎢ w ⎥ ⎢ Zw ⎥ ⎢ Zw ⎥ ⎢ Zw ⎥ ⎢⎣1⎥⎦ ⎢⎣ 0 0 1 ⎥⎦ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ 1 ⎦ ⎣ 1 ⎦ ⎣ 1 ⎦
p is
(1)
M represents camera project matrix; K represents internal parameter matrix; [R T ] is defined by the pose of world coordinate system and called external parameter matrix; fc (1) and fc ( 2) represent variables of horizontal and vertical whose unit is pixel; alpha_c is the horizontal scale factor; cc(1) and cc( 2) are coordinates where
of image center whose unit is also pixel. External parameter matrix can be represented by translate vector T and rotate vector R . Thus, the project matrix is equal to the product of internal and external parameter matrix.
806
H. Gao et al.
There exist many kinds of distortion in camera lens, especially in short focal length lens. It is necessary to considerate the impaction of distortion in order to achieve high accurate calibration result. There are four main distortion: radial distortion, tangential distortion, decentering distortion and thin prism distortion. Generally speaking, radial distortion and tangential distortion are enough for describing the camera distortion model. Two kinds of distortion can be represented as follows:
k c = ( k c (1), k c ( 2 ), k c (3), k c ( 4 ), k c (5 )) T , Set
(2)
p as the 3D coordinate of vector P = [ X c ; Yc ; Z c ] , imaging coordinate of this T
point on CCD plane has the relationship with inner parameters. Set
x n as the physical
coordinate of imaging point under pinhole model:
⎡ Xc / Zc ⎤ ⎡ x ⎤ xn = ⎢ ⎥=⎢ ⎥ , ⎣ Yc / Zc ⎦ ⎣ y ⎦
(3)
r 2 = x 2 + y 2 , taking the lens distortion into consideration, the physical coordinate of imaging point x d can be represented by:
Set
⎡ x (1) ⎤ xd = ⎢ d ⎥ = 1+ kc (1)r2 + kc (2)r 4 + kc (5)r6 xn + dx , ⎣xd (2)⎦
(
where tangent distortion
)
d x can be represented as :
⎡ 2 k (3) xy + k c ( 4)( r 2 + 2 x 2 ) ⎤ dx = ⎢ c ⎥ , 2 2 ⎣ k c (3)( r + 2 y ) + 2k c ( 4) xy ⎦ where
(4)
(5)
k c is the vector contains radial distortion and tangent distortion, and then the p is described by x in pixel. Where:
ideal imaging coordinate of point
xu = fc(1)( xd (1) + alpha _ c * xd (2)) + cc(1) , yu = fc(2) xd (2) + cc(2)
(6)
x in pixel can be written by: ⎡ xu ⎤ ⎡ x d (1) ⎤ ⎢ y ⎥ = K ⎢ x (2)⎥ . ⎢ u⎥ ⎢ d ⎥ ⎢⎣ 1 ⎥⎦ ⎢⎣ 1 ⎥⎦
(7)
Now we can know the calibration procedure from the above discussion. Firstly, the ideal image coordinates after distortion compensation is calculated. Secondly, the biprojective matrixes of two cameras are acquired by directed linear transformation
An Improved Two-Stage Camera Calibration Method Based on PSO
807
(DLT) method with pinhole model. Finally, the stereo triangulation is made by two projective matrixes.
3 Calibration Method Based on 3D Reconstruction Error 3.1 Two-Stage Calibration Method The traditional estimation for the camera parameters can be obtained by minimizing the residual between the model and N observations (U i ,Vi ) , where i = 1,L, N . In the case of Gaussian noise, the objective function is defined as the sum of squared residuals: 2
N
N
F = ∑ (U i − ui ) + ∑ (Vi − vi ) , i =1
Where,
2
(8)
i =1
(ui , vi ) is the correct image points, (U i ,Vi ) is the observed values. The least
squares estimation technique can be used to minimize Eq. (7). Due to the nonlinear nature of the camera model, simultaneous estimation of the parameters involves applying an iterative algorithm. However, without proper initial parameter values the optimization may stick in a local minimum and thereby cause the calibration to fail. This problem can be avoided by using the parameters from the DLT method as the initial values for the optimization. A global minimum of Eq. (7) is then usually achieved after a few iterations. The object function proposed here is as follows: n
n
i=1
i=1
∑ Pi −P1i = ∑ (xi − x1i )2 +(yi − y1i )2 +(zi − z1i )2 ,
(9)
Pi ( xi , yi , z i ) is the ground truth 3D coordinate of space point, P1i ( x1i , y1i , z1i ) is the observed values, n represents the number of calibration
Where,
points. Eq. (8) is based on 3D reconstruction error and different from Eq. (7) which is based on 2D reconstruction error. Two-stage calibration method based on 3D reconstruction error proposed here is described as follows: First step: the inner parameters of two cameras are determined by Matlab Calibration Toolbox [4]. Second step: optimization of camera parameters. In this paper, we calculate the projective matrix directly, in this way the inner and outer parameters can be contained simultaneously. The traditional projective matrix composition method has 32 parameters need to be tuned at the same time including 10 inner parameters and 6 outer parameters for each camera, while in our proposed method only has 20 parameters to be optimized explicitly including only 10 inner parameters, the outer parameters are optimized non-obviously (implicitly), they are optimized embedded in the whole projective matrix. By this way the parameters to be tuned is reduced, the calibration speed is enhanced and the stability of the resolutions is also ensured which can be proved by the following experiments.
808
H. Gao et al.
3.2 Optimal Algorithm Based on 3D Reconstruction Error The triangulation is constructed by dual projective matrixes, meanwhile, the 3D reconstruction error is also constructed which realize the 20 parameters of two cameras to be optimized at the same time. The flow chart of this algorithm is as follows: 1) Calculate the initial value of ( fc(1), fc(2), cc(1), cc(2), alpha_ c, kc) by means of Matlab Calibration Toolbox, the initial value of kc = (kc (1), kc (2), kc (3), kc (4), kc (5))T is set to be 0. 2) Use character points’ image coordinates (u1i , v1i ) , (u2i , v2i ) and two cameras’ ' 1i
( fc(1), fc(2), cc(1), cc(2), alpha_ c, kc) to ' 1i
' 2i
calculate the ideal image coordinates
()
' 2i
(u , v ) and (u , v ) according to Eq. 5 . 3)
Use ideal
(u1' i , v1' i ) , (u 2' i , v2' i ) and world coordinate Pi ( xi , yi , z i ) to
M 1 and M 2 by means of DLT method. Use ideal (u , v ) , (u , v ) and M 1 , M 2 to calculate P1i ( x1i , y1i , z1i ) ,
calculate two cameras’ projective matrix 4)
' 1i
' 1i
' 2i
' 2i
()
meanwhile, set the following object function 8 to be minimized, and then the new value of ( fc(1), fc(2), cc(1), cc(2), alpha_ c, kc) can be achieved. 5) If the aforementioned iteration number is achieved, then get out the loop and return the optimal value of two cameras’ ( fc(1), fc(2), cc(1), cc(2), alpha_ c, kc) , or go back to step 2
)to calculate again.
The goal of calibration is to achieve high triangulation accuracy. The optimized distortion coefficients and inner parameters acquired by our method possess physical meanings which can also ensure the triangulation error to be minimal.
4 Optimization Strategy Based on Particle Swarm Optimization In order to test the performance of PSO, we also design the GA to optimize the same cost function. Comparisons are also made between PSO and GA in the following experiment results. 4.1 Optimization Scheme Based on PSO Initialize population of PSO is a swarm of random particles, during every iteration, the particle updates itself by means of tracing two best values. One is the best solution found by itself which is named personal best point and represented by pbest . Another best point in global PSO is the best solution found by swarm named global best point which is represented by. After finding the two best solutions, the particles update its own velocity and position according to equation (9) and (10) .
(
)
(
)
vidk +1 = vidk + c1rand1k pbestidk − xidk + c2 rand 2k gbest dk − xidk ,
(10)
An Improved Two-Stage Camera Calibration Method Based on PSO
xidk +1 = xidk + vidk +1 , Where, i is the particle index;
809
(11)
vidk is the d th velocity of i th particle during the k th
iteration; c1 , c 2 are accelerate coefficients (learning factors) which can adjust max flying pace length of global best particle and personal best particle respectively.
rand 1, 2 is the random numbers on the interval [0,1] applied to i th particle; xidk is the d th position of i th particle during the k th iteration; pbest id is the d th personal best point position of i th particle;
gbest d is the d th global best point position of
whole swarm. According to the parameter optimization of calibration, the individual particle can be represented as follows: b = { fc11, fc12, cc11, cc12, alpha _ c1, kc1, fc 21, fc 22, cc 21, cc 22, alpha _ c 2, kc 2 }.
(12)
The initial values of two cameras’ parameters can be determined by Matlab Calibration Toolbox. The calculation complication can be reduced by means of confirming the searching space before PSO is done. The individual of swarm is generated randomly between the initial value neighbor domain and the cost function adopted is formula (8), where the individual number d = 20 . The optimized procedure based on PSO is as follows: 1)
Initialize the swarm randomly. The initialization of initial searching point position and its speed is finished randomly between the initial value’s neighbor domain. The initial number of particle is selected as d = 20 . Calculate pbestid correspondingly, then
2)
gbest d is the best value of pbestid .
Evaluate every individual in the swarm. Calculate the particle’s fitness value according to formula 8 , if the value is better than the current pbestid ,
()
pbestid . If the pbestid in all particles is better than current gbest d , then update gbest d . then update the
3)
Update the swarm according to formula
(9)and(10).
If a terminate condition is met, then the searching procedure is terminated, except that return to step 2 .
)
4.2 Optimization Scheme Based on GA The optimized procedure based on GA is as follows: Initialization: generate M individuals randomly, where there is suppose the evolution iteration number of swarm to be t = 0 ,
G 0 = {b10 , L, b 0j , L, bM0 } ,
M = 100 , (13)
810
H. Gao et al.
Where b is chromosome similar with the individual in PSO, superscript of it represents evolution iteration number, subscript of it represents the number of individuals. 1)
Fitness value calculation: calculate the fitness value of every chromosome according to formula (8) and arrange them with sort ascending,
G = {b ,L, b ,L, b } and t
2)
t 1
t j
t M
Select operation: select principle,
n
n
∑ P − P(b )≤ ∑ P − P(b i
t j
1i
i=1
i
i=1
),
t j+1
1i
(14)
k individuals according to the optimal and random
G t +1 = {b1t +1 , L , bkt +1 } , 3)
(15)
Aberrance operation: select p individuals of new generated apply aberrance operation to part of the individuals’ genes,
k individuals,
G t +1 = {b1t +1 , L , bkt +1 , bkt ++11 , L , bkt ++1p } , 4)
(16)
Crossover operation: select one gene to make crossover operation randomly, repeat the same step M − k − p times,
Git +1 = {b1t +1 , L , bkt +1 , L , bkt ++1p , L , bMt +1 } , 5)
(17)
Set evolution iteration number to increase onetime, that is optimal individual as the current solution, n
t = t + 1 , select the
n
M
t bbest = {b | ∑ Pi − P( 1i b j))} . 1i b )= min ( ∑ Pi − P( t i
t i
i =1
j =1
(18)
i =1
If the ending condition is achieved, that is evolution iteration number is bigger than n
some value set beforehand or the fitness value is satisfied with
∑ P − P(b)< ξ , 1i
i
i =1
then the searching procedure is terminated, except that return to step 2
).
5 Simulation and Experiment Results 5.1 Experiment Conditions Images of calibration and test objects were obtained with two black and white cameras, focal length of lens is 12.5mm. The calibration and test points are created by a sheet containing 49 black squares on the top surface of a foam block 20mm×20mm in size. The corners of the 169 squares are treated as control points, making a total of 169 points. The optical axis of two cameras is parallel with each other. There are 676
An Improved Two-Stage Camera Calibration Method Based on PSO
811
points for calibration extracted from the block 1~2 meters away in front of the camera at four different positions. Another 4732 points on other 28 images are used to test the triangulation accuracy. The software developing environment is based on MS VC++6.0 and foam block is moved on calibration table. 5.2 Experiment Results and Analysis The experimental comparisons in terms of calculation speed and triangulation accuracy for these three methods are shown in Table 1. The calibration error represents the fitness between the calibration data and camera model. We can see that the average error of two methods is within 1mm along three directions ( x , y and z ) respectively which illustrates that these calibration data are fit for the camera model preferably. In order to test calibration result, the triangulation test is necessary. As seen from Table 1, the parameters got from PSO is more optimal than GA, the average triangulation errors obtained by PSO is lower. As we can see from Figs 1~2, the maximum value of absolute triangulation error is 2.3mm and 1.9mm respectively for GA and PSO. The convergence curves of cost function are achieved after 1000 iterations. The final value of formula (8) is constant which is 22.1 and 16.7 respectively in detail. The optimization procedure terminates after 660 times and 430 times respectively. The conclusion can be draw from above results. On one hand, the same point of PSO and GA lies in the random initialization of swarm, evaluating the system with fitness value, and searching the solution randomly according to fitness value. On the other hand, there are some differences between PSO and GA. PSO search the optimal solution by tracing the optimal particle in the solution space without crossover and aberrance operation used in GA. By comparison with GA, the predominance of PSO lies in easy realization, fast speed and without lots of parameters to be adjusted. Table 1. Error comparison for different optimal strategies GA
left camera
right camera
k1 k2 k3 k4 k5 k1 k2 k3 k4 k5
fc11 fc12 alpha_c 1 cc11 cc12 fc21 fc22 alpha_c 2 cc21 cc22 AverageerrorX (mm)
Calibration error
AverageerrorY (mm) AverageerrorZ (mm) AverageerrorX (mm)
Triangulation error
AverageerrorY (mm) AverageerrorZ (mm)
-0.057642 0.064481 0.001803 0.000343 0.000016 -0.058743 0.092365 0.002133 -0.000653 0.000026
587.309864 565.162345 -0.000102 392.287535 291.160867 579.123356 567.233876 0.000133 411.345345 286.271876
PSO -0.049015 0.074469 -0.027153 -0.003405 0.004772 -0.073053 0.079556 -0.019137 0.011664 -0.018213
597.787054 570.937578 0.006843 395.434565 296.664261 564.632476 564.135606 0.016851 406.753114 298.203234
0.114664
0.091065
0.096570
0.089456
0.611689
0.539543
0.590186
0.602146
0.673654
0.657331
1.996522
1.7833574
812
H. Gao et al. 2 4
25
1.5 24.5 2
1
24
0.5 0
23.5
0 23
-0.5
-2
22.5
-1 22
-4
-1.5 21.5
-2 0
100
200
300
400
500
600
-6
700
0
500
1000
1500
Calibration error
2000
2500 3000
3500 4000 4500 5000
21
0
100
200
Triangulation error
300
400
500
600
700
800
900
1000
Cost function convergence curve
Fig. 1. Results of GA 2
20
4
1.5
19.5 2
1
19
0.5
18.5
0
0
18
-0.5
-2
17.5
-1
17 -4
-1.5
16.5
-2 0
100
200
300
400
500
600
Calibration error
700
16
-6 0
500
1000
1500
2000
2500
3000
3500
4000
4500
Triangulation error
5000
0
100
200
300
400
500
600
700
800
900
1000
Cost function convergence curve
Fig. 2. Results of PSO
6 Conclusions The goal of calibration is to do 3D measure and it is exciting to make the differences between practical results and true results to be as small as possible. With this hope, we take the sum of distance in the world coordinate between ground truth 3D points and calculated 3D points as cost function and calculate dual cameras’ distortion coefficients and inner parameters at the same time. Furthermore, as the high accurate camera calibration is usually finished offline, the strategy proposed in this paper that optimizing parameters with PSO is feasible and can achieve more calibration accuracy than traditional optimal method. This attempt also extends the application field of PSO. Future work will focus on further improving the PSO by the proper selection of parameters and the design of different topologies. In addition, PSO applications in the image registration and image rectification will be investigated.
Acknowledgement This work is supported by China Liaoning Province Educational Office fund (No.20080611), Shenzhen-Hong Kong Innovative Circle project (Grant No.SG200810220137A) and Project 801-000021 supported by SZU R/D Fund.
References 1. Tsai, R.Y.: A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses. IEEE Journal of Robotics and Automation, 323–344 (1987)
An Improved Two-Stage Camera Calibration Method Based on PSO
813
2. Weng, J., Cohen, P., Herniou, M.: Camera calibration with distortion models and accuracy evaluation. IEEE Transaction on Pattern Analysis and Machine Intelligence, 965–980 (1992) 3. Zhang, Z.: A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1330–1334 (2000) 4. Heikkilä, J., Silvén, O.: A Four-step Camera Calibration Procedure with Implicit Image Correction. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 1106–1112. IEEE Press, Puerto Rico (1997) 5. Zhou, C., Tan, D.L., Zhu, F.: A high-precision Calibration Method for Distorted Camera. In: Proceedings of IEEE International Conference on Intelligent Robots and Systems, pp. 2618–2623. IEEE Press, Sendai (2004) 6. Wang, C.H., Hong, T.P., Tseng, S.S.: Integrating Fuzzy Knowledge by Genetic Algorithms. IEEE Transactions on Evolutionary Computation, 138–149 (1998) 7. Ishibuchi, H., Nakashima, T., Murata, T.: Performance Evaluation of Fuzzy Classifier Systems for Multi Dimensional Pattern Classification Problems. IEEE Transactions on Systems, Man, and Cybernetics, 601–618 (1999) 8. Eberhart, R.C., Kennedy, J.: A New Optimizer Using Particle Swarm Theory. In: Proceedings of the Sixth International Symposium on Micro Machine and Human, pp. 39–43. IEEE Press, Nagoya (1995) 9. Niu, B., Li, L.: A novel PSO-DE-based hybrid algorithm for global optimization. In: Huang, D.-S., Wunsch II, D.C., Levine, D.S., Jo, K.-H. (eds.) ICIC 2008. LNCS (LNAI), vol. 5227, pp. 156–163. Springer, Heidelberg (2008) 10. Niu, B., Zhu, Y.L., He, X.X., Shen, H.: A Multi-swarm Optimizer Based Fuzzy Modeling Approach for Dynamic Systems Processing. Neurocomputing 71, 1436–1448 (2008)
EMD Based Power Spectral Pattern Analysis for Quasi-Brain-Death EEG Qi-Wei Shi1 , Ju-Hong Yang1 , Jian-Ting Cao1,3,4 , Toshihisa Tanaka2,3 , Tomasz M. Rutkowski3 , Ru-Bin Wang4 , and Hui-Li Zhu5 1
Saitama Institute of Technology 1690 Fusaiji, Fukaya-shi, Saitama 369-0293, Japan 2 Tokyo University of Agriculture and Technology 2-24-16, Nakacho, Koganei-shi, Tokyo 184-8588, Japan 3 Brain Science Institute, RIKEN 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan 4 East China University of Science and Technology Meilong Road 130, Shanghai 200237, China 5 Huadong Hospital Affiliated to Fudan University 221 Yanan West Rd, Shanghai 200040, China
[email protected],
[email protected] Abstract. Evaluating the significance differences between the group of comatose patients and the group of brain death is important in the determination of brain death. This paper presents the power spectral pattern analysis for Quasi-Brain-Death EEG based on Empirical Mode Decomposition (EMD). We first decompose a single-channel recorded EEG data into a number of components with different frequencies. We then focus on the components which are related to the brain activities. Since the power of spontaneous activities in the brain is usually higher than that of non-activity components. Therefore, we can evaluate the power spectral patterns between comatose patients and quasi-brain-deaths. Our experimental results illustrate the effectiveness of proposed method. Keywords: Electroencephalography (EEG), Quasi-Brain-Death, Empirical Mode Decomposition (EMD), Power Spectral Pattern.
1
Introduction
Brain death, briefly speaking, is defined as the absence and irreversibility of all brain and brain stem function [1]. Based on this definition, the medical criteria are established in the most countries [2]. For example, the Japanese criterion includes the following major items for brain death determination: i. Deep coma: unresponsive to external visual, auditory and tactile stimuli and be incapable of communication. ii. Pupil test : no pupils’ response to light and pupils dilate to 4 mm. iii. Brain stem reflexes test : e.g. gag reflex, cough reflex, corneal reflex, painful stimuli are absent. D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 814–823, 2009. c Springer-Verlag Berlin Heidelberg 2009
EMD Based Power Spectral Pattern Analysis for Quasi-Brain-Death EEG
815
iv. Apnea test : patient’s loss of spontaneous respiration after disconnecting the ventilator. v. EEG confirmatory test : persistence of brain dysfunction, six hours with a confirmatory EEG, flat EEG at level of 2 μV/mm. The standard process of brain death diagnosis involves risky and time consuming items [3]. For example, in the apnea test, the respiratory machine is temporarily removed in order to determine the patient’s spontaneous respiration. Moreover, in the EEG confirmatory test, the recordings should continue at least 30 minutes, and the test should be repeated again after 6 hours. To avoid the risks and develop a practical yet safe and reliable method in the diagnosis of brain death, the EEG-based preliminary examination has been proposed [4,5]. That is, after items i.–iii. have been verified, an EEG-based preliminary examination is to be applied at the bedside of patient. Since EEG recordings might be corrupted by some artifacts or various sources of interfering noise, in the EEG preliminary examination system, it’s critical to extract informative features from noisy EEG signals and evaluate their significance. To study statistical significance differences between the presence and absence of brain activities in our clinical EEG data, several complexity measures are developed for the quantitative EEG analysis [6]. To decompose brain activities with a specific frequency, the time-frequency EEG analysis technique based on empirical mode decomposition (EMD) has been proposed [7,8]. In this paper, we present an exploratory data analysis technique based on empirical mode decomposition method. The EMD method is used to decompose a single-channel recorded EEG data into a number of components with different frequency. Since the power of spontaneous activities in the brain is usually higher than that of non-activity components, we apply the power spectrum analysis technique to obtain the average maximum power of one thousand continuous samples of EEG data. The experimental results illustrate the proposed method is effective to extract the underlying data and show well performance on evaluating the differences between comatose patients and quasi-brain-deaths.
2
Empirical Mode Decomposition
The EMD is an analysis method that it adaptively decomposes any complicated data set into some oscillatory components called intrinsic mode function (IMF). The IMFs, usually expressed as the standard Hilbert transform, represent the oscillation modes embedded in the data. The local energy and the instantaneous frequency calculated from IMF components can be given a full energy-frequencytime distribution of the data. Thus, EMD can be seen as a unique spectral analysis technique. For an observed time domain signal x(t), we can always obtain its Hilbert transform f (t), such as ∞ 1 x(τ ) f (t) = P dτ. (1) π t −∞ − τ
816
Q.-W. Shi et al.
It is impossible to calculate the Hilbert transform as an ordinary improper integral because of the pole at τ = t. However, the P in front of the integral denotes the Cauchy principal value which expanding the class of functions for which the integral in Eq. (1) [9]. With this definition, x(t) and f (t) form the complex conjugate pair, so the complex signal Z(t) can be formulated as Z(t) = x(t) + jf (t) = a(t)ejθ(t) ,
(2)
in which j is the imaginary unit (j 2 = −1), an instantaneous amplitude a(t) and an instantaneous phase θ(t) are presented by a(t) = x2 (t) + f 2 (t), (3) −1
θ(t) = tan
f (t ) . x (t )
(4)
The instantaneous frequency ω(t) of the signal x(t) can be defined as ω(t) =
dθ(t) . dt
(5)
In principle, it is necessary that one limitation is a narrow band signal for the instantaneous frequency by Eq. (5). An IMF component as a narrow band signal is a function that satisfies two conditions [9]: a) In the whole data set, the number of extrema and the number of zero crossings must either equal or differ at most by one. b) At any point, the mean value of the upper envelope with the lower envelope is zero. Here the upper envelope is defined by the local maxima, and the lower envelope is defined by the local minima. The procedure to obtain the IMF components from an observed signal is called sifting and it consists of the following steps: 1. Identification of the extrema of an observed signal. 2. Generation of the waveform envelopes by connecting local maxima as the upper envelope, and connection of local minima as the lower envelope. 3. Computation of the local mean by averaging the upper and lower envelopes. 4. Subtraction of the mean from the data for a primitive value of IMF component. 5. Repetition above steps, until the first IMF component is obtained. 6. Designation the first IMF component from the data, so that the residue component is obtained. 7. Repetition above steps, the residue component contains information about longer periods which will be further resifted to find additional IMF components.
EMD Based Power Spectral Pattern Analysis for Quasi-Brain-Death EEG
817
The sifting algorithm is applied to calculate the IMF components based on a criterion by limiting the size of the standard deviation (SD) computed from the two consecutive sifting results as T 2 (hk −1 (t ) − hk (t )) SD = . (6) hk2−1 (t ) t=0 Based on the sifting procedure for one channel of the real-measured EEG data, we finally obtain n x(t) = ci (t) + rn (t). (7) i=1
In Eq. (7), ci (t)(i = 1, · · · , n) represents n IMF components, and rn represents a residual component. The residual component can be either a mean trend or a constant.
3
EEG Preliminary Examination with EMD
3.1
EEG Signals and Brain Activity
The EEG preliminary examination was carried out in the Shanghai Huashan Hospital affiliated to Fudan University (China). A portable EEG system (NEUROSCAN ESI) was used to record the patient’s brain activity. The EEG data was directly recorded at the bedside of the patients in the intensive care unit (ICU). In the examination, only nine electrodes are chosen to apply to patients. Among these electrodes, six exploring electrodes (Fp1 , Fp2 , F3 , F4 , F7 , F8 ) as well as GND were placed on the forehead, and two electrodes (A1 , A2 ) as the reference were placed on the earlobes (Fig. 1). The sampling rate of EEG was 1000 Hz and the resistances of the electrodes were set to less than 8 kΩ.
Fig. 1. The layout of six exploring electrodes, GND and two reference electrodes
818
Q.-W. Shi et al.
A total of 35 coma and quasi-brain-death patients had been examined by using EEG from June 2004 to March 2006. The patients were classified into a deep-coma group (19 cases) and a quasi-brain-death group (17 cases) after medical diagnosis. One patient firstly showed being in the coma state and then behaved as quasi-brain-death. The total recording EEG data of these patients with an average signal length of 300 seconds were analyzed. Expecting the brain activities of a comatose patient dominate in lower frequency bands, in our experimental analysis, we focus on the δ-wave (0.1–3 Hz), the θ-wave (4–7 Hz) and α-wave (8–13 Hz). Moreover, considering the power of activities in the brain is usually higher than that of non-activity components, we evaluate the power spectral when classifying coma and quasi-brain-death patient. In the following, we will present two representative cases of coma and quasibrain-death patients. Then, brain activities power distribution will be presented basing on the summary of all patients’ EEG analysis results. 3.2
Patient A with Brain Activity
We first demonstrate our result by an example of a 17-year-old female patient (Patient A). The patient was suffered from encephalitis and was admitted to the hospital one month later. After two days’ hospitalization, the clinical diagnosis showed that the patient was in a deep coma state since a very weak visual response appears in a few time. The EEG examination was taken in March, 2005 and lasted 314 seconds. As an example, a small time window of five seconds (78 ∼ 83 sec.) EGG signal is shown in Fig. 2(a). From the EEG activity oscillates shown in Fig. 2(a), we paid attention to a randomly chosen channel. For instance, the sample of channel F4 from the beginning of 78 sec. is selected as an example. Applying the data analysis method described in Section 2 to the chosen data, we obtained the result shown in Fig. 2(b). In time domains, raw signal F4 was decomposed to five IMF components (C1 ∼ C5 ) and a residual component (r). Then, in the right column, components are displayed in the frequency domain by applying the Fast Fourier Transform (FFT). Generally, component with such a high frequency like C1 does not exist in brain and the residual component (r) is not related to brain activities. As shown in Fig. 2(b), the right column gives the maximum value of each useful IMF component’s power spectra in their frequency domain. Among the four components (C2 ∼ C5 ), one with a frequency of 13 Hz and one with a frequency of 3 Hz can be seen clearly. Compared to others, their values of power separately go up to 2413.07 and 4199.57 that implied the high intensity of brain activities. Then, we applied the same method to one thousand continuous samples of Channel F4 from 78 sec. (Fig. 3) and calculate each sample’s maximum power among useful components. These 1000 samples’ average power that the signal contained goes up to 3205.61. This data analysis result is identical to the clinical diagnosis that the patient was in coma state.
EMD Based Power Spectral Pattern Analysis for Quasi-Brain-Death EEG
819
Fp1 Fp2 F3 F4 F7
Scale 80
F8
+
78
79
80
81
82
83
(a) A view of Patient A’s recorded EEG data
F4
EMD Result of Single Channel 50 0 -50 78
78.1
78.3
78.4
78.5
0
C2
C2
0
C3
C3
C4
C4
2413.07
4000
0
2000 0
-10 50
4199.57
4000
C5
C5
341.516
2000 0
-5 10
0
79
119.651
4000
0
78.9
2000
-5 5
78.8
2000 0 4000
0
2000 0
-50 20
4000
r
0
r
78.7
Maximum Power
4000
-5 5
-20
78.6
Time(sec.)
C1
C1
78.2
EMD result
5
78
78.5
2000 0
79
0
10
20
30
40
Frequency(Hz)
Time(sec.)
(b) EMD result for channel F4 in time and frequency domain Fig. 2. EMD results for Patient A who has brain activity
EEG Single Channel Data F4
50 0 -50 77.5
78
1000 sampling amounts
79
79.5
80
80.5
Time(sec.)
Fig. 3. 1000 sampling amounts of Channel F4 from 78 sec
3.3
Patient B without Brain Activity
In the second example, Patient B was a 56-year-old man who had a primary cerebral disease and was admitted to the hospital on October, 2005. One day
820
Q.-W. Shi et al.
Fp1 Fp2 F3 F4 C3
Scale 31.65
C4
+
551
552
553
554
555
556
(a) A view of Patient B’s recorded EEG data EMD Result of Single Channel F3
20 10 0 551
551.1
551.2
551.3
551.4
EMD result
C1 C2 C3
4000 2000 0
C4
4000 2000 0
-2
4000 2000 0
5 0 -5
4000 2000 0
C1
4000 2000 0
0
C2
4000 2000 0
C5
551.5 Time(sec.)
C6
5 -5 5 0
C3
-5 5 0
C4
-5 2 0
5 0 -5 10 8 6 551
C7
0
r
r
C7
C6
C5
-2 2
551.2
551.4
551.6
551.8
4000 2000 0 4000 2000 0
552
551.6
551.7
551.8
551.9
552
Maximum Power 53.466 98.5535 149.337 87.6026 756.903 825.17
0
10
20
30
40
Frequency(Hz)
Time(sec.)
(b) EMD result for channel F3 in time and frequency domain Fig. 4. EMD results for Patient B who has no brain activity
EEG Single Channel Data
20
F3
0 -20
1000 sampling amounts 551
551.5
552
552.5
553
Time(sec.)
Fig. 5. 1000 sampling amounts of Channel F3 from 551 sec
later, the patient fell into a deep-coma state and his pupil dilated to 4 mm with no response to light. The EEG examination in the bedside of patient was taken for 1148 seconds. As an example, a window of five seconds (551 ∼ 556 sec.) EEG signals is plotted in Fig. 4(a). Applying the EMD method to the randomly
EMD Based Power Spectral Pattern Analysis for Quasi-Brain-Death EEG
821
Table 1. The average value of power spectral patterns of comatose patients(C1 ∼ C19 ) and quasi-brain-deaths(D13 ∼ D35 ) Case No. C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19
EEG Channel Fp2 Fp2 F3 F4 F4 F4 Fp2 F4 Fp2 Fp1 F8 F8 F3 F3 F8 F3 F3 F8 F7
Average Value of Power Spectrum 3857.95 2393.03 3012.26 1054.8 3527.5 7638.7 2026.98 3205.61 6545.59 2762.4 2287.33 4497.08 3415.46 3226.73 3399.63 3813.66 5773.31 4061.89 1885
Case No. D13 D20 D21 D22 D23 D24 D25 D26 D27 D28 D29 D30 D31 D32 D33 D34 D35
Average Value of Power Spectrum 177.04 704.38 735.82 676.25 474.9 391.06 645.02 967.23 469.21 221.25 520.44 544.5 338.81 414.89 528.67 247.36 228.87
selected channel F3, for example, we obtained the EMD result of seven IMF components (C1 ∼ C7 ) and one residual component (r) shown in Fig. 4(b). Desirable components (C2 ∼ C7 ) were transformed into frequency domain by Fourier Transform. The y-coordinate of the right column in Fig. 4(b) is in the same scope as the one of Fig. 3(b), however, the amplitudes of IMF components were all in a low range. Same as Patient A mentioned before, EMD method was applied to one thousand continuous samples of Patient B’s EEG signal (Fig. 5) and the average maximum value of IMFs’ in the power spectra is only 544.5. Comparing to the result of Patient A, the average power implied the brain activity of Patient B can hardly be obtained. Without loss of generality, we applied the same process to other channels but only similar results were obtained. The standard clinical criteria also concluded the patient in brain-death. 3.4
Evaluation of All Patients’ EEG Power Spectrum
The EMD method we proposed has been applied to the 35 quasi-brain-death patients. Distinct from the previous method which just focused on the maximum power of EEG data at one point [10], in this paper, we gave the average power of EEG signal in a period of time. The evaluation of all cases are shown in Table 1. Obviously, the average value of brain activities in power spectral pattern of coma cases (C1 ∼ C19 ) are rel-
822
Q.-W. Shi et al. 8000
C: Coma D: Quasi Brain Death
7000 6000 5000 4000 3000 2000 1000 0
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10C11C12C13C14C15C16C17C18 C19
D D D D D D D D D D D D D D D D D 13
20 21
22 23
24 25 26
27 28
29 30
31 32
33 34
35
Fig. 6. The brain activities’ power distribution of coma and brain death patients
atively higher than that of quasi-brain-death cases (D13 ∼ D35 ). As shown in Fig. 6, the experimental results provid the statistical evaluation of the average power of all patients’ brain activities in a period of time. More specifically, in coma group, the lowest value of brain activities in spectral pattern (C4 : 1054.8) of coma cases is even larger than the highest value (D26 : 967.23) in the group of quasi brain death. The results obtained in our paper have been compared to those by principal factor analysis (PFA) associated ICA [4] or complexity measures for the quantitative EEG analysis [6]. In classifying the group of come and quasi brain death cases, we could obtain the same results based on our proposed method.
4
Conclusion
In this paper, we proposed a practical EEG-based preliminary examination in the process of clinical diagnosis for brain death. The analysis method EMD was applied randomly on a channel of raw EEG signal. The amount of 1000 samples were analyzed and the average maximum value of power spectral pattern reflected the intensity of brain activities. The average power value demonstrated the high intensity of spontaneous brain activities among the 19 coma cases, as well as the absence of those among 17 quasi-brain-deaths excepted some noise. Moreover, we found the results of EMD method corresponding to the clinical diagnosis. Since the EMD method showed its effectiveness and reliability, we expect to proposed more detail about statistical evidence in evaluating the power spectral patterns between comatose patients and quasi-brain-deaths. In the near future, we wish to apply this method to real-time measured data. Also, the diagnosis of brain death is expected to be more precise.
EMD Based Power Spectral Pattern Analysis for Quasi-Brain-Death EEG
823
Acknowledgments The authors would like to acknowledge Dr. GuoXian Zhu and Dr. Yue Zhang of Shanghai Huashan Hospital and Prof. Yang Cao and Prof. Fanji Gu of Fudan University for assistance in the EEG examinations and useful comments. This work was supported in part by KAKENHI (21360179).
References 1. Taylor, R.M.: Reexamining the Definition and Criteria of Death. Seminars in Neurology 17, 265–270 (1997) 2. Wijdicks, E.: Brain Death Worldwide: Accepted Fact but no Global Consensus in Diagnostic Criteria. Neurology 58, 20–25 (2002) 3. Marks, S., Zisfein, J.: Apneic Oxygenation in Apnea Tests for Brain Death: A Controlled Trial. Neurology 47, 300–303 (1990) 4. Cao, J.: Analysis of the quasi-brain-death EEG data based on a robust ICA approach. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds.) KES 2006. LNCS (LNAI), vol. 4253, pp. 1240–1247. Springer, Heidelberg (2006) 5. Cao, J., Chen, Z.: Advanced EEG Signal Processing in Brain Death Diagnosis. Signal Processing Techniques for Knowledge Extraction and Information Fusion, 275–298 (2008) 6. Chen, Z., Cao, J., Cao, Y., Zhang, Y., Gu, F., Zhu, G., Hong, Z., Wang, B., Cichocki, A.: An Empirical EEG Analysis in Brain Death Diagnosis for Adults. Cognitive Neurodynamics 2, 257–271 (2008) 7. Li, L., Saito, Y., Looney, D., Cao, J., Tanaka, T., Mandic, D.P.: Data Fusion via Fission for the Analysis of Brain Death. Evolving Intelligent Systems: Methodology and Applications, 279–320 (2008) 8. Saito, Y., Tanaka, T., Cao, J., Mandic, D.: Quasi-Brain-Death EEG Data Analysis by Empirical Mode Decomposition. Advances in Cognitive Neurodynamics, 837– 842 (2007) 9. Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., Yen, N.-C., Tung, C.C., Liu, H.H.: The Empirical Mode Decomposition and the Hilbert Spectrum for Nonlinear and Non-stationary Time Series Analysis. In: Proceedings of the Royal Society of London, vol. A 454, pp. 903–995 (1998) 10. Shi, Q., Yang, J., Cao, J., Tanaka, T., Wang, R., Zhu, H.: EEG Data Analysis Based on EMD for Coma and Quasi-Brain-Death Patients. Journal of Experimental & Theoretical Artificial Intelligence (in print, 2009)
Proposal of Ride Comfort Evaluation Method Using the EEG Hironobu Fukai1 , Yohei Tomita1 , Yasue Mitsukura1 , Hirokazu Watai2 , Katsumi Tashiro2 , and Kazutomo Murakami2 1
Graduate School of Bio-Applications & Systems Engineering, Tokyo University of Agriculture and Technology, 2-24-16, Naka-cho, Koganei, Tokyo, 184-8588, Japan 2 Bridgestone Corporation, 3-1-1, Ogawahigashi, Kodaira, Tokyo, 187-8531, Japan
[email protected],
[email protected],
[email protected],
[email protected] Abstract. In this study, we propose the ride comfort evaluation method by using the electroencephalography (EEG). Recently, the subjective evaluation method that is questionnaire survey etc. is used for introducing the human sensibility. However, it is not established because of the difficulty of obtaining the human sensibility. Moreover, the objective evaluation method is hoped because the subjective evaluation method has ambiguous criterion by individual, and difference of sensitivity. Therefore, we propose the evaluation method by using the EEG that objective evaluation is possible. In this study, we investigate a ride comfort of car driving. We use the general car and we investigate the ride comfort according to the difference of the tire. The EEG is measured in driving condition. Moreover, the ride comfort subjective evaluation is surveyed by semantic differential method (SD method). The feature of the EEG during the driving and feature of the subjective evaluation is extracted by the factor analysis (FA). From the result, the EEG feature and subjective evaluation feature has correlation. Thus, the effectiveness of the proposed method as an objective evaluation method was shown. Keywords: electroencephalogram (EEG); factor analysis (FA); semantic differential method (SD method).
1
Introduction
Recently, many products and services for the people are supplied in the world. As a result, life consciousness of people is improved. Therefore, to give people the satisfaction, it is requested for innovative technology that to clarify a higherorder reaction of the sensibility and to take it to the product. Thus, the sensibility which people have originally is researched [1],[2]. Most of these researches are evaluated by questionnaire etc. that is subjective. However, the objective D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 824–833, 2009. c Springer-Verlag Berlin Heidelberg 2009
Proposal of Ride Comfort Evaluation Method Using the EEG
825
evaluation method is hoped because the subjective evaluation method has ambiguous criterion and difference of sensitivity among each individual. Therefore, we propose the evaluation method by using the EEG that objective evaluation is possible. For the experiment, ride comfort is evaluated by using the car that is a familiar vehicle. Ride comfort is a psychological reaction that the environment in the car gives, and the overall sense caused by the driving of the vehicle. When we consider the ride comfort, many factors including a temperature, besides sitting feelings of the vibration, the noise, the chair, humidity, feeling of quality, a smell, a lighting, an atmospheric pressure variation, and the indoor design, etc. influence the human sensibility. Therefore, present ride comfort evaluation is decided subjectively by the professional test driver. The test driver is an expert driver, and it is necessary and indispensable for the evaluation of the driving performance. However, general drivers has own preference. Thus, they have various evaluations of the preference. Therefore, for improvement of general driver’s ride comfort that has various preference, not only professional driver but also general driver’s evaluation is requested. However, it is difficult to evaluate a quantitative ride comfort. Then, the ride comfort research is researched physically. In the ride comfort, inspection of evaluation including the characteristic of the human vibrational sensibility is widely researched [3]. General evaluation criterion by vibration is defined as the acceleration that is the vertical vibration etc. by specification of International Organization for Standardization (ISO). Human’s vibration sensibility receiving characteristic is known to be sensitive to the bandwidth of the vertical vibration of 4-8Hz and the horizontal vibration of 1-2Hz well. Takei and Ishiguro proposed the ride comfort evaluation equation by doing the sensory evaluation and the vibration measurement that is a regular vibration and a transitional vibration in rough road driving [4]. Inagaki et al. proposed the ride comfort evaluation equation by doing the sensory evaluation and the vibration measurement that is a regular vibration and a transitional vibration in rough road driving [5]. Moreover, the evaluation to the load noise generated when driving in the car is done. Ishii et al. reported that the sample sound was evaluated by using the EEG including the load noise, and the regression to the sound that became comfortable was obtained [6]. Thus, the ride comfort is obtained from not only influence from a specific sensibility but also influence from several sensibilities though the research on the quantification of ride comfort is advanced. Therefore, the overall ride comfort is evaluated by actually measuring subject’s EEG under driving. In this study, we investigate the ride comfort according to the difference of the tire by using the EEG obtained from single channel EEG instrument.
2
The Procedure of the Proposed Method
The procedure of the proposed method is shown in followings. 1. The EEG signal measurement in moving car 2. Questionnaire survey
826
H. Fukai et al.
3. Feature extraction by factor analysis 4. Factor score plotting in two dimensions 5. Comparison of the EEG feature and the questionnaire result First of all, when evaluating the ride comfort, we experiment with seven kinds of roads with three kinds of tires, and record the EEG that can be acquired in these cases. Secondly, we survey the psychological state by questionnaire. In this survey, we use the SD method. In the acquired EEG, we use 4-22 Hz. Generally, it is said that sensibility will appear to 4-22 Hz in the conventional method[7]. Therefore, we used these frequency bandwidth. Next, we construct the data matrix by using the obtained EEG and questionnaire result for factor analysis, respectively. In the EEG, we decide the frequency and the frequency spectrum average of each experiment as a variate and an individual.For the purpose of extracting of the psychological state, we decide the adjective and the adjective rate as a variate and an individual. Then, we extract the features by the factor analysis. Where, principal factor method is used as factor loading, varimax rotation is adopted as factor rotation, and least-squares estimation is applied as the factor score estimation. Furthermore, in order to make the EEG factor analysis result easy to see, factor score is plotted in two dimensions. Finally, in order to show the effectiveness of proposed EEG feature extraction method, we compare the objective evaluation which EEG feature with subjective evaluation by the SD method feature. If these results are similarly, it is considered that the EEG extract the ride comfort is extracted by using the EEG. 2.1
The Simple Electroencephalograph
In this study, the simple electroencephalograph of the band type is used. This electroencephalograph is made by Brain Function Research & Development Center in Japan. Fig. 1 shows this electroencephalograph. The conventional electroencephalograph is expensive, large, and impossible to measure under in a natural environment. Therefore, the subjects are strained.
Fig. 1. Simple electroencephalogram
Proposal of Ride Comfort Evaluation Method Using the EEG
827
Table 1. Pair of adjectives Pair of adjectives Noiseless Noisy Pleasant Unpleasant Steady Unsteady Soft Hard Likable Unlikable Harsh Gentle Fast Slow Good Bad
This electroencephalograph is compactly made as 120mm (W), 135mm (D), and 35mm (H) and can be measured under the practical environment. Therefore, it’s less burdensome on subjects. The electrode is fixed to the headband. The measurement position is electrode arrangement Fp1 in international 10-20 system. We can obtain the discrete time data that the EEG of one second is analyzed frequency to 24Hz at 1Hz interval. The bandpass filter from 4Hz to 22Hz is used for this simple electroencephalograph. Therefore, we use the time series data of each frequency components between 4 and 22Hz. 2.2
SD Method
There is a multidimensional scale as a method of multiple evaluation. Moreover, the SD method is often used in a multidimensional scale. SD method is a type of a rating scale and measuring the connotative meaning of concepts. In sensory evaluation, it is important to measure the connotative meaning of concepts. SD method rates the impression by multiple rating scales constructed by the pair of various adjectives. In this study, we use SD method as a psychological measurement. Subject evaluate by 5 stages of 8 pairs of adjectives. We use the pairs of adjectives in the Table1. In these pair of adjectives, we select pair of adjectives based on questionnaire survey that consider as riding comfort from frequently-used pair of adjectives that is obtained by inoue [8]. In the proposed experiment, subjects write in the psychological state at the questionnaire paper during the car stopped between the road and road. 2.3
Factor Analysis
The factor analysis is used for the feature extraction of the EEG and the questionnaire survey. The factor analysis is a statistical method to explain the correlation between multivariate by small number of latent factors. The FA is modeled by equition (1). X = AF + E (1)
828
H. Fukai et al.
X is original signal. A indicates factor loading matrix. F means common factor. E is error. The feature data is extracted by the following methods. – Step1 The data matrix is normalized. – Step2 The correlation matrix (R) is calculated from the data matrix. – Step3 Commonality (h2 ) is substituted for diagonal component of the correlation matrix. Squared multiple correlation coefficient is used as an estimate of commonality, and calculated following equations. ⎡
Ri,j
h211 ⎢ r21 ⎢ =⎢ . ⎣ ..
r12 h222 .. .
rn1 rn2
⎤ · · · r1n · · · r2n ⎥ ⎥ . . .. ⎥ . . ⎦ · · · h2nn
(2)
Inverse matrix of R = (rij ) is defined as R−1 = (rij ) Square of multiple correlation coefficient is utilized in the estimation of commonality. Hence, equation (3) is used for estimation of commonality. h2ij = 1 −
1 rij
(3)
– Step4 The obtained correlation matrix is peculiarly resolved and the factor is extracted. The principal factor method is used for the factor extraction. ⎡ ⎤ a11 a12 · · · a1m ⎢ a21 a22 · · · a2m ⎥ ⎢ ⎥ A=⎢ . (4) .. . . . ⎥ ⎣ .. . .. ⎦ . am1 am2 · · · amm Factor loadings are calculated by computing eigenvalue or eigenvector that meet following equation: RQ = Qλ (5) Therefore, it is eigenvalue problem for correlation matrix. Then, factor loadings of principal factor analysis are calculated following equation: A = Qλ1/2
(6)
– Step5 Varimax rotation is done. – Step6 The factor score is calculated by using least-squares estimation. The factor score multiplies by itself.
3
Ride Comfort Evaluation Experiment
In order to show the effectiveness of the proposed method, we experiment and analyze the data. 3 times for 2 days experiments are performed in the Bridgestone
Proposal of Ride Comfort Evaluation Method Using the EEG
829
Proving Ground. The EEG is measured in a real car driver’s seat and passenger’s seat under driving. We use 2 professional drivers and five passengers as the subjects, and 3 kind of tires are used in this experiment. There are seven kinds of roads (smooth road, rough road, undulated road, bank1, bank2, track-groove road, uneven road), and it takes 12 minutes for driving around the Bridgestone Proving Ground. Moreover, the same vehicle is used for all experiments. On the way, the psychological state is surveyed by SD method during the car stopped between the road and road.
4
Experimental Results and Discussions
We illustrate the effectiveness of the proposed method, we show the experimental results. 4.1
Analysis of the EEG
The EEG data which obtained from the all experiment are arranged as data matrix. Data is arranged according to the procedure mentioned above. We analyze the EEG to each subject because EEG include the characteristic of the individual. Fig.2 shows the result according to measured day. Where, x-axis indicates the 1st-order factor and y-axis means the 2nd-order factor. Firstly, we verify the influence of difference by measurement day. In Fig. 2 (a), it is confirmed that data distribution of September 2nd is different from other day. It is considered that influence of physical condition appear in subject D compared with the subject P because subject D has task of driving. Thus, the data of September 2nd is considered as caused by the physical condition change. Moreover, in September 1st, we interrupted the measurement because raining. After that, we resumed the measurement because the rain stops. However, the road condition was different. The difference of the distribution of D2 is considered influence of weather. In Fig. 2 (b), it is difference in July data and September data. The measurement condition including weather, temperature, and so on is huge different between in July and September. Thus, this difference is considered to concern the EEG. Next, when the schedule was different, it turned out that how to feel the person changed as a result. From these results, if the day changed, it turned out that the data was not effective though the tire was able to be distinguished by using EEG. It is considered that this is because how to feel person has changed by the physical condition etc. every day. Therefore, we compare the difference of the tire by using data of September 2nd. Fig. 3 shows the distribution of factor score of each tire on September 2nd. A,B, and C denote the kind of different tire, respectively. Furthermore, R0 means static state. Fig.3(a) shows the distribution difference on tire A and tire B, and Fig.3(b) indicates the distribution difference of tire B. The difference of these distribution is considered as the difference of ride comfort.
830
H. Fukai et al.
(a) D2
(b) P2 Fig. 2. Distribution of factor score (each measurement day)
4.2
Analysis of the Subjective Evaluation
We construe the factor of adjective pairs to subject D and P, respectively. We pay attention to the factor loadings that absolute value is over 0.4. Table 2 shows the factor loadings. Showing the Table 2, 1st-order factor having maximum eigenvalue means emotional adjective, 2nd-order factor is construed as a physical value. Moreover, the sensitive factor is selected for subject D in 3rd-order factor. On the other hand, the sensory factor and the sensitive factor are selected for subject P. According to these results, the emotional factor is extracted firstly because content of the questionnaire is survey of the mental change of tire and road impression. In the secondly, the physical factor is extracted.
Proposal of Ride Comfort Evaluation Method Using the EEG
831
(a) D2
(b) P2 Fig. 3. Distribution of factor score (each tire [9/2])
4.3
Feature Comparison
In the EEG analysis, the features were different in each tire. Moreover, in subjective evaluation, the meaning of factor was construed. The subjective evaluation results of each tire indicate in Table 3. The sensitive factor and emotional factor has the difference tire A and tire B is shown in subject D2. The tire A and B is shown in the separated area (Fig. 3 (a)). From this result, the relationship of the EEG feature and the subjective evaluation feature is indicated. The sensitive factor and emotional factor difference of tire B and tire C is shown in subject P2. From EEG analyze result, only tire B is shown in another area (Fig. 3 (b)). Relativity of the tire A was not able to be confirmed though the relativity of the EEG feature and the subjective evaluation feature was confirmed about tire B and tire C. Because it is considered that the subjectivity evaluation factor doesn’t correspond to the EEG factor by the one to one but for multiple component to mix it.
832
H. Fukai et al. Table 2. Factor loadings
Variate Variate Noiseless—Noisy Pleasant—Unpleasant Steady—Unsteady Soft—Hard Likable—Unlikable Harsh—Gentle Fast—Slow Good—Bad Variate Noiseless—Noisy Pleasant—Unpleasant Steady—Unsteady Soft—Hard Likable—Unlikable Harsh—Gentle Fast—Slow Good—Bad
1st-order factor 2nd-order factor 3rd-order factor Subject D Subject D 1st-order factor 2nd-order factor 3rd-order factor -0.111 -0.092 -0.517 -0.559 -0.299 -0.467 -0.669 -0.292 -0.094 0.010 -0.533 -0.268 -0.724 -0.005 -0.194 0.216 0.598 0.003 -0.240 -0.016 -0.002 -0.609 -0.201 -0.327 Subject P 1st-order factor 2nd-order factor 3rd-order factor 0.317 0.606 -0.177 0.608 0.369 -0.110 0.408 0.502 -0.438 0.020 0.499 -0.112 0.798 0.018 -0.125 -0.068 -0.430 0.610 -0.017 -0.068 0.523 0.727 0.131 0.050
Table 3. Subjective evaluation result of each tire D2 Tire Emotion Physical value Sensibility A -0.555 -0.420 1.856 B 1.017 -1.010 -0.883 C -1.228 0.670 0.832 P2 Tire Emotion Physical value Sensibility A 0.675 -0.244 -0.167 B 0.397 -0.207 0.077 C 0.896 -0.429 0.474
5
Conclusions
In this study, we proposed ride comfort evaluation method by using the EEG. For the purpose of verifying the effectiveness of the proposed method, we analyze the real EEG of the subject in a real car driver’s seat and passenger’s seat under driving. The factor analysis was used as a method of extracting the feature of the EEG and SD method. In addition, to make analytical data easy to
Proposal of Ride Comfort Evaluation Method Using the EEG
833
see, it made it to visible. Moreover, the effectiveness of the EEG measurement was verified by the comparison with the subjectivity evaluation. The EEG feature and the psychological state feature indicated the correlation. The future works are as following. 1. 2. 3. 4.
Rotation of factor score that considers tension level EEG pattern classification The trend review of the analysis result and the other biological information. Semantic analysis of the EEG feature factor.
References 1. Kitajima, M., Utsugi, A.: On the Measurement of Kansei-Two Approaches to Model Sense, Feeling, Emotion, or Sensitivity. The Journal of the IEICE 76(3), 242–245 (1993) 2. Inokuchi, S.: Kansei Information Processing. The Journal of the IEICE 80(10), 1007–1012 (1997) 3. Shiiba, T., Suda, Y.: Ride Comfort Evaluation of Automobile with Driving Simulator:An Approach by Driving Simulator with Multibody Vehicle Model. The Journal of the JSME 68(670), 119–124 (2002) 4. Takei, K., Ishiguro, M.: Evaluation of Ride Comfort on the Basis of Subjective Judgement. R&D Review of Toyota CRDL 30(3), 47–56 (1995) 5. Inagaki, H., Taguchi, T., Yasuda, E., Doi, S.: Evaluation of Seat Kansei Quality. R&D Review of Toyota CRDL 35(4), 9–14 (2000) 6. Ishii, Y., Yamashita, T., Araga, Y.: Sensory Evaluation of Road Noise with Organic Reaction using Brain Waves. Honda R&D Technical Review 14(2), 181–188 (2002) 7. Ito, S., Mitsukura, Y., Fukumi, M., Akamatsu, N.: Proposal of the EEG Analysis Method using the Individual Characteristic of the EEG. T.IEE Japan 124-C(64), 1259–1266 (2004) 8. Inoue, M., Kobayashi, T.: The Research Domain and Scale Construction of Adjective-Pairs in a Semantic Differential Method in Japan. Jap. J. of Educ. Psychol. 33(3), 253–260 (1985)
Image Reconstruction Using NMF with Sparse Constraints Based on Kurtosis Measurement Criterion Li Shang1,2, Jinfeng Zhang2, Wenjun Huai2, Jie Chen2, and Jixiang Du3,4,5 1 JiangSu Province Support Software Engineering R&D Center for Modern Information Technology Application in Enterprise, Suzhou 215104, Jiangsu, China 2 Department of Electronic Information Engineering, Suzhou Vocational University, Suzhou 215104, Jiangsu, China 3 Department of Computer Science and Technology, Huaqiao University, Quanzhou 362021, Fujian, China 4 Department of Automation, University of Science and Technology of China, Hefei 230026, Anhui, China 5 Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031, Anhui, China {sl0930,zjifeng,hwj,cj}@jssvc.edu.cn,
[email protected] Abstract. A novel image reconstruction method using non-negative matrix factorization (NMF) with sparse constraints based on the kurtosis measurement is proposed by us. This NMF algorithm with sparse constraints exploited the Kurtosis as the maximizing sparse measure criterion of feature coefficients. The experimental results show that the natural images’ feature basis vectors can be successfully extracted by using our algorithm. Furthermore, compared with the standard NMF method, the simulation results show that our algorithm is indeed efficient and effective in performing image reconstruction task. Keywords: NMF; Sparse constraints; Kurtosis; Image reconstruction.
1 Introduction Non-negative matrix factorization (NMF) algorithm is a very efficient parameter-free method for decomposing multivariate data into strictly positive activations and basis vectors[1-4]. NMF usually produces a sparse linear representation of the data, but this representation does not always result in a parts-based one. Because sparse coding has also on theoretical grounds been shown to be a useful middle ground between completely distributed representations and unary representations, P. Hoyer[2] incorporated the notion of sparseness with NMF to improve the found decompositions. This sparse NMF method[2] can discover parts-based representations that are qualitatively better than those given by basis NMF[3-4] , and Hoyer has proved that his algorithm is successful in extracting face image features. However, in Hoyer’s algorithm, the sparse measure is defined by the relationship between L1 norm and L 2 norm[2], and the desired sparseness is set by the users rather than the data of feature basis vectors or features coefficients. Thus, the sparseness measure is not self-adaptive to natural D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 834–840, 2009. © Springer-Verlag Berlin Heidelberg 2009
Image Reconstruction Using NMF with Sparse Constraints
835
images. To remove or avoid the disadvantages mentioned above, in this paper, we propose a modified NMF with sparse constraints algorithm, which exploits the Kurtosis as the maximizing sparse measure criterion[5], so the natural image structure captured by the Kurtosis not only is surely sparse, but also is surely independent[6-8]. The experimental results also showed that utilizing our sparse NMF algorithm, the natural images’ features can be extracted successfully. Further, applied these features, the original images can be reconstructed clearly.
2 The NMF with Kurtosis Based Sparse Constraints 2.1 The Cost Functions Referring to the classical NMF algorithm[3], and combining the minimum image reconstruction error with Kurtosis measure of feature coefficients, we construct the following cost function of the minimization problem: 2
F (W, H) =
M 1 ⎡ ⎤ ∑ ⎢ V i ( x , y ) − ∑ H ij ( x , y ) W j ⎥ − λ ∑ kurt ( H ij ) , j =1 ij 2 i ⎣ ⎦
(1)
where the matrices of V , W and H are all non-negative. Parameter λ is also positive constant. W is a N×M matrix containing the basis vectors W j as its columns. V is a N×L matrix, which denotes the n-dimensional input image data. Thus, H is a M×L matrix. Vi and W j are the columns of V and W , respectively, and Hij is the activities used for the encoding of the data. Here, the index i in Vi accounts for the ith input, whereas the index j accounts for the jth basis vector W j used for the image reconstruction. This means that ezch data vector Vi is approximated by the linear combination of the basis vectors W j and their activities Hij . In Eqn. (1), the first term is the image reconstruction error and it ensures a good representation for a given image, and the second term is the sparseness measure based on the absolute value of Kurtosis, also named sparsity penalty term, which is defined as:
(
)
kurt ( h i ) = E {h i4} − 3 E{h i2} . 2
(2)
and maximizing kurt ( h i ) (i.e., minimizing − kurt ( h i ) ) is equivalent to maximizing
the sparseness of coefficient vectors; The last term, a fixed variance term, can penalize the case in which the coefficient variance of the ith vector h i2 deviates from its target value σ 2t . Without this term, the variance becomes so small that the sparseness constraint can only be satisfied, and the image reconstruction error would become larger, which is not desirable either. In Eqn. (1), to avoid the scaling misbehavior, usually a normalization step for the basis vectors is incorporated when minimizing Eqn. (1). Then, the Eqn.(1) can be rewritten as follows:
836
L. Shang et al.
2
M 1 ⎡ Wj ⎤ F (W, H) = ∑ ⎢ V i ( x , y ) − ∑ H ij ( x , y ) ⎥ − λ 1∑ kurt ( H ij ) . j =1 ij 2 i ⎢⎣ W j 2 ⎥⎦
(3)
Now we are looking at a cost function which effectively depends on variables F
({W } , H ) , with W being the normalized basis vectors W = W j j
j
j
j
Wj 2.
2.2 Learning Rules
It should be noted that before updating the basis function and coefficient weights, the observed data X has been centered and whitened. Similarly to the quotients of standard NMF, we can reformulate the fixed point conditions or the cost function (1) into multiplicative update rules [2], the general steps are shown as follows:
Step 1. Calculate and store ∇W W j 2 ; Step 2. Normalize the basis vectors W j : W j ← W j W j 2 ; Step 3. Calculate the reconstruction according to the following equation: R i = ∑ H ij W j.
(4)
j
Step 4. Update the activities according to T
H ij ← H ij :
Vi W j . T R i W j + λ g ′ ( H ij )
(5)
where g ′ ( H ij ) is calculated as follows: g ′ ( H ij ) =
∂ ku rt ( H ij ) ∂ H ij
= β ⎡⎣ H 3ij − 3 H ij2
H ij ⎤⎦ ,
(6)
where β = sign ( kurt ( h i ) ) , and for super-Gaussian signals, β = 1 , and for subGaussian signals, β = −1 . Because of natural image data belonging to superGaussian, β is equal to 1. Step 5. Calculate the reconstruction with new activities according to R i = ∑ H ij W j .
(7)
j
Step 6. Update the non-normalized basis vectors according to
(
)
∑ H ij ⎡ V i + R Ti W j ∇ W W j 2 ⎤ ⎣ ⎦ i . W j ← W j: j⎡ T H i ⎣ R i + V i W j ∇ W W j 2 ⎤⎦
(
Step 7. Return to step 1until convergence.
)
(8)
Image Reconstruction Using NMF with Sparse Constraints
(a)
(b)
837
(c)
Fig. 1. Basis vectors obtained by our sparse NMF algorithm to natural scenes. (a) the positive basis for ON-channel; (b) the negative basis for OFF-channel; (c) the basis for ON-channel minus the basis for OFF-channel.
Figure 1 shows the feature basis vectors estimated for these ten natural images using the above-mentioned updating rules of the sparse NMF algorithm, where gray pixels denote zero weight, black pixels denote negative weight, and brighter pixels denote positive weights. From figure 1, it is clear to see that the feature basis are not only sparse, but also most of learned basis vectors behave the locality and orientation in spatial domain.
3 Experimental Results 3.1 Image Data Preprocessing
We select natural images as test images in our experiments. All test images used are available on the following Internet web: http:// www.cns.nyu.edu/lcv/denoise. Firstly, selecting randomly 10 noise-free natural images with 512×512 pixels. Then, we sampled patches of 8×8 pixels 5000 times from each original image, and converted every patch into one column. Thus, each image was converted a matrix with 64×5000, and the size of input data set X consist of 10 test images was 64×50000 pixels. Consequently, each image patch is represented by a 8×8 dimensional vector. Further, the data set X was centered and whiten by the method of principal component analysis ˆ . Considering the non(PCA), and the preprocessed data set was denoted by X ˆ negativity, we separate X into ON-channel and OFF-channel, denoted respectively by Y and Z . So, the non-negative matrix V=(Y;Z) is obtained. And then, using the updating rules of W and H in turn, we minimized the objective function given in Eqn. (8). The extracted 64 feature basis vectors of natural scenes are shown in Fig. 1, as described in subsection 2.2. 3.2 Image Reconstruction
Using the feature bases extracted, we can implement the task of restoring the original images. The test image was selected as the classical image shown in Fig. 2 (a), used
838
L. Shang et al.
(a) The original Gasshopper image
(b) 5000
(c) 3000
(c) 10000
(e) 20, 000
(f) 50, 000
Fig. 2. The original image and reconstructed results obtained by our SC algorithm. (a) The original Gasshopper image; (b) 5000 image patches; (c) 20000 image patches; (d) 50000 image patches.
widely in the image processing field, commonly known as Gasshopper with 256×512 pixels. Assume that the number of image patches with 8×8 pixels sampled randomly from the Gasshopper image was 3000, 5000, 10000, 20000 and 50000, respectively. Then, corresponding to different image patches, the image reconstruction results obtained by our sparse NMF algorithm were shown in Fig. 2 (b) to (f), in the same time, the reconstruction ones obtained by the standard NMF method were also given, which were shown in Fig. 3. Note that to find the accurate position of any image patch, we need remember the positions of all image patches sampled randomly. Because of sampling randomly, the same pixel might be founded in different image patches. Therefore, for the sample pixel, we averaged the sum of the values of all reconstructed pixels, and used the averaged pixel value as the approximation of the original pixel. Distinctly, from Fig.2 and Fig.3, it can be seen that the larger the number of image patches is, the clearer the
Image Reconstruction Using NMF with Sparse Constraints
839
Table 1. Values of SNR obtained by different algorithms corresponding to different image patches
SNR of Image patches
Images patches 3000 8.91 8.97
Standard NMF Our sparse NMF
5000 11.18 11.24
10000 15.31 15.43
20000 18.27 18.85
(a) 3000
(b) 5000
(c) 10000
(c) 20000
(e) 30, 000
(f) 50, 000
50000 20.96 21.36
Fig. 3. The reconstructed results obtained by standard NMF algorithm. (a) 5000 image patches; (b) 20000 image patches; (c) 50000 image patches.
reconstructed image is. When the number of image patches equals to 50000, it is difficult to tell it from the original images only with naked eyes. Moreover, compared Fig. 2 with Fig. 3, correspond to each reconstruction image with the same patches, respectively obtained by algorithms of our sparse NMF and standard NMF, it is also not easy to tell them. Thus, to testify that our proposed NMF algorithm can be indeed successfully applied on image feature extraction, the signal to noise ratio (SNR) values of the output image were also calculated and listed in Table 1. According to
840
L. Shang et al.
the experimental results, it is easy to see that our sparse NMF algorithm is surely better than standard NMF in feature extracting.
4 Conclusions In this paper, a novel natural image feature extraction method using a modified sparse NMF algorithm developed by us is proposed. This sparse NMF algorithm exploits the Kurtosis as the maximizing sparse measure criterion, so the natural image structure captured by the Kurtosis not only is surely sparse, but also is surely independent. Using this algorithm, the features of nature images can be extracted successfully, and this features are not only sparse, but also most of learned basis vectors behave the locality and orientation in spatial domain, and utilizing these features, any nature image can be reconstructed. Compared with the standard NMF in performing image reconstruction, the experimental results show that our sparse NMF is indeed efficient in extracting features of natural images.
Acknowledgement This research was supported by the Opening Project of JiangSu Province Support Software Engineering R&D Center for Modern Information Technology Application in Enterprise (No. eisecSX200806). And it was also sponsored by Qing Lan Project, Startup Foundations Research for Young Teachers of Suzhou Vocational University (SZDQ09L05), the grants of the National Science Foundation of China (No.60805021), the China Postdoctoral Science Foundation (No.20060390180 and 200801231), as well as the grants of Natural Science Foundation of Fujian Province of China (No.A0740001 and A0810010).
References 1. Julian, E., Edgar, K.: Sparse coding and NMF. In: proceedings of 2004 IEEE International Joint Conference on Neural Networks, vol. 4, pp. 2529–2533 (2004) 2. Hoyer, P.: Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research 5, 1427–1469 (2004) 3. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999) 4. Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996) 5. Li, S., Cao, F.W., Chen, J.: Denoising natural images using sparse coding algorithm based on the kurtosis measurement. In: Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds.) ISNN 2008, Part II. LNCS, vol. 5264, pp. 351–358. Springer, Heidelberg (2008) 6. Bell, A., Sejnowski, T.J.: The Independent Components’ of Natural Scenes Are Edge Filters. Vision Research 37, 3327–3338 (1997) 7. Hyvärinen, A., Oja, E., Hoyer, P., Horri, J.: Image Feature Extraction by Sparse Coding and Independent Component Analysis. In: Proc. Int. Conf. on Pattern Recognition (ICPR 1998), Brisbane, Australia, pp. 1268–1273 (1998) 8. Hyvärinen, A.: Sparse coding shrinkage: denoising of nongaussian data by maximum likelihood estimation. Neural Computation 11, 1739–1768 (1997)
A Cyanobacteria Remote Monitoring System Zhiqiang Zhao1,2,3 and Yiming Wang3 1
JiangSu Province Support Software Engineering R&D Center for Modern Information Technology Application in Enterprise 2 Department of Electronics & Information Engineering, Suzhou Vocational University, Suzhou 215104,Jiangsu,China 3 Suzhou University, Suzhou 215006,Jiangsu,China
[email protected] Abstract. This article analyzes the major factors and features of the Cyanophyta problem, and makes a research on some key attributes so as to build up a monitor system by analyzing all its steps, and utilizing the current theory and methodology. This article, in view of the traits of the Cyanophyta monitor techniques, fully focuses on the remote wireless monitoring of Cyanophyta in Tai Lake, and have realized software and hardware design of the end device, data excess point and remote monitoring platform. The application of WSNs has improved the Cyanophyta monitoring technique. Keywords: Cyanophyta monitor; Sensor Networks; SimpliciTI.
1 Introduction Safe drinking water has become a key concern of society [1]. With discharging a large number of water which polluted by industrial and domestic, it made cyanobacteria breakout easily which caused by eutrophication of water in the lake, and it polluted the drinking water seriously [2]. It has become a research hotspot to develop some realtime system which can monitoring the changes of cyanobacteria in the lake, and earlier warning to cyanobacteria breakout [3]. As the development of network technology, datebase technology, modern communication technology and sensor network technology, environmental monitoring system combine with computer and communication network has become an important direction of application. The change in cyanobacteria monitoring which from original artificial monitoring to remote automated monitoring network is an inevitable trend of development [4].
2 System Function and Overall Structure 2.1 System Function Introduce As the requirements and characteristics of cyanobacteria monitoring in Tai Lake, the system needs such functions as follows: [4] D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 841–848, 2009. © Springer-Verlag Berlin Heidelberg 2009
842
Z.-Q. Zhao and Y.-M. Wang
1) Monitoring the cyanobacteria situation of water bodies (especially near the water intake of the drinking water factory); 2) Monitoring the water temperature changes in Tai Lake; 3) Realizing the remote monitoring and early warning, long-range management and other functions. 2.2 Diagram of System Structure This system consists of field data acquisition nodes(End Device, ED), data aggregation nodes(Access Point, AP) and remote monitoring platform. Field data acquisition nodes collects the information about consistency of Cyanobacteria chlorophyll and water temperature ,then transport the data to date aggregation nodes through sensor networks[5]. Data aggregation node is the aggregation points between sensor networks and GPRS transmission networks [6]. Data aggregation nodes collect all date from field data acquisition nodes in sensor networks. After treatmenting, data aggregation nodes upload the date to remote monitoring platform through the GPRS networks. Remote monitoring platform connect the data aggregation nodes via the Internet. Remote monitoring platform can real-time monitoring, query data, and make a early warning to the area in which the cyanobacteria concentrations in excess of the scope of permit, and the structure of Wireless Cyanobacteria remote monitoring system is shown in Figure 1.
Fig. 1. The structure of Wireless Cyanobacteria remote monitoring system
3 Hardware Design The main hardware of this system has two parts: field data acquisition nodes and data aggregation nodes. By constituted of the power supply module, temperature sensor, chlorophyll sensor, processor and RF modules, field data acquisition nodes be laid in the water where loop along the Tai Lake. By constituted of the processor, RF module, GPRS module and display module, data aggregation nodes be arranged in the shore of Tai Lake. Remote monitoring platform receiving data via the Internet, and it is responsible for data processing, display, store and provide user-friendly interface for researchers to monitoring the data convenient. System's hardware structure as shown in Figure 2.
A Cyanobacteria Remote Monitoring System
843
Solar cells
Temperature Sensor
Chlorophyll sensor
Processor
End Device
RF Module
Remote Monitoring Platform Access Point
RF Module
LCD display module
Processor
GPRS networks
GPRS module
Fig. 2. System's hardware structure
3.1 ED Circuit ED hardware structure is shown in Figure 3.ED is constituted by Processor, RF module, sensor. The introduction and performance about this part as follows. CC1110[7][8] is a low-cost wireless SOC, for low-power wireless applications introduced by TI. The chip includes a standard enhanced 8051MCU, a wireless transceiver chip CC1110, both were packaged in a chip.8051 MCU own 32KB Flash and 4KB RAM. The main working frequency bands in the wireless communications for the ISM and ARD, in the ISM band can set up 300-348MHz,391-464MHz and 728-928MHz freely. VDD_33 C2100nF
VDD_33
100nF U1 2 DVDD 10 DVDD 28 DGUARD 29 DVDD 30 DCOUPL
C1 1uF
D1 R3 LED VCC
220
D2
P20 DD DC
14 15 16
LED1 LED2
4 3 1 36 35 34 33 32
R4 LED
R5 4.7k
220
J1 3 2 1 CON3
DQ Sig IN
Port
C9
Sig IN 104
VDD_33
C3
R1
5 6 7 8 9 11 12 13
31 RESET_N
10K
C6 220pF AVDD AVDD AVDD AVDD
C12 2.2uF
C7 220pF
E1 ANTENNA
C13
E2 ANTEN
JM1 0
27nH C8
220pF
L5
L4
22nH RF_P RF_N
P0_0 P2.3/XOSC32_Q1 P0_1 P2.4/XOSC32_Q2 P0_2 P0_3 XOSC_Q1 P0_4 XOSC_Q2 P0_5 P0_6 P0_7 RBIAS RESET_N GND Exposed CC1110F32
C5 100nF
L2
P2.0 P2.1 P2.2 P1_0 P1_1 P1_2 P1_3 P1_4 P1_5 P1_6 P1_7
C4 100nF
19 22 25 26
23
27nH C10 8.2pF
3.9pF L3
24
27nH C9 3.9pF
17 18 21 20
Y2 Y1 C17 C15
27
C16
37
R2
27pF
27pF
56K
C19 1u
Fig. 3. ED hardware structure
C18 32.768K
15pF
26M
15pF
C14 C11 220pF 5.6pF
JM2 0
844
Z.-Q. Zhao and Y.-M. Wang
RF transceiver of the CC1110 integrates a highly configurable modem. This modem supports different modulation format, the rate up to 500kbs.CC1110 offers a wide range of hardware support for packet processing, data buffering, burst data transmission, clear channel assessment, connection quality and electromagnetic wave excitation .In receive and transmit mode, the current is lower than 16.2mA or 16mA respectively, rate of 2.4 Kbaud. The ultra-short time of CC1110 from sleep mode to active mode conversion, especially fit for the applications which require very long battery life .CC1110 using QLP package in 6mm * 6mm, have 36 pins total. And there are many wireless module base on CC1110 in the marke. As the actual needs,choose the CC1110 wireless module designed by WXL corporation ChengDu China. There are two types sensor: Chlorophyll sensor and Temperature sensor. System select Wetstar 1006P chlorophyll fluorescence sensor is designed by WETLABS. It is a multi-functional fluorescence with simple to use, high-precision, underwater and so on. Wetstar 1006P can measuring chlorophyll fluorescence of the water precise.Choose DS18B20[9] as the temperature sensor , and this sensor is widely used in application. This sensor with small size, simple connection, Single-bus, fit for romate monitoring system. 3.2 AP Circuit AP use the same wireless processor as the ED, and use the GR64 GPRS module. The GR64 circuit is shown in Figure 4. GR64 [10][11] is a GPRS module embedded TCP / IP protocol stack for GSM / GPRS designed by Sony / Ericssons,it can compatible GR47, and the embedded VD_33
GND
ON/OFF SIM_VCC SIM_DET SIM_RST SIM_DATA SIM_CLK J3 SIM_CLK 6 SIM_RST 5 SIM_VCC 4 C26 104
CcClk CcRst CcVcc
CcIO CcIn Gnd SimCard GND
1 2 3
SIM_DATA SIM_DET C27 104
M1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Fig. 4. GR64 circuit
VCC GR64 GND VCC GND VCC GND VCC GND VCC GND CHG_IN GND ADIN4/GPIO5 ON/OFF SIMVCC SIMDET SIMRST SIMDAT SIMCLK DAC GPIO1 GPIO2 GPIO3 GPIO4 VRTC ADIN1 ADIN2 ADIN3 SDA SCL
BUZZER DSR1/GPIO7 LED/GPIO6 VREF TX_ON RI/GPIO8 DTR1/GPIO10 DCD1/GPIO11 RTS1/GPIO9 CTS1/GPIO12 DTM1 DFM1 DTM3 DFM3 USBDP USBDN SSPDTM SSPDFM VUSB ALARM SSPFS SSPCLK MICIP MICIN EARP EARN AUXO SERVICE AUXI AREF
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
L3 GRI GDTR GRTS URX UTX
A Cyanobacteria Remote Monitoring System
845
ARM9 CPU can be opened to the user.GR64 has a wealth of storage resources: 256KB script space can accommodate two scripts that can remote upgrade script under CSD, at least 50KB of data NVM space and 100KB RAM.GR64 provides a wealth of interfaces: two serial ports can adaptive baud rate, USB2.0, SPI, antenna, audio interface, 12 I / O lines (8 Multiplexing), AD / DA, buzzer and so on.
4 Software Design 4.1 Sensor Network Architecture System uses the SimpliciTI [12] network protocol :a small RF networks for lowpower RF protocol.It can simplify the implementation of the microcontroller and minimize occupied resources. With a star network topology, contains a data center( AP) which is mainly responsible for network management. AP provide date process forward and equipment permissions, connect permissions and security about the network. AP also has a function which can expand the terminal equipment, and supporting the expansion of a network topology. Network structure is shown in Figure 5.
Fig. 5. Network structure
4.2 Software Flow Chart ED is mainly responsible for collecting field data ,at the same time, in accordance with the network protocol to transmit data to AP. Its main features can be divided into the main program, data acquisition and network communication subroutine, and developed by C and assembler language. ED joined the network flow as shown in Figure 6. Most of the ED nodes in network are in dormant state, wait for working until receive a activate order. At the same time, ED nodes can also be used as the RE node, when the address in receiving information does not same as the adress of the ED, this
846
Z.-Q. Zhao and Y.-M. Wang
Begin Initialization
Initialize serial port
Initialize timer
Initialize sensor
N Connection successful Y Receive information
Y Information process
N
Y Collection data
Collected and transmitted to AP
N
Serial Port Status
Fig. 6. Flow chart of ED add net
ED will forwarding the information.How to optimizing the route from ED to AP node, which is down the focus of further study. In accordance with the network protocol, AP is mainly responsible for the management of network and data collection, data processing. At the same time, transferring data to the remote monitoring and control platform by the GPRS network. Constituing by the main program, network communications and GPRS communications Subroutine.AP set up the network as shown in Figure 7.
A Cyanobacteria Remote Monitoring System
Start
ED add
Y
847
Distribution PORT
N
BSP initialization
Response to ED
Y receive the required format success, sent to the serial
Receive information by PORT
Initialize serial port, timer
N Open serial port interrupt
Date from serial
Y
Sent to the designated PORT
Fig. 7. Flow chart of AP set up the network
5 Conclusions The cyanobacteria remote wireless monitoring system based on sensor networks is an unmanned automation, intelligent monitoring system for cyanobacteria. Research and application of this system will improve the study on prevent cyanobacteria break out ,it can monitoring implementation of the development of cyanobacteria, promoting the prevention capabilities of cyanobacteria outbreak. And the system's network structure and data transmission, information collection mode, data-processing methods provide a common, advanced, effective model for other data acquisition and monitoring system.
Acknowledgement This research was supported by the Opening Project of JiangSu Province Support Software Engineering R&D Center for Modern Information Technology Application in Enterprise (No.eisecSX200806),and sponsored by Project and Startup Fundation Research for Young Teachers of Suzhou Vocational University (No.SZDQ09L03).
References 1. She, F.N.: Quantitative analysis On Chlorophyll Aconcentration In TaiHu Lake Using Thematic Mapper Date. Journal Of Lake Sciences 8(3), 201–207 (1996) 2. He, Y.L.: Cyanobacteria outbreak of incentives and governance approaches. Heilongjiang Science and Technology Information 114 (January 2008)
848
Z.-Q. Zhao and Y.-M. Wang
3. Kong, F.X.: Large-scale eutrophication of shallow lakes cyanobacterial algal consider the formation mechanism of: Acta Ecologica Sinica 25(3), 589–595 (2005) 4. Heng, Y.S.: Taihu cyanobacteria bloom early warning system of monitoring techniques. Chinese Environmental Monitoring 24(2), 63–65 (2008) 5. Sun, L.: Wireless Sensor Networks. Qing Hua University Press (2005) 6. Yan, L.H.: Remote data transmission system based on GPRS technology: Xi’an University of Science and Technology (2006) 7. CC1110 wireless single-chip microcomputer development system: Chengdu Wireless Communications Technology Co., Ltd (2007) 8. Chipcon Smart RF CC1100, Chipncon (2005) 9. DS18b20. Dallas Semicondoctor(2005) 10. GR64 Manual. Sony Ericsson(2006) 11. Ren, T.: TCP / IP protocol and network programming. Xi’an Electronic Technology University Press (2004) 12. Rudner, R.: Introduction to SimpliciT. Texas Insruments 28(2), 109–119 (2007)
Study on Fault Diagnosis of Rolling Mill Main Transmission System Based on EMD-AR Model and Correlation Dimension Guiping Dai1,2 and Manhua Wu3 1
JiangSu Province Support Software Engineering R&D Center for Modern Information Technology Application in Enterprise 2 Department of Electronic & Information Engineering , Suzhou Vocational University, Suzhou 215104, Jiangsu, China
[email protected] 3 Department of Foreign Languages and International Exchange, Suzhou Vocational University, Suzhou 215104, Jiangsu, China
[email protected] Abstract. In order to improve the fault diagnosis accuracy of rolling mill main transmission system, a fault feature extraction method based on EMD (Empirical Mode Decomposition)-AR model and Correlation Dimension is proposed. In the proposed method, EMD is used to decompose the vibration signal of complex machine into several intrinsic mode functions (IMFs), then the AR models of some IMF components which contain main fault information are constructed respectively. Finally, the correlation dimensions of auto-regressive parameters in AR models are calculated. Analysis of the experimental results shows that this method not only can reflect the state changes of dynamic system profoundly and detailedly, but also can realize the separation of state features, thus it may judge the fault conditions of rolling mill main transmission system effectively. Keywords: Empirical mode decomposition; AR model; Correlation dimension; Rolling mill main transmission system; Fault diagnosis.
1 Introduction The rolling mill main transmission system is rolling mill's important component, which undertakes driven roll's rotation mechanical energy transportation work. In rolling mill’s movement, the main transmission system represents torsional vibration phenomenon frequently, that may cause the main transmission system's spare part to have fatigue failure, even may occur sudden breaks when it is serious [1], but the vibration signal measured through sensor not only contains fault information, but also mixes with heavy background signal relating to the rolling mill running status and noise, which frequency band overlaps mutually, therefore the traditional time domain or frequency range method is very difficult to extract fault information [2]. D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 849–857, 2009. © Springer-Verlag Berlin Heidelberg 2009
850
G.-P. Dai and M.-H. Wu
Correlation dimension is an important parameter to reflect the nonlinear system behavior in state space, the use of fractal dimension may carry on quantification to nonlinear system's state [3], thus the rolling mill main transmission system's fault diagnosis can be realized effectively. However, strict fractal is only one kind of idealized model, the majority of vibration signal mixed with noise only has the fractal feature in some kind of criterion scope, actually random noise does not have strict fractal feature[4], therefore it is unreliable to use the primitive vibration signal to calculate correlation dimension directly in order to further distinguish transmission system's states; Moreover, when there is a great deal of primitive signal data, to calculate correlation dimension directly will be much work. Therefore before the calculation of correlation dimension, it is necessary to reduce the primitive vibration signal’s dimensionality, then according to the processed signal or parameter, calculate the correlation dimension. In view of the above questions, this paper firstly utilized the wavelet threshold denoising algorithm and empirical mode decomposes (EMD) to denoise, linearize and tranquilize the primitive vibration signal, in order to obtain some IMF components; Then, AR models of IMF components which contains main information were set up to carry on dimensionality reduction, then each AR model’s autoregressive parameter can be calculated; Finally, the phase space of each AR model’s autoregressive parameter was reconstructed through the time-delay phase diagram method, then correlation dimension can be calculated, thus the accurate judgment of the rolling mill main transmission system's faults was made.
2 The AR Model Based on EMD AR model is a time series analysis method, which parameters contain essential information of system states and the accurate AR model can express objective characteristics of dynamic systems profoundly and concentratedly, while plenty of researches indicate that AR model’s autoregressive parameters can reflect the state changes most sensitively. However, AR model can only apply to stationary signal analysis, while vibration signal of the rolling mill main transmission represents non-stationary features, the autoregressive parameters by setting up AR model directly from it will not reflect the original signal characteristics. Therefore, the original vibration signal needs a stable pre-treatment before setting up AR model on it, which can be completed by EMD method, and the vibration signal is also mixed with random noise, which not only affects the quality of the EMD decomposition, but also makes the correlation dimension of the loss of physical meaning. Accordingly, before the EMD decomposition, this paper is talking about taking orthogonal wavelet transform denoising method to remove noise [5] and [6]. EMD is a adaptive signal decomposition method which is applicable to non-linear, non-stationary process, and its purpose is to decompose the signal into a set of intrinsic mode functions (IMFs), for each IMF component indicates an inherent characteristic vibrational form of signals, which must meet two conditions: the difference between the sum of local maximum minimum points and the number of zerocrossing points does not exceed 1; at any point, the definition of mean envelope
②
、
①
Study on Fault Diagnosis of Rolling Mill Main Transmission System
851
given by the local maxima and local minimum must be zero [7]. The essence of this method is to obtain intrinsic vibration modes through characteristic time scales, and then by it to decompose the data sequence. Ideas are as follows: (1) Initialization: r0 (t ) = x(t ) , i=1 Initialization: h0 (t ) = ri −1 (t ) , j=1; (2) Obtains ith IMF
①
② Find out h
(t ) partial ex-
③ The maximum and minimum points of were interpolated to form upper and lower envelope; ④ Calculate the average of upper and lower envelope m (t ) ; ⑤ h (t ) = h (t ) − m (t ) ; ⑥ If SD = ∑ ⎡ h (t ) − h (t ) h (t )⎤ ≤ 0.3 , then ⎣⎢ ⎦⎥ I (t ) = h (t ) , otherwise j=j+1, switch to ②
treme point;
h j −1 (t )
T
j −1
imfi
j −1
j
j −1
j −1
t =0
2
j
j −1
2 j
j
(3) ri (t ) = ri −1 (t ) − I imfi (t )
(4) If extreme points of ri (t ) were not less than 2, then i=i+1, switch to (2); Other-
wise decomposition was completed, ri (t ) was the remaining component. Where, x(t ) is the decomposed signal; m j (t ) is the average of upper and lower envelope of h j (t ) ; h j (t ) is the difference between h j −1 (t ) and m j −1 (t ) ; SD is the criterion of ending sifting process. n
The algorithm may result in x (t ) = ∑ I imfi (t ) + rn (t ) finally after the inferential i =1
reasoning: n
x(t ) = ∑ I imfi (t ) + rn (t ). i =1
(1)
At this point, the signal x (t ) is decomposed into a summation of n IMF components and a residual component. For each IMF component I imft (t ) , set up the following auto-regressive model AR (m) m
Iimft (t ) +
∑φ
ik I imft (t
− k ) = ei (t )
k =1
(2)
、
Where, φ ik m are the model parameter and model order of autoregressive parameter model AR (m) of I imft (t ) , ei (t ) is the model residual, which is white noise sequence of mean zero, variance σ i2 . FPE criteria is used to determine the model order of m and least squares algorithm is used to estimate the autoregressive parameter φ ik .
3 The Basic Principle of Correlation Dimension The correlation dimension can be calculated directly by observing one-dimensional time series or dealing with data sequence using phase space reconstruction approach.
852
G.-P. Dai and M.-H. Wu
Mechanical system usually represents the highly complex characteristics and may has lots of free degrees mixed with noise, while in the normal circumstances, test results show a single time series, if let such a time series to reflect the characteristics of mechanical system, it will inevitably lose much information, so a single time-series should be re-constituted to higher-dimensional phase space. In order to obtain phasespace geometry of dynamical systems from one-dimensional time-series, in 1983, Grassberger and Procaccia put forward an algorithm of calculating correlation dimension directly from a given time-series data according to the embedding theory and the phase space reconstruction idea, that is GP algorithm [9]. Supposing that {x k , k = 1,2, L , N } is time series obtained by observing a system, where x k is the autoregressive parameter φ ik and N is the model order m of AR model. The time series is reconstructed into phase space and embedded in m-dimensional Euclidean space Rm in order to get a point (or vector) set J (m) which elements were recorded as follows: X n (m, τ ) = ( x n , x n +τ , L , x n + ( m −1)τ ) , n = 1,2, L , N m .
(3)
In the upper formula, τ = kΔt is called time-delay and Δt is time interval of two adjacent sampling points, k is an integer. N m = N − (m − 1)τ .
(4)
Selecting a reference point X i from these N m points to calculate the distance from the remaining points Nm -1 to X i d ij = d ( X i − X j ) = X i − X j .
(5)
For all X i ( i = 1,2, L , N m ), repeat the process in order to obtain correlation integral function C m (r ) =
2 N m ( N m − 1)
Nm Nm
∑ ∑ H (r − d
ij
)
.
(6)
i =1 j =1
Where r > 0 ; H is Heaviside function which expression is: ⎧1, x > 0 . H ( x) = ⎨ ⎩0, x ≤ 0
(7)
Supposing D 2 (m, r1 ) = d ln C m (r ) , that D2 (m, r2 ) is the slope of curve ln C m (r ) ~ ln r . d ln r
When r → 0 , get D2 ( m) ≈ lim D2 ( m, r1 ) . Thus the value D2 ≈ lim D2 (m) of r→ 0
m →∞
D2 (m) which is not changed with the elevation of phase space dimension m is just the correlation dimension of time series.
Study on Fault Diagnosis of Rolling Mill Main Transmission System
853
4 The Algorithm Description Based on EMD-AR Model and Correlation Dimension The EMD-AR model and correlation dimension were introduced into the fault vibration signal analysis of rolling mill main transmission, where EMD-AR model was used to linearize,tranquilize vibration signal and to reduce its dimension, and then correlation dimension was used to represent the fault characteristics of vibration signal accurately and quickly, the specific algorithm is as follows: (1) Utilizing the denoising algorithm based on orthogonal wavelet transform to remove random noise from the vibration signal of rolling mill main transmission, and then combining with EMD decomposition algorithm, the signal is decomposed into a series of IMF components called as I imft (t ) . (2) Reducing the dimension of IMF components which contain main information by setting up separately AR model, then obtaining the autoregressive parameter φ ik of each AR model, where k = 1,2, L m , m is the AR model order. (3) Reconstructing the phase space of each autoregressive parameter φ ik through time-delay phase diagram method, and then calculating its correlation dimension. Standardized treatment and removing the trend: Processing φ ik with zero-mean and normalization, then utilizing the least square method to remove the trend for the zero-drift signal; Phase space reconstruction: Using mutual information method to determine the optimum time delay τ , Cao's method to determine the optimum embedding dimension m, and in accordance with time delay method to construct point set J (m); Calculation of correlation integral function C m (r ) : C m (r ) displays spatial correlation of points, and its essence is the probability of distance less than r between two arbitrary points, which is the most critical and most time-consuming part in the calculation of correlation dimension. In this paper, referring to the K-Nearest Neighbor Search theory in computer search technology and utilizing the distance for-
① ② ③
mula d ij = d 1 ( X i − X j ) =
m −1
∑x
i + lτ
− x j + lτ in the rhombus field which topology is
l =0
equivalent to the distance in spherical field , the speeding of the calculation is made; Calculation of correlation dimension: Mapping ln C m (r ) − ln r curve which is called the correlation integral curve, then select the section of good linearity for linear fitting to calculate the slope of fitting a straight line, that is a correlation dimension.
④
5 Experimental Results This experiment makes vibration tests on the main transmission system of rolling mill, measures radial displacement vibration signal in normal, misalignment, collision and oil film whirling fault state. In the experiment, speed of the transmission shaft is 3000r/min, sampling frequency is 4096Hz and number of sampling points is 1024. The denoising algorithm based on orthogonal wavelet threshold is used to remove random noise.
854
G.-P. Dai and M.-H. Wu
Fig. 1. EMD decomposition results of radial vibration signal in the state of misalignment fault. IMF components named c1, c2, c3, c4 each has relatively high frequency, while
IMFcomponents named c5, c6 with lower frequency. A series of IMF components are obtained by the EMD decomposition separately of the above fault vibration signals, and EMD decomposition results in the state of misalignment fault are shown in figure 1. From the Fig, we can see that each IMF component contains different characteristic time scales, and components named c1, c2, c3, c4 each has relatively high frequency, including main fault information, while components named c5, c6 with lower frequency, is the background signal and noise in relation to its speed. Thus, set up AR model on the former four IMF components c1, c2, c3, c4 respectively, then reconstruct the phase space of AR model’s autoregressive parameters through time-delay phase diagram method and calculate its correlation dimension, thereby the fault characteristics can be extracted. Figure2(a) displays correlation dimensions of AR mode’s autoregressive parameters of IMF component c1 of vibration signals in different fault status, because different embedding dimension m can obtain different correlation dimension, so the figure gives correlation dimension corresponding to different embedding dimension, and it shows clearly that correlation dimension in the oil film whirling fault state is obviously smaller than the other three fault states; Figure2(b) displays correlation dimensions of AR model’s autoregressive parameters of IMF component c2, and it indicates clearly the distinguish between the misalignment state and the other three states; Figure 2(c) displays correlation dimensions of AR model’s autoregressive parameters of IMF component c3, from the graph not only the misalignment fault state can be distinguished, but also the difference between the normal state and the other three states ; Figure 2(d) displays correlation dimensions of AR model’s autoregressive parameters of IMF component c4, from the graph, the collision fault is distinguished; Figure3(a) displays correlation dimensions corresponding to different embedding dimensions of
855
Correlation dimension
Correlation dimension
Study on Fault Diagnosis of Rolling Mill Main Transmission System
Embedding dimension
(a) IMF component c1
(b) IMF component c2
Correlation dimension
Correlation dimension
Embedding dimension
Embedding dimension
Embedding dimension
(c) IMF component c3
(d) IMF component c4
Fig. 2. Correlation dimensions of AR mode’s autoregressive parameters of IMF components c1-c4 in different fault states. *: Normal state. □: Oil film whirling state. ○: Collision state. +: Misalignment state.
the original vibration signal, although the state of oil film whirling can be distinguished when embedding dimension is 10, but the other three states can not be distinguished; Figure3(b) displays correlation dimensions of AR model’s autoregressive parameters of the original signal which is directly used to set up AR model, the figure can not state any clear distinction between the four. From the above analysis we can see that the direct use of original vibration signal to calculate correlation dimension is difficult to extract the fault information comprehensively and accurately, because rotating machinery fault signals in general are more complex, which includes non-linear, non-stationary vibration characteristics, such as fundamental frequency, doubling of frequency, ultra-low frequency, while the AR model is only suitable for stationary signal analysis. Therefore, autoregressive parameters by setting up AR model directly on it can not adequately reflect information about the state of mechanical systems, however, EMD decomposition obtains intrinsic vibration modes through characteristic time scales, and then by intrinsic vibration
G.-P. Dai and M.-H. Wu
Correlation dimension
Correlation dimension
856
Embedding dimension (a)
Embedding dimension (b)
Fig. 3. Left (a): Correlation dimensions of the original vibration signal. *: Normal state. □: Oil film whirling state. ○: Collision state. +: Misalignment state. Right (b): Correlation dimensions of AR model’s autoregressive parameters of the original signal. *: Normal state. □: Oil film whirling state. ○: Collision state. +: Misalignment state.
modes can decompose data sequence, which achieved the separation of system state features. So correlation dimensions of AR model’s autoregressive parameters of different IMF components can distinguish between different fault types, which can provide a reliable basis for the fault diagnosis of rolling mill main transmission system.
6 Conclusion Because of the influence of nonlinear stiffness, friction, unbalance and external load, the fault vibration signal of rolling mill main transmission system performs non-linear, non-stationary characteristics, it is difficult to extract fault information only by traditional time-frequency range analysis method. For this problem, the paper combined with EMD decomposition, AR model, fractal geometry theory and applied them to the fault diagnosis of rolling mill main transmission system. Theoretical and experimental study results indicated that this algorithm can not only profoundly and comprehensively express the objective law of dynamic system state change, but also realized the separation of system state characteristics. Therefore, correlation dimensions of AR model’s autoregressive parameters of different IMF components can distinguish between different fault types and thus it improved the fault diagnosis accuracy of rolling mill main transmission system. The algorithm is feasible to combine pattern recognition techniques to develop more effective fault diagnosis system, with correlation dimensions of AR model’s autoregressive parameters of each IMF component.
Acknowledgments This paper was supported by the Opening Project of JiangSu Province Support Software Engineering R&D Center for Modern Information Technology Application in
Study on Fault Diagnosis of Rolling Mill Main Transmission System
857
Enterprise (No.eisecSX200806), and sponsored by Qing Lan Project and The Science & technology Foundation of Fujian province (No.2008F5046), and the Science Technology Foundation of Suzhou vocational university (No.SZD09L28).
References 1. Yu, W.K., Liu, B.: Processing’s method of non-stationary transient impact torsional vibration signal for Rolling mill. Chinese Journal of Scientific Instrument 26(8), 504–505 (2005) 2. Zong, M., Dai, G.P.: Characteristics study on time-frequency analysis method based on empirical mode decomposition. Chinese Journal of Sensors and Actuators 19(4), 1029–1032 (2006) 3. Zhao, H.: Application research of correlation dimension in the machinery equipment fault diagnosis. China Safety Science Journal 16(3), 129–134 (2006) 4. Liu, T.X., Hua, H.X.: Application research of state monitoring method based on fractal geometry. Chinese Journal of Mechanical Engineering 37(5), 100–104 (2001) 5. Dai, G.P., Liu, B.: Instantaneous parameters extraction based on wavelet denoising and EMD. Acta Metrologica Sinica 28(2), 158–161 (2007) 6. Liu, B., Dai, G.P.: Adaptive wavelet thresholding denoising algorithm based on white noise detection and 3σ rule. Chinese Journal of Sensors and Actuators 18(3), 473–477 (2005) 7. Huang, N.E., et al.: The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. In: Proc.R Soc.Lond.A (1998)
Analysis of Mixed Inflammable Gases Based on Single Sensor and RBF Neural Network Yu Zhang1,2, Meixing Qi3, and Caidong Gu1,4 1
JiangSu Province Support Software Engineering R&D Center for Modern InformationTechnology Application in Enterprise, Suzhou, China, 215104 2 Depart of Electronic information engineering, Suzhou Vocational University,Suzhou International Educational Park, Road 106, Zhineng Dadao, Suzhou 215104, Jiangsu, China
[email protected] 3 Department of Mechanical Engineering, Suzhou Industrial Park Institute of Vocational Technology, 68 Suqian Road, Suzhou Industrial Park, Suzhou 215021, Jiangsu, China
[email protected] 4 Depart of computer engineering, Suzhou Vocational University, Suzhou International Educational Park, Road 106, Zhineng Dadao, Suzhou 215104, Jiangsu, China
[email protected] Abstract. The sensitivity of a catalytic sensor changes according to different types of gases or different temperatures. To fully exploit this property, a sensor can be controlled to work under different temperatures to produce different output signals for a given mixture of inflammable gases. An Radial Basis Function(RBF) neural network can be used to analyze the mixture of gases by using a dynamic learning algorithm. The simulation experiment, with a sample mixture of firedamp, carbon monoxide and hydrogen, shows that the proposed method is indeed efficient in analyzing mixtures of inflammable gases. Keywords: RBF neural network, Gas analysis, Catalytic sensor, Dynamic training algorithm.
1 Introduction Gas analysis is very important in many areas such as industrial manufacturing, environmental protection, security monitoring and scientific research. Currently, multiple gas sensors are usually used to analyse mixed inflammable gases. However, the phenomenon of cross sensitivity in many gas sensors often undermines the result of the analysis. In recent years, sensor arrays and neural network units have also been used in gas analysis systems [1]. This method relies on the choice of special gases of the sensors. Since the sensors can be disturbed considerably by the environmental gases, this method is usually unable to provide reliable detection results. Thermo catalytic sensors are widely used in the detection of mine gases and other inflammable gases due to its unique selection of inflammable gases [2]. To fully exploit this nature, a single thermocatalytic sensor controlled to work under different temperatures for gases has different sensitivity, it make the analysis of mixed inflammable gases possible. D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 858–864, 2009. © Springer-Verlag Berlin Heidelberg 2009
Analysis of Mixed Inflammable Gases Based on Single Sensor and RBF Neural Network
859
2 A Multiple Thermostatic Detection System of Catalytic Sensor Since any change in the temperature of the catalytic sensor can damage the accuracy and sensitivity of the detection, the method of thermostatic detection is used in this paper. As shown in Fig. 1, the resistors R1 , R2 , R3 and the catalytic r-sensor form a bridge, which is the closed loop control circuit along with the regulator A. The circuit works as follows: at the starting point, the current in the r-sensor is set to be its rated current I 0 , the bridge is balanced ( R1 R3 = R2 r )and the output signal U O = I 0 r . When the inflammable gases are being detected, the gases undergo catalytic oxidized reaction at the surface of the catalytic circuit component. As a result, both the temperature at the component r and its resistance increases. Because of the control of the regulator A, the current at the catalytic sensor component is decreased to I C , the temperature drops to the original level and the bridge returns to the balance point. At this point, the change in the output voltage is in reflection of the concentration of the gas being detected. With the addition of a programmable controlled potentiometer B, the operating current at the catalytic circuit component can be regulated. So, its operating temperature can be changed, and the multi-detection under constant temperature system is established.
Fig. 1. Detection circuit with thermostatic sensor
3 Gas Analysis with a Single Catalytic Sensor Based on RBF Network Assume that there is in the mixed mixture a combination of n different gases, whose concentration are c 1 , c 2 ... c n respectively. Using multiple detection under the constant temperature system mentioned above, the operating temperature of the single catalytic sensor is changed n times through the regulation of operating current. The output voltage u i under operating temperature T i is written as follows :
u i = f i ( c1 , c 2 ...c n ) i = 1, 2 ... n .
,
(1)
The equation (1) could be represented in the vector form of U = F ( C ) in which U = [ u 1 , u 2 ... u n ] T , C = [ c1 , c 2 ... c n ] . In Eq.(1), the output voltages u i is the input of the neural network. The output of this network would then be the desired concentration of the gases c i ' . The mappings between the inputs and the outputs of the neural network are:
860
Y. Zhang, M. Qi, and C. Gu
c i ' = g i (u 1 , u 2 ...u n ) i = 1, 2 ... n .
(2)
Also the Eqn.(2) could be represented in the vector form of C ' = G (U ) , in which C ' = [ c 1 ' , c 2 '... c n ' ] T , G = [ g 1 , g 2 ... g n ] T .
… Operating temperature Tn
un
Denormalization
u2
RBF network processing
Operating temperature T2
u1 Normalization
Catalytic sensor
Inflammable gas mixture
Operating temperature T1
c 1’ c2’ … cn’
Fig. 2. Gas analysis model of RBF neural network
Obviously, utilizing the outstanding nonlinear mapping capability of the neural network, when the mappings satisfy G = F − 1 , the outputs C ' = C , it provides the accurate concentration of the different gases in the mixture. Since the RBF neural network can converge to any nonlinear function under any accuracy and is free of the problem of local minimums, the RBF neural network is used here[3]. The neural network is constructed according to a reasonable structure and is trained with a large amount of samples. The connection weights between the hidden and output layer are specified under given distortion constraints, which reflect the nonlinear mapping G of the neural network. Therefore, the concentration of the gases in the mixture can be accurately detected with a single catalytic sensor.
4 The Structure and the Learning Algorithm of RBF Network The RBF neural network [4-7] is composed of three layers of neurons: the input layer, the hidden layer and the output layer. The number of neurons in the output and input are specified according to the number of the types of gases in the gas mixture, while the one in the hidden layer is computed dynamically using the online learning algorithm. The parameters of the RBF network are composed of three parts: the center α , the width σ and the weight w . The center and the width represent the spatial mode and the relative positions of the centers of the samples. And the RBF neural network is trained by using online training algorithm here. 4.1 Allocating Hidden Layer Units Online Assume that the input dimension of the RBF network is N , the number of hidden units is P , and the dimension of the output is M . For the given sample pair [ U , C ], assuming that the output of the neural network is C ' . Let E = C − C ' . If the following condition is satisfied, a new hidden unit is allocated:
Analysis of Mixed Inflammable Gases Based on Single Sensor and RBF Neural Network
861
⎧⎪ E > ε . ⎨ δ U − C _ near > ( t ) ⎪⎩
(3)
where U = [U 1 , U 2 .....U N ]T ( U l = { u 1 , u 2 ..... u n }( l = 1, 2 .... N ) ) , and it represents the input sample space. • represents the norm, which is usually set as 2; C _ near represents the center closest to X among all the centers in the current network; ε and δ (t ) are the error threshold and the distance threshold, respectively. 4.2 The Learning Algorithm The method stated above in subsection 3.1 is used to allocate the hidden layer units of the RBF neural network. The three parameters of the neural network, namely, the center α , the width σ and the weight w , are adjusted by using the gradient descent. The algorithm is described as follows: The quality measure function is defined as:
J =
1 M ∑ (c j − c j ') 2 . 2 j =1
(4)
where M is the dimension of the output of the network; c j is the expectation of the output corresponding to the input sample; c j ' is the actual output of the network and N
c j '=
P
∑h w i =1
i
ij
=
P
∑ exp{ −
∑ (u k =1
k
− ci
σi
i =1
2
(k )
)2 (k ) is the }wij , 1 ≤ j ≤ M ,1 ≤ i ≤ P ; c i
output of the i -th center in response to the k –th input u k . Based on (4), the optimal searching direction for the weight w is , the center α i ( r ) and the width σ i in the gradient descent algorithm are respectively:
Sw is = − Sα i
(r )
=−
Sσ i = −
∂J ∂α i
(r)
=
∂J = e s hi . ∂wis 2 hi ( u r − α i
σi
2 U −αi ∂J = ∂σ i σ i3
(r )
)
2
2
(5) M
∑e j =1
j
wij .
(6)
M
hi ∑ e j w ij .
(7)
j =1
Therefore, the parameter adjustment algorithm for the RBF neural network is written as follows:
wis (n + 1) = wis (n) + λSwis .
(8)
862
Y. Zhang, M. Qi, and C. Gu
α i ( r ) ( n + 1) = α i ( r ) ( n ) + λ Sα i ( r ) .
(9)
σ i ( n + 1) = σ i ( n ) + λ S σ i .
(10)
where Eqns.(8) to (10) constrain to 1 ≤ r ≤ N ,1 ≤ i ≤ P ,1 ≤ s ≤ M . λ is the rate of learning. In order to have a better convergence, λ is adjusted online using the following iterative equation:
⎧ a λ ( k − 1), J ( k ) ≤ J ( k − 1) . ⎩b λ ( k − 1), J ( k ) > J ( k − 1)
λ (k ) = ⎨
(11)
where a ≥ 1.0 , 0 < b < 1.0
5 Experimental Results Different samples of mixed gas with different concentrations of CH4 , CO , H 2 are formulated. In order to having considerable difference in analysis results, high concentrations are selected. For the mixed gas samples, the concentrations of CH 4 , CO , H 2 are limited in the range of 0∼1.0% 0∼0.1% and 0∼0.15% respectively. Altogether 115 groups of experiment samples are formulated, 110 groups are used for the training of the RBF neural network, while the other 5 groups are used for the testing of the neural network. The SH-3 catalytic sensor is used[8], which has two parameters with the rated voltage at 2.8V and the rated current at 80mA. For each group of mixed gas sample, the operating current are respectively 40mA 65mA and 85mA, while the corresponding operating temperature are 150 oC 350 oC and 560 oC. Totally 115 groups of output signals are detected. Some of the sample data are listed in Table 1.
、
、
、
Table 1. Some of the sample data in the experiment Components in the gas mixture/% H
2
0.040 0.021 0.010 0.039 0.082 0.100
Detected outputs under three operating temperatures/mv Δ U o1 ΔU o3 ΔU o2
CO
CH
0.005 0.101 0.050 0.020 0.000 … 0.050
0.00 0.00 0.05 0.10 0.40
1.220 0.605 0.290 1.150 2.314
1.01
2.854
4
1.282 1.807 0.824 1.362 2.335 … 3.371
1.313 1.719 2.790 4.911 17.63 38.72
The RBF neural network has 3 input nodes and 3 output nodes. The maximum number of neurons in the hidden layer of the neural network is ranged in 10 45. After normalization, the sample data mentioned above is used to train the neural network. The tolerated error is assumed to be 0.001. After multiple simulations use the
~
Analysis of Mixed Inflammable Gases Based on Single Sensor and RBF Neural Network
863
Fig. 3. Curve of neural network training error.The horizontal axis denotes the number of training steps,while the vertical axis denotes the training error.The dotted line denotes the normal RBF network algorithm ,while the continuous line denotes the training algorithm in this paper.
software of MATLAB, 32 hidden layer neurons are selected. The error curve of the network training is shown in Fig.3. It can be seen that the training algorithm for the RBF network proposed in this paper has a fast convergence rate and a high accuracy. Finally, 5 groups of mixed inflammable gases are used to test the neural network. The training error values are shown in Table 2. It is clearly to see that most analysis error is less than 8%. According to the component fractions in the mixed gases to be detected, the single catalytic sensor can be used under different operating temperatures, the output signal of which can then be processed by the RBF neural network, which has been trained earlier. Experimental results show that the neural network can accurately calculate the actual volume fractions of the different gases in the inflammable gas mixture. And this provides a new method for the analysis of mixed inflammable gases. Table 2. Result of mixed gas analysis Components in the gas mixture /% CH
0.20 0.35 0.50 0.61 0.86
4
CO
H2
0.015 0.036 0.020 0.052 0.085
0.022 0.028 0.083 0.055 0.100
Detection results in the experiment /% CO H2 CH 4 0.18 0.019 0.025 0.40 0.034 0.031 0.51 0.026 0.084 0.58 0.052 0.057 0.89 0.083 0.098
Acknowledgments This research was supported by the Opening Project of JiangSu Province Support Software Engineering R&D Center for Modern Information Technology Application in Enterprise (No.SX200901).
864
Y. Zhang, M. Qi, and C. Gu
References 1. Tong, M.M., Zhang, Y., Dai, X.L.: Analysis of Mixed Inflammable Gases with Catalytic. Journal of China University Mining & Technology 35(1), 35–37 (2006) 2. Pang, O., Yang, C.R., Zhang, Y.Q.: Research on Gas Analysis of Artificial Neural Network. Chinese Journal of Scientific Instrument 20(2), 121–124 (1999) 3. Qu, J.L., Wang, L., Gao, F.: Quantitative Analysis of Gas Mixture Using an Artificial Neural Network. Journal of Northwestern Polytechnical University 21(4), 401–403 (2007) 4. Li, Y.H., Qiang, S., Zhuang, X.Y.: Robust and Adaptive Backstepping Control for Nonlinear Systems using RBF Neural Networks. IEEE Trans. on Neural Networks 15(3), 693–701 (2004) 5. Yanf, F., Paindavoine, M.: Implementation of an RBF Neural Network on Embedded Systems. IEEE Trans. of Neural Networks 14(5), 1162–1175 (2003) 6. Peng, J.X., Li, K., Huang, D.S.: A Hybrid Forward Algorithm for RBF Neural Network Construction. IEEE Trans. on Neural Networks 17(6), 1439–1451 (2006) 7. Huyberechts, G., Szecowka, P., Roggen, J.: Quantification of Carbon Monoxide and Methane in Humid Air Using a Sensor Array and an Artificial Neural Network. Sensors and Actuators 45(5), 123–130 (1997) 8. Tong, M.M.: Dynamic Analysis on Thermostatic Methane Detection with Catalytic Sensor. Journal of China University Mining & Technology 29(3), 275–278 (2003)
Image Segmentation of Level Set Based on Maximization of Between-Class Variance and Distance Constraint Function Changxiong Zhou1,2, Zhifeng Hu1,2, Shufen Liu1,2, Ming Cui1,2, and Rongqing Xu3 1
JiangSu Province Support Software Engineering R&D Center for Modern Information Technology Application in Enterprise, Suzhou, China 2 Department of Electronic Information Engineering, Suzhou Vocational University, Suzhou, 215104, China
[email protected] 3 College of Optoelectronic Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China
Abstract. In most existing level set models for image segmentation, it is necessary to constantly re-initialize the level set function, or to acquire the gradient flow information of the image to restrict the evolution of the curve. A novel image segmentation model of level set is proposed in the paper, which is based on the maximization of the between-class variance and the distance-based constraint function. In this model, the distance-based constraint function is introduced as the internal energy to ensure that the level set function is always the signed distance function (SDF), so that the constant re-initialization of the level set function during the evolution process is avoided. Meanwhile, the external energy function (between-class variance function) is constructed based on the weighted sum of square of the difference between the average grey levels of the target region and the overall region, the background and the overall region respectively. This function is maximized to ensure that the curve represented by zero level set converges towards the target boundary stably. Experimental results show that the constant re-initialization in traditional models has been eliminated in the proposed model. Furthermore, since region information has been incorporated into the energy function, the model renders good performance in the segmentation of both weak edges images and those with Gaussian noise or impulse noise. Keywords: Image segmentation, Between-class variance, Level set.
1 Introduction Image segmentation is the process of separating objects of interest from background. It is an essential preliminary step in image processing. Over the past decades a great deal of image segmentation technique has emerged, including thresholding, active contour model and so on. One of the most commonly used methods for segmenting images is thresholding, such as mean shift, normalized cut methods [1], as well as the D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 865–874, 2009. © Springer-Verlag Berlin Heidelberg 2009
866
C. Zhou et al.
threshold method proposed by Otsu et al. [2]. Active contour models based on the level set function are also effective methods of image segmentation. They can adapt to topological changes, break and merge automatically [3, 4, 5]. As a result, they have been extensively applied to image processing. In active contour methods, the contour energy functional is defined and then minimized so that the contour curve could move from the initial contour curve to the boundary of the desired object, following the descent flow in the energy functional [6,7,8]. These methods can often produce more robust results since relevant information such as region-based information [9], a prior information about the image [10, 11] and the distance functional constraint [12] could be incorporated into the energy functional. Chan-Vese proposed a simplified method of MS model (C-V method) [8]. In their model, region-based information is incorporated into the energy functional as external energy, so that global segmentation is enabled, the requirements on the initial location of the curves represented by the zero level set are greatly reduced, and the robustness against noise is greatly enhanced. However, in their model, the internal energy functional only ensures that the curve represented by zero level set is smooth. As a result, it is necessary to re-initialize the level set functional to keep it close to the signed distance function in practical applications, which significantly increases the load of calculation. Although the re-initialization methods have been improved by many researchers, the results are still unsatisfactory. A level set image segmentation approach is proposed by Li et al. that does not involve re-initialization [10]. However, since the edge gradient information of the image is taken into consideration instead of the global information of the image, it is hard to obtain satisfactory results when the method is applied to the segmentation of images with weak edges. In this paper, the difference between the object and its background region is described by the external energy functional, which is constructed as a weighted sum of two terms. The first term is the square of the difference between the average gray values of the object and the whole image region. The second term is the square of the difference between the average gray values of the background region and the whole image region. An image segmentation method based on the maximization of the between-class differences under the distance function constraint is proposed in this paper, which effectively eliminates the need for re-initialization in traditional models. Also, since regional information is incorporated into the energy function, good segmentation results can be obtained when this method is applied to the segmentation of images with weak edges and those with noise.
2 Level Set with Li et al. Model Given a closed contour curve C on a given plane, if φ ( x, y ) is the shortest distance from the point ( x, y ) to curve C , φ ( x, y ) is called the signed distance function, where φ ( x, y ) < 0 if point ( x, y ) is inside the curve C , φ ( x, y ) > 0 if point ( x, y ) is outside the curve C , and φ ( x, y ) = 0 if point ( x, y ) is on the curve C . A level set is defined as a set of points with a fixed value of SDF. As a special case, the zero level set stands for the set of points that satisfy φ ( x, y ) = 0 , which describes a closed curve C on a plane. The level set function represents implicitly the contour curves of the active contour
Image Segmentation of Level Set Based on Maximization
867
model, while effectively eliminating the need of tracking the curve. The level set methods convert the problem of the movement of the contour curve into a problem involving the solution of a numerical partial differential equation represented as {( x, y ), φ ( x, y , t ) = 0} . In order to ensure the smoothness of the contour curve and the stable evolution of the curve toward the border of the object, it is necessary to constantly re-initialize the level set function during iteration to keep it close to the signed distance function. As a result, the load of computation is increased. The standard re-initialization method is to solve the following re-initialization equation: ∂φ = sgn( φ )(1 − ∇ φ ) . ∂t
(1)
Where sgn( ⋅) is the signed function, ∇ donates the gradient operator. In the actual iterative process, the level set will often deviate from the signed distance function, in which case the re-initialization of the level set function would become impossible to implement. Moreover, this phenomenon cannot be eliminated even if one round of reinitialization is carried out for every iteration. To completely eliminate this phenomenon, it is necessary to set the iteration time interval parameter Δ t small enough. However, this would seriously affect the speed of evolution of the level set, thus unable to meet the requirements of practical applications. In the Image segmentation method based on the active contour model, the contour energy functional is defined and then minimized so that the contour curve could move from the initial contour curve to the boundary of the desired object, following the descent flow in the energy functional. The energy functional is defined by Li et al. as: E (φ ) = μ E int (φ ) + E extr (φ ) .
(2)
Where E int (φ ) denotes the internal energy term, E extr (φ ) denotes the external energy term, and μ > 0 is a parameter. Considering that the level set function needs to meet the requirement of the signed distance function, i.e., φ ≡ 1 , the internal energy function E int (φ ) can be defined as E int (φ ) =
∫∫
Ω
1 ( ∇ φ − 1)2 . 2
(3)
Where Ω denotes image region. The expression (3) is formulated as a metric to characterize how close the function φ is to the signed distance function. By minimizing the expression in (3), φ can be made to converge to 1, so that the level set function is made close to the signed distance function. The greatest strength of this model lies in the fact that once an appropriate level set function is specified during initialization and made to converge to the signed distance function under the constraint of the internal energy (the distance constraint function), there is no need to re-initialize the level set. In Li et al. method, the external energy function is defined as E extr (φ ) = λ E extr 1 (φ ) + υ E extr 2 (φ ) .
(4)
868
C. Zhou et al. E extr 1 (φ ) =
∫∫
E extr 2 (φ ) =
g δ (φ ) ∇ φ dxdy .
Ω
∫∫
Ω
(5)
H ( − φ ) dxdy .
(6)
Where g stands for the edge indicator function of an image I defined as g =
1 1 + ∇ G ∗ I ( x, y )
2
, G is the Gaussian operator, ∗ denotes the convolution integral,
δ is the Dirac function, and H is the Heaviside step function. In Li et al. model, the edge gradient information of the image is taken into account instead of the global information of the image. As a result, good results can be obtained as long as the method is applied to the segmentation of images with distinct boundaries of the desired object.
3 Improving Energy Function and Represented by Level Set Energy function plays a key role in the level set contour model. When there are only piecewise smooth areas of the object and its background in the image, let A denote the total area of the image I ( x , y ) on the region Ω , which is divided by the curve C into two homogeneous regions, namely, Ω 1 of the object (inside C ) and Ω 2 of the background (outside C ) . Let A1 , A2 stand for the areas of the two regions respectively and let c1 , c 2 stand for the average gray values of the two regions respectively. Let C stand for the average gray value of the whole image. The energy functional can be defined as follows: E (φ ) = λ1 E (φ ) + λ 2 E (φ ) . E 1 (φ ) = E 2 (φ ) = −
Where c = 1 A1 =
∫∫
Ω
∫∫
Ω
∫∫
Ω
1 ( ∇ φ − 1) 2 dxdy . 2
(8)
A1 A ( c1 − c ) 2 − 2 ( c 2 − c ) 2 . A A
I ( x , y )(1 − H (φ )) dxdy , c2 = A1
(1 − H (φ )) dxdy , A2 =
(7)
∫∫
Ω
∫∫
Ω
I ( x , y ) H (φ ) dxdy , A= A2
(9)
∫∫
Ω
dxdy ,
H (φ ) dxdy . The parameters λ1 and λ 2 in Eqn. (7)
are both greater than 0. The internal energy function in Eqn. (8) is the distance constraint function, which ensures that the level set function converges to the signed distance function, so that the need to re-initialize the level set function is eliminated, the load of computation is reduced and the speed of operation is increased. Moreover, since the level set function converges to the distance function, the curve represented by the zero level set becomes smooth. The external energy in Eqn. (9) is described by the variance between the object region and its background region, as represented by a weighted sum of the square of the difference between the average gray values of the
Image Segmentation of Level Set Based on Maximization
869
object and the whole image region and the square of the difference between the average gray values of the background region and the whole image region. As the curve evolves from the initial position to the actual object boundary, the between-class variance grows to its maximum. At this point, the curve C successfully completes the task of image segmentation, dividing the image into the object region that lies inside it and the background region that lies outside it. As the region information is incorporated into the external energy function in the model, image segmentation for images with weak boundaries is enabled and the robustness against noise is enhanced. In this paper, we denote by ∂E the Gateaux derivative (or first variation) of the ∂φ
functional E . From (8) and (9), we have: ∂ E 1 / ∂ φ = − [ Δ φ − div ( ∇ φ / ∇ φ )] .
∂E 2 I − c1 1 = − [( c1 − c ) 2 + A1 2 ( c1 − c ) ]( − δ (φ )) . ∂φ A A1 −
1 I − c1 [( c 2 − c ) 2 + A 2 2 ( c 2 − c ) ]δ (φ ) A A2
(10)
(11)
The formulation (11) can be simply written as ∂ E 2 / ∂ φ = (1 / A )[( c1 − c )( 2 I − c − c1 ) − ( c 2 − c )( 2 I − c − c 2 )]δ (φ ) .
(12)
The following evolution equation is the gradient flow that minimizes the functional E : ∂ φ / ∂ t = −∂ E (φ ) / ∂ φ .
(13)
The partial differential equation (PDE) for the level set corresponding to (7) is: 1 ∇φ ∂φ )] + λ 2 [ − ( c1 − c )( 2 I − c − c1 ) . = λ1 [ Δ φ − div ( A ∇φ ∂t
(14)
+ ( c 2 − c )( 2 I − c − c 2 )]δ (φ )
The improved level set evolution model in (14) is called level set image segmentation method based on the maximization of the between-class variances and the distance constraint function. According to curve evolution theory, the movement equation of the curve with respect to time t is represented as follows: ∂C / ∂t = F n .
(15)
Where F is the velocity, and n is the inner normal direction of the curve C . The contour C shrinks when F > 0 and expands when F < 0 . The movement of the curve can be represented by either (14) or (15), with (14) utilizing the level set function for the representation of the movement of the curve. Without loss of generality, assume that the average gray value of the object region c1 is smaller than the average gray value of the background region c 2 , so that c1 < c 2 and c1 ≤ c ≤ c 2 . Let F1 = − ( c1 − c )( 2 I − c − c1 ) and F2 = − ( c 2 − c )( 2 I − c − c 2 ) .
870
C. Zhou et al.
When I < c + c1 , F1 and F 2 take negative values. When I > c + c 2 , F1 and F 2 take 2
2
positive values. To understand the meaning of (14), let I ( x , y ) denote the gray value of the point ( x , y ) . When I > c + c 2 , F1 + F 2 > 0 and the contour curve C repre2
sented by the zero level set function shrinks. When I < c + c1 , F1 + F 2 < 0 and the 2
contour curve C represented by the zero level set function expands. When c + c1 c + c 2 , the contour curve C represented by the level set function con c + c 2 and 2
the contour curve represented by the zero level set shrinks. Since the region information, which includes the average gray values of the object region and the background region, is incorporated into the improved model, the correct image segmentation result of the Lateral Ventricle image with weak boundaries is obtained when the evolution exceeds 400 iterations. As the number of iterations continues to grow, the contour curve stops evolution and stabilizes at the segmentation result.
(a)
(b)
(c)
(d)
Fig. 2. Image segmentation of Lateral Ventricle based on the proposed model. (a) The initial contour. (b) 100 iterations. (c) 200 iterations. (d) 400 iterations.
The image segmentation result on the Lateral Ventricle image with Gaussian noise using the improved model that includes (7) as the energy functional and (14) as the level set is shown in Fig.3. For this image in Fig.3, we used the parameters λ 1 = 0 . 04 , and λ 2 = 1 . The 385 × 393 pixel image of the Lateral Ventricle with Gaussian noise is shown in Fig. 3(a), where the initial contour curve is provided by the white rectangle. The Gaussian noise has a mean value of 0 and a standard deviation of 0.0001. The curve evolution takes respectively 100,300,600 iterations in Fig. 3(b), 3(c), 3(d). Compared with the case shown in Fig.2, due to the addition of the Gaussian noise in the case shown in Fig.3, the difference between the average gray values of the region inside the contour curve C and the region outside C decreases, i.e., the speed corresponding to the external energy function becomes smaller. Therefore, the evolution of the contour curve becomes slower. Using the improved model, the correct image segmentation result of the Lateral Ventricle image with weak boundaries is obtained when the number of iterations reaches 600. Compared with the case shown in Fig.2, 200 more iterations are needed to obtain the correct result.
872
C. Zhou et al.
(a)
(b)
(c)
(d)
Fig. 3. Gaussian noise image segmentation of Lateral Ventricle based on the proposed model. (a) The initial contour. (b) 100 iterations. (c) 300 iterations. (d) 600 iterations.
The image segmentation result on the Lateral Ventricle image with impulse noise using the improved model that includes (7) as the energy functional and (14) as the level set is shown in Fig.4.
(a)
(b)
(c)
(d)
Fig. 4. Impulse noise image segmentation of Lateral Ventricle based on the proposed model. (a) The initial contour. (b) 100 iterations. (c) 300 iterations. (d) 400 iterations.
For this image, we used the parameters λ 1 = 0 . 04 , and λ 2 = 1 . The 385 × 393 pixel image of the Lateral Ventricle with impulse noise is shown in Fig. 4(a). The impulse noise has a density of 0.04. The initial contour curve is provided by the white curve, most of which is located inside the object region and is close to the object boundary. Let I ( x , y ) denote gray value of point ( x , y ) . When the contour curve is in the object region, c + c1 and the contour curve represented by the zero level set expands. When the 2 contour curve is in the background region, I > c + c 2 and the contour curve represented 2 I
0
(i=1,2…20), if there exits matrix K and positive definite symmetric matrix P>0, satisfying linear matrix inequality ⎡W ⎢M T ⎢ 1 ⎢ M 2T ⎢ T ⎢M 3 ⎢M T ⎢ 4T ⎢M 5 ⎢M T ⎢ 6 ⎢⎣ M 7T
M1 − J1 0 0 0 0 0 0
M2 0 −J2 0 0 0 0 0
M3 0 0 −J3 0 0 0 0
M4 0 0 0 −J4 0 0 0
M5 0 0 0 0 − J5 0 0
M6 0 0 0 0 0 − J6 0
M7 ⎤ 0 ⎥⎥ 0 ⎥ ⎥ 0 ⎥ 0 , DFE + E F D ≤ ε DD + ε E E T
T
T
T
−1
T
(ii) For any matrix P > 0 and any real number ε > 0 satisfying ε I − EPE > 0 , T
( A + D F E ) P ( A + D FE ) T ≤ A PA T + AP E T ( ε I − E PE T ) − 1 E PA T + ε D D T
(iii) For any matrix P > 0 and any real number ε > 0 satisfying P − ε DD > 0 , T −1 T T −1 −1 T ( A + DFE) P ( A + DFE) ≤ A ( P − ε DD ) A + ε E E . T
Design of a Single-Phase Grid-Connected Photovoltaic Systems Based on Fuzzy-PID Controller Fengwen Cao and Yiwang Wang Department of Electronic & Information Engineering, Suzhou Vocational University, Suzhou 215104, Jiangsu, China
[email protected] Abstract. The output power of photovoltaic (PV) module varies with module temperature, solar isolation and 1oads changes etc. In order to control the output power of single-phase grid-connected PV system according to the output power PV arrays. In this paper a Fuzzy-PID controller is designed for single-phase grid connected PV system, which includes a DC/DC converter and a single-phase DC/AC inverter that connected to utility grid. Fuzzy-PID control technique is used to realize the system control. The matlab simulation experimental results show that, the proposed method has the good performance. Keywords: photovoltaic system, Fuzzy-PID control, DC/AC inverter, grid-connected.
1 Introduction To protect the environment from the industrial pollution and the greenhouse effect, several research and development projects have been realized on renewable energy. The conventional energy sources for electrical power include hydroelectric, fossil fuels, nuclear energy and so on. The wide use of fossil fuels has resulted in the problem of greenhouse emissions worldwide, which also seriously damages the earth's environment and fossil fuels will be exhausted in the future, and their cost has obviously increased. However, photovoltaic is one of the important renewable energy sources. The cost of the photovoltaic is on a falling trend and is expected to fall further as demand and production increases [1-3]. The PV energy is increasing interest in electrical power applications, which is crucial to operate PV energy conversion systems near maximum power point (MPP) to increase the output efficiency of PV arrays. Now, there are many different control methods, such as the inverter control method can be divided into output voltage feedback control and output current feedback control. However, the single phase gridconnected PV system is commonly used inverter output current control method. So far in the current control methods, PID control method has the characteristics of external interference in response to the process of control without overshoot, but the use of conventional PID controller, due to the need for the establishment of precise mathematical model, which can’t satisfy the requirements of flexibility of the system. As a result, this paper give a design based on the Fuzzy PID controller to realize the single D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 912–919, 2009. © Springer-Verlag Berlin Heidelberg 2009
Design of a Single-Phase Grid-Connected Photovoltaic Systems
913
phase PV grid-connected control method, using Fuzzy logic control to overcome the shortcomings of conventional PID control, which application to the single-phase gridconnected PV system to control the inverter, the inverter in order to enhance antijamming capability. In this paper, a complete simulation model of the grid-connected PV system based on Fuzzy-PID controller is obtained by MATLAB7.01/SIMULINK software. The results of the simulation experiment are presented and showed the eliminates the concussion at maximum power point and improves system stability. The grid-connected PV system designed in this paper can feed energy into the existing AC grid system, where the cost of batteries for energy storage can be reduced. In this paper, the grid-connected PV system is given first. Then, the Fuzzy PID controller is described. The controller for grid-connected PV system is designed also. Finally, the simulation experimental results of grid-connected PV system by the Fuzzy PID controller are gotten and discussed.
2 Single-Phase Grid Connected Photovoltaic System 2.1 System Block Diagram Figure.1 shows the block diagram of the grid-connected PV power system, which includes PV arrays, DC/DC converter, DC/AC inverter and Fuzzy-PID control system. Wher the DC/DC converter is a boost type power converte is applied to trace the maximum power point of the PV arrays’ output. The output voltage of DC/DC power converter is regulated by the grid-connected DC/AC full-bridge inverter. GRID PV ARRAYS
DC/DC CONVERTER
DC/AC INVERTER
CONTROL SYSTEM
Fig. 1. The block diagram of proposed PV system
The schematic diagram is shown in Figure.2. The boost DC/DC converter with MOSFET switch is used between the PV arrays and the DC/AC inverter. The boost converter duty cycle (D) is adjusted such that maximum PV arrays’ output power is extracted under all operating conditions. 2.2 The PV Cell Model The equivalent circuit of PV solar cell in photovoltaic systems is shown in Figure.3[4]. A photovoltaic cell is based on the physical phenomenon called “photovoltaic effect”. The principle consists to transform the photons emitted by the sun in electrical energy.
914
F.W. Cao and Y.W. Wang
UPV
GRID
Fig. 2. Grid-connected PV system schematic diagram
RL
I ph
D
RS
+
UO
Load
ID
IL
-
Fig. 3. The equivalent circuit of PV cell model
The output current of a cell is function of the insulation and the temperature, given by the equation below:
⎡ ⎛U + I R I L = I ph − I D ⎢exp ⎜ oc L L AkT ⎝ ⎣ Where,
⎞ ⎤ UD ⎟ − 1⎥ − R , i = 1, 2...n . ⎠ ⎦ s
(1)
I D is the saturation current of D, K is the Boltzmann's constant, T is the cell
temperature formulated in Kelvin, e the electron charge, k the idealistic factor for a pn junction. I L and U o are the output current and voltage of the PV cell model respectively, I ph is the PV cell model short-circuit current depending on the insulation
Rs the shunt resistance which characterized the leakage current of the junction and RS is the series resistance which represented the different contact and the temperature,
and connection resistances. The constants in the above equations will be described by PV cells manufacturers. According to Eqn.1, the electrical characteristics I (U) of the PV cell model is non linear. The typical I-U curves can be showed in Fig.4.
Design of a Single-Phase Grid-Connected Photovoltaic Systems
915
I/mA 1.25kW/m2
15
0.75kW/m2 0.5kW/m2
10
5
0
250
500
U/mV
Fig. 4. Typical I-U curve for a PV cell under variable irradiance
2.3 The Control System
Because the change operating voltage of the PV cell array will produce different output power, the ripple voltage of solar cell array will decrease the efficiency of the photovoltaic system. The Fuzzy-PID controller is used to produce a pulse-width modulation (PWM) control signal to turn on or off the power electronic switch of the DC/DC boost converter. The average output voltage across power electronic switch can be derived as:
U DC = U PV /(1 − ton / T ) = U PV /(1 − D ) .
(2)
Where U PV is input voltage (output voltage of PV arrays), U DC is output voltage and D is the duty ratio of power electronic switch. The DC/AC full-bridge inverter circuit converts to alternate current by closed-loop current control, which can generate a sine wave connected the existence utility grid-connected. The control block diagram of the PV system is shown in Figure 5. The function of the DC/DC boost converter is to output a stable DC voltage to the grid-connected DC/AC inverter, the feedback control of output voltage and current is designed. As seen in Figure5, the output voltage of the DC/DC boost converter is sensed by a voltage sensor and compared with a setting voltage, and then, the compared result is sent to a Fuzzy-PID controller. The output of Fuzzy PID controller is sent to a PWM control circuit. The output of the PWM circuit is sent to a driver circuit to generate the driver signal for the power electronic device of the DC/DC boost converter. The DC/AC inverter control use SPWM control, which can output a sine alternate current to connected the grid. In this paper, MPPT is done by a Fuzzy-PID controller as shown in Figure 5. to reach the output panel characteristics around the optimal voltage for the maximum energy. Ur
e
U DC
de / dt
Fig. 5. Control system of the DC/DC converter based on Fuzzy-PID controller
916
F.W. Cao and Y.W. Wang
The inputs to the Fuzzy PID controller are error of the DC/DC boost converter output voltage and the preset voltage. The two inputs are processed by the fuzzy logic controller. Thus during transient conditions the fuzzy logic controller outputs a larger incremental reference current to speed up the transient response but outputs almost zero incremental reference current near the peak power region to reduce oscillations about the MPP.
3 The Fuzzy PID Control for Single Phase Grid-Connected PV Generation System In order to reach stable efficient high-precision control method in the PV generation system. But PV system is a complex multi-variable, non-linear, strong-coupling system, difficult to establish accurate mathematical model, based on fuzzy control is a knowledge and experience of control methods, so Fuzzy PID control for gridconnected those system can get better control results. A typical block diagram of a Fuzzy PID controller is shown in Figure 6. In generally, Fuzzy-PID control method not only has the advantage of fuzzy control without precise mathematical model, good robustness, good dynamic response curve, short response time etc, but also owns the advantage of dynamic tracking characteristic and steady state precision. The fuzzy rule base is used to realize the online modulation of the three revised parameters of PID ΔK p , ΔK i and ΔK d .
de / dt
Uref
e
Kp
Ki
Kd U
o
Fig. 6. Block diagram of the Fuzzy PID controller
3.1 Fuzzification and Variables
The inputs to the Fuzzy PID controller are the error signal
e , the change of the error
Δe and the three outputs ΔK p , ΔK i and ΔK d , which are used to modify PID parameters. Both input variables e and Δe are divided into seven fuzzy levels or subsets in this design, which are PB (Positive Big), PM (Positive Medium), PS (Positive Small), ZZ (Zero), NS(Negative Small), NM (Negative Medium) and NB (Negative Big). The Fuzzy set PS assumes a membership value greater than zero beginning at the origin, in the present model, the Fuzzy set PS is offset from the origin in order to speed up the start-up process and at the same time prevent variation of the reference current at the MPP. Additional Fuzzy sets PM and NM have also been added to improve the control surface and allow a smooth transition from the transient to the steady-state.
Design of a Single-Phase Grid-Connected Photovoltaic Systems
In the same way, the output variables ΔK p ,
917
ΔK i and ΔK d are also divided into
seven fuzzy sets as the input variables, which are PB (Positive Big), PM (Positive Medium), PS (Positive Small), ZZ (Zero), NS (Negative Small), NM (Negative Medium) and NB (Negative Big). 3.2 Fuzzy Rules for Control
A rule base is a set of (IF-Then) rules, which contains a fuzzy logic quantification of the expert linguistic description of how to achieve good control [7]. The basic rule base of these controllers types is given by: IF
e is A and Δe is B THEN
u ΔK i
is C.
Where u represents the fuzzy output variable. For input and output membership function table 1 show the control rule that used for fuzzy Kp, Ki and Kd respectively. Table 1. Rule base for fuzzy PID controller ΔK
Δe
P
e
ΔK
Δe
i
e
ΔK de
Δe
3.4 Defuzzification Method
The output of the Fuzzy PID controller is Fuzzy sets. However a crisp output value is required. Hence the output of the Fuzzy controller should be defuzzified. The most popular method, center of gravity or center. This method has good averaging properties and simulation results showed that it provided the best results.
4 System Simulations The establishment of PV generation system model in the MATLAB7.01/ SIMLINK is designed. The maximum power point tracking was controlled by using a fuzzy logic control and inverter current control using predicted current control to control current of inverter to be sinusoidal wave shape. The simulation results of system following parameters are listed as follows: PV arrays: 100W; AC = 220V, 50Hz.
918
F.W. Cao and Y.W. Wang
Fig. 7. The waveform of inverter current and voltage
This grid-connected PV system is simulated by use MATLAB/SIMULINK. Fig.7. shows the output voltage of inverter, which demonstrate this system can provide energy to utility with low harmonics and high power factor.
5 Conclusion In this paper, a single-phase grid connected PV based on Fuzzy-PID controller has been built by applying the proposed control strategy. The Fuzzy-PID controller has been designed in details. The designed system has simulated. The matlab simulation experimental results show that, the proposed method has the expected good performance.
Acknowledgment This work was supported by the 2009 Senior Talent Culture Project of Suzhou Municipal Government and the Innovative Team’s Scientific Research Program of Suzhou Vocational University(200802).
References 1. Jou, H.L., Chiang, W.J., Wu, J.C.: Novel Maximum Power Point Tracking Method for the Photovoltaic System. In: 7th International Conference on Power Electronics and Drive Systems, pp. 620–623 (2007) 2. Kroposki, B., DeBlasio, R.: Technologies for the New Millennium: Photovoltaics as a Distributed Resource. In: Proc. IEEE Power Engineering Society Summer Meeting, vol. 3, pp. 1798–1801 (2000) 3. Wu, T.F., Nien, H.S., Shen, C.L., Chen, T.M.: A Single-Phase Inverter System for PV Power Injection and Active Power Filtering with Nonlinear Inductor Consideration. IEEE Trans. on Industry Applications 41(4), 1075–1083 (2005) 4. Veerachary, M., Senjyu, T., Uezato, K.: Voltage-Based Maximum Power Point Tracking Control of PV System. IEEE Trans. on Aerospace Electronic System 38(1), 262–270 (2002) 5. Mol, J.H., He, S.J., Zoul, Y.P.: Fuzzy PID Controller Design for PWM-Buck EBW Stabilized High-Voltage Source Using Saber-Matlab Co-simulation. In: 2007 IEEE International Conference on Control and Automation, Guangzhou, China, pp. 2003–2007 (2007)
Design of a Single-Phase Grid-Connected Photovoltaic Systems
919
6. Zhang: Fuzzy Modeling and Fuzzy Control, 3rd edn., pp. 116–229. Birkhauser Press, Basel (2006) 7. Passino, K.M., Yurkovich, S.: Fuzzy Control. Addison Wesley longnan, Inc, Reading (1998) 8. Reznik, L., Ghanayem, O., Bourmistrov, A.: PID Plus Fuzzy Controller Structures as A Design Base for Industrial Applications. Engineering appliaction of artificial Intelligence 13(4), 419–430 (2000) 9. Khaehintung, N., Pramotung, K., Tuvirat, B., Sirisuk, P.: RISC-Microcontroller Built-in Fuzzy Logic Controller of Maximum Power Point Tracking for Solar-Powered LightFlasher Applications. IEEE IECON 3, 2673–2678 (2004)
Ontology-Based Decision Support for Security Management in Heterogeneous Networks Michal Chora´s1,2, Rafal Kozik2 , Adam Flizikowski1,2, Rafal Renk1,3 , and Witold Holubowicz1,3 ITTI Ltd., Pozna´ n Institute of Telecommunications, UT&LS Bydgoszcz Adam Mickiewicz University, Pozna´ n
[email protected],
[email protected],
[email protected] Abstract. In this paper our original methodology of applying ontologybased logic into decision support system for security management in heterogeneous networks is presented. Such decision support approach is used by the off-network layer of security and resiliency mechanisms developed in the INTERSECTION Project. Decision support application uses knowledge about networks vulnerabilities to support off-network operator to manage and control in-networks components such as probes, intrusion detection systems, Complex Event Processor, Reaction and Remediation. Hereby, both IV O (Intersection Vulnerability Ontology) as well as P IV OT - decision support system based on the vulnerability ontology are presented.
1
Introduction
INTERSECTION (INfrastructure for heTErogeneous, Resilient, SEcure, Complex, Tightly Inter-Operating Networks) is a European co-funded project in the area of secure, dependable and trusted infrastructures. The main objective of INTERSECTION is to design and implement an innovative network security framework which comprises different tools and techniques for intrusion detection and tolerance. The INTERSECTION framework as well as the developed system called IDTS (Intrusion Detection and Tolerance System) consists of two layers: in-network layer and off-network layer. Our decision support system is placed in the offnetwork layer of the INTERSECTION security-resiliency framework. The role of the off-network decision support system is to support network operators in controlling complex heterogeneous and interconnected networks and real-time security processes such as network monitoring, intrusion detection, reaction and remediation. The knowledge about vulnerabilities is needed to more effectively cope with threats and attacks, and to understand interdependencies and cascade effects within the networks. Therefore network vulnerabilities should be identified, described, classified, stored and analyzed. The framework operator should be able to control in-network processes and trigger/stop their reactions on the basis of D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 920–927, 2009. c Springer-Verlag Berlin Heidelberg 2009
Ontology-Based Decision Support for Security Management
921
the vulnerability knowledge which is incorporated in our decision support system intelligence by means of the vulnerability ontology. In this paper we show, how the previously described concept of INTERSECTION Vulnerability Ontology in [1][2][3] is now used for decision support. The paper is structured as follows: ontology-based approach is introduced and motivated in Section 2. Our vulnerability ontology is presented in Section 3. Decision support application based on ontology is described in Section 4.
2
Vulnerability Knowledge for Decision Support-Ontology-Based Approach
In both computer science and information science, an ontology is a form of representing data model of a specific domain and it can be used to e.g.: reason about the objects in that domain and the relations between them. Since nowadays, we can observe the increasing complexity and heterogeneity of the communication networks and systems, there is a need to use high-level meta description of relations in such heterogeneous networks. This need and requirement is particularly apparent in the context of Future Internet and Next Generation Networks development. From operators point of view, two important issues concerning communications networks are: security and Quality of Service. In the past years critical infrastructures were physically and logically separate systems with little interdependence. As digital information gained more and more importance for the operation of such infrastructures especially on the communication part. Communication part of critical infrastructures are the one of the most important part that represents the information infrastructure on which critical infrastructures rely and depend. The communication part is typically related to telecom operators or separate department inside company that manages the network. The last decade has seen major change in telecommunication market in most of European countries. The are two main factors that cause those changes: – Market deregulation that enables new telecom providers to enter the market – New technologies and solutions that cause lower costs of services, introduction of the new services and increased telecom traffic. Unfortunately, the increasing complexity and heterogeneity of the communication networks and systems increase their level of vulnerability. Furthermore, the progressive disuse of dedicated communication infrastructures and proprietary networked components, together with the growing adoption of IP-based solutions, exposes critical information infrastructures to cyber attacks coming from the Internet and other IP based networks. To deal with those problems there is a need to create good information security management system that will allow the administrators to deal with a great amount of security information and make the decision process effective and efficient.
922
M. Chora´s et al.
To support those tasks we propose to develop the security framework consisting of several modules as well as of the decision support system based on the applied ontology.
3
Ontology Design
One of the goals of the INTERSECTION project is to identify and classify heterogeneous network vulnerabilities. To match this goal we have proposed a vulnerability ontology. The major aim of our ontology is to describe vulnerabilities beyond single domain networks and to extend relations/restrictions onto heterogeneous networks. Our ontology is now called IV O - INTERSECTION Vulnerability Ontology. Networks vulnerabilities tend to be often mistaken with threats and attacks. Therefore we decided to clearly define vulnerability as asset-related network weakness. Obviously, then such weaknesses are exploited by threats and attacks. Such vulnerability definition is based on ISO/IEC 13335 standard and is shown in Figure 1 [4].
Fig. 1. Vulnerabilities identification and definition on the basis of networks assets
Networks assets should also be defined and described. We decided to use Shared Information/Data (SID) Model in which networks assets and relations between them are defined. SID Model provides Physical Resource Business Entity Definitions [5]. SID assets description is specified in UML and visualized using UML diagrams. In our ontology approach, we found Resources and Vulnerabilities classes as a the most important components. Class Resources is based on division proposed in SID (Shared Information/Data Model).
Ontology-Based Decision Support for Security Management
923
It includes following subclasses: – – – –
Physical Resources, Logical Resources, Software Service.
Class Vulnerabilities is connected with Resources (exposed by them). That is why subclasses of Vulnerability class are: – Physical Resources Vulnerabilities, – Logical Resources Vulnerabilities, – Software Vulnerabilities. Every subclass inherited properties and restrictions from its superclass that is why we decided to classified our ontology in this way. For example classes Wired and Wireless inherited Resources, Topology, Vulnerabilities, Network Structure, Risk and Safeguards from superclass Network. Our vulnerability ontology is focused on network design vulnerabilities (e.g. protocols weakness etc.). In contrast there are some implementation vulnerabilities, however these are already stored in National Vulnerability Database (N V D) [6].
4
Ontology Applied to Decision Support
The created ontology is applied in our decision support system intelligence providing knowledge about vulnerabilities and how they influence specific interconnected scenarios. Decision support system applied to security management in heterogeneous networks has the following functionalities: 1. Provides information about influence of heterogeneity onto networks security and resiliency issues 2. Provides information to Intrusion Detection and Anomaly Detection Systems decision support tool provides information about security risks and threats in particular scenarios (what networks are interconnected, what technologies are used etc.). Intrusion detection systems receive information on how to act in such scenarios (e.g. how often the packets should be sniffed, what features should be extracted etc.) 3. Supports decisions of Intrusion Tolerance Systems decision support system provides information about tolerance, expected False Positives etc. 4. Provides useful information for security architecture visualization module additional information for end-users (network management operators) 5. Supports Complex Event Processor Module (a part of IDS system) - decision support drives the decision engine while performing the correlation activity 6. Decision support system cooperates with the relational vulnerabilities database (IV D) created in FP7 INTERSECTION Project.
924
M. Chora´s et al.
Fig. 2. IVO Visualized in Protege
PIVOT (Project INTERSECTION Vulnerability Ontology Tool) is the ontology-logic based decision support tool. Our goal was to apply ontology in a real-life decision-support application. It is end-user oriented application, which allows to modify and browse the vulnerability ontology. One of the biggest advantages is tool has client-server architecture, what allows to share one ontology by multiple users (e.g. by network operators). The ontology interface built in PIVOT is user-friendly and intuitive. The application consists of MySQL storage database, Protege OWL API, Tomcat WWW server and OpenLaszlo framework. The backbone of the tool is Protege API. It is an open-source Java library for the Web Ontology Language and RDF(S). The API provides classes and methods to load and save OWL files, to query and manipulate OWL data models, and to perform reasoning. Furthermore, the API is optimized for the implementation of graphical user interfaces. The
Ontology-Based Decision Support for Security Management
925
Fig. 3. PIVOT architecture
greatest part of API is that Protege allows to work with OWL model having MySQL database in backend, what makes dramatic performance improvements. Client-server architecture allows to share one ontology model with multiple users. Each connection to PIVOT is transactional, what provide better ontology database integrity. All database operation (the way the model is stored in db) are transparent for PIVOT, what means that user do not have to worry about establishing connection, committing changes (made on model) or bothering where and how the particular instance is stored. Actual version of PIVOT allows to establish two types of connection - the RMI and the HTTP. RMI (Java Remote Method Invocation API) is a Java application programming interface for performing the remote procedure calls. This type of PIVOT interface was developed to be use with other components in local network. This gives opportunity to share ontology among other processes running on remotes machines. The HTTP interface is developed to perform easy OWL model maintenance and management through the web browser. This functionality is provided by Apache Tomcat server. This server is developed by the Apache Software Foundation (ASF). Tomcat implements the Java Servlet and JavaServer Pages (JSP) specifications form Sun Microsystems, and provides a pure Java HTTP web server environment for Java. It is used as a PIVOTs module which is started on PIVOT boot up. Java Server Pages (JSP) is Java technology that allows software developers to dynamically generate HTML, XML or other types of documents in response to web client request. The technology allows Java code and certain predefined actions to be embedded into static content. PIVOT benefits from easy XML document generation. This format allows to define own elements and to help share structured information via network, what makes PIVOT more universal.
926
M. Chora´s et al.
Fig. 4. PIVOT in operation (screenshot)
To boost the presentation layer OpenLaszlo is used. It is an open source platform for the development and delivery of rich Internet applications. PIVOT architecture is presented in Figure 3. PIVOT is now available at: http://193.142.112.119:8081/lps-4.1.1/pivot/. PIVOT interface in operation is shown in Figure 4.
5
Conclusions
The major contribution of this paper is a new approach to vulnerability description and handling based on the ontology logic. INTERSECTION Vulnerability Ontology has been motivated and presented in detail. We also showed how to apply IV O in an innovative decision support system used in INTERSECTION security-resiliency framework. Moreover, PIVOT - ontology-logic based decision support application has been developed and presented. Our decision support system can be used by end-users such as networks operators and telecoms to manage heterogeneous and interconnected networks.
Acknowledgement The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 216585 (INTERSECTION Project).
Ontology-Based Decision Support for Security Management
927
References 1. FP7 INTERSECTION Deliverable D.2.2: Identification and classification of vulnerabilities of network infrastructures (2008) 2. Flizikowski, A., et al.: On Applying Ontologies to Security and QoS Management in Heterogeneous Networks. In: Information Systems Architecture and Technology - Information Systems and Computer Communications Network, 189-200, ISBN 978-83-7493-416-9 (2008) 3. Michal, C., et al.: Ontology-based description of networks vulnerabilities. Polish Journal of Environmental Studies 5c (2008) 4. ISO/IEC 13335-1:2004, Information Technology Security Techniques Management of information and communications technology security Part 1: Concepts and models for information and communications technology security management 5. Shared Information/Data Model TeleManagement Forum (2002) 6. http://nvd.nist.gov/ 7. FP7 INTERSECTION (INfrastructure for heTErogeneous, Reislient, Secure, Complex, Tightly Inter-Operating Networks) Project Description of Work. 8. Ekelhart, A., et al.: Security Ontologies: Improving Quantative Risk Analy-sis. In: Proc. of the 40th Hawaii International Conference on System Sciences (2007) 9. http://protege.stanford.edu/ 10. OWL Web Ontology Language Semantics and Abstract Syntax (2006), http://www.w3.org/TR/owl-features/ 11. SWRL: A Semantic Web Rule Language Combning OWL and RuleML, W3C Member Submission, http://www.w3.org/Submission/SWRL/ 12. Spector, A.Z.: Achieving application requirements. Distributed Systems, 0-20141660-3, 19-33 (1990) 13. Gomez, A., Corcho, O.: Ontology languages for the Semantic Web. IEEE Intelligent Systems 1904, 54–60 (2002)
A Constrained Approximation Algorithm by Encoding Second-Order Derivative Information into Feedforward Neural Networks Qing-Hua Ling1 and Fei Han2 1
School of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, 212003, China 2 School of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang, Jiangsu, 212013, China
[email protected],
[email protected] Abstract. In this paper, a constrained learning algorithm is proposed for function approximation. The algorithm incorporates constraints into single hidden layered feedforward neural networks from the a priori information of the approximated function. The activation functions of the hidden neurons are specific polynomial functions based on Taylor series expansions, and the connection weight constraints are obtained from the second-order derivative information of the approximated function. The new algorithm has been shown by experimental results to have better generalization performance than other traditional learning ones.
1 Introduction The popular method for training a feedforward neural network (FNN) is the backpropagation (BP) algorithm [1] which applies the gradient descent method to derive the updated formulae of the weights. However, first, these learning algorithms are apt to be trapped in local minima but global minima. Second, they have not considered the network structure features as well as the involved problem properties, thus their generalization capabilities are limited [2-6]. A proper matching between the underlying problem complexity and the network structure complexity is crucial for improving the network generalization capability [4]. In the literature [5-6], a class of constrained learning algorithm (CLA) was proposed by coupling the a priori information from problems into the cost functions defined at the network output layer. As a result, the solutions for the involved problems finding the roots of polynomials can be very easily obtained. In the literatures [7-8], two learning algorithms were proposed that are referred to as Hybrid-I and Hybrid-II methods. In the Hybrid-I algorithm, the cost terms for the additional functionality based on the first-order derivatives of neural activation at hidden layers were designed to penalize the input-to-output mapping sensitivity in the course of training the weights. By forcing the hidden layer neurons to work in saturating mode, this algorithm can increase the problem complexity and improve the generalization D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 928–934, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Constrained Approximation Algorithm
929
capability for the involved network. As for the Hybrid-II algorithm, it incorporates the second-order derivatives of the neural activations at hidden layers and output layer into the sum-of-square error cost function to penalize the high frequency components in training data. In the literature [9], a modified hybrid algorithm (MHLA) is proposed according to the Hybrid-I and Hybrid-II algorithms to improve the generalization performance. Nevertheless, these learning algorithms do not consider the information of the approximated function when they are used to solve the function approximation problem. In the literature [10], a new learning algorithm referred as FDCLA was proposed for function approximation. The first-order derivative information of the approximated function was incorporated in the FDCLA. Similar to FDCLA, the new algorithm in this paper incorporates both architectural constraints and connection weight constraints. The former are realized by selecting the activation functions of the hidden neurons as a class of specific polynomial functions which are extracted from Taylor series expansion. The latter are extracted from the second-order derivative of the approximated function. Finally, simulated results are given to verify the better convergence performance of the proposed algorithm.
2 The Proposed Approximation Algorithm Assume that the sample points of the function are selected at identically spaced intervals. In addition, these points, i.e., ( x i , y i) , i = 1,2,K, N . , are very close in space. According to Mean-Value Theorem, the corresponding approximate estimated values of the functional first-order derivative can be obtained as follows [10]: f ′( xi ) ≈
f ′( x1) ≈
( y2 −
( y i +1 −
y1)
y i −1)
( x 2 − x1)
( x i +1 − xi −1)
, i = 2, L, N − 1.
( − ) , f ′( x N ) ≈ y N y N −1 (
x N − x N −1)
(1)
(2)
As for the second-order derivative of the approximated function, its approximation value can be obtained as follows: f ′( xi + 1 ) ≈
( y i +1 −
2
f ′′( x i ) =
y i)
( x i +1 − x i )
, i = 1,2, K , N − 1.
( f ′( x i + 1 ) − f ′( x i − 1 )) 2
2
( xi + 1 − x i − 1 ) 2
, i = 2,K, N − 1.
(3)
(4)
2
f ′′( x1) = f ′′( x 2) , f ′′( x N ) = f ′′( x N −1)
(5)
When the function is approximated by the FNN, φ (x) , the following constraints can be easily obtained: φ ′′( x i ) − f ′′( x i ) = 0, i = 1,2, L, N .
(6)
930
Q.-H. Ling and F. Han
This constraint can be simplified as: Φ 2 ≡ φ ′′( x ) − f ′′( x) = 0
(7)
where f ′′(x) is a constant, but it varies with different value of the variable x . First, according to the literature [10], a single hidden layered FNN is adopted for approximating the function and the transfer function of the k th hidden neuron is k
selected as the function of x k! , k = 1,2,K , n. . The activation function of the (n + 1) th hidden neuron is fixed as −1. . The weights from the input layer to the hidden layer are all fixed to one, i.e., w(k1,)1 = 1, k = 1,2,L, n. Second, the constraints containing derivative information of the approximated function can be obtained, as shown in Eqn. (7). Finally, from literatures [5-6,10], a constraint for updated synaptic weights is imposed to avoid missing the global minimum as follows: n
2 ∑ (d w1(,2i)) = (δP) 2
(8)
i =1
where d w1(,2i) denotes the change of synaptic weight w1(,2i) , and δP is a predetermined positive value. A sum-of-square error cost function is defined at the output of the FNN: E=
1 2N
N
∑ (t i − y i)
2
(9)
i =1
where t i denotes the target signal, y i is the actual output of the network, and N is the number of training samples. Suppose that dΦ = (dΦ 2) T is equal to a predetermined positive value δQ , designed to bring dΦ closer to its target (zero). As for k th training pattern, by introducing the function ε k , dε k is expanded as follows [5-6,10]: n
n
2 ( 2) (2) ( 2) dε k = ∑ J id w1,i + (δQ − i∑=1 d w1,i F i)v + ((δP) − ∑ (d w1,i ) )u n
i =1
Ji =
2
i =1
∂ Ek ∂w
( 2) 1,i
, ( E k = 12 (t k − y k ) 2 ),
F
i
=
∂Φ 2 , ∂ w1(,2i)
( i = 1,2,L , n + 1. )
(10)
(11)
According the above FNN, the following results can be obtained: i
J i = −(t k − y k ) Fi =
∂Φ 2 ∂ w1(,2i)
xk , i!
i = 1,2, K , n.
J n +1 = (t i − y i)
(12)
i−2
=
xk
(i − 2)!
, ( i = 2,L , n. ) F 1 = F n +1 = 0
In order to maximize dε at each epoch, the following equations are obtained:
(13)
A Constrained Approximation Algorithm
931
n
2 2 ( 2) ( 2) d ε k = ∑ ( J i − F i v − 2ud w1,i ) d w1,i = 0
(14)
i =1
n
( 2) 2
d ε k = −2u ∑ (d w1, i ) 3
2
i =1
0
,
s ik < 0
(13)
. +
Here, we restrict the difference and magnitudes of upper control force u and the −
down control force u for the sensitive property of chaos dynamic system. According to existence necessary and sufficient condition of sliding mode in [22], it require the following conditions
lim+ ρik ,ψ ik (⋅) + gik (⋅, u + ) + Ξ i < 0 ,
(14)
lim ρik ,ψ ik (⋅) + gik (⋅, u + ) + Ξ i > 0
(15)
s →0
and s → 0−
,
where ρik is a norm vector of manifold sik = 0 , ⋅, ⋅ is inner product operator. From (9) and (11), the attraction region towards the sliding region can be defined as
Ωik = Ωik+ U Ωik− ,
{
(16)
}
where Ωik+ = xi (t ) ∈ Rn lk (xi (t )) + ζ k ( xi (t ), u+ (t)) +Ξik < 0, xik (t) − xikeq > 0
{
}
Ωik− = xi (t) ∈ Rn lk (xi (t)) + ζ k ( xi (t), u+ (t)) +Ξik > 0, xik (t) − xikeq < 0
.
, and
Then
the
global attraction region is obtained as
Ωi = UΩik .
(17)
Weighted Small World Complex Networks: Smart Sliding Mode Control
941
With multiple sliding mode manifolds, it can easily ensures the system state will enters the intersection (8) while the system state falls into the global attraction region (17). Moreover, while the system partial state reach their corresponding parts of the unstable equilibrium. The rest partial state will fast reach to their corresponding parts of the equilibrium state under suitable conditions [23]. Based on the above analysis, therefore, we have the following result. Theorem 1. Under the smart sliding mode controller (12) and (13), the system (2) is globally asymptotical synchronization via weighted small world complex network generated by Generation algorithm 1, when the system state enters the attraction region (17), if the following autonomous linear subsystem
⎧ x&i , p +1 (t ) = l p +1 ( xi (t )) ⎪ , M ⎨ ⎪ x& (t ) = l ( x (t )) n i ⎩ in is asymptotical stable to the manifolds
(18)
Si , i = 1, 2,L , N .
Remark 3. Employing the smart sliding mode control with a small magnitude, the attraction region (15) is restricted in a very narrow gorge for each ideal tracking to the corresponding sliding mode manifold. Fortunately, these kinds of narrow attraction regions always exist due to the ergordicity of chaotic dynamics. The trajectory will be eventually attracted to the intersection of manifolds when the region Ω is entered.
5 Simulation Tests In this section, a weighted small world network G with 50 nodes is firstly constructed. After that, the Liu chaos and Lorenz chaos systems are investigated, respectively. The ith Liu chaos system is given as
⎧ x&i1 = a( xi 2 − xi1 ) ⎪ ⎨ x&i 2 = bxi1 − hxi1 xi 3 , ⎪& 2 ⎩ xi 3 = −cxi 3 + kxi1
(19)
when a = 10, b = 40, c = 2.5, h = 1, k = 4 , (19) shows chaos property. The Lorenz chaos system of the
ith node is
⎧ x&i1 = σ ( xi 2 − xi1 ) ⎪ ⎨ x&i 2 = rxi1 − xi 2 − xi1 xi 3 , ⎪ x& = x x − bx i1 i 2 i3 ⎩ i3 where
σ is called Prandtl number. Let σ = 10 , r = 28 , and b = 8 / 3 .
(20)
942
Y. Yang and X. Yu 120
100
x i, i=1, ..., 50
80
60
40
20
0
-20
0
0.5
1
1.5
2
2.5 3 time (sec)
3.5
4
4.5
5
Fig. 1. Synchronization behavior of Liu systems on weighted small world network 35
30
x i, i=1, ..., 50
25
20
15
10
5
0
0
0.5
1
1.5
2
2.5 3 time (sec)
3.5
4
4.5
5
Fig. 2. Synchronization behavior of Lorenz systems on weighted small world network
The objective is to control the synchronization behavior approaches to the desired state xd . Let c = 2 , and channel a pinning controller for the corresponding system
x&i 2 = bxi1 − hxi1 xi 3uis for (19) and x&i 2 = rxi1 − xi 2 − xi1 xi 3uis for (20), respectively. Design the smart sliding mode controller is designed
⎧1.5, uis = ⎨ ⎩0.5,
for si > 0 for si < 0
,
(21)
Weighted Small World Complex Networks: Smart Sliding Mode Control
943
and the sliding manifolds are designed as si = {( x(1), x(2), x(3)) | x(2) − xd (2) = 0} . The inner-coupling matrix of weighted small world network is supposed that only the second variable for coupled node dynamics is inner-coupled. ω = 10 . As shown in the simulations, Fig. 1 exhibits coupled Liu system states fast approach the desired state, while Fig. 2 shows that the coupled Lorenz system states exhibit good synchronization behavior.
6 Conclusion In this paper, a novel weighted small world complex network model was established. Furthermore, a smart sliding mode control strategy was proposed for synchronization realization. Compared with other synchronization analysis and control method, our smart control scheme exhibited good performance without local linearization and other idea assumptions.
Acknowledgement This work is partially supported by the China National Natural Science Foundation Project (60774017 and 60874045) and the Open Projects of Key Laboratory of Complex Systems and Intelligence Science of Chinese Academy of Sciences (20060101).
References 1. Watts, D.J., Strogatz, S.H.: Collective Dynamics of Small-world Networks. Nature 393, 440–442 (1998) 2. Strogatz, S.H.: Exploring Complex Networks. Nature 410, 268–276 (2001) 3. Barabasi, A.L., Albert, R.: Emergence of Scaling in Random Networks. Science 286, 509–512 (1999) 4. Barabasi, A.L., Albert, R., Jeong, H.: Mean-field Theory for Scale-free Random Networks. Physcia A 272, 173–187 (1999) 5. Holmgren, A.J.: Using Graph Models to Analyze the Vulnerability of Electric Power Networks. Risk Analysis 26(4), 955–969 (2006) 6. Wang, X.F., Chen, G.: Complex Networks: Small-world, Scale-free, and Beyond. IEEE Circuits Syst. Mag. 3(1), 6–20 (2003) 7. Abdallah, C.T., Tanner, H.R.: Complex Networked Control Systems: Introduction to the Special Section. IEEE Control Systems Magazine, 30–32 (2007) 8. Wang, X.F., Chen, G.: Pinning Control of Scale-free Dynamical Networks. Physica A 310, 521–531 (2002) 9. Chen, T., Liu, X., Lu, W.: Pinning Complex Network by a Single Controller. IEEE Trans Circuits and Systems-I: Regular Papers 54(6), 1317–1326 (2006) 10. Wang, X.F., Chen, G.: Synchronization in Scale-free Dynamical Networks: Robustness and Fragility. IEEE Trans Circuits and Systems-I: Fundamental Theory and Applications 49(1), 54–62 (2002) 11. Lü, J., Yu, X., Chen, G., et al.: Characterizing the Synchronizability of Small-world Dynamical Networks. IEEE Trans Circuits and Systems-I: Regular Papers 51(4), 787–796
944
Y. Yang and X. Yu
12. Li, C., Chen, G.: Synchronization in General Complex Dynamical Networks with Coupling Delays. Physica A 343, 263–278 (2004) 13. Lü, J., Chen, G.: A Time-varying Complex Dynamical Network Model and Its Controlled Synchronization Criteria. IEEE Trans Automatic Control 50(6), 841–846 (2005) 14. Liu, B., Liu, X., Chen, G., et al.: Robust Impulsive Synchronization of Uncertain Dynamical Networks. IEEE Trans Circuits and Systems-I: Regular Papers 52(7), 1431–1441 (2005) 15. Yang, M., Wang, Y., Wang, H.O., et al.: Delay Independent Synchronization of Complex Network via Hybrid Control. In: Proceedings of 2008 American Control Conference, Seattle, Washington, USA, pp. 2206–2271 (2008) 16. Chen, M.: Chaos Synchronization in Complex Network. IEEE Trans Circuits and SystemsI: Regular Papers 55(5), 1335–1346 (2008) 17. Pecora, L.M., Carroll, T.L.: Master Stability Functions for Synchronized Coupled Systems. Physical Review Letters 80, 2109–2112 (1998) 18. Chavez, M., Hwang, D., Amann, A., et al.: Synchronizing is Enhanced in Weighted Complex Networks. Physical Review Letters 94(21), 8701 (2005) 19. Wu, C.W.: Synchronization and Convergence of Linear Dynamics in Random Directed Networks. IEEE Trans Automatic Control 51(7), 1207–1210 (2006)ss 20. Jalili, M., Ajdari, R.A., Hasler, M.: Enhancing Synchronizability of Dynamical Networks Using the Connection Graph Stability Method. Int. J. Ciurcuit Theory Appl. 35, 611–622 (2007) 21. Jalili, M., Ajdari, R.A., Hasler, M.: Enhancing Synchronizability of Weighted Dynamical Networks Using Betweenness Centrality. Physical Review E 78, 16105 (2008) 22. Utkin, V.I.: Sliding Modes in Control Optimization. Springer, Berlin (1992) 23. Yu, X.: Controlling Lorenz Chaos. International Journal of Systems Science 27(4), 355–359 (1996) 24. Tian, Y.P., Yu, X.: Adaptive Control of Chaotic Dynamical Systems Using Invariant Manifold Approach. IEEE Trans Circuits and Systems-I: Fundamental Theory and Applications 47(10), 1537–1542 (2000)
A Novel Method to Robust Tumor Classification Based on MACE Filter Shulin Wang1,2,* and Yihai Zhu1,3,* 1
Intelligent Computing Lab, Hefei Institute of Intelligent Machines, Chinese Academy of Science, Heifei, 230031, China 2 School of Computer and Communication, Hunan University, Changsha, Hunan, 410082, China 3 Department of Automation, University of Science and Technology of China, Hefei, Anhui, 230026, China
[email protected] Abstract. Gene expression profiles consisting of thousands of genes can describe the characteristics of specific cancer subtype. By efficiently using the overall scheme of gene expression, accurate tumor diagnosis can be performed well in clinical medicine. However, faced many problems such as too much noise and the curse of dimensionality that the number of genes far exceeds the size of samples in tumor dataset, tumor classification by selecting a small set of gene subset from the thousands of genes becomes a challenging task. This paper proposed a novel high accuracy method, which utilized the global scheme of differentially expressed genes corresponding to each tumor subtype which is determined by tumor-related genes, to classify tumor samples by using Minimum Average Correlation Energy (MACE) filter method to computing the similarity degree between a test sample with unknown label in test set and the template constructed with training set. The experimental results obtained on two actual tumor datasets indicate that the proposed method is very effective and robust in classification performance. Keywords: Gene expression profiles, MACE filter, tumor classification, similarity degree, DNA microarray.
1 Introduction Gene expression profiles (GEP), which can provide systematic quantitative information on the simultaneous expression of thousands of genes within cells in any given state, is emerging as a powerful and cost-effective tool for the analysis of cell state, and thus provides the understanding and insights into biological processes and disease etiology. The advance of DNA microarray technology has made its clinical applications to tumor diagnosis possible, and further makes personal medicine realizable. However, due to many problems such as too much noise and the curse of dimensionality that the number of genes far exceeds the size of samples in tumor dataset, *
These authors contributed equally to this work.
D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 945–954, 2009. © Springer-Verlag Berlin Heidelberg 2009
946
S. Wang and Y. Zhu
dimensionality reduction by selecting a small set of gene subset or by extracting a small set of feature subset from thousands of genes in GEP becomes a challenging task. As stated by Dabney [1], the constructed classification model may be evaluated from three aspects: accuracy, interpretability, and practicality. It is possible that a complicated classification model may be rejected for a simpler alternative even if the simpler alternative does not perform as well because of the poorer interpretability of the complicated classification model. Therefore, many existing methods focused on constructing simpler classification model by finding the smallest gene subset. For example, Wang et al. [2] proposed a very effective method involving two steps: (1) choosing some important genes using a feature importance ranking scheme, (2) evaluating the classification capability of all simple combinations of those important genes by using a good classification model. Huang et al. [3] proposed an efficient evolutionary approach to gene selection from GEP, which can simultaneously optimize gene selection and tissue classification for microarray data analyses. However, although these methods classifying tumor subtypes by identifying the smallest gene subsets have many merits such as the selected genes can be used as the biomarker of tumor subtype and the constructed classification model is very simple, their computational costs are also very big and the optimal gene subset selected is not unique. The method to overcome these problems is that we just utilize the overall scheme of those differentially expressed genes to construct classification model. Common methods to select differentially expressed genes are t-test [4] and rank sum test [5], etc. If all differentially expressed genes are used as the input of classifier such as k-nearest neighbor (k-NN) [6] or support vector machines (SVM) [7], the classification model with good generalization performance is not always obtained. As we know, the tumor subtypes are determined by all differentially expressed genes in cell. Therefore, different tumor subtypes are determined by different gene expression patterns. In this paper, we proposed a novel high accuracy method to tumor classification by computing the similarity degree of the expression pattern of differentially expressed genes between a test sample with unknown label in test set and the template constructed with training set by using Minimum Average Correlation Energy (MACE) filter method [12].
2 Methods 2.1 Flowchart of Analysis Our method involves two key steps: the selection of differentially expressed genes (gene selection) and the computing of similarity degree between a test sample and known template constructed by training set (correlation filter). The normalization step will make the dataset have zero mean and one variance. The decision that a test sample belongs to a certain subtypes is very sample only according to the peak value obtained by correlation filter. The analysis flowchart of our method is shown in Fig. 1. More details on how to select differentially expressed genes and to compute similarity degree between a sample and a template are described in subsection 2.3 and 2.4, respectively.
A Novel Method to Robust Tumor Classification Based on MACE Filter
Dataset
Gene selection
947
Normalization
Correlation filter
Label
Label decision
Sample similarity peak
Fig. 1. The analysis flowchart of our method
2.2 Representation of GEP DNA microarray is composed of thousands of individual DNA sequences printed in a high density array on a glass microscope slide. Samples are generated under multiple conditions which may be a time series during a biological process or a collection of different tissue samples. Let G ={g1,"gn } be a set of genes and S ={s1,"sm } be a set of samples. The corresponding gene expression matrix can be represented as n X ={ xi , j |1≤i ≤ m ,1≤ j ≤ n} . The matrix X is composed of m row vectors. si ∈R ;i =1,2,", m , m denotes the number of samples, and n denotes the number of genes measured.
2.3 Selection of Differentially Expressed Genes We focused on obtaining the overall scheme of differentially expressed genes, so one of important tasks is to identify differentially expressed genes from tumor dataset before computing similarity degree by using MACE filter method because too many genes will lead to the increase of similarity between every two samples in different subtypes. Troyanskaya et al. [8] systematically assessed the performance of three methods including nonparametric t-test, Wilcoxon rank sum test, and a heuristic method. Their experiments on simulated and real-life dataset under varying noise levels and p-value cutoffs suggest that the rank sum test appears most conservative. Usually, there are two kinds of rank sum test methods: Wilcoxcon rank sum test (WRST) [9,10] which is only suitable for binary sample set, and Kruskal-Wallis rank sum test (KWRST) [11] which is suitable for multi-class sample set. And KWRST is also a non-parameter method for test equality of population medians among classes. 2.4 Minimum Average Correlation Energy Filter Composite correlation filters, also called the advanced correlation filters or synthetic discriminant functions, were developed in the mid 1980s and early 1990s for classification and track [13]. Recently, correlation filters have been extraordinarily notable and applied to biometric recognition, such as faces, irises, and palmprint, which is successful [14]. The block diagram of correlation process is shown in Fig. 2.
948
S. Wang and Y. Zhu
Fig. 2. Block diagram of correlation process
One of the famous composite correlation filters is the Minimum Average Correlation filter (MACE), proposed by the Abhijit Mahalanobis et al. in 1987 [12]. In 1980, Hester and Casasent suggested the synthetic discriminant function approach (SDF). After their introduction, SDF’s have been the focus of much research and a lot of new techniques were added to this method, which formed minimum variance synthetic discriminant function (MVSDF) [15] and minimum average correlation energy (MACE). Because, the original SDF’s did not consider any input noise, they were not optimized for noise tolerance. Vijaya Kumar introduced the MVSDF in 1986, which maximized the noise tolerance of the SDF’s. The original SDF method controlled only the cross-correlation value at the origin, and thus could not guarantee that the correlation output plane had its peak (i.e., maximum value) at the origin, which might produce large peak-to-sidelobe rate (PSR). To solve this problem, Mahalanobis et al. introduced, MACE filter capable of producing sharp correlation peaks at the origin. MACE filters: If we have N training samples of a class, the i -th sample is described as a 1-D discrete sequence denoted by xi ( n ) . Its discrete Fourier transform (DFT) is denoted by X i ( k ) . In this section, we describe the discrete sample sequence as a column vector xi of dimensionality d equal to the length of the sample xi ( n ) , i.e.,
xi = [ xi (1), xi (2), " , xi (d )]T
(1)
where the superscript ‘ T ’ denotes transpose. All DFTs are also of the length of d . We define a matrix
X = [ X 1 , X 2 ,", X N ]
(2)
with column vectors X i , where X i denotes the discrete Fourier transform of xi ( n ) . Upper case symbols refer to the frequency plane terms, while the lower case symbols represent quantities in the space domain. A correlation filter is a class-specific template synthesized from a set of training samples with the purpose of recognizing their class, while the template is often called as a filter. So we use some training samples to synthesized the class-dependent filter. The vector h represents the filter h ( n ) in the space domain and the vector H its Fourier transform H ( k ) in the frequency domain. We denote the correlation function of the i -th sample sequence xi ( n ) with the filter sequence h ( n ) by gi ( n ) , i.e.,
A Novel Method to Robust Tumor Classification Based on MACE Filter
gi (n) = xi (n) ⊗ h(n)
949
(3)
We denote the DFT of the correlation function by Gi ( k ) . The energy of the i -th correlation plane is d
d
d
k =1
k =1
Ei = ∑ g i (n) = (1 d )∑ Gi (k ) = (1 d )∑ H (k ) X i (k ) n =1
2
2
2
(4)
Equation (4) is a direct realization of Parseval’s theorem. Using the vector form of the sample sequence, we can also write Eq.(4) as
Ei = H + Di H
(5)
where the superscript + denotes the conjugate transpose of a complex vector, and Di is a diagonal matrix of size d ×d whose diagonal elements are the magnitude square of the relative element of X i , i.e.,
Di (k , k ) = X i (k )
2
(6)
Note that the diagonal elements of Di describe the power spectrum of xi ( n ) . We wish to design a correlation filter that ensures sharp correlation peaks while allowing constraints on the correlation peak values. To achieve good detection, it is necessary to reduce correlation function levels at all points except at the origin of the correlation plane, where the imposed constraints on the peak value must be met. Specifically, the value of the correlation function must be at a user specified value at the origin but is free to vary elsewhere. This is equivalent to minimize the energy of the correlation function while satisfying intensity constraints at the origin. In the vector notation, the correlation peak amplitude constraint is
gi (0) = X i+ H = ui
i = 1, 2, " , N
(7)
where gi (0) is the value of the output correlation at the origin (peak), ui is the user specified value of the i -th correlation function at the origin. The filter must also minimize the correlation plane energy Ei (i =1,2,", N ) . Thus in matrix-vector notation, the problem is to find the frequency domain vector H that minimizes the H + Di H for all i , while satisfying the peak constraints in Eq.(7), which can also be written for all samples as
X +H = u
(8)
where u =[u1,u2 ,",u N ] . The solution of this problem does not exist because the simultaneous constrained minimization of all Ei (i =1,2,", N ) is not possible. We, therefore, attempt to minimize the average value of Ei in Eq.(5), while meeting the linear constraints in Eq.(8).
950
S. Wang and Y. Zhu
The average correlation plane energy is N
N
N
i =1
i =1
i =1
Eav = (1 N )∑ Ei =(1 N )∑ H + Di H =(1 N ) H + (∑ Di ) H
(9)
where we define D as N
D = (1 N )∑ α i Di
(10)
i =1
where αi are constants. If all αi =1 , we may rewrite Eq.(9) as N
Eav = (1 N )∑ H + DH i =1
α i = 1, i = 1, 2,", N
(11)
The solution to this problem may be found using the method of Lagrange multipliers. This solution method is possible since we solve for the filter in the frequency domain. The vector H is given by
H = D −1 X ( X + D −1 X )−1 u
(12)
3 Experiments 3.1 Descriptions of Two Tumor Datasets In our experiments we applied our approach to two publicly available tumor datasets: Small Round Blue Cell Tumor (SRBCT) [17] and Acute Lymphoblastic Leukemia (ALL) [16] datasets, which are described in Tables 1 and 2. The SRBCT dataset contains 88 samples with 2,308 genes in every sample. According to the original literature, there are 63 training samples and 25 test samples containing five non tumorrelated samples. The 63 training samples contain 23 Ewing family of tumors (EWS), 20 rhabdomyosarcoma (RMS), 12 neuroblastoma (NB), and eight Burkitt lymphomas (BL) samples. The test samples contain six EWSs, five RMSs, six NBs, three BLs, and five non tumor-related samples that were removed in our experiments. ALL dataset totally contains 248 samples, in which 148 samples were used as training set and 100 samples were used as test set. For example, for subclass BCR-ABL, there are 15 samples. According to the rank of samples in original dataset, the first nine samples were used as training set, and the last six samples were used as test set. The others were deduced similarly. Table 1. Descriptions of SRBCT dataset Subclass EWS NB RMS BL Non-SRBCT Total
#Original dataset 29 18 25 11 5 88
#Training set 23 12 20 8 0 63
#Test set 6 6 5 3 5 25
A Novel Method to Robust Tumor Classification Based on MACE Filter
951
Table 2. The partition of training set and test set for ALL dataset NO. 1 2 3 4 5 6
Subclass BCR-ABL E2A-PBX1 Hyperdip>50 MLL T-ALL TEL-AML1 Total
#Training set 9 16 39 12 25 47 148
#Test set 6 11 25 8 18 32 100
3.2 Experimental Method The correlation filters trained for each class are MACE filters without using any falseclass samples, which means that each class has its own filter. In the verification process, a test sample has correlation with all filter templates, yielding correlation output planes (COP). Then, we check each COP to search the maximum values of each COP. According to those values, we can get the maximal value and its class label. For each filter, if the test sample is a true sample, there will be theoretically sharp value in the COP. On the contrary, a false sample will not yield a sharp value. Fig. 4 shows the illustrative diagram of this process.
Fig. 3. The demonstration of correlation filter on actual tumor samples
For one test sample there will be N similarity peak values, because there are N correlation filters for N classes. According to the N similarity peak values, we can obtain the maximum peak value and its filter-dependent class label, which is the calculated label. After we have already known the test sample’s label, we can compare the two labels, the calculated label and the known label of a test sample, to validate whether they are consistent with each other or not. Obviously, if the calculated label accords with the known one, we think that the test sample is classified correctly; on the contrary, we deem that the test sample is classified wrongly. According to the above process, we can gain the correctness of classification of every test sample, and achieve corresponding prediction accuracy.
952
S. Wang and Y. Zhu
3.3 Experimental Results On the SRBCT dataset, the average accuracy is 87.12%±3.75 which is computed with the selected number of top-ranked genes from 300 to 600. Meanwhile, we got an excellent result on the ALL dataset, with the average accuracy of 97.19%±0.66 which is also computed with the selected number of top-ranked genes from 300 to 600. The classification accuracy on the two datasets with different numbers of topranked genes is shown in Fig. 4.
Fig. 4. The classification accuracy on SRBCT and ALL datasets with different numbers of topranked genes
4 Comparisons with Other Methods Tumor classification based on GEP has been extensively studied. For example, the nearest shrunken centroid method proposed by Tibshirani et al. can identify discriminant genes by a shrinkage parameter [19]. Dabney [1] proposed a classification to nearest centroids (ClaNC) method which performs class-specific gene selection. Table 3. Comparisons with other related methods No. 1
Methods KWRST + MACE
2
4
The significance for classification Genetic Evolution of Subsets of Expressed Sequences (GESSES) Class Separability
5
Nearest Shrunken Centroid
Fuzzy Neural Network --
6
ClaNC
--
3
Classifier -Artificial Neural Network k-Nearest Neighbor
Datasets SRBCT ALL SRBCT
Accuracy % 95.0 (unbiased) 99.0 (unbiased) 100.0 (biased)
Ref. Ours
SRBCT
100.0 (biased)
[18]
SRBCT
95.0 (unbiased)
[2]
SRBCT ALL SRBCT ALL
95.0 (unbiased) 86.0 (unbiased) 95.0 (unbiased) 99.0 (unbiased)
[19]
[17]
[1,20]
A Novel Method to Robust Tumor Classification Based on MACE Filter
953
Usually, ClaNC method outperforms the nearest shrunken centroid method in classification performance. In fact, some methods are biased due to the gene selection on whole dataset, such as the methods in [17] and [18]. Comparing the best results obtained by our unbiased method with the best ones obtained by other related methods are shown in Table 3, from which we can see that our method is equivalent to the best one in classification accuracy.
5 Conclusions Most of the traditional methods to tumor classification focused on the gene selection and feature extraction of GEP, and the selected genes or extracted features are commonly used as the input of a classifier. The merit of gene selection is that the important tumor-related genes may be selected out, but its drawbacks are that finding a small set of gene subset from GEP will cost more CPU time and the classifier constructed by the selected gene subset possibly lacks generalization performance. Although the methods of feature extraction are time-saving, there exists information loss in the extracted features. To avoid these problems, we proposed a novel high accuracy method which combines Kruskal-Wallis rank sum test with MACE filter to compute the similarity degree of the expression pattern of differentially expressed genes between a test sample and the template constructed with training sample for tumor classification. The merits of our method are that it not only is time-saving but also can obtain very high accuracy by sufficiently utilizing the overall scheme of differentially expressed genes. 95% of the best prediction accuracy on SRBCT dataset and 99% on ALL dataset indicate that compared with other methods the proposed method is more effective and robust, which is appropriate for the classification of large numbers of tumor samples in future.
Acknowledgments This work was supported by the grants of the National Science Foundation of China, No. 30700161, the grant of the Guide Project of Innovative Base of Chinese Academy of Sciences (CAS), No.KSCX1-YW-R-30, and the grant of Oversea Outstanding Scholars Fund of CAS, No.2005-1-18.
References 1. Dabney, A.R.: Classification of microarrays to nearest centroids. Bioinformatics 21(22), 4148–4154 (2005) 2. Wang, L.P., Chu, F., Xie, W.: Accurate cancer classification using expressions of very few genes. IEEE/ACM Transactions on computational biology and bioinformatics 4(1), 40–53 (2007) 3. Huang, H.L., Lee, C.C., Ho, S.Y.: Selecting a minimal number of relevant genes from microarray data to design accurate tissue classifiers. BioSystems 90(1), 78–86 (2007)
954
S. Wang and Y. Zhu
4. Sreekumar, J., Jose, K.K.: Statistical tests for identification of differentially expressed genes in cDNA microarray experiments. Indian Journal of Biotechnology 7(4), 423–436 (2008) 5. Deng, L., Ma, J.W., Pei, J.: Rank sum method for related gene selection and its application to tumor diagnosis. Chinese Science Bulletin 49(15), 1652–1657 (2004) 6. Li, L.P., Darden, T.A., Weinberg, C.R., Levine, A.J., Pedersen, L.G.: Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Combinatorial Chemistry & High Throughput Screening 4(8), 727–739 (2001) 7. Zhou, X., Tuck, D.P.: MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics 23(9), 1106–1114 (2007) 8. Troyanskaya, O.G., Garber, M.E., Brown, P.O., Botstein, D., Altman, R.B.: Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 18(11), 1454–1461 (2002) 9. Lehmann, E.L.: Non-parametrics: Statistical methods based on ranks, Holden-Day, San Francisco (1975) 10. Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945) 11. Kruskal, W.H., Wallis, W.A.: Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association 47(260), 583–621 (1952) 12. Mahalanobis, A., Kumar, B.V.K., Casasent, D.: Minimum average correlation energy filters. Appl. Opt. 26, 3633–3640 (1987) 13. Kumar, B.V.: Tutorial survey of composite filter designs for optical correlators. Appl. Opt. 31, 4773–4801 (1992) 14. Kumar, B.V., Savvides, V.M.K., Xie, C., Thornton, J., Mahalanobis, A.: Biometric verification using advanced correlation filters. Appl. Opt. 43, 391–402 (1992) 15. Kumar, B.V.: Minimum variance synthetic discriminant functions. Opt. Soc. Am. A 3, 1579–1584 (1986) 16. Yeoh, E.J., Ross, M.E., Shurtleff, S.A., Williams, W.K., Patel, D., Mahfouz, R., Behm, F.G., Raimondi, S.C., Relling, M.V., Patel, A., Cheng, C., Campana, D., Wilkins, D., Zhou, X., Li, J., Liu, H., Pui, C.H., Evans, W.E., Naeve, C., Wong, L., Downing, J.R.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2), 133–143 (2002) 17. Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7(6), 673–679 (2001) 18. Deutsch, J.M.: Evolutionary algorithms for finding optimal gene sets in microarray prediction. Bioinformatics 19(1), 45–52 (2003) 19. Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences of the United States of America 99(10), 6567–6572 (2002) 20. Dabney, A.R., Storey, J.D.: Optimality driven nearest centroid classification from genomic data. PLoS ONE, 2(10), e1002. doi:10.1371/journal.pone.0001002 (2007)
Ensemble Classifiers Based on Kernel PCA for Cancer Data Classification Jin Zhou1, Yuqi Pan1, Yuehui Chen1, and Yang Liu2 1
School of Information Science and Engineering, University of Jinan, Jinan 250022, P.R. China
[email protected] 2 Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong
Abstract. Now the classification of different tumor types is of great importance in cancer diagnosis and drug discovery. It is more desirable to create an optimal ensemble for data analysis that deals with few samples and large features. In this paper, a new ensemble method for cancer data classification is proposed. The gene expression data is firstly preprocessed for normalization. Kernel Principal Component Analysis (KPCA) is then applied to extract features. Secondly, an intelligent approach is brought forward, which uses Support Vector Machine (SVM) as the base classifier and applied with Binary Particle Swarm Optimization (BPSO) for constructing ensemble classifiers. The leukemia and colon datasets are used for conducting all the experiments. Results show that the proposed method produces a good recognition rate comparing with some other advanced artificial techniques. Keywords: Cancer data classification, Kernel principal component analysis, Support vector machine, Ensemble classifier, Binary particle swarm optimization.
1 Introduction The recent advent of DNA microarray technique has made simultaneous monitoring of thousands of gene expressions possible [1]. With this abundance of gene expression data, researchers have started to explore the possibilities of cancer data classification. Quite a number of methods have been proposed in recent years with promising results. Usually, the classification of gene expression data requires two steps: feature selection and data classification. As the microarray data consists of a few hundreds of samples and thousands or even ten thousands of genes, it is extremely difficult to work in such a high dimension space using traditional classification methods directly. So feature selection methods, which include principal components analysis (PCA), Fisher ratio, t-test, correlation analysis etc, have been proposed and developed to reduce the dimensionality [2]. Along with the feature selection methods, intelligent methods have been applied for microarray data classification, such as artificial neural network (ANN) [3], K nearest neighbor (KNN) [4], decision tree [5] and flexible D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 955–964, 2009. © Springer-Verlag Berlin Heidelberg 2009
956
J. Zhou et al.
neural tree (FNT) [6]. Recent years, ensemble approaches [7] have been put forward. It combines multiple classifiers together as a committee to make more appropriate decisions for classifying microarray data instances. Much research has showed that it can offer improved accuracy and reliability. In this paper, the gene expression data is firstly preprocessed for normalization in which four steps are taken. Kernel principal component analysis (KPCA) is then applied to extract features. Secondly, an intelligent approach is brought forward, which uses Support Vector Machine (SVM) as the base classifier and applied with Binary Particle Swarm Optimization (BPSO) for constructing ensemble classifiers. The leukemia and colon datasets, which were obtained from the Internet, are used for conducting all the experiments. The paper is organized as follows: Section 2 introduces the normalization of gene expression data. The feature selection method based on KPCA is described in section 3. The optimal design method for constructing ensemble classifiers is discussed in section 4 and 5. Section 6 gives the experiment results. And in section 7, we present our conclusions.
2 Gene Expression Data Normalization Due to the noisy nature of dataset provided by microarray experiment, preprocessing is an important step in the analysis of microarray data. The raw intensities have a wide dynamic range. Both datasets have to be normalized to decrease the variation before submitting them to the evolutionary algorithm. In this paper, four steps are taken: 1) If a value is greater than the ceiling 16000 and smaller than the floor 100, this value is replaced by the ceiling/floor. 2) Leaving out the genes with (max - min) ≤ 500, here max and min refer to the maximum and minimum of the expression values of a gene, respectively. 3) Carrying out logarithmic transformation with 2 as the base to all the expression values. 4) For each gene i, subtract the mean measurement of the gene μ i and divide by the standard deviation σ i . After this transformation, the mean of each gene will be zero, and the standard deviation will be one.
3 Feature Selection Based on Kernel PCA The traditional Principal Component Analysis (PCA) [8] is based exclusively on the second-order statistics with smooth Gaussian distribution. It is difficult to describe the data with non-Gaussian distribution, so the Kernel-based algorithm (KPCA algorithm[9,10]) is proposed for nonlinear PCA. KPCA uses kernel function to obtain the arbitrary high-order correlation between input variants, and find the principal components needed through the inner production between input data. First of all, a nonlinear mapping Φ is used to map the input data space Rn into the feature space F:
Ensemble Classifiers Based on Kernel PCA for Cancer Data Classification
Φ: RN→F, x→Φ (x).
957
(1)
Correspondingly, a pattern in the original input space Rn is mapped into a potentially much higher dimensional feature vector in the feature space F. An initial motivation of KPCA is to perform PCA in the feature space F. Let us construct the covariance matrix in the feature space F: C =
1 M
M
∑
( Φ ( x j ) − Φ )( Φ ( x j ) − Φ ) T
,
j =1
(2)
where 1 M
Φ =
M
∑ Φ (x ) .
(3)
j
j =1
However, it is not easy to centralize data directly in the feature space F. To avoid this difficulty, we make the assumption again that M
∑ Φ (x
j
)= 0
.
(4)
j =1
So let us consider the following noncentralized covariance matrix: ~ 1 C = M
M
∑ Φ (x
j
)Φ ( x j ) T
.
j =1
(5)
Now we have to solve the Eigenvalue equation: ~
λV = CV
∈
,
(6)
for Eigenvalues λ≥0 and Eigenvectors V F \{0}. ~ It is very computationally intensive or even impossible to calculate C ’s eigenvectors in a high-dimensional (even infinite-dimensional) feature space. KPCA can be viewed as utilizing two key techniques to solve this problem artfully. One is the SVD technique[11] adopted in Eigenfaces, and the other is the so-called kernel-tricks[9]. SVD technique can be used to transform the eigenvector calculation problem of a large-size matrix to the eigenvector calculation problem of a small-size matrix and, kernel-tricks can be used to avoid the computation of dot products in the feature space by virtue of the following formula: K ( x i , x j ) = ( Φ ( x i ) ⋅ Φ ( x j )) .
(7)
~ Specifically, let Q = [Φ( x1 ),..., Φ( x M )] ; then C can also be expressed by ~ 1 C = QQ M
T
.
(8)
~ Let us form the matrix R = Q T Q : By virtue of kernel-tricks, we can determine the ~ elements of the M × M matrix R by ~ R ij = Φ( xi )T ⋅ Φ( x j ) = (Φ( xi ) ⋅ Φ( x j )) = K ( xi , x j ) . (9)
958
J. Zhou et al.
~ Let us calculate the orthonormal eigenvectors u1, u2, …, um of R corresponding to m largest eigenvalues λ1 ≥ λ2 ≥ … ≥ λm. Then, by SVD technique, the orthonormal eigen~ vectors w1, w2, …, wm of C corresponding to m largest eigenvalues λ1, λ2, …, λm are 1
wj =
λj
Qu
j
, j = 1, …, m.
(10)
After the projection of the mapped sample Φ (x) onto the eigenvector wj, we can obtain the j-th feature y
j
= w
T j
Φ (x) =
1
λ
u
T j
Q TΦ (x)
.
(11)
j
The resulting features y1, y2, …, ym form a KPCA-transformed feature vector Y=(y1, y2, …, ym)T for sample x.
4 Data Classification Using Support Vector Machine There are many kinds of methods for microarray data classification. Since Support Vector Machine (SVM) is suitable for data analysis that deals with few samples and large features, in recent years, most researchers applied Support Vector Machine (SVM) as the base classifier to learn microarray datasets and obtained very good results. 4.1 Support Vector Machine
Support Vector Machine (SVM), which was originally introduced by Vapnik and coworkers [12], is now used in many classification problems. SVM builds up a hyperplane as the decision surface in such a way to maximize the margin of separation between positive and negative examples. SVM achieves this by the structural risk minimization principle. The error rate of a learning machine on the test data is bounded by the sum of the training-error rate and the capacity of this machine depends on the Vapnik Chervonenkis (VC) dimension. Given a labeled set of training samples (Xi, Yi), i=1, …, M, where X i ∈ R N and Yi ∈ {−1, 1} , the discriminant hyperplane is defined by: M
f (X ) =
∑
Y iα i K ( X , X i ) + b
,
(12)
i=1
where K(X, Xi) is a kernel function and the sign of f(X) determines the membership of X. The selection of an appropriate kernel function is very important for SVM. At present, the selection of kernel function in microarray data classification is mostly artificial and unitary. An improvement scheme can combine several kinds of kernel functions to gain a higher performance. The polynomial kernel function has good global quality and strong extrapolation ability. As a result of a low polynomial exponent, a higher computation speed can be obtained. To the opposite, the Gauss radial basic function is the locally strong kernel
Ensemble Classifiers Based on Kernel PCA for Cancer Data Classification
959
function. Its interpolation ability will be weakened along with the parameter σ ’s growth. Therefore, to get a kernel function that has high learning capability, strong generalization, both good extrapolation and interpolation abilities, we need to design a mixed kernel function that combine several kinds of kernel functions together. In this paper, K mix is adopted as the kernel function in SVM. K mix = λK poly + (1 − λ ) K rbf , λ ∈ (0,1) .
(13)
4.2 Parameter Optimization with PSO
Particle Swarm Optimization (PSO) [13] is one of the evolutionary optimization methods inspired by nature. Since PSO was first introduced by Kennedy and Eberhart(1995), it has been successfully applied to optimize various continuous nonlinear functions. In this paper, PSO is used to optimize parameters of the SVM. A population of particles is randomly generated initially. Each particle represents a potential solution and has a position represented by a position vector xi. A swarm of particles moves through the problem space with the moving velocity of each particle represented by a velocity vector vi. At each iteration step t, each particle keeps track of the best position among all the particles pg(t) and its own best position pi(t), a new velocity for particle i is updated by vi(t+1) = w * vi(t) + c1 * rand1 * (pi(t) - xi(t)) + c2 * rand2 * (pg(t) - xi(t)),
(14)
where c1 and c2 are positive constant and rand1 and rand2 are uniformly distributed random number in [0, 1]. The term vi is limited to the range of ±vmax. If the velocity violates this limit, it is set to its proper limit. Changing velocity in this way can enable the particle i to search around its individual best position, pi, and global best position, pg. Based on the updated velocities, each particle changes its position according to the following equation xi(t+1) = xi(t) + vi(t+1).
(15)
5 An Ensemble Classifiers Design with Binary PSO Selecting several classifiers to construct the committee is better than any one [14]. So we should select appropriate classifiers to form the classification committee. In this paper, we introduce the selection method classifiers ensemble using Binary of Particle Swarm Optimization (BPSO) [15]. 5.1 Particle Representation
Suppose N base classifiers are generated after trained by the feature subsets. They are expressed as C1, C2, C3, ..., CN. In this new ensemble approach for Cancer Data Classification, Xik is the i-th particle in swarm at iteration k. It is represented by a Ndimensional vector which is introduced to denote the N base classifiers and can be defined as X ik = xik1 , xik2 ,..., xink , where xijk is the position of the i-th particle with
[
]
960
J. Zhou et al.
respect to j-th dimension. A binary value of 1 for the j-th dimension implies that Cj is selected in the solution and 0 otherwise. 5.2 Initial Population
popk is the set of Popsize particles in the swarm at iteration k, i.e. popk = [X1k, X2k, …, XPopsizek]. For each dimension of a particle, a binary value of 0 or 1 is assigned according to a probability of e. In particular, ⎧ 1 , U ( 0 ,1 ) > e , ⎪ x ij0 = ⎨ , ⎪⎩ 0 , otherwise
(16)
where Vik is the velocity of particle i at iteration k. It can be described as Vi k = v ik1 , vik2 ,..., vink , vijk is the velocity of particle i with respect to j-th dimension. Velocity values are restricted to some minimum and maximum values, namely Vik = [Vmin, Vmax] where Vmin = -Vmax. The velocity of particle i in the j-th dimension is established by
[
]
vij0 = Vmin + U(0,1) ∗ (Vmax − Vmin ) .
(17)
This limit enhances the local search exploration of the problem space. 5.3 Fitness Function
In order to measure individuals, the fitness function should be created. We first generate the validation set V and then calculate the error E vik of each individual on V at iteration k. f ( X ik ) is the fitness of the i-th particle at iteration k depicted as follows: f (X
k i
)= 1
E vik
,
(18) ,
N
E
k vi
=
∑
x ijk × classifier
j
(19)
j =1
where N is the total number of base classifiers, x ijk is the position of the i-th particle with respect to j-th dimension at iteration k, classifierj is the error of the j-th base classifier on V. 5.4 Finding New Solutions
Since the BPSO algorithm is employed in this study, we need to use two useful functions for generating new solutions, namely a limitative function H to force the real values between 0 and 1, and a piecewise linear function G to force velocity values to be inside the maximum and minimum allowable values.
⎧V max , if v ijk > V max , ⎪⎪ G ( v ijk ) = ⎨ v ijk , if v ijk ≤ V max , ⎪ k ⎪⎩V min , if v ij < V min .
(20)
Ensemble Classifiers Based on Kernel PCA for Cancer Data Classification
961
After applying the piecewise linear function, the following limitative function is used to scale the velocities between 0 and 1, which is then used for converting them to the binary values. That is 1
H (v ijk ) =
1+
V max − v ijk
2
. (21)
v ijk − Vmin
So, new solutions are found by updating the velocity and dimension respectively. First, we compute the change in the velocity vijk such that vijk −1 = w ∗ vijk −1 + c1 ∗ rand1 ∗ ( pbijk −1 − xijk −1 ) + c 2 ∗ rand 2 ∗ ( gb kj −1 − xijk −1 ) ,
(22)
where PBik is the best value of the particle i obtained until iteration k. The best position associated with the best fitness value of the particle i obtained so far is called particle best and defined as PBik = pbik1 , pbik2 ,..., pbink . GBk is the best position among all particles in the swarm, which is achieved so far and can be expressed as GB k = gb1k , gb2k ,..., gbnk . c1 and c2 are social and cognitive parameters and Rand1 and Rand2 are uniform random numbers between 0 and 1. Then we update the velocity vijk by using the piecewise linear function such that
[
[
]
]
vijk = G (v ijk −1 + Δvijk −1 ) .
(23)
Finally we update the dimension j of the particle i such that ⎧1, if U ( 0 ,1) < H ( v k ) ij ⎪ . x ijk = ⎨ ⎪⎩ 0 , otherwise
(24)
6 Experiments We performed extensive experiments on two benchmark cancer datasets, which were obtained from the Internet, namely the Leukemia and Colon database. The Leukemia dataset consists of 72 samples taken from leukemia patients: 25 samples of AML and 47 samples of ALL [16]. A total of 38 out of 72 samples were used as training data and the remaining samples were used as test data. Each sample contained 7129 gene expression levels. The Colon dataset consists of 62 samples of colon epithelial cells taken from colon-cancer patients [16]. Each sample contains 2000 gene expression levels. A total of 31 out of 62 samples were used as training data and the remaining samples were used as test data. For this experiment, the normalization procedure is firstly used for preprocessing the raw data. Four steps were taken. Then the KPCA is employed, 60 informative features of each sample are extracted and 9 training datasets are chosen for training the 9 base classifiers. SVM is employed to be the base classifier and PSO is used to
962
J. Zhou et al. Table 1. Parameters used in this paper
Parameters for KPCA K(xi, xj) : kernel function
λ
Parameters for SVM : kernel function proportion coefficient K(X, Xi): kernel function Parameters for PSO L: population size w: weight c1, c2: learning factor Xup: the upper boundary of x Xdown: the lower boundary of x Vmax: the max velocity rand1, rand2: uniform random number Parameters for BPSO L: population size w: weight c1, c2: learning factor Vmax: the max velocity rand1, rand2: uniform random number
RBF
0.95 Kmix 30 1.0 2.0 3.0 -3.0 1.8 (0, 1) 30 1.0 2.0 1 (0, 1)
Table 2. Comparison of different approaches on Leukemia dataset
Author This paper Furey et al. [17] Li et al. [18] Ben-Dor et al. [19] Nguyen et al. [20] Zhao et al. [21]
Classification Rate (%) 97.1~100 94.1 84.6 91.6~95.8 94.2~96.4 95.8~97.2
Table 3. Comparison of different approaches on Colon dataset
Author This paper Furey et al. [17] Li et al. [18] Ben-Dor et al. [19] Nguyen et al. [20] Zhao et al. [21]
Classification Rate (%) 93.7~99.7 90.3 94.1 72.6~80.6 87.1~93.5 85.5~93.3
Ensemble Classifiers Based on Kernel PCA for Cancer Data Classification
963
optimize the parameters for each SVM. Then BPSO is applied for selecting appropriate base classifiers to construct the classification committee. In our experiment, the parameters that used are shown in Table 1. A comparison of different feature extraction methods and different classification methods for leukemia dataset is shown in Table 2, for colon dataset is shown in Table 3.
7 Conclusions In this paper, a new ensemble of classifiers is proposed for cancer data classification. The leukemia and colon datasets are used for conducting all the experiments. The raw data is first preprocessed for normalization. Gene features are then extracted based on the KPCA, which greatly reduces dimensionality, as well as maintains the informative features. At last the SVM is employed to construct the classifier committee based on BPSO for classification. The experimental results show that the proposed framework is efficient in recognition rate comparing with some other advanced artificial techniques.
Acknowledgments This research was partially supported by the Natural Science Foundation of China under contract number 60573065, the Key Subject Research Foundation of Shandong Province and the Natural Science Foundation of Shandong Province under contract number Y2007G33.
References 1. Sarkar, I., Planet, P., Bael, T., Stanley, S., Siddall, M., DeSalle, R.: Characteristic Attributes in Cancer Microarrays. Computers and Biomedical Research 35(2), 111–122 (2002) 2. Koller, D., Sahami, M.: Towards optimal feature selection. In: Machine Learning, Proceeding of 13th Int. Conf. (1996) 3. Azuaje, F.: A Computational Neural approach to Support the Discovery of Gene Function and Classes of Cancer. IEEE Transactions on Biomedical Engineering 48(3), 332–339 (2001) 4. Li, L., Weinberg, C., Darden, T., Pedersen, L.: Gene Selection for Sample Classification Based on Gene Expression Data: Study of Sensitivity to Choice of Parameters of the GA/KNN Method. Bioinformatics 17(12), 1131–1142 (2001) 5. Camp, N., Slattery, M.: Classification Tree Analysis: A Statistical Tool to Investigate Risk Factor Interactions with an Example for Colon Cancer. Cancer Causes Contr. 13(9), 813–823 (2002) 6. Chen, Y., Peng, L., Abraham, A.: Gene expression profiling using flexible neural trees. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds.) IDEAL 2006. LNCS, vol. 4224, pp. 1121–1128. Springer, Heidelberg (2006) 7. Tan, A., Gilbert, D.: Ensemble Machine Learning on Gene Expression Data for Cancer Classification. Applied Bioinformatics 2(3), 75–83 (2003) 8. Sergios, T., Konstantinos, K.: Pattern Recognition. China Machine Press (2002)
964
J. Zhou et al.
9. Scholkopf, B., Smola, A., Muller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998) 10. Yang, J., Yang, J.Y., Frangi, A.F.: Combined Fisherfaces framework. Image and Vision Computing 21, 1037–1044 (2003) 11. Golub, G.H., Van Loan, C.F.: Matrix Computations. The Johns Hopkins University Press, Baltimore (1996) 12. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1999) 13. Kennedy, J., Eberhard, R.C.: Particle Swarm Optimization. In: Proceeding of IEEE International Conf. on Neural Networks, Piscataway, NJ, USA, pp. 1942–1948 (1995) 14. Zhou, Z.H., Wu, J., Tang, W.: Ensembling Neural Networks: Many Could Be Better Than All. Artificial Intelligence 137(1-2), 239–263 (2002) 15. Kennedy, J., Eberhart, R.C.: A Discrete Binary Version of the Particle Swarm Optimization. In: Proceeding Of the conference on Systems, Man, and Cybernetics SMC 1997, pp. 4104–4109 (1997) 16. Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, N.: Tissue classification with gene expression profiles. Computational Biology 7, 559–584 (2000) 17. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., GaasenBeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Blomfield, C.D., Lander, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286(12), 531–537 (1999) 18. Eisen, M.B., Brown, B.O.: DNA Arrays for Analysis of Gene Expression. Methods in Enzymology 303, 179–205 (1999) 19. Cho, S.B.: Exploring Features and Classifiers to Classify Gene Expression Profiles Of acute Leukemia. Artifical Intellegence 16(7), 1–13 (2002) 20. Harrington, C.A., Rosenow, C., Retief, J.: Monitoring Gene Expression Using DNA Microarrays. Curr. Opin. Microbiol. 3, 285–291 (2000) 21. Zhao, Y., Chen, Y., Zhang, X.: A novel ensemble approach for cancer data classification. In: Liu, D., Fei, S., Hou, Z., Zhang, H., Sun, C. (eds.) ISNN 2007. LNCS, vol. 4492, pp. 1211–1220. Springer, Heidelberg (2007)
A Method for Multiple Sequence Alignment Based on Particle Swarm Optimization Fasheng Xu1 and Yuehui Chen2 1
School of Science, University of Jinan, Jinan, 250022, P.R. China 2 School of Information Science and Engineering, University of Jinan, Jinan, 250022, P.R. China
Abstract. Sequence Alignment is a basic information disposal method in Bioinformatics. However, it is difficult to deal with multiple sequence alignment problem(MSA). In this paper, an improved particle swarm optimization is designed to solve MSA. In the algorithm, each particle represents an alignment and flies to the particle which has the best solution by some rules. Moreover, in order to expand the diversity of the algorithm and enhance the possibility of finding the optimal solution, three operators are designed, that is, gaps deletion, gaps insertion, and local search operator. Simulation results show that for MSA proposed algorithm is superior to Clustal X. Keywords: Multiple Sequence Alignment; Bioinformatics; Particle Swarm Optimization.
1
Introduction
Multiple alignments of protein sequences are important in many applications, including phylogenetic tree estimation, secondary structure prediction and critical residue identification. Sequence alignment is by far the most common task in bioinformatics. Procedures relying on sequence comparison are diverse and range from database searches to secondary structure prediction . Sequences can be compared two by two to scour databases for homologues, or they can be multiply aligned to visualize the effect of evolution across a whole protein family. However, it is difficult to align multiple sequence. A common heuristic is to seek a multiple alignment that maximizes the SP score (the summed alignment score of each sequence pair), which is NP complete[1]. Therefore the design of algorithms for multiple sequence alignment has been a very active research area. Many efforts have been made on the problems concerning the optimization of sequence alignment. Needleman and Wunsch [2] presented an algorithm for sequence comparison based on dynamic programming (DP), by which the optimal alignment between two sequences is obtained. The generalization of this algorithm to multiple sequence alignment is not applicable to a practical alignment that consists of dozens or hundreds of sequences, since it requires huge CPU time proportional to N K , where K is the number of sequences each with D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 965–973, 2009. c Springer-Verlag Berlin Heidelberg 2009
966
F. Xu and Y. Chen
length N. Stochastic methods such as Gibbs sampling can be used to search for a maximum objective score [3], but have not been widely adopted. A more popular strategy is the progressive method [4][5], which first estimates a phylogenetic tree. A profile (a multiple alignment treated as a sequence by regarding each column as a symbol) is then constructed for each node in the binary tree. If the node is a leaf, the profile is the corresponding sequence; otherwise its profile is produced by a pair-wise alignment of the profiles of its child nodes . Current progressive algorithms are typically practical for up to a few hundred sequences on desktop computers, the best-known of which is CLUSTALW [6]. A variant of the progressive approach is used by T-Coffee [7], which builds a library of both local and global alignments of every pair of sequences and uses a library-based score for aligning two profiles. On the BAliBASE benchmark [8][9], T-Coffee achieves the best results, but has a high time and space complexity that limits the number of sequences it can align to typically around one hundred. There are also some non-deterministic approaches using genetic algorithms, such as SAGA [10], which was reported to find optimal alignments even in search spaces of a considerable size (more that 10 sequences). The approach is to use a progressive alignment as the initial state of a stochastic search for a maximum objective score (stochastic refine refinement). Alternatively, pairs of profiles can be extracted from the progressive alignment and re-aligned, keeping the results only when an objective score is improved (horizontal refinement)[11]. In this paper, we developed a method for multiple sequence alignment based on the particle swarm optimization(PSO). The organization of this paper is as follows. In Section 2, the sequence alignment problem is introduced. Then, the outline procedure of the improved particle swarm optimization for multiple sequence alignment problem is designed in Section 3. To verify the feasibility and efficiency of the proposed approach, an empirical example is presented in Section 4. Some concluding remarks are given in Section 6.
2
The Sequence Alignment Problem
In bioinformatics, the most important data sets are biological sequences, including DNA sequences and protein sequences. A DNA sequence can be seen as symbols of the ACGT four strings, a protein sequence can be seen as 20 kinds of proteins string symbols. In the process of evolution there can insert, delete or mutate elements of the sequences. Thus, in order to highlight the similarities of the sequences it is often convenient to insert gaps in them, leading to a higher number of symbol matches. The similarity of aligned sequences is measured using a scoring function, which is based on a matrix that assigns a score to every pair of symbols (based on a mutation probability). For proteins, the most commonly used matrices are PAM(Percent Accepted Mutation) and BLOSUM (Blocks Substitution Matrix)[12]. Additionally, a penalization to the insertion of gaps is required in order to avoid the insertion of an excessive number of them. The process of finding an optimum (or at least good) match between the sequences is called sequence alignment.
A Method for Multiple Sequence Alignment Based on PSO
3 3.1
967
Improved Particle Swarm Optimization for MSA Particle Swarm Optimization
Particle Swarm Optimization (PSO) was first introduced by Kennedy and Eberhart [13][14] in 1995 and partly inspired by the behavior of large animal swarms such as schooling fish or flocking birds. PSO conducts search using a population of a random solutions, corresponding to individual. In addition, each potential solution called particles is also assigned a randomized velocity. Each particle in PSO flies in the hyperspace with a velocity which is dynamically adjusted according to the flying experiences of its own and its colleagues. Each particle adjusts its position according to their own and their neighboringparticles experience, moving toward two points: the best position so far by itself called Pbest and by its neighbor called Gbest at every iteration. The particle swarm optimization concept consists of, at each time step, changing the velocity each particle toward its Pbest and Gbest. Suppose that the search space is D dimensional, then the ith particle of the swarm can be represented by a D dimensional vector Xi = (xi1 , xi2 , . . . , xiD ) . The particle velocity can be represented by another D dimensional vector Vi = (vi1 , vi2 , . . . , viD ) . The best previously visited position of the ith particle is denoted as Pi = (pi1 , pi2 , . . . , piD ) . Defining g as the index of the best particle in the swarm, and let the superscripts denote the iteration number, then the position of a particle and its velocity are updated by the following equations : k+1 k vid = wvid + c1 r1k (pkid − xkid ) + c2 r2k (pkgd − xkid ) k+1 xk+1 = xkid + vid id
(1) (2)
where d = 1, 2, . . . , D, i = 1, 2, . . . , N, and N is the size of swarm; w is called inertia weight: c1 , c2 are two positive constants, called cognitive and social parameter respectively; r1 , r2 are random numbers, uniformly distributed in [0, 1]; and k = 1, 2, . . . determines the iteration number. 3.2
Improved Particle Swarm Optimization
The sequence alignment problem can be considered as an optimization problem in which the objective is to maximize a scoring function. Thus, the PSO algorithm was adapted to be used with biological sequences. In the adapted PSO algorithm, a particle represents a sequence alignment. Thereby, as the main mechanism of the PSO algorithm is the movements of the particles towards the leader, suitable operators to implement this mechanism are proposed. The general algorithm is shown as follows[15]: PSOMSA ( ) 1. Generate a set of initial particles 2. Determine the leader particle gbest 3. Repeat until the termination criterion is met
968
F. Xu and Y. Chen
a. Measure distance between gbest and every particle b. Move every particle towards gbest c. Determine the leader partic The termination criteria can be a maximum number of iterations, or a number of iterations after which the best score do not improve. The implicit idea of the PSO algorithm is that a set of particles that are randomly sparse over a search space will progressively move to regions which will provide better solutions to the problem, until the swarm finds a solution that it cannot improve anymore. Next, some implementation details will be discussed, such as the particle representation, scoring function and the implementation of the particle movement mechanism. Data Representation. In general, a swarm is made up by a set of particles, and a particle of the swarm is designated as the leader (gbest).Additionally, each particle preserves a memory with its best historical location (pbest). As mentioned above, in the adapted PSO algorithm, a particle will correspond to a sequence alignment. An alignment is then represented as a set of vectors, where each vector specifies the positions of the gaps for one of the sequences to be aligned. Thus, a coordinate of the particle corresponds to a sequence to be aligned, and is represented with a vector of size s, where s is the maximum allowed number of gaps, which may be different for each sequence. Therefore, a set of n sequences to be aligned correspond to an n-dimensional search space. Initialization. The size of the swarm (i.e., the number of particles) is determined by the user. Additionally, the length of the alignment has a minimum value given by the length of the largest sequence, and a maximum length given, for instance, as twice the length of the largest sequence. The initial set of particles is generated by adding gaps into the sequences at random position, thus all the sequences have the same length L (the typical value of L is 1.4 times of the longest sequences). Scoring Function. The global alignment score is based on the score of the alignment of each pair of sequences. Thus each sequence should be aligned with the every other sequence. In general, the score assigned to each particle (alignment) is the sum of the scores(SP) of the alignment of each pair of sequences. The score of each pair of sequences is the sum of the score assigned to the match of each pair of symbols, which is given by the substitution matrix. This matrix includes all the possible symbols, including the gap and the related penalization. The score of a multiple alignment is then: SP − Score(A) =
k−1
k
s(Ai , Aj )
(3)
i=1 j=i+1
where the s(Ai , Aj )is the alignment score between two aligned sequencesAi and Aj .
A Method for Multiple Sequence Alignment Based on PSO
969
Speed Update. In the PSO algorithm each particle moves towards the leader at a speed proportional to the distance between the particle and the leader. In this paper, the speed is defined: k+1 vid = c1 r1k (pkid − xkid ) + c2 r2k (pkgd − xkid )
(4)
k+1 where c1 and c2 are the weights. If the value of vid is not an integer, rounded it.
Position Update. After the update of speed, each particle updates its coordinate(x) according to: k+1 xk+1 = xkid + vid (5) id When the position updated, the sequence may become illegal(Figure 1 shows an example). 1 F
2 Y
3 ü
4 C
5 N
6 ü
7 ü
8 ü
9 ü
10 H
11 M
the original sequence F
Y
ü
C
N
ü
ü
ü
ü
H
M
F
Y
ü
ü
ü
N
ü
C
ü
H
M
the sequence after updated Fig. 1. An example of illegal update. The location of residue C and N is changed after update and the sequence is illegal.
An adjustment must be done in order to eliminate such illegal sequence. The adjustment is using if
xid ≤ xi(d−1) ,
then
xid = xi(d−1) + 1
(6)
An example is depicted in Figure 2. Operators. According to the traditional PSO, three types of operators are represented in PSOMSA: the insertion, deletion and local search. 1)Gaps Deletion After a number of iterations, some columns may only have gaps. They not only useless, but also extended the length of alignment. Therefore, we remove them from the alignment.It is shown in Figure 3. 2) Gaps Insertion
970
F. Xu and Y. Chen
1 F
2 Y
3 ü
4 C
5 N
6 ü
7 ü
8 ü
9 ü
10 H
11 M
the original sequence F
Y
ü
C
N
ü
ü
ü
ü
H
M
F
Y
ü
ü
ü
N
ü
C
ü
H
M
the sequence after updated F
Y
ü
C
N
ü
ü
ü
ü
H
M
F
Y
ü
ü
ü
ü
ü
C
N
H
M
the sequence after adjusted Fig. 2. An example of the adjustment of illegal sequence. The location of residue C is updated according to 5 and the residue of N is adjusted according to 6.
Fig. 3. An alignment with gap columns
The gaps insertion operators are added in order to avoid the PSO converge into a local optimal solution. We insert some gap columns into the current best alignment by a probability m. For example, we insert L columns into the alignment. The location to insert is defined in two ways as follows: Insert L gap columns into a random location in the alignment or Insert one gap column into the alignment at L random locations. One of the operators is randomly selected by the program when it is running. Because the gap columns do not change the score of alignment, the gaps insertion operators do not help to improve the result of alignment. But the operators will affect the other particles, all particles’ gaps will to be increased; thereby
A Method for Multiple Sequence Alignment Based on PSO
971
increasing the diversity of the algorithm and avoiding the algorithm converge into a local optimal solution. 3) Local Search Operator In this paper, we use a local search to enhance the search performance. Assuming the count of gaps in the sequence which has the least gaps is L, we remove some gaps ranged from 0 to L from all the sequences. After that, we compute the score of the alignment. If the score is higher than before, the alignment is reserved; otherwise, we resume the alignment.
4
Simulation Experiments and Discussion
The proposed algorithm, called PSOMSA, was implemented in order to test its performance. The number of particles is determined taking into account the length of the sequences and the number of sequences to align. In order to test the algorithm, eight protein families of different length and different percentages of sequence identity were selected from the alignments database BALiBASE, available in [16]. One protein family was selected from each length category (short, medium and large), and one from each range of identity percentage (less that 25%, between 25% and 40% and greater that 35%). Table 1 presents the protein families used in the experiments. These protein families were previously Table 1. Protein families used in the experiments Reference Name 1r69 repressor 1tgxA cardiotoxin 1fmb hiv-1 protease 1mrj alpha tricosanthin 1ezm elastase 1cpt cytochrome p450 1ac5 b-galactoxidase 1ad3 aldehyde dehydrogenase
Length Identity% Short < 25 Short 20-40 Short > 35 Medium 20-40 Medium > 35 Large < 25 Large 20-40 Large > 35
aligned using the well known algorithm Clustal X 2.0. The alignment obtained using Clustal was evaluated using a PAM250 matrix. The alignment was obtained using a penalty of 10 for each open gap and a penalty of 0.3 for each expend gap. The algorithm stopped after 10 gaps insertion operators without improving the quality of the solution. Table 2 shows the results of the experiments. It can be found from the results that the PSOMSA algorithms has superior performance when compared to Clustal X, especially when the data has smaller sequences and shorter length. However, when the data has a longer length, the results is similar. There are still many enhancements that must be done to PSOMSA in order to achieve satisfying results. Also, new fitness functions based on different scoring methods are possible straightforward developments.
972
F. Xu and Y. Chen Table 2. Experimental results Reference Number of Sequences Length Clustal X Score PSOMSA Score 1r69 4 63-78 2.7 287.9 1tgxA 4 57-64 447.9 890.7 1fmb 4 98-104 1513.4 1578 1mrj 5 247-266 2361.1 2367.9 1ezm 4 297-380 8223.8 8628.8 1cpt 5 378-434 1267.2 1795 1ac5 4 421-485 2814.5 3105.6 1ad3 4 424-447 5710.6 5726.8
5
Conclusions
In this work an algorithm based on the PSO algorithm, was proposed to address the multiple sequence alignment problem with SP score. The proposed approach was tested using some protein families and compared with the alignments generated by the Clustal X 2.0 algorithm. From simulation results, it is shown that the proposed PSOMSA algorithms has superior performance when compared to Clustal X . In future work, additional experimentation should be performed, including experimentation with sequences of nucleic acids, the use of other algorithms to find initial sequence alignments , the use of other scoring schemes based on PAM or BLOSUM matrices and improving the speed of PSOMSA. Acknowledgments This research was supported by the NSFC (60573065), the Natural Science Foundation of Shandong Province (Y2007G33), and the Key Subject Research Foundation of Shandong Province.
References 1. Wang, L., Jiang, T.: On the Complexity of Multiple Sequence Alignment. J. Comput. Biol. 1(4), 337–348 (1994) 2. Needleman, S.B., Wunsch, C.D.: A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins. J. Mol. Biol. 48, 443–453 (1970) 3. Lawrence, C.E., Altschul, S., Boguski, M., Liu, J., Neuwald, A., Wootton, J.: Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment. Science 262, 208–214 (1993) 4. Hogeweg, P., Hesper, B.: The Alignment of Sets of Sequences and the Construction of Phyletic Trees: an Integrated Method. J. Mol. E 20, 175–186 (1984) 5. Feng, D.F., Doolittle, R.F.: Progressive Sequence Alignment as a Prerequisite to Correct Phylogenetic Trees. J. Mol. E 25(4), 351–360 (1987)
A Method for Multiple Sequence Alignment Based on PSO
973
6. Thompson, J.D., Higgins, D.G., Gibson, T.J.: Clustal W: Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Positionspecific Gap Penalties and Weight Matrix Choice. Nucleic Acids Res. 22(22), 4673–4680 (1994) 7. Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: A Novel Method for Fast and Accurate Multiple Sequence Alignment. J. Mol. Biol. 302(1), 205–217 (2000) 8. Bahr, A., Thompson, J.D., Thierry, J.C., Poch, O.: Balibase (Benchmark Alignment Database): Enhancements for Repeats, Transmembrane Sequences and Circular Permutations. Nucl. Acids Res. 29(1), 323–326 (2001) 9. Thompson, J.D., Plewniak, F., Poch, O.: Balibase: A Benchmark Alignment Database for the Evaluation of Multiple Alignment Programs. Bioinformatics 15(1), 87–88 (1999) 10. Notredame, C., Higgins, D.G.: Saga: Sequence Alignment by Genetic Algorithm. Nucleic Acids Res. 24(8), 1515–1524 (1996) 11. Hirosawa, M., Totoki, Y., Hoshida, M., Ishikawa, M.: Comprehensive Study on Iterative Algorithms of Multiple Sequence Alignment. Comput. Appl. Biosci. 11(1), 13–18 (1995) 12. Henikoff, S., Henikoff, J.G.: Amino Acid Substitution Matrices from Protein Blocks. Proc. Natl. Acad. Sci. USA 89(22), 10915–10919 (1992) 13. Kennedy, J., Eberhart, R.C.: Particle Swarm Optimization. In: Proceedings of IEEE International Conference on Neural Networks, pp. 1942–1948. IEEE Press, Piscataway (1995) 14. Eberhart, R., Kennedy, J.: A New Optimizer Using Particle Swarm Theory. In: Proc. 6th Int Sympossum on Micro Machine and Human Science, pp. 39–43. IEEE Press, Piscataway (1995) 15. Rodriguez, P.F., Nino, L.F., Alonso, O.M.: Multiple Sequence Alignment using Swarm Intelligence. International Journal of Computational Intelligence Research 3(2), 123–130 (2007) 16. National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov 17. Clustal W Download Page, http://www.clustal.org/download/current/
Inference of Differential Equation Models by Multi Expression Programming for Gene Regulatory Networks Bin Yang, Yuehui Chen, and Qingfang Meng Computational Intelligence Lab. School of Information Science and Engineering University of Jinan, Jinan 250022, P.R. China
[email protected] Abstract. This paper presents an evolutionary method for identifying the gene regulatory network from the observed time series data of gene expression using a system of ordinary differential equations (ODEs) as a model of network. The structure of ODE is inferred by the Multi Expression Programming (MEP) and the ODE’s parameters are optimized by using particle swarm optimization (PSO). The proposed method can acquire the best structure of the ODE only by a small population, and also by partitioning the search space of system of ODEs can be reduced significantly. The effectiveness and accuracy of the proposed method are demonstrated by using synthesis data from the artificial genetic networks. Keywords: Evolutionary method, multi expression programming, ordinary differential equations, particle swarm optimization, artificial genetic networks.
1
Introduction
Gene expression programs which produce the living cells involving regulated transcription of thousands of genes depend on recognition of specific promoter sequences by transcriptional regulatory proteins. The problem is how a collection of regulatory proteins associates with genes can be described as a transcriptional regulatory network. The most important step is to identify the interactions among genes by the modeling of gene regulatory networks. Many models have been proposed to describe the network including the Boolean network [2][18], Dynamic Bayesian network [3], the system of differential equations [4] and so on. A recent review for inferring genetic regulatory networks based on data integration and dynamical models can be seen in ref. [19]. The system of differential equations is powerful and flexible model to describe complex relations among components [6], so many methods are proposed for inferring a system of differential equations for the gene regulatory network during the last few years. But it is hard to determine the suitable form of equations which describe the network. In the previous studies, the form of the differential equation was being fixed. The D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 974–983, 2009. c Springer-Verlag Berlin Heidelberg 2009
Inference of Differential Equation Models by MEP
975
only one goal was to optimize parameters and coefficients. For example, Tominaga D. used the Genetic Algorithms (GA) to optimize the parameters of the fixed form of system of differential equations [5]. In recent years some researchers studied the learning of gene regulatory network by inferring the structures and parameters of a system of ODEs. Erina Sakamoto proposed an ODEs identification method by using the least mean square (LMS) along with the ordinary genetic programming (GP) to identifying the gene regulatory network [6]. Dong Yeon Cho proposed a new representation named S-tree based GP to identify the structure of a gene regulatory network and to estimate the corresponding parameter values at the same time [7]. Li jun Qian proposed GP was applied to identify the structure of model and kalman filtering was used to estimate the parameters in each iteration. Both standard and robust kalman filtering were considered [1]. But their inference algorithms can only be applied to the small-scale networks. In this paper, we propose a new method, in which the Muti Expression Programming(MEP) and particle swarm optimization(PSO) are employed to overcome structure form and parameters identification problems in discovering of a system of differential equations. We infer the structure of the right-hand sides of ODEs by MEP and optimize the parameters of ODEs by PSO. Compared with GP which was applied in the previous studies, a MEP chromosome encodes several genes and each gene can present a ODE, so we can acquire the best structure of the ODE only by a small population. To reduce the complexity of the genetic network inference problem, the partitioning [8] is used in the process of identification of structure of system. Each ODE of the ODEs can be inferred separately and the research space reduces rapidly. Thus for the the large-scale networks, our method performs better.
2 2.1
Method Structure Optimization of Models Using MEP
Encoding MEP [9] is a new technique in evolutionary algorithms that is first introduced in 2002 by Oltean. A traditional GP [10] encodes a single expression. By contrast, a MEP chromosome contains several genes. Each gene encodes a tree which contains the terminal or function symbols selected from a terminal set T or a function set F. The two sets for a given problem are pre-defined. A gene that encodes a function includes some pointers towards the function arguments. The best of the encoded solution is chosen to represent the chromosome [11]. We use MEP to identify the form of the system of differential equations. For this purpose, we encode right-hand side of each ODE into a MEP chromosome. For example a ODEs model with the form of ⎧ . ⎪ ⎨ X. 1 = aX1 + bX2 X2 = cX1 ⎪ ⎩ . X3 = dX2 + e
(1)
976
B. Yang, Y. Chen, and Q. Meng
1: x1
1: x1
1: 1
2: x2
2: x2
2: x2
3 : + 1,2
3 : + 1,1 dx2/dt
dx1/dt
3 : + 1,2 dx3/dt
4 : * 1, 3
4 : - 1, 3
4 : - 1, 3
5 : 1, 2
5 : x3
5 : x3
6 : - 2, 5
6 : + 1, 1
6 : + 3, 5
Fig. 1. Example of a ODEs
can be represented as three MEP chromosomes{E3, E6 , E3 } illustrated in Fig. 1, where the coefficients a, b, c, d, e are derived by PSO (as described later in this paper). We infer the system of ODEs with partitioning. Partitioning, in which equations describing each variable of the system can be inferred separately, significantly reducing the research space. When using partitioning, a candidate equation for a signal variable is integrated by substituting references to other variables with data from the observed time series. This allows us to infer the structure of systems comprising more variables and higher degree of coupling than were inferred by other methods [8]. Genetic Operate. The genetic operators used within MEP algorithm are crossover and mutation. (1) Crossover. In this paper, we choose the one-point crossover. Firstly, two parents are selected according to the predefined crossover probability Pc . One crossover point is randomly chosen and the parents exchange the sequences at this point. (2) Mutation. One parent is selected according to the predefined mutation probability Pm . One mutation point is randomly chosen. If the mutation point encodes a function symbol, it may be changed into a terminal symbol or another function with arguments and parameters. And we can mutate the function arguments and parameters into another arguments and parameters produced randomly. 2.2
Parameter Optimization of Models using PSO
At the beginning of this process, we check all the constants contained in each equation , namely count their number ni and report their places. Distribution of parameters in each chromosome is illustrated in Fig. 2. According to ni , the particles are randomly generated initially. Each particle xi represents a potential solution. A swarm of particles moves through space,
Inference of Differential Equation Models by MEP
977
1 : x1 2 : x3 3 : + 1, 1
P1
4 : * 1, 3
P3
P2
5:1 6 : - 2, 5
P4
P5
Fig. 2. Distribution of parameters in each chromosome
with the moving velocity of each particle represented by a velocity vector vi . At each step, each particle is evaluated and keep track of its own best position, which is associated with the best fitness it has achieved so far in a vector P besti . And the best position among all the particles is kept as Gbest [12]. A new velocity for particle i is updated by vi (t + 1) = vi (t) + c1 r1 (P besti − xi (t)) + c2 r2 (Gbest(t) − xi (t))
(2)
where c1 and c2 are positive constant and r1 and r2 are uniformly distributed random number in [0,1]. Based on the updated velocities, each particle changes its position according to the following equation: xi (t + 1) = xi (t) + vi (t + 1) 2.3
(3)
Fitness Definition
For inferring a system of ODEs, the fitness of each variable is defined as the sum of squared error and the penalty for the degree of the equations: f itness(i) =
T −1
(xi (t0 + kΔt) − xi (t0 + kΔt)) . 2
(4)
k=0
Where t0 is the starting time, t is the stepsize, T is the number of the data point, xi (t0 +kt) is the actual outputs of i-th sample, and xi (t0 +kt) is ODEs outputs. All outputs are calculated by using the approximate forth-order Runge-Kutta method. And a is the penalty for the degree of the equations. 2.4
Summary of Algorithm
The proposed method for the optimal design of the system of ODEs can be described as follows. (1) Create a initial population randomly (structures and their corresponding parameters);
978
B. Yang, Y. Chen, and Q. Meng
(2) Structure optimization is achieved by MEP as described in subsection 2.1; (3) At some interval of generations, select the better structures to optimize parameters. Parameter optimization is achieved by PSO as described in subsection 2.2. In this process, the structure is fixed. (4) If satisfactory solution is found, then stop; otherwise go to step (2).
3
Experimental Results
We have prepared two tasks to test the effectiveness of our method. Experimental parameters are summarized in Table 1. Function and terminal sets F and T are described as follows, F = {+, −, ∗, xa } (5) T = {X1 , ..., Xn , 1}. 3.1
Example 1: The Small Artificial Gene Regulatory Network
Fig.3 shows an example of gene regulatory network. This type of network can be modeled as a so-called S-system model [14]. This model is based on approximating kinetic laws with multivariate power-law functions. A model consists of n non-linear ODEs and the generic form of equation i is given as follows:
Xi (t) = αi
n
g Xj ij (t)
j=1
− βi
n
h
Xj ij (t)
(6)
j=1
where X is a vector of dependent variable, α and β are vectors of non-negative rate constants and g and h are matric of kinetic orders. The parameter of the genetic network are given in Table 2. And the initial conditions is {0.7, 0.12, 0.14, 0.16, 0.18} for X1 , X2 , X3 , X4 , X5 [13]. Experimental parameter for this task are shown in Table 1. The search region of the parameters was [0.0, 15.0]. Five runs are carried out. In each run, the proposed method produces one candidate solution . Select 10 better structures to optimize parameters by PSO at every 30 generations and end when the generation is achieved Table 1. Parameters for experiments
Population size Generation Crossover rate Mutation rate Time series Stepsize Data point gene size
Exp1 Exp2 1000 1000 500 2000 0.7 0.7 0.3 0.3 1 1 0.01 0.01 15 20 5 15
Inference of Differential Equation Models by MEP
979
X1
X
X5
X2
X3
Fig. 3. The targeted gene regulator network
Table 2. Parameters of the genetic network system i 1 2 3 4 5
αi gi1 gi2 gi3 gi4 gi5 5.0 1.0 -1.0 10.0 2.0 10.0 -1.0 8.0 2.0 -1.0 10.0 2.0
βi hi1 hi2 hi3 hi4 hi5 10.0 2.0 10.0 2.0 10.0 -1.0 2.0 10.0 2.0 10.0 2.0
or the best model is gained. To handle the powers of the component variable, we used the following terminal set: T = {X1 , X1−1 , X2 , X2−1 , X3 , X3−1 , X4 , X4−1 , X5 , X5−1 }
(7)
The experiments were performed in the Windows XP system with Intel Pentinum Dual 2.00GHz processor and 1GB memory. We created the following ODEs by our method and throughout simulation, we further confirm that the identified system is quite close to the original system (Fig. 4). ⎧ . ⎪ X1 = 4.999994X3X5−1 − 9.999994X12 ⎪ ⎪ . ⎪ ⎪ 2 2 ⎪ ⎨ X. 2 = 10.000023X1 − 10.000014X2 −1 (8) X3 = 10.000016X2 − 10.000015X32X2−1 . ⎪ ⎪ 2 −1 2 ⎪ X4 = 8.000003X3 X5 − 10.000001X4 ⎪ ⎪ ⎪ ⎩ . X5 = 9.999994X42 − 10.000019X52 In [7], Dong Yeon Cho proposed a new representation named S-tree based genetic programming(GP) to identify the structure of a gene regulatory network and the size of population was assumed as 10 000 and the proposed scheme was terminated after 5 × 105 iterations. Compared with it our size of population and
980
B. Yang, Y. Chen, and Q. Meng
1.4
1.2
Concentrations
1
0.8
X1 Pred X1 X2 Pred X2 X3 Pred X3 X4 Pred X4 X5 Pred X5
0.6
0.4
0.2
0
0
2
4
6
8 10 Time Points
12
14
16
Fig. 4. Time series of the acquired model
Table 3. Obtained Parameters of the ODEs by our proposed method and S-tree(GP), αi , βi : parameters by our proposed method, αi , βi : parameters by S-tree(GP), αi , βi : true parameters i 1 2 3 4 5
αi /αi /αi 8.5854/4.9999/5.0 9.7709/10.0000/10.0 13.7629/10.0000/10.0 8.3954/8.0000/5.0 9.4643/9.9999/5.0
βi /βi /βi 13.7959/9.9999/10.0 10.0117/10.0000/10.0 13.9742/10.0000/10.0 13.7959/10.0000/10.0 13.7959/10.0000/10.0
number of iteration are far smaller (Table 1). And The execution time for each experiment was ∼ 400s. We obtain the true structure during the every experiment. Table 3 shows the best parameters obtained among all the experiments. Obviously the parameters are very closed to the targeted model in our method. 3.2
Experiment 2: The Large-Scale Artificial Gene Regulatory Network with Noisy Environment
This test system which is the same with Experiment 1 is a reduced version of test system ss30 genes that was introduced by [15]. Fig.5 shows the example of gene regulatory network. The system represents a genetic network with 15 variables. Table 4 shows the parameters of S-system formalism. As we have to estimate a relatively large number of parameters and structure of the system of ODEs with an a small data set, there can be a lot of different possible network structures all of which bring about only small differences in estimating the given data set. These false candidates can be decreased by reducing the structural search space based on the available constraint. The constant is that all the diagonal elements in the matrix h are not zero (hii for i = 1, ..., n) [7]. Namely the i-th equation must contain Xi . This is because as the concentration Xi is higher, Xi can participate in the reaction more actively (i.e. it disappears fast) [7].
Inference of Differential Equation Models by MEP 1
981
Fig. 5. The large-scale Artificial Gene Regulatory Network Table 4. S-system parameters of the large-scale target model of the large Artificial Gene Regulatory Network αi 1.0 βi 1.0 gi,j g1,14 = −0.1, g3,12 = −0.2, g5,1 = 1.0, g6,1 = 1.0, g7,2 = 0.5, g7,3 = 0.4, g8,4 = 0.2, g9,5 = 1.0, g9,6 = −0.1, g10,7 = 0.3, g11,4 = 0.4, g11,7 = −0.2, g12,13 = 0.5, g13,8 = 0.6, g14,9 = 1.0, g14,15 = −0.2, g15,10 = 0.2, othergi,j = 0.0, hi,j 1.0, if i = j, 0.0otherwise
The set of time-series data began from randomly generated initial values and was obtained by solving the set of differential equations on the targeted model. In the previous studies, for the large-scale Artificial Gene Regulatory Network, the form of the differential equation was being fixed, and the only one goal was to optimize parameters and coefficients [16]. In this paper we apply the Multi Expression Programming to evolve the righthand side of the equation. Experimental parameter for this task are shown in Table 1. Ten runs were carried out. The search region of the parameters was [-1.0, 1.0]. And The execution time for each experiment was ∼ 1.8 h. During the experiments, we could obtain the best structure and parameters which were the same with the targeted model(Fig.5 and Table 4) except the 11-th gene. We only obtained the 11-th differential equa. 0.018851 tion: X11 = X7−0.199999 X11 − X11 . So the only interaction X4 → X11 could not be identified. To test the performance of our method in a real-world setting, we added 1, 2, 5, 10 and 15% Gaussian noise to the time-series data in order to simulate the measurement noise that often corrupts the observed data obtained from actual measurements of gene expression patterns. Except that the size of population is fixed at 10000, other settings are same as in the previous experiment. In the same execution time for each run, we can obtain the same structure by the time-series data which is added 1, 2, 5 and 10% Gaussian noise as the data without noise. When the noise ratio is similar up to the 15%, the X3 → X7 and X4 → X11 are not identified. Hence, we can conclude that the proposed algorithm is robust within 10% random noise. And we further confirm that our method will perform well in the real-world network.
982
4
B. Yang, Y. Chen, and Q. Meng
Conclusion
In this paper, a new approach for evolving ODEs is proposed from the observed time series by using MEP along with the PSO algorithm. By applying the proposed method to the identification of the artificial genetic networks based on generated time-course data, we have succeeded in creating the systems of ODEs which are very close to the targeted systems. Simulation results shown that the method performs well for generating the correct genetic networks even adding Gaussian noise to the time-series data. The key problem for inferring a genetic network is how to reduce computational complexity for a real genetic network. This problem can be easily solved by the proposed method, and then enhancing the applicability of the proposed algorithm for large-scale genetic regulator networks. The method has following two advantages: (1) a MEP chromosome encodes several expressions, so we can acquire the best structure of the ODE only by a small population; (2) with partitioning, we can acquire the best system very fast, and each node of the genetic regulator network can be inferred separately and the research space is reduced significantly. Thus the proposed method is suitable for inferring the large-scale genetic regulator networks. Finally, like for any system identification method, the possibility to generate several time-series data sets displaying a variety of dynamical behaviors of the system, will be critical for the application of the method to larger systems. In the future, we will apply our approach to solve some of the real biochemical network problems, especially to the real large-scale biochemical networks.
Acknowledgment This research was supported by the NSFC (60573065), the the Natural Science Foundation of Shandong Province (Y2007G33), and the Key Subject Research Foundation of Shandong Province.
References 1. Qian, L.: Inference of Noisy Nonlinear Differential Equation Models for Gene Regulatory Networks using Genetic Programming and Kalman Filtering. IEEE Transactions on Signal Processing 56(7), 3327–3339 (2008) 2. Akutsu, T., Miyano, S., Kuhara, S.: Identification of Genetic Networks from a Small Number of Gene Expression Patterns under the Boolean Network Model. In: Proc. of Pacific Symposium on Biocomputing, pp. 17–28 (1999) 3. Murphy, K., Mian, S.: Modeling Gene Expression Data using Dynamic Bayesian Network. Computer Science Division, University of California Berkeley (1999) 4. Chen, T., He, H.L., Church, G.M.: Modeling Gene Expression with Differential Equations. In: Proc. of Pacific Symposium on Biocomputing, pp. 29–40 (1999) 5. Tominaga, D., Koga, N., Okamoto, M.: Efficient Numerical Optimization Algorithm Based on Genetic Algorithm for Inverse Problem. In: Proc. of Genetic and Evolutionary Computation Conference, pp. 251–258 (2000)
Inference of Differential Equation Models by MEP
983
6. Sakamoto., E., Iba, H.: Inferring a System of Differential Equations for a Gene Regulatory Network by using Genetic Programming. In: Proc. Congress on Evolutionary Computation, pp. 720–726 (2001) 7. Cho, D.Y., Cho, K.H., Zhang, B.T.: Identification of Biochemical Networks by S-tree Based Genetic Programming. Bioinformatics 22, 1631–1640 (2006) 8. Bongard, J., Lipson, H.: Automated Reverse Engineering of Nonlinear Dynamical Systems. Proceedings of the National Academy of Science 104, 9943–9948 (2007) 9. Gro¸san, C., Abraham, A., Han, S.-Y.: MEPIDS: Multi-expression programming for ´ intrusion detection system. In: Mira, J., Alvarez, J.R. (eds.) IWINAC 2005. LNCS, vol. 3562, pp. 163–172. Springer, Heidelberg (2005) 10. Andrew, H.W., et al.: System Identification using Genetic Programming. In: Proc. of 2nd Int. Conference on Adaptive Computing in Engineering Design and Control (1996) 11. Oltean, M., Grosan, C.: Evolving Digital Circuits using Multi Expression Programming. In: Zebulum, R., et al. (eds.) NASA/DoD Conference on Evolvable Hardware, Washington, pp. 24–26 (2004) 12. Chen, Y.H., Yang, B., Abraham, A.: Ajith Abraham. Flexible Neural Trees Ensemble for Stock Index Modeling. Neurocomputing. 70, 697–703 (2007) 13. Gennemark, P., Wedelin, D.: Efficient Algorithms for Ordinary Differential Equation Model Identification of Biological Systems. IET Syst Biol 1, 120–129 (2007) 14. Savageau, M.A.: Biochemical Systems Analysis: a Study of Function and Design in Molecular Biology. Addison-Wesley Pub. Co., Advanced Book Program, Reading (1976) 15. Maki, Y., Tominaga, D., Okamoto, M., Watanabe, S., Eguchi, Y.: Development of a System for the Inference of Large Scale Genetic Networks. In: Pac. Symp. Biocomput.., pp. 446–458 (2001) 16. Kimura, S., Ide, K., Kashihara, A.: Inference of S-system Models of Genetic Networks using a Cooperative Coevolutionary Algorithm. Bioinformatics 21, 1154–1163 (2005) 17. Kikuchi, S., et al.: Dynamic Modeling of Genetic Networks using Genetic Algorithm and S-system. Bioinformatics 19, 643–650 (2003) 18. Bornholdt, S.: Boolean Network Models of Cellular Regulation: Prospects and Limitations. J. R. Soc. Interf. 5, 85–94 (2008) 19. Hecker, M., Lambeck, S., Toepfer, S., van Someren, E., Guthke, R.: Gene Regulatory Network Inference: Data Integration in Dynamic Models A Review. Biosystems 96, 86–103 (2009)
Function Sequence Genetic Programming Shixian Wang, Yuehui Chen, and Peng Wu Computational Intelligence Lab. School of Information Science and Engineering University of Jinan Jiwei road 106, Jinan 250022, Shandong, P.R. China {ise wangsx,yhchen,ise wup}@ujn.edu.cn
Abstract. Genetic Programming(GP) can obtain a program structure to solve complex problem. This paper presents a new form of Genetic Programming, Function Sequence Genetic Programming (FSGP). We adopt function set like Genetic Programming, and define data set corresponding to its terminal set. Besides of input data and constants, data set include medium variables which are used not only as arguments of functions, but also as temporary variables to store function return value. The program individual is given as a function sequence instead of tree and graph. All functions run orderly. The result of executed program is the return value of the last function in the function sequences. This presentation is closer to real handwriting program. Moreover it has an advantage that the genetic operations are easy implemented since the function sequence is linear. We apply FSGP to factorial problem and stock index prediction. The initial simulation results indicate that the FSGP is more powerful than the conventional genetic programming both in implementation time and solution accuracy. Keywords: Genetic Programming, Function Sequence Genetic Programming, factorial problem, stock index prediction.
1
Introduction
Genetic Programming (GP) [4][7] can evolve a program structure to solve complex problems. It uses parse tree to present program. The tree depicts an expression. The internal nodes of the tree are functions, and the external leaves of the tree are terminals that can be input data or constants, as arguments of functions. Evolving program tree was popularized by the work of Koza [4][7]. To mimic true program structure, many variants of Genetic programming have been presented so far. Each of them has different presentation of program structure. Linear Genetic Programming (LGP) [9] uses directly binary machine code string to present program. This presentation is a real program which can be directly executed during fitness calculation. But it has a poor portability because machine code depends on specific machine. Markus introduced an interpreting variant of linear genetic programming [1]. In his LGP approach, an individual program is represented as a variable length string composed of simple C instructions. Each C instruction is encoded in 4 bytes holding the operation identifier, D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 984–992, 2009. c Springer-Verlag Berlin Heidelberg 2009
Function Sequence Genetic Programming
985
indexes of the participating variables and constant value. Adopting this presentation, programs of an imperative language(like C) were evolved, instead of the tree-based GP expression of a functional programming language (like LISP). Line-Tree Genetic Programming [2] is a combination of Markus‘s approach [1] with Genetic Programming based on tree [4][7]. In this approach, the program tree has two kinds of nodes, the node of linear C instruction string and branch node. According to conditions, branch node selects a linear string node to execute. Whole program runs from the root of tree to the leaf. Line-Graph Genetic Programming [3] is a natural expansion since several branch nodes may select the same following nodes. Graph is more general than tree. In order to evolve more complex program to deal with difficult or specific problems, other program evolutions based on graph was also presented. Parallel Algorithm Discovery and Orchestration(PADO) [8] is one of the graph based GP. PADO with action and branch-decision nodes uses stack memory and index memory. The execution of PADO is carried out from the start node to the end node in the network. Poli proposed an approach by using graph with functions and terminals nodes located over a grid in literature [10]. In Graph structured Programming Evolution(GRAPE) [11], program is depicted as an arbitrary directed graph of nodes and data set. The genotype of GRAPE adopts the form of a linear string of integers. This paper proposes a novel method, Function Sequence Genetic Programming (FSGP), to evolve program. The details of FSGP are described in section 2. In section 3, we apply this approach to two problems, factorial problem and stock index prediction. In section 4, conclusions are finally drawn.
2 2.1
FSGP Representation
FSGP was inspired by the real procedure of programming. We make an assumption that operators of program language had been implemented by functions. When we write a program, we may define some variables and may use several constants. One part of variables denoted as data variable are used to store input data and the other part of variables denoted as medium variable here, often changed, are used to store the value of functions temporarily. All defined variables and constants can serve as arguments of the implemented functions. Of course, there often exist some variables not to be used or whose usage might have no effect on the result of whole program. After defining variables and constants, we continue to give a specific sequence of functions aiming to specific problem, and the argument variables of the functions in the sequence are explicitly told, so as to the medium variables which store the function return value. All functions run orderly according to their position in the sequence. The individual program in our approach is given as a fixed length function sequence instead of tree and graph, and data set D differentiated from the other evolving paradigm are adopted. Each function in sequence comes from function set F = {f1 , f2 , f3 , . . . , fm }, which has the same definition as those in Genetic
986
S. Wang, Y. Chen, and P. Wu
Function set Element Add Index f0
Sub f1
Mult f2
Div f3
GT_J f4
LT_J f5
Eql_J f6
v1 d1
v2 d2
v3 d3
v4 d4
1 d5
Null f7
Date set Element Index
v0 d0
Function sequence f0 , d0 , d1 , d1
f3 , d3 , d 1 , d3
function
v 3 = Div ( v 3, v 1)
f 6 , d 1 , d 4, 0 , 3
f 3, d 5, d 3, d 2
Fig. 1. An individual programm in FSGP for factorial problem(see section 3.1). The function sequence holds the message of 4 functions. All functions in the sequence are executed orderly after inputting an integer to data variable(v4 ) and initializing media variables(v0 , v1 , v2 , v3 ) with 1. The second position in the sequence represents the function Eql J(v1 , v4 , 0, 3). If v1 is equal with v4 , the function will return 0, otherwise 3, the individual program will jump there and continue to run. The output of the last function stored to v2 is the result of this individual.
Programming [4]. The data set D, as the counterpart of terminal set T in Genetic Programming, is extended in this presentation. It includes data variables storing input data, constant(s), and medium variables. All members in data set D can serve as the arguments of functions. Beyond the usage of arguments all medium variables can store the function return value. The sequence of function holds the necessary message for the execution of function. Generally indexes of function, arguments, and media variable(s) are included. For some functions there are special message, such as GT J, LT J and Eql J (as defined in section 3.1) with 4 arguments need a function index, two argument indexes, and two addition integers directing the positions in function sequence. Figure 1 illustrates a FSGP model. The sequence of functions is generated randomly. Then it is optimized by evolutionary algorithm. The number of the functions is the length of individual program. All functions in the sequence run orderly. The final result of an executed program is the return value of the last function. The function sequence in FSGP is linear like Markus’s approach [1]. But the differences are also obvious: (1) Beside of implementing the instruction of program language, domain-specific functions in FSGP can take more than 2 arguments usually; (2) Modules or motifs can also be initialized as function sequences in FSGP.
Function Sequence Genetic Programming
987
parent1 crossover point
parent2
child1 child2 Fig. 2. Crossover operator of FSGP
parent multation point
child Fig. 3. Mutation operator of FSGP
2.2
Evolutionary Process
In genetic programming, evolutionary algorithms are used to search the optimal program structure according to an objective function. Various evolutionary algorithms have been developed so far, such as Genetic Algorithms (GA), Evolution Strategies (ES), Evolutionary Programming (EP), and their variants. For finding an optimal function sequence in FSGP, the operators used in Genetic Algorithms are employed due to its simpleness and effectiveness. The key steps of evolutionary process are described as following. Crossover. Crossover used here was one-cut point method. We selected randomly two parental program individuals and a position in their functions sequence according to crossover probability Pc . The right parts of two parent functions sequence were exchanged. The information of exchanged functions was kept in this course. Figure 2 illustrates the recombination operation. Mutation. A position in the sequence was randomly selected according to mutation probability Pm . Then new function message was generated randomly and placed into the selected position. Figure 3 illustrates the procedure of mutation. Reproduction. The reproduction simply chooses an individual in the current population and copies it without changing into the new population. In this phase, we adopted roulette selection along with the elitist strategy. Using the strategy, an best individual in the current population was chosen and copied into the new population.
988
S. Wang, Y. Chen, and P. Wu
It is clear that evolutionary process is simpler than other GP based on tree and graph structure.
3
Experimental Results
In order to verify the efficiency of the FSGP, we applied this approach to two problems, factorial problem and stock index prediction. With the first one we aimed to testify the capacity of FSGP constructing complex program, and with the second to the capacity of constructing prediction model. The parameters of FSGP for both experiments are given in Table 1. Table 1. The parameters of FSGP algorithm Parameter Generations for Factorial Generations for Stock index prediction Population size Crossover rate Pc Mutation rate Pm
3.1
value 200000 2000 100 0.9 0.5
Factorial Problem
The objective is to evolve a program into one which can calculate the factorial of input integer. We used the same data as those in the GRAPE [11]. Training data are input/output pairs (a, b) : (0, 1), (1, 1), (2, 2), (3, 6), (4, 24), (5, 120). The integers from 6 to 12 were used as the test set. We defined function set F ={Add,Sub,M ult,Div, GT J,LT J, Eql J, N ull}. The former 4 functions implement the operator +, -, *, and /, respectively. Function GT J, LT J, andEql J have four arguments, x0 , x1 , x2 , x3 , implementing relation operator >, < and ==, respectively. The operator operates on x0 and x1 , and x2 , x3 are two positions in the function sequence. If comparison result is true, the function will return x2 , otherwise x3 . The individual program will jump to the position and continue to run when they returns. The N ull function used to reduce the size of program does nothing. We constructed a function sequence containing 15 functions and data set D = {v0 , v1 , v2 , v3 , v4 , 1}. The former 4 variables are used as medium variables, and v4 as data variable. We used ”number of hit” as fitness value. The fitness function used in this experiment is r (1) n Where r is the number of training data computed correctly, n is the total number of training data. In order not to trap into infinite execution, program will exit and its fitness value will be set to 0 if a program passes 200 functions. The higher the fitness value indicates the better performance. We think the program individual whose fitness equals 1 is competent, then it is verified by test data. f itness =
Function Sequence Genetic Programming
989
We obtained fifteen appropriate program structures with twenty independent runs. This result shows that FSGP is more efficient than GRAPE [11] for factorial problem, which had the best success rate of 59% for test set with 2500000 evaluations. One of the structures evolved by FSGP is similar to the following pseudo code. 1. 2. 3. 4. 5. 6. 7.
double v0 , v1 , v2 , v3 , v4 ; const double v5 = 1; initialize v0 , v1 , v2 , v3 with 1; input an integer to v4 ; v1 = Add(v0 , v1 ); v3 = Div(v3 , v1 ); if(!(v1 == v4 )) go to 5; 8. v2 = Div(v5 ,v3 ); 9. v2 is the result of the program. This program implements a novel idea for factorial problem. When computing 1 the factorial of integer n, using division it lets v3 = 2∗3...∗n through n iterations, 1 then makes v2 = v3 as the result of the program. It is right for the integer n. 3.2
Stock Index Prediction
The other problem is the stock market prediction. In this work, we analyzed the Nasdaq-100 index valued from 11 January 1995 to 11 January 2002 [12] and the NIFTY index from 01 January 1998 to 03 December 2001 [13]. For both the indices, we divided the entire data into almost two equal parts. No special rules were used to select the training set other than ensuring a reasonable representation of the parameter space of the problem domain [14]. Our target was to evolve a program that could predict the index value of the following trade day based on the opening, closing and maximum values of the same on a given day. The assessment of the prediction performance was quantifying the prediction obtained on an independent data set. The root mean squared error(RMSE) was used to study the performance of the evolved program for the test data. The RMSE is defined as follows: N 1 RM SE = (yi − yi )2 (2) n i=1
Where yi is the actual index value on day i, yi is the forecast value of the index on that day, N is the number of train sample. For Nasdaq-100 index variables from v0 to v11 belonged to the data set D = {v0 , v1 , v2 , . . . , v14 , 1} were used as medium variables, variable v12 , v13 and v14 were employed as data variables. In addition, the length of individual program was set to 60. For NIFTY index, data set was D = {v0 , v1 , . . . , v18 , 1}. The former 14 variables were used as medium variables, variables from v14 to v18 were
990
S. Wang, Y. Chen, and P. Wu
1 Desired output Model output 0.9
0.8
Desired and model output
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
50
100
150
200
250
300
350
400
450
Fig. 4. The result of prediction for Nasdap-100 index
0.8 Desired output Model output 0.7
Desired output and model output
0.6
0.5
0.4
0.3
0.2
0.1
0
0
50
100
150
200
250
300
350
400
Fig. 5. The result of prediction for NIFTY index
corresponding to input data, and the length of program was set to 80. The test results of RMSE obtained using this approach and other different diagram[5] for the two stock indices are showed in Table 2.
Function Sequence Genetic Programming
991
Table 2. Experimental results for the two indices ANN[5] NF[5] MEP[5] LGP[5] SFGP Nas-100 0.0284 0.0183 0.021 0.021 0.0160 NIFTY 0.0122 0.0127 0.0163 0.0124 0.0131
From Table 2, it is obvious that FSGP are better than other models for Nas100 stock index, and for both indices, FSGP has better prediction than that in GEP. The result of prediction for Nasdaq-100 index is shown in Figure 4 and that for NIFTY is shown in Figure 5.
4
Conclusion
This paper propose a novel method for program evolution, Function Sequence Genetic Programming (FSGP). This approach adopts a fixed-length functions sequence to present the program individual. This presentation is closer to handwriting program. We applied FSGP to factorial problem and stock index prediction. As the result we illustrate this approach was an applicable and effective model.
Acknowledgment This research was supported by the NSFC (60573065), the the Natural Science Foundation of Shandong Province (Y2007G33), and the Key Subject Research Foundation of Shandong Province.
References 1. Brameier, M., Banzhaf, W.S.: A Comparison of Linear Genetic Programming and Neural Networks in Medical Data Mining. IEEE Transactions on Evolutionary Computation 5(1), 7–26 (2001) 2. Kantschik, W., Banzhaf, W.: Linear-tree GP and its comparison with other GP structures. In: Miller, J., Tomassini, M., Lanzi, P.L., Ryan, C., Tetamanzi, A.G.B., Langdon, W.B. (eds.) EuroGP 2001. LNCS, vol. 2038, pp. 302–312. Springer, Heidelberg (2001) 3. Kantschik, W., Banzhaf, W.: Linear-graph GP - A new GP structure. In: Foster, J.A., Lutton, E., Miller, J., Ryan, C., Tettamanzi, A.G.B. (eds.) EuroGP 2002. LNCS, vol. 2278, pp. 83–92. Springer, Heidelberg (2002) 4. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992) 5. Grosan, C., Abraham: Stock Market Modeling Using Genetic Programming Ensembles. In: Genetic Systems Programming: Theory and Experiences, vol. 13, pp. 131–146. Springer, Heidelberg (2006)
992
S. Wang, Y. Chen, and P. Wu
6. Miller, J.F., Thomson, P.: Cartesian Genetic Programming. In: Proceedings of the European Conference on Genetic Programming, pp. 121–132. Springer, London (2000) 7. Koza, J.R.: Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Cambridge (1994) 8. Teller, A., Veloso, M.: Program Evolution for Data Mining. J. The International Journal of Expert Systems 8(1), 216–236 (1995) 9. Nordin, P.: A Compiling Genetic Programming System that Directly Manipulates the Machine-Code. In: Advances in Genetic Programming. MIT Press, Cambridge (1994) 10. Poli, R.: Evolution of Graph-like Programs with Parallel Distributed Genetic Programming. In: Genetic Algorithms: Proceedings of the Seventh International Conference, pp. 346–353. Morgan Kaufmann, MI USA (1997) 11. Shirakawa, S., Ogino, S., Nagao, T.: Graph Structured Program Evolution. In: Genetic And Evolutionary Computation Conference Proceedings of the 9th annual conference on Genetic and evolutionary computation, pp. 1686–1693. ACM, New York (2007) 12. Nasdaq Stock MarketSM, http://www.nasdaq.com 13. National Stock Exchange of India Limited, http://www.nseindia.com 14. Abraham, A., Philip, N.S., Saratchandran, P.: Modeling Chaotic Behavior of Stock Indices using Intelligent Paradigms. J. Neural, Parallel & Scientific Computations 11(1&2), 143–160 (2003)
Speech Emotion Recognition Research Based on Wavelet Neural Network for Robot Pet Yongming Huang, Guobao Zhang, and Xiaoli Xu School of Automation, Southeast University, Nanjing Jiangsu 210096, China
[email protected] Abstract. In this paper, we present an emotion recognition system using wavelet neural network and BP neural network for special human affective state in the speech signal. 750 short emotional sentences with different contents from 5 speakers were collected as experiment materials. The features relevant with energy, speech rate, pitch and formant are extracted from speech signals. Neural network are used as the classifier for 5 emotions including anger, calmness, happiness, sadness and boredom. Compared with the traditional BP network, the results of experiments show that the wavelet neural network has faster convergence speed and higher recognition rate. Keywords: Wavelet neural network; BP neural network; Emotional speech; Recognition of emotion.
1
Introduction
Emotion recognition by computer is a topic which has been researched in recent years, and becomes more and more important with the development of the artificial intelligence. An effective human emotion recognition system will help to make the interaction between human and computer more natural and friendly. It has many potential applications in areas such as education [1], entertainment, custom service etc. Considering this and its potential uses, we design a context independence system in speech-based emotion recognition in this paper. For the robot pet becoming smarter and more humanoid, it’s important for the machine to possess the emotional recognition ability. The ultimate goal of this research is to make a Personal Robot Pet which can recognize its host’s emotion. The remainder of this paper is organized as follows. Section 2 describes the design and implementation of our system. Section 3 presents the results of the two different neural networks. Conclusions and discussions are given in section 4.
2 2.1
System Design Processing Flow
Fig.1 illustrates the processing flow of our system which is divided into two main parts: speech processing and emotion recognition. First, some pre-processing should D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 993–1000, 2009. © Springer-Verlag Berlin Heidelberg 2009
994
Y. Huang, G. Zhang, and X. Xu
be done to get the effective speech period by using the Cool Edit Pro 1.2a, including filtering and intercepting the speech period (determining the beginning and end points in Fig 4). Second, the features for each speech are extracted and compiled into a feature vector. Last, PCA (Principle Component Analysis) is used to reduce the dimensions of the feature vector, forming the final input vectors [2]. In the training stage, the input vector is used to train the neural network; in the recognition stage it is applied to the completely trained network and the output is a recognized emotion; also the system can study based on the output. These steps are explained further in the following sections.
Sp eech Inpu t
Pre -p ro cessing Proce ssing Of Speec h
Features Ex traction
Red ucing the Dime nsions by PCA
Emotion Recogn ition Recog nitio n of Ne ural Ne two rk
Outpu t
Traing of Neural Network
Stu dy
Fig. 1. Processing flow of Emotion Recognition System
2.2
Emotions and Features
Emotions. How to classify emotions is an interesting and difficult issue. A wide investigation on dimensions has been performed in the past, but researchers still haven’t established a standard so far. Different researchers on emotion recognition differ on the number of categories and the kinds of categories [3, 4]. Our motivation to do this study is to make a Personal Robot Pet which can recognize its host’s emotion, so it is enough to select common basic emotions in life: angry, calm, happy, sad and bore. Features. Speech signal is short-term stable, so we choose short-term acoustic features in this emotion recognition system. Some features have been proved to be useful for emotion recognition in many papers [5]. Pre-processing should be done to the effective speech period by the Cool Edit Pro 1.2a, including filtering and intercepting the speech period (determining the beginning and end points in Fig 4).
Speech Emotion Recognition Research Based on Wavelet Neural Network
995
speech power, pitch,12 LPC1 parameters, Delta LPC parameters [6]. energy, median of F12,variance of duration of energy plateaus, minimum of F1, median of F03, mean F0, maximum/mean duration of energy plateaus, variance of F0 [7]. signal energy, sub-band energy, spectral flux, zero- crossing rate, fundamental frequency, MFCC, FFBE [8]. pitch average, pitch variance, intensity average, intensity variance, jitter pitch tremor, shimmer intensity tremor, speech rate [9]. After examining these examples, we select the following features in this study: speech rate, max-Energy, mean-Energy, number of pole, max-Pitch, mean-Pitch (fundamental frequency), max-formant. 2.3
Neural Network Architecture
The network is composed of five sub-neural networks in this paper, with one network for each of the five emotions that are examined. As to each sentence, the recognition processing flow is diagrammed in Fig2. The features vector is input to each of the five sub-networks, then each sub-network will give an output ( v1 , v 2 , L v5 ) that represents the likelihood to the sub-network’s emotion. At last, the Logic Decision selects the “best” emotion based on these outputs.
v2
Sub-Network (surprise)
Logic Decision
Sub-Network (calm)
v1
…
Speech feature parameters
Sub-Network (angry)
Output
v5
Fig. 2. The steps of recognition
Fig. 3. Sub-network configuration
Wavelet Neural Network is a feed forward neural network whose activation functions are wavelets [10]. WNN is not essentially different from BP neural network in the structure or the form. As to the connection of the wavelets and the neural network, we adopt the compact metric structure. To draw close to the engineering application, a three-layer feed forward neural network which has only one intermediate layer is selected. We take the Wavelet network as the sub-network. Fig.3 illustrates the configuration of each of the five sub-networks. Each network contains three layers. One input layer with six nodes, one intermediate layer with twelve nodes and 1
LPC means the linear predication coefficients. F1 means the first formant frequency. 3 F0 means the fundamental frequency. 2
996
Y. Huang, G. Zhang, and X. Xu
one output layer with one node, which is an analogical value 0.99 or 0.01 for training utterance. If the emotion of input utterance is the same as the sub-network, the value is 0.99, else 0.01. Adopting separate sub-network for the five emotions allows each network to be adjusted separately, and we are able to easily change each network separately without redesigning the entire system which is good for network’s study and can be updated in the feature. To demonstrate the advantage of wavelet neural network on the speech emotion recognition, we also take the BP network as the sub-network. Section 3 will presents the results of the two different neural networks. 2.4
Learning Algorithm and Wavelet Function
In this paper, the WNN’s structure is three-layer feed forward, Supposed that the input layer has L neuron, the layer has M neuron, the output layer has S neuron, Vij is the connection weight between the input layer cell i and the hidden layer cell j, wjk is the connection weight between the hidden layer cell j and the output layer cell k, the total number of samples is N, the input of the nth sample is
xin (i = 1, 2,L , L; n = 1, 2L , N ) , the output of the network is
okn (k = 1, 2,L , S ; n = 1, 2L , N ) , the corresponding goal output is Dkn . The excitation of the input layer is linear transform (output = input), the excitation of the hidden layer is wavelet function, and the excitation of the output layer is Sigmoid function. Depending on the above network parameters, the output of the network is as following: L
M
o = f (∑ψ ( n k
∑v x i =1
j =1
n ij i
ai
− bj )w jk )
(1)
Where: k =1,2,…,S; n =1,2,…,N.
Using the learning algorithm of increasing the momentum gradient descent, there have:
∂E + αΔw jk (t − 1) ∂w jk
(2)
Δvij (t ) = −η
∂E + αΔvij (t − 1) ∂vij
(3)
Δa j (t ) = −η
∂E + αΔa j (t − 1) ∂a j
(4)
Δb j (t ) = −η
∂E + αΔb j (t − 1) ∂b j
(5)
Δw jk (t ) = −η
Speech Emotion Recognition Research Based on Wavelet Neural Network
997
Where α is the momentum parameter, α ∈ (0,1) as usual. The momentum item reflects the empirical value accumulated before, and it plays damping action on the adjustment at the moment t. So far there has no unified theory for selecting the wavelet functions. The Morl-et −0.5 x 2
wavelet cos(1.75 x )e , which is limit supportive, symmetric, cosine modulat-ed Gaussian Wave, has been widely employed in various fields. Based on this, the Morlet wavelet is selected to be the excitation of the hidden layer in this study.
3 Experiments and Results 3.1 Database A record of emotional speech data collections is undoubtedly useful for researchers interested in emotional speech analysis. In this study, emotional speech recordings were made in a recording studio, using a professional-quality microphone and digital audio record software-Cool Edit Pro1.2a (Fig 4), at a sampling rate of 11025, using a single channel 16-bit digitization.
Fig. 4. Cool Edit Pro1.2a
We invited five classmates-three male, two female, who are good at acting, to be served as subjects. Each subject uttered a list of 30 Chinese sentences for five times, one time for each of the five emotions. When finished, the speech database must be eamined by a listening test. If disqualified, the sentence will be deleted and recorded again. At last, some pre-processing should be done to get the effective speech period in the Cool Edit Pro, including filtering and determining the beginning and end points in Fig 4. 3.3
Results
3.3.1 Training Speed The data in Table 1 stand for the training epochs, which are acquired by the speech features of the same person, training the wavelet neural network and the
998
Y. Huang, G. Zhang, and X. Xu Table 1. Training speed of WNN and BPNN
Samples Network/step BPNN WNN
training separately 50 82 1200
100 165 1600
150 263 1650
training by increasing samples on the previous training net 50 100 150 82 166 39 1200 400 150
BP neural network separately until the networks are both convergent and the error accuracies are the same. In the table, training separately means that the first 50 samples need train a network and the second 100 samples need train another network as before; while training by increasing samples on the previous training net means that the first 50 samples need train a network, but the second 100 samples needn’t, for adding the later 50 samples to train the known network. From the table, under the same training conditions, when trained at the first time, the training speed of BPNN is faster than the WNN’s. However, the WNN only converges slowly at first. Once converged and “remembered” the nonlinear functions, its training speed will enhances obviously; as a result, the convergent speed of WNN is faster than the BPNN’s when increasing the train samples. So the WNN has more engineering significance. 3.3.2 Generalization Ability The central problem of that neural network can be employed in the robot pet effectively is its generalization ability, that is to say, the trained network have the ability of precise reaction to the test samples or the work samples.
Fig. 5. Generalization Ability between WNN and BPNN
Speech Emotion Recognition Research Based on Wavelet Neural Network
999
The results of the experiments(Fig. 5)show that BPNN has well stability, in which the outputs of each sample in different experiment are similar, but simulation results after entry are lager derivate of that before entry. As for WNN, though there is some volatility, it is only volatile around a small area of original value under statistics. As a result, WNN has more generalization ability. 3.3.3 Recognition Comparison For all the cases, the mean recognition rates of the five emotions from five speakers (F1, F2, M1, M2, M3), who are three male and two female, are acquired from the WNN and BPNN. Table 2. Mean Recognition Rate between WNN and BPNN
Speaker WNN
F1 94%
F2 84%
M1 86%
M2 90%
M3 86%
BPNN
84%
78%
82%
78%
82%
From the table, under the same training conditions, the recognition rate of WNN is higher than that of BPNN. The results prove that WNN is effective to employ to the speech emotion recognition system oriented the Personal Robot Pet.
4
Conclusions and Discussions
Aim at the personal intelligent robot pet, we studied the speech emotion recognition and artificial neural network deeply, and researched the results of speech emotion recognition rate classified by wavelet neural network. Compared to BPNN, the effective neural network established by the wavelet algorithm has faster convergent speed, more generalization ability and better recognition rates. From the results, the wavelet neural network has more engineering signification. In order to improve the learning ability of the intelligent robot pet, the feedback idea of the control engineering was taken into the learning algorithm to make the emotional communication more easily, so the pet can be more understanding. There is still more work to be done in the field of emotion recognition in speech. The speech features used in this study need to be reduced because of the large number of them. And if we want to design a speaker independent system, we should increase other features, such as the first formant, the variance of the first formants. But in the aspect of the engineering, extracting more features from the recordings of the host by the robot pet means more computing time, and the real-time character will be drastically reduced, as a result, the robot pet will not be propitious for marketing promotion. In addition, feature trials with different topologies of neural networks may also help improve performances. The combination of audio and visual data will convey more information about the human emotional state than speech alone. Therefore, the robot pet who can both hear and see can obtain more recognition rate when communicating with its host, and the communication can be more smoothly.
1000
Y. Huang, G. Zhang, and X. Xu
References 1. Fragopanagos, N., Taylor, G.: Emotion Recognition in Human-Computer Interaction. Neural Networks 18, 389–405 (2005) 2. Cheriyadat, A.: Limitations of principal component analysis for dimensionality-reduction for classification of hyperspectral data, pp. 31–56. Mississippi State University, America (2003) 3. Bhatti, M.W., Wang, Y., Guan, L.: A Neural Network Approach for Human Emotion Recognition in Speech. In: ISCAS 2004, pp. 181–184 (2004) 4. Murry, I.R., Arnott, J.L.: Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech. Computer Speech and Lauguage 22, 107–129 (2008) 5. Ververidis, D., Kotrropoulos, K.: Emotional speech recognition: Resource, features, and methods. Speech Communication 48, 1162–1181 (2006) 6. Nicholson, J., Takahashi, K., Nakatsu, R.: Emotion Recognition in Speech Using Neural Networks. Neural Computing & Applications 9, 290–296 (2000) 7. Zhongzhe, X., Dellandrea, E., Weibei Deal, E.: Features Extraction and Selection for Emotional Speech Classification. IEEE, 411–416 (2005) 8. Temko, A., Nadeu, C.: Classification of acoustic events using SVM-based clustering schemes. Patttern Recognition 39, 682–694 (2006) 9. Amir, N.: Classifying emotions in speech: a comparison of methods. In: EUROSPEECH 2001 Sandinavia 7th European Conference on Speech Communication and Technology 2th INTERSPEECH Even, pp. 127–130 (2001) 10. Bhatti, M.W., Wang, Y., Guan, L.: A Neural Network Approach for Human Emotion Recognition in Speech. In: ISCAS 2004, pp. 181–184 (2004)
Device Integration Approach to OPC UA-Based Process Automation Systems with FDT/DTM and EDDL Vu Van Tan, Dae-Seung Yoo, and Myeong-Jae Yi School of Computer Engineering and Information Technology University of Ulsan, San-29, Moogu-2 dong, Namgu, Ulsan 680-749, Korea {vvtan,ooseyds,ymj}@mail.ulsan.ac.kr,
[email protected] Abstract. Manufacturers and distributors alike are seeking better ways to more efficiently manage the assets of their operators. Advances in communication technologies and standards are now making them easier and more cost justifiable to deliver information from measurement instrumentation as well as manage these instrumentation assets more efficiently. Today’s technologies such as Electronic Device Description Language (EDDL) and Field Device Tool (FDT) are available for device integration. This paper introduces a flexible device integration approach to achieving the advantages of both such technologies for the device management to new OPC Unified Architecture (UA)-based process automation systems. This approach is suitable not only for the process automation industry, but also for the factory automation industry. Visibility of and access to field device information through the OPC standards, FDT, and EDDL contribute to efforts to improve the life cycle management of process plants and associated distribution operations. Keywords: Automation system, Device integration, EDDL, FDT/DTM, OPC, Process automation, Unified architecture.
1
Introduction
Many field devices from different vendors are now being connected to process automation systems through fieldbus communication. These field devices have intelligence, and so many settings and adjustments can be made through fieldbus communication [9,11,13,14]. Process automation systems, i.e., Distributed Control Systems (DCSs) or Programmable Logic Controllers (PLCs), are the traditional way to acquire information from measurement devices and effect control of motors and valves. Many plant environments have more than one control system platform installed. Even those with control systems from one vendor may have a variety of field device interface implementations to communicate with field devices. The challenge can only be faced with open standard and standardized device integration. In addition, another challenge is to employ a permanent
Corresponding author.
D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 1001–1012, 2009. c Springer-Verlag Berlin Heidelberg 2009
1002
V.V. Tan, D.-S. Yoo, and M.-J. Yi
cost effective information bridge from selected devices to the maintenance people without interfering with the existing control systems. Two technologies available today for device integration such as Electronic Device Description Language (EDDL) [5] and Field Device Tool (FDT) [6] come from process automation society [7]. The EDDL technology defines own language that device manufacturers use to write textual descriptions, i.e., Electronic Device Description (EDD), for their devices. EDDs are processed by an EDD interpreter within a device independent engineering system. Unlike, the FDT technology uses programmed software components, i.e., Device Type Managers (DTMs), for device integration. These DTMs run inside the FDT frame application which manages and coordinates the DTMs. More recently, the OPC Foundation has been proposed new specifications to design and develop automation control softwares running on independent platforms based on XML, Service Oriented Architecture (SOA), and web services, i.e., OPC Unified Architecture (UA) specifications [16]. The solution is to share information in more complex data structure formats with enterprise level Manufacturing Execution System (MES) and Enterprise Resource Planning (ERP) systems. The most important feature of the OPC UA standard is the possibility to define powerful information models. These models can define the organization of data in the address space of an OPC UA server in terms of structure and semantic [16]. However, this standard is making a challenge to propose a new device integration concept to researchers and hardware and software vendors [8]. The study presented in this paper aims at proposing a flexible device integration approach to OPC UA-based process automation systems. This approach is proposed based on the OPC UA client-server model with using FDT/DTM and EDDL technologies. It achieves the advantages of both technologies to field device integration in unified architecture for OPC UA-based systems. This paper is organized as follows: The next section presents some background on the FDT/DTM, EDDL, and the OPC UA specifications and reviews several related approaches. The approach to device integration for OPC UA-based process automation systems complying with the client-server model is introduced in Section 3. The system analysis according to the proposed approach in enabling efficient collaboration between device-level SOA on the one hand and on the other hand services and applications is presented in Section 4. Finally, some conclusions are marked in Section 5.
2 2.1
Background and Related Work FDT/DTM
Field devices have gained intelligence along with the spread of digital communication. As the number of field devices increases, the more complicated settings and adjustments need using the advanced functions in such devices have become challenges [10,11,3,1,9]. As a consequence, some field device vendors now provide dedicated softwares on PCs to completely supplement the functional limitations on dedicated terminal.
Device Integration Approach
1003
The FDT technology [6,15] uses programmed software components, called Device Type Manager (DTM), for field device integration. The DTMs runs on the FDT Frame Application that manages and coordinates the DTMs. Since a DTM complies with the interface specification, the device manufacturer is not restricted in the design of functionality, the implementation of the DTM components, and the programming languages. The DTM is composed of two types of DTM: (i) device DTM for field devices and (ii) communication DTM for field communication control. Furthermore, a gateway DTM is also available to connect HART devices via the HART1 multiplexer or PROFIBUS and supports communication between devices. Device DTMs can be classified into two types: (i) the first one is a dedicated device DTM to support a specific field device, and (ii) the other one is a universal device DTM in order to apply to many kinds of field devices. 2.2
Electronic Device Description Language
The open and standardized EDDL technology defines own language that the device manufacturers use to write textual descriptions, i.e., Electronic Device Description (EDD), for their devices [5]. EDDs are processed by an EDD interpreter within a device independent engineering system. The amount of language elements is limited and specific for device description. It allows for easy and efficient development of the EDD as common functionality for device integration. An EDD is independent of operating systems and hardware platforms, and provides a uniform mechanism for device operations, the interpretation to yield high robustness. All software tools equipped with an interpreter to accept the EDDL grammar can extract the device data out of the EDDL file. There are several differences in the working mode between FDT/DTM and EDDL [12], such as (i) passive storage of data in case of EDDL and active storage of data in case of FDT/DTM; (ii) EDDLs do not communicate with their devices, but DTMs do; (iii) applications relying on EDDL concept need an EDDL interpreter, FDTs do not need further plug-ins or software; (iv) applications relying on EDDL concept communicate with device through appropriate interfaces e.g., fieldbus, while a DTM is the interface to the device; (v) EDDL programming needs less effort to learn and handle than the development of DTMs does; (vi) the set of EDDL describable devices is a subset of DTM manageable devices. 2.3
OPC Unified Architecture
The OPC UA standard (a twelve-part specification has been released now) [16] is the next generation technology for secure, reliable, and inter-operable transport of raw data and preprocessed information from the plant floor (or shop floors) to production planning or the ERP system. It is based on XML, web services, and SOA to share information in more complex data structure formats with enterprise level MES and ERP systems in a way that they can understand. It 1
http://www.hartcomm2.org/
1004
V.V. Tan, D.-S. Yoo, and M.-J. Yi
embodies all functionalities of the existing OPC standards and expands on top of them. It enables all three types of data – current data, historical data, and alarms and events – to be accessed by a single OPC server because three different data with different semantics now have been required, for example, to capture the current value of a temperature sensor, an event resulting from a temperature threshold violation, and the historic mean temperature. The role of the OPC technology and the OPC UA standard can be illustrated in Fig. 1.
Fig. 1. The role of the OPC technology and the OPC UA standard in process and factory automation
The OPC UA standard defines a vendor and protocol independent sever-client model with utilization of standard web technologies. However, it is making new challenges to researchers such as (i) new device integration concept, (ii) new scenarios like Asset Management and Manufacturing Execution System (MES), and (iii) the design and implementation of this new standard for automation system applications. 2.4
Problem Statements
Today most of the human-machine interface (HMI), supervisory control and data acquisition (SCADA), and distributed control system (DCS) manufacturers offer interfaces that conform the OPC specifications. The OPC UA standard now is defined with the utilization of XML, web services, and SOA [16]. This makes a significant asset to business network applications. However, it seems that the OPC UA standard is geared for the future, but not for now [16,17]. This standard is making a challenge to develop field device integration concept to researchers and software and manufacturer vendors. The field device configuration software packages offering a graphical interface and easy operation by adopting FDT/DTM are developed as an open framework for field device tools [13,18,12,14]. As a result, the FDT/DTM technology can be used not only to build an open system independent of a specific vendor, but also to implement various communication protocols and functions based on
Device Integration Approach
1005
advanced technologies. It is highly popular with factory automation vendors as well as process automation vendors. However, the device integration for OPCbased automation systems has not covered by these tools. Grossmann et al. [7] presented an approach to field device integration by integrating both the FDT and EDDL technologies into a unified architecture. The architecture of this approach consists of clients and server complying with the OPC UA client-server model. The server loads the device description containing Device Information Model (DIM) and Device Operation Model (DOM). The DIM is directly processed within the server to establish the information model while the DOMs are only handled in terms of server storage. The server information model allows OPC UA clients to access the device data and functions which are part of the DIM. The key issues in establishing FDT as an open core technology are the technical cooperation between the OPC Foundation and the Field Device Tool-Joint Interest Group (FDT-JIG). They provide the end-users both within process and factory automation a totally seamless, truly open and easy enterprise wide communication. The reasons for the cooperation are (i) OPC is not suitable for cases where devices need to be addressed through several different protocols because of designing purely for communication, and (ii) the FDT technology adds a nested communication or stacked communication interface to the OPC technology for supporting communication to devices with several different protocols. This research presents a unified and flexible approach to field device integration for OPC UA-based process automation systems with the utilization of both FDT/DTM and EDDL technologies. This approach has modified the device integration concept developed by Grossmann et al. [7]. It ensures the flexibility and powerful information models of OPC UA systems when deploying to industrial environments in terms of Internet of Things (IoT) [3], aiming at developing an ultimate solution for both process and factory automation today.
3
3.1
Device Integration Concept for OPC UA-Based Process Automation Systems The OPC UA-Based Process Automation System Architecture
Internet-based manufacturing, leveraging the latest technologies to achieve distributed information systems, provides new possibilities not only for static, data centric integration of the plant floor into an overall enterprise architecture, but also for full process control integration of control and field devices by means of SOA, XML, and web services. The OPC UA standard was proposed to create a standardized data exchange model for automation systems. It defines a vendor and protocol independent of client-server architecture based on web technologies to ensure interoperability. Another potential feature is the ability to define powerful information models. These information models can define the organization of data within the UA server’s address space in terms of structure and semantic. The OPC UA architecture is perfectly suited as the central interface
1006
V.V. Tan, D.-S. Yoo, and M.-J. Yi
Fig. 2. The architecture of an OPC UA-based process automation system
for the vertical integration of automation systems. The challenge now is to make a flexible mechanism for device integration. The architecture of an OPC UA-based process automation system is proposed as shown in Fig. 2. It indicates that all the functionalities of the existing OPC standards such as data access, historical data access, and alarms and events. In addition, the SOA is on the move and is foreseeable that this architectural paradigm will be dominant in the future. The integration of devices into the business IT-landscape through SOA is a promising approach to digitalize physical objects and to make them available to IT-systems [3]. This can be achieved by running instances of web services on these devices, creating an Internet of Services to collaborate and empower the future service-based factory. Considering these issues, the device integration concept developed by Grossmann et al. [7] has not completely solved the requirements of integration of devices for the architecture of OPC UA-based process automation systems as aforementioned. 3.2
Device Integration Approach
The address space of an OPC UA server is created by adding device type definitions. These definitions are generated using the device information provided by the FDT/DTM and EDDL. The device type definitions are composed of the parameters and their default values as well as the communication channels, the parameters, and default values. The end-users can create derivative definitions from the device types for defining the operation mode-specific device types. For example, the information from FDT/DTM can be used to create the device type information for a pressure transmitter. The relations between the OPC UA
Device Integration Approach
1007
Fig. 3. The relations between OPC UA information model and both FDT/DTM and EDDL technologies
information model and both FDT/DTM and EDDL are shown in Fig. 3. The type definitions in the server’s address space are then used to generate device instances. These instances can be created either directly from the default type definitions by the FDT/DTM information or from the derived definitions. In order to realize the architecture of an OPC UA-based process automation system and to enable efficient collaboration between device-level SOA on the one hand and on the other hand services and applications, the approach developed by Grossmann et al. can be modified as shown in Fig. 4. The device-specific DIM is established as a well-defined information model on the server side. It concentrates on the representation of the device regarding structure and behavior. EDDL is developed to describe device specific properties and is perfectly suited for describing the DIM running by an interpreter. The Server: The device specific DIM described in EDDL is the base for information model in the OPC UA server. The EDDL-interpreter of the server will process the EDD of a device and then establishes the appropriate information model to the server’s address space. As Fig. 4 shows, each device is represented by a Device object. The device object includes variables that hold the data items of the data model. This covers meta information such as unit, range, values, parameters, etc. When an OPC UA client accesses data from the server side, the server will read the corresponding device data from the plant floor. To offer the DIM access to the device data, the server will communicate with the devices over the fieldbus protocols such as FOUNDATION fieldbus, PROFIBUS, CANopen, DeviceNet, and HART through HART gateway. Having the services on devices will not be much of use if they cannot be dynamically discovered by the server. As a result automatic
1008
V.V. Tan, D.-S. Yoo, and M.-J. Yi
Fig. 4. The device integration approach based on both FDT/DTM and EDDL
service discovery will allow us to access them in a dynamic way without having explicit task knowledge and the need of a binding. The discovered services will be used by the clients in terms of web services enabled devices [3,19]. Therefore, the discovery services should be implemented to find devices. The Client: In the standard case there are three different types of clients: (i) The universal Device Engineering Framework (DEF), (ii) application specific clients, and (iii) web browser-based clients. The application specific clients directly access device data via the DIM in the OPC UA server and process those in a specific mechanism. The universal DEF concentrates on the device operation via the device specific user interface. This offers a dynamic plug-in mechanism for DOMs, which are composed of the device specific user interface. The elements provided by EDDL are sufficient for simple graphical user interfaces (GUIs) and the DOM is completely described in EDDL. The DEF has to include the OPC UA client’s services that provide functions for the DOMs to access device data and functions via the DIM in the server. To support web browser-based clients, the client components embedded in the server side should render the GUIs that are compatible with a web browser showing an HTML document as webpage. 3.3
General Prototype
The OPC UA standard tends to offer the possibility of powerful information model, a critically important feature. It suits perfectly the role of central interface for vertical integration of automation systems. How field level sensors and devices are set up and how they communicate with the higher control levels over the network are the potentially ideal application for XML-based web services. In addition, the XML-based unified device integration is a promising approach to both existing field device management solution and new device integration concept. Due to the high number of different field device types and fieldbus types within a single control system, it is still making a challenge to software and
Device Integration Approach
1009
Fig. 5. A general prototype for the OPC UA-based process automation systems using the proposed device integration approach
hardware vendors since the new OPC UA standard has been proposed. A general prototype for the OPC UA-based process automation systems with different fieldbus protocols is illustrated in Fig. 5. Networks of automation systems are normally heterogenous and use different protocols with a hierarchical structure. The transitions between networks are managed by gateway devices. The general prototype has addressed a nested communication through heterogenous networks with integrating PROFIBUSHART gateway as well as PROFIBUS-I/O Link gateway, etc. The services of the OPC UA client should be embedded in the server side in order to support web browser-based clients. In the case using an ERP system like SAP is guaranteed, it acts as an OPC UA client communicating to the DIMs of the devices. As Fig. 5 shows, the usage of Profinet, CANopen, and I/O-Link devices within the prototype indicates that the device integration approach introduced in this research is suited not only for process automation devices, but also for factory automation devices as well. This prototype will be investigated carefully for the further deployment of OPC UA-based systems.
4
System Analysis
To evaluate a system for satisfying the requirements of industrial system applications, some issues such as fault, configuration, administration, performance, and security should be addressed and ensured. Such issues will be given as the first attempt in order to evaluate the field device integration approach for the OPC UA-based process automation systems:
1010
V.V. Tan, D.-S. Yoo, and M.-J. Yi
– Fault issue: When the OPC UA-based process automation systems are part of the IT industry system, both faults of particular parts of manufacturing processes and faults need to be managed. By complying with the advantages of both FDT/DTM and EDDL technologies, the approach of device integration will enable the end-users to identify faults in their production process not only in high system level but in lower system level. – Configuration issue: Mapping the field devices into the OPC UA server’s address space is a major challenge because of needing a new device integration concept for the new OPC UA standard. With the utilization of the FTD/DTM and EDDL technologies, this issue will be guaranteed well. – Administration issue: By integrating both the FDT/DTM and EDDL technologies for device integration, it is possible to integrate devices to the OPC UA server’s address space in an efficient way to achieving the advantage of both such technologies. However, the field of research on device integration needs to be developed and supported by both hardware and software vendors. – Performance issue: It is still an open question for the cases of integration of more devices due to the originality of the new OPC UA standard. The proposed system based on the new OPC UA specifications is implementing. This issue will be investigated carefully in the future work because only several preliminary results from experiment have been achieved. – Security issue: This issue is part of the OPC UA-based process automation systems and fieldbus protocols. In the point of view of system, it can meet the security objective requirements in each level by the security supports of each automation system level [4]. Both devices as well as middleware services have to be authorized when they want to communicate. In the point of view of the enterprise, it can be ensured by using the WS-security and web technologies for the integrity and confidentiality of all messages exchanged between the client and server. Security of remote invocations including the authentication of clients, encryption of messages, and access control can be guaranteed by both XML signature and XML encryption. The XML signature can meet the security objectives such as integrity, non-repudiation, authentication, accountability, and confidentiality. Besides, the XML encryption complies with the security objective confidentiality [2]. The XML security is consequently suitable for process and factory automation in terms of IoT.
5
Concluding Remarks
This paper has introduced an approach to field device integration for the OPC UA-based process automation systems, i.e., distributed control systems or programmable logic control systems. The architecture of an OPC UA-based process automation system was also presented to make readers easy to understand the field device integration concept. A general prototype related to the device integration issue and the automation system was fully illustrated and analyzed. This indicates the future process automation systems with the new OPC UA
Device Integration Approach
1011
standard. In addition, the system analysis demonstrated the ability of the proposed device integration solution to identify the faults, to maintain the system, to configure the system, etc. Platform independence and robustness via the interpreter allow a long-term independence that is an important issue in industry today. Since the migration strategy assures that all existing DTMs and EDDs can be used for the future without changes, the protection of investments is achieved. The potential of the new OPC UA standard is making new scenarios like Asset Management or Manufacturing Execution System (MES) and leads to a robust and platform independent solution. By introducing a flexible field device integration issue, it is important to note that the use of both FDT/DTM and EDDL technologies can be reused throughout the life cycle. Both such technologies now are being expected to play an important role in both process and factory automation. Although client-server architectures are playing an important role in the field of business software systems, the SOA is on the move and is foreseeable that it will be dominant in the future. The integration of field devices into the business IT system through SOA is a promising approach to digitalize physical objects and to make them available to IT-systems. This can be achieved by running instance of web services on these devices. It is a challenge to researchers and developers for the case of field devices on the plant floor. Acknowledgements. The authors would like to thank the Korean Ministry of Knowledge Economy and Ulsan Metropolitan City which partly supported this research through the Network-based Automation Research Center (NARC) at the University of Ulsan. The authors also would like to thank three anonymous referees for their carefully reading and commenting this paper.
References 1. Bohn, H., Bobek, A., Golatowski, F.: SIRENA – Service Infrastructure for Realtime Embedded Networked Devices: A Service Oriented Framework for Different Domains. In: Proceedings of the Int. Conf. on Systems and Int. Conf. on Mobile Communications and Learning Technologies, p. 43. IEEE CS Press, Los Alamitos (2006) 2. Braune, A., Hennig, S., Hegler, S.: Evaluation of OPC UA Secure Communication in Web browser Applications. In: Proceedings of the IEEE Int. Conf. on Industrial Informatics, pp. 1660–1665 (2008) 3. de Souza, L.M.S., Spiess, P., Guinard, D., K¨ ohler, M., Karnouskos, S., Savio, D.: SOCRADES: A web service based shop floor integration infrastructure. In: Floerkemeier, C., Langheinrich, M., Fleisch, E., Mattern, F., Sarma, S.E. (eds.) IOT 2008. LNCS, vol. 4952, pp. 50–67. Springer, Heidelberg (2008) 4. Dzung, D., Naedele, M., Hoff, T.P.V., Crevatin, M.: Security for Industrial Communication Systems. Proceedings of IEEE 93(6), 1152–1177 (2005) 5. EDDL - Electronic Device Description Language, http://www.eddl.org/ 6. FDT-JIG Working Group: FDT Interface Specification, version 1.2.1. FDT Joint Interest Group (2005), http://www.fdtgroup.org/
1012
V.V. Tan, D.-S. Yoo, and M.-J. Yi
7. Grossmann, D., Bender, K., Danzer, B.: OPC UA based Field Device Integration. In: Proceedings of the SICE Annual Conference, pp. 933–938 (2008) 8. Hadlich, T.: Providing Device Integration with OPC UA. In: Proceedings of the 2006 IEEE Int. Conf. on Industrial Informatics, pp. 263–268. IEEE Press, Los Alamitos (2006) 9. Ivantysynova, L., Ziekow, H.: RFID in Manufacturing – From Shop Floor to Top Floor. In: G¨ unther, O.P., Kletti, W., Kubach, U. (eds.) RFID in Manufacturing, pp. 1–24. Springer, Heidelberg (2008) 10. Jammes, F., Mensch, A., Smit, H.: Service-Oriented Device Communications using the Devices Profile for Web Services. In: Proceedings of the 3rd Int. Workshop on Middleware for Pervasive and Ad-Hoc Computing, pp. 1–8. ACM Press, New York (2005) 11. Karnouskos, S., Baecker, O., de Souza, L.M.S., Spiess, P.: Integration of SOA-ready Networked Embedded Devices in Enterprise Systems via a Cross-Layered Web Service Infrastructure. In: Proceedings of the 12th IEEE Int. Conf. on Emerging Technologies and Factory Automation, pp. 1–8. IEEE Press, Los Alamitos (2007) 12. Kastner, W., Kastner-Masilko, F.: EDDL inside FDT/DTM. In: Proceedings of the 2004 IEEE Int. Workshop on Factory Communication Systems, pp. 365–368. IEEE Press, Los Alamitos (2004) 13. Neumann, P., Simon, R., Diedrich, C., Riedl, M.: Field Device Integration. In: Proceedings of the 8th IEEE Int. Conf. on Emerging Technologies and Factory Automation, vol. 2, pp. 63–68. IEEE Press, Los Alamitos (2001) 14. Simon, R., Riedl, M., Diedrich, C.: Integration of Field Devices using Field Device Tool (FDT) on the basis of Electronic Device Descriptions (EDD). In: Proceedings of the 2003 IEEE Int. Symp. on Industrial Electronics, vol. 1, pp. 189–194. IEEE Press, Los Alamitos (2003) 15. Tetsuo, T.: FDT/DTM Framework for new Field Device Tools. Yokogawa Technical Report, no. 44, pp. 13–16 (2007), http://www.yokogawa.com/rd/pdf/TR/rd-tr-r00044-004.pdf 16. The OPC Foundation (2008) The OPC Unified Architecture Specification: Parts 1-11. Version 1.xx (2008), http://www.opcfoundation.org/Downloads.aspx 17. Tom, H., Mikko, S., Seppo, K.: Roadmap to Adopting OPC UA. In: Proceedings of the IEEE Int. Conf. on Industrial Informatics, pp. 756–761 (2008) 18. Yamamoto, M., Sakamoto, H.: FDT/DTM Framework for Field Device Integration. In: Proceedings of the SICE Annual Conference, pp. 925–928 (2008) 19. Zeeb, E., Bobek, A., Bohn, H., Golatowski, F.: Service-Oriented Architectures for Embedded Systems Using Devices Profile for Web Services. In: Proceedings of the 21st Int. Conf. on Advanced Information Networking and Applications Workshops, pp. 956–963. IEEE Press, Los Alamitos (2007)
A SOA-Based Framework for Building Monitoring and Control Software Systems Vu Van Tan, Dae-Seung Yoo, and Myeong-Jae Yi School of Computer Engineering and Information Technology University of Ulsan, San-29, Moogu-2 dong, Namgu, Ulsan 680-749, Korea {vvtan,ooseyds,ymj}@mail.ulsan.ac.kr,
[email protected] Abstract. This paper proposes a SOA-based framework for building complex monitoring and control software systems used in modern process and factory automation today where production processes will span over all types of systems. This framework is developed with utilization of the OPC Unified Architecture (UA) specifications and Object-Oriented Design (OOD). It provides generic components upon which sophisticated production processes can be modeled. Solutions to security of remote invocations are implemented to make this framework capable and reliable. The preliminary experiment results are provided, and the comparison with existing approaches and the discussion are also presented. They demonstrate that the proposed framework is feasible for applying to web service-based monitoring and control system applications. Keywords: Framework, Monitoring and control, OPC, Object-oriented design, SOA, Software system, Unified architecture, XML, Web service.
1
Introduction
Web services enable distributed applications to be independent of any operating system, so that users can deploy one application on Windows Platforms and another application on UNIX or Linux, and have two systems communicating with each other seamlessly. Web services and XML technology were proposed to use into industrial system applications [19]. Programmers have several choices to consider the design and implementation of their applications such as Microsoft’s .NET technology and C# programming language or Java. More recently, the OPC Foundation has defined new specifications as the next generation for process control and monitoring running on various platforms, i.e., the OPC Unified Architecture (UA) specifications [23]. The OPC UA standard is based on XML, web services, and Service-Oriented Architecture (SOA). This standard makes a significant asset to business network applications and has the potential to open the business world to industrial connectivity. However, the OPC UA standard now is making a great challenge to propose a new device
Corresponding author.
D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 1013–1027, 2009. c Springer-Verlag Berlin Heidelberg 2009
1014
V.Van Tan, D.-S. Yoo, and M.-J. Yi
integration concept to researchers [24]. It also makes new challenging scenarios like Asset Management and Manufacturing Execution System (MES). A monitoring and control system can be characterized as a distributed and integrated monitoring, control, and coordination system with partially cyclic and event-based operations. Its control functions can be divided into continuous, sequential, and batch control. The role of continuous control makes process control system different from others like discrete manufacturing systems. In addition to control functions, this system has other functions including performance monitoring, condition monitoring, abnormal situation handling and reporting. In the last few years, a small number of approaches based on the OPC XMLDA specification [20] have been proposed and developed in order for applying OPC-based web services to monitoring and control systems. However, the performance of these approaches is limited because of using pure XML textual data representation. Furthermore, the design solutions implemented within existing approaches are still difficult to identify, evaluate, and reuse in terms of framework components and software engineering. Despite recent software engineering, e.g., object-oriented programming, design patterns, components, etc., it is still difficult to build industrial software systems [9]. An object-oriented framework was defined as a prefabricated extensible set of classes or components with predefined collaboration between them and extension interfaces. A framework therefore forms a predefined architecture that models the interaction between its components and potential extensions. The study of this paper aims at proposing a scientific framework to develop monitoring and control software systems applying for process and factory automation. The binary data representation is supported to guarantee high performance issue. To ensure security of remote invocations in the heterogeneous environments, the security solutions are implemented into the proposed framework for application developers and programmers. The proposed framework is designed and developed according to the suggested design criteria and ObjectOriented Design (OOD). They make the framework relatively easy to reuse, maintain, and upgrade. This framework also complies with the requirements of control and monitoring applications used in industry today.
2 2.1
Background and Related Work Related Work
In recent decades there have been significant developments in both hardware and software such as high-speed networks, web services, and grid computing [6]. It is important to recognize the benefits of these new developments and to design software to make use of them. Web services are the latest tools from major computer hardware and software vendors such as Microsoft, IBM, Sun Microsystems, and other. Based on the OPC XML-DA specification, several approaches focused on XML web services have been proposed and developed for industrial systems recently.
A SOA-Based Framework for Building Monitoring and Control Software
1015
Chilingaryan and Eppler [4] developed a data exchange protocol using the OPC XML-DA specification. Its development was motivated by achieving consistency with high level standards, multi-platform compatibility, and high performance systems. However, the design solutions implemented within the protocol are not clearly provided. The system performance related to high performance requirements has not evaluated because only some parts were developed. Jia and Li [14] presented a design of embedded web server used for process control system based on the OPC XML-DA specification. A layer of XML-DA services was added between web service interface and embedded operation system in embedded web server by applying OPC, Java, and XML in order to link to the Internet through TCP/IP. Unfortunately, no experimental results according to their design were provided to readers. Usami et al. [25] developed a prototype embedded XML-DA server on the controller that can either send or receive the information instead of the field devices. The client can communicate with the controllers. The performance was provided to illustrate the system has an acceptable performance. However, the design solution in this prototype is not easy to identify and evaluate for reusing in terms of framework components and software engineering. Khalgui et al. [16] proposed two refinement mechanisms with the four policies for selecting data according to the expectation of clients for improving the network traffic between the OPC XML-DA server and its clients. The first refinement allows us to reduce the size of a message by applying a selection method for the returned value. The second one is to allow reducing the amount of communications by increasing the acquisition period. Eppler et al. [8] suggested some control system aspects to build control applications for transferring data in an efficient way. These aspects are based on the SOAP protocol with attachment of the binary data representation. By this way the size of a SOAP message size can be reduced about more than six times or more. As a result XML parsing and application performance can be improved upon through binary encodings. On the other hand, commercially available systems based on the XML-DA specification are successfully developed by some software companies such as Advosol, Softing AG, Technosoftware, etc., applying them to the control systems of relatively slow processes that used in process and factory automation in terms of enterprise application level. Unfortunately, the technical documents related to the design solutions of these systems have not provided because of secret. 2.2
State-of-the-Art OPC Unified Architecture Standard
The OPC UA standard shares information in more complex data structure formats with enterprise level MES and ERP systems in a way that they can understand. It embodies all functionalities of the existing OPC servers and expands on top of them. One of the key problems with the new standard is that implementing them can be quite challenging. The OPC Foundation has taken many steps to guarantee that the implementation of the standard would be relatively straightforward and easy process.
1016
V.Van Tan, D.-S. Yoo, and M.-J. Yi
To facilitate the adoption of the new standard and to reduce the barrier to entry, an OPC-UA Software Development Kit (SDK) is being developed. The SDK is the entry point to jump-start the existing applications and makes them OPC-UA-enabled. The SDK consists of a series of application programming interfaces and sample code implementations. To that end, the UA specifications are written to be platform-agnostic and, for that reason, the SDK comes in different flavors to facilitate the adoptions on different platforms. The .NET, ANSI C, and Java implementation samples will be provided. The OPC UA standard intends to enable enterprise interoperability and expects to solve enterprise integration challenges. This is a very ambitious undertaking and has been difficult to determine what elements of enterprise interoperability can actually be standardized. It is clear that this standard does not provide everything needed for interoperability from the enterprise-IT perspective, but the impact is expected to be considerable. 2.3
Research Contribution
Frameworks for developing monitoring and control system applications are widely accepted and provide support for easy and fast development to application developers. However, most of existing approaches are generic with a focus on infrastructure issues and only rarely dedicated to a specific domain. This research describes a scientific application framework for building monitoring and control software systems in application domains of process and factory automation, including all relevant software development issues: (i) infrastructure issues and technical details, (ii) framework architecture, (iii) components, (iv) device integration concept, and (v) security solutions. This framework provides generic components upon which production processes can be modeled. It enables realworld devices to seamlessly participate in business processes that span over several systems from back-end down to the field devices on the plant floor.
3
Framework Design Criteria
As an object-oriented framework is defined as a prefabricated extensible set of classes or components with predefined collaboration between them and extension interfaces [9], a framework therefore forms a predefined architecture that models the interactions between its components and potential extensions. Moreover, a software framework has to meet the requirements that rise from the software engineering point of view and the changes in technology. A two-step requirement analysis should be performed to construct the framework design criteria including domain analysis and domain design [17]. Thereby, the objective of the domain analysis is the development of a domain model that contains the knowledge of a domain in a systematic way. The key of the domain design is the transformation of the results of the domain analysis into reusable form, implementation-oriented objects and the design relations between the objects. The six design criteria are suggested as follows:
A SOA-Based Framework for Building Monitoring and Control Software
1017
1. Generic requirements. The fucus of suggested framework is the application development of the monitoring and control system domain. Therefore, a universal architecture is necessary with treating different organizational types of monitoring and control systems. 2. Methodology. The framework concept introduces a new level of abstraction that provides an easier conceptualization of the problem domain. This enables the implementation of more complex control and monitoring strategies where process control is distributed and large amounts of distributed data are collected. 3. Flexibility. This feature indicates that a system should be able to adapt to various circumstances. Flexibility in the proposed framework is incorporated in the context of control and monitoring of process and factory automation. 4. Reusability. Software reusability is one of the important goals of developing frameworks. The suggested framework is ensured by designing and developing a generic structure, architecture, and modules. 5. Openness. By complying with the OPC technology, the proposed framework makes it flexible and open for the development of a specific application. 6. Compatibility with existing standards. Because of using the OPC UA specifications, the suggested framework embodies all functionalities of existing OPC specifications [19,20,21,22]. It is that backward compatibility with previous standards will ensure quick adoption on the development. However, this criterion is quite challenging to researchers and developers.
4 4.1
The Framework Architecture The Infrastructure Issues and Technical Details
Despite the role of client-server architectures in the field of business software systems, the Service-Oriented Architecture (SOA) is on the move and is foreseeable that this architectural paradigm will be dominant in the future. Enabling efficient collaboration between device-level SOA on the one hand and on the other hand services and applications that constitute the enterprise back-end is a challenging task. By integrating web services on the plant floor, devices have the possibility of interacting seamlessly with the back-end systems [12,13]. However, to provide mechanisms for discovering other service-enabled devices or methods for maintaining a catalogue of discovered devices, it is difficult task for current products [6, 1]. There are still big differences between device-level SOA and the one that is used in back-end systems. These differences can be overcome by using a middleware between the back-end applications and the services which are offered by devices, service mediators, and gateways. The fundamental architecture of system integration to integrate device-level services with enterprise software systems running on the application level using web service technology can be shown in Fig. 1. It shows that there are at leat two different ways to couple networked embedded devices with enterprise services.
1018
V.Van Tan, D.-S. Yoo, and M.-J. Yi
Fig. 1. The fundamental architecture of a system integration
One way is that a direct integration of device-level services in business processes should be built, while another way exposes device functionality to the application layer via a middleware layer. Today most of human-machine interface, supervisory control and data acquisition (SCADA), distributed control system (DCS), and software programmable logic controller (PLC) manufacturers offer interfaces that conform to the OPC specifications because of supporting industry-standard interoperability, enhanced productivity, and better collaboration across systems. 4.2
Architectural Overview
The integration of devices into the business IT-landscape through SOA is a promising approach to digitalize physical objects and to make them available to IT systems. It can be tackled by running instances of web services on these devices, which move the integration of back-end applications such as Enterprise Resource Planning (ERP) systems, creating an Internet of Services to collaborate and empower the future service-based factory. One key advantage of using services is that functionality provided by different devices can be composed to allow for more sophisticated application behavior. The use of services is also desirable because business software today is built more and more in a service-oriented way based on web services [12, 13]. Since 1996, the OPC Foundation has been provided specifications for communication among devices used in process automation, published protocols that quickly became the standard for connecting to software and hardware. Such devices include sensors, instruments, programmable logic controllers, humanmachine interfaces, and alarm subsystems used in manufacturing, oil, and gas. Based on the OPC UA specifications [23], the architecture to integrate the
A SOA-Based Framework for Building Monitoring and Control Software
1019
Fig. 2. The proposed architecture for monitoring and controlling field devices
devices from the plant floor to enterprise applications can be proposed as shown in Fig. 2. It shows that there are two methods for mapping devices to the address space at the enterprise applications. One method is to directly integrate embedded devices and their functionality to the OPC UA server and other one is to map the embedded devices through existing OPC DA servers. This framework architecture is based on the OPC client-server model. It provides a means for aggregating one or more OPC servers into its address space. Although the client does not need accesses to multiple servers in the normal case, this feature is provided to make a flexible architecture in order to satisfy multiple servers for different clients in some special cases. The UML (Unified Modeling Language) sequence diagram to establish the connection between the client and the server is shown in Fig. 3. To do this, the Discovery services will find endpoints which provide security requirements and services for clients. Since the system has developed, the implementation is based on the assumption that the clients have known that endpoint to address. If the endpoint is known, a secure channel to the client is established by the Secure Channel services and the Session services allows to be created a session between the client and server [23]. When a client has completed its task for the connection, it will add a new node to the appropriate object in the address space. Once the client has been connected to the server, the objects to be monitored are registered.
1020
V.Van Tan, D.-S. Yoo, and M.-J. Yi
Fig. 3. Interactions for a connection between the client and server
4.3
Components
The proposed architecture is the bridging technology which enables the use of existing software systems and OPC servers enabled devices. Although direct access from an ERP system to devices is possible, the proposed architecture simplifies the management of field devices and acquires data from these devices. The components of the proposed architecture are briefly described as follows: (1) Discovery Component. This component defines services used to discover the endpoints implemented by a server and to read the security configuration for these endpoints. The Endpoint Discovery that is part of this component is provided by server for client to access without establishing a session. In general, the endpoint may or may not have the some session that clients use to establish a Secure Channel. Once a client retrieves the endpoints, this client can save this information and use it to connect to server again. Servers may register themselves with a well-known Server Discovery using the RegisterServer service. The Server Discovery which is part of the Discovery Component is used for clients to discover any registered servers by calling FindServers service. In a case the server’s configuration has changed, the client needs to go through the discovery process again. (2) Session Manager. This component consists of services that manage session for clients. It is responsible for validating incoming requests and cleaning up when a session expires. Before calling these services, the client must create a secure channel to ensure integrity of all messages exchanged during a session.
A SOA-Based Framework for Building Monitoring and Control Software
(3)
(4)
(5)
(6)
(7)
4.4
1021
The secure channel is created together with a session by means of security solutions. A communication channel opened with a secure channel ensures the confidentiality and integrity of all messages exchanged between the client and server. Subscription Manager. After creating a connection with the establishment of secure channel between the client and server, the users in the client side can read or subscribe to data or events from field devices on the plant floor. The Subscription Manager component provides services to manage the subscription. Device Node Manager. This component is composed of services that manage the server’s address space. It is an abstraction that collects data and events from different sources, i.e., field devices on the plant floor, and exposes them via a standard interface. The services provided by this component are used to add and delete address space nodes and reference between them. Monitoring Manager. Clients define monitored items to subscribe to data and events. Each monitored item identifies the item to be monitored and the subscription to use to send notifications. This is also assigned a sampling interval that is either inherited from the publishing interval of the subscription or that is defined specifically to override that rate. The attributes and variables for value or status changes of a node, including the caching of values and the monitoring of nodes for events are focused. COM Wrapper. This component is used to map the OPC server’s address space onto the UA information model. It consists of a Device Node Manager, which encapsulates all accesses to the OPC server (or simply COM server). The Session Manager is responsible for mapping the user identity associated with a session. This component also allows different sessions to use the same COM server instance for the clients to access. COM Proxies. This component was implemented for supporting the COM clients to access the OPC UA server. These Proxies map the UA server’s address space onto the COM address space. All of the mappings hide any part of the UA server’s address space that cannot be represented with the COM address space. Data Representation
The fundamental approach to solve the bandwidth problems today is using binary data representation, which is integrated into XML such as BXML [3], BXSA [5], etc. Recently, some solutions are available to satisfy these conditions like SOAP with attachment or HTTP message using XLink [8]. To provide a fast and reliable solution to process monitoring and control applications based on XML and web services, the binary data are incorporated into SOAP message, i.e., the SOAP header is still XML in order to pass the firewall and the body of the SOAP message is encoded as binary data to reduce the size of a message.
1022
4.5
V.Van Tan, D.-S. Yoo, and M.-J. Yi
Field Device Integration
As the number of field devices increases, the more complicated settings and adjustments need to use advanced functions in such devices have become challenges [1,6,13]. Today’s technologies such as Electronic Device Description Language (EDDL) [7] and Field Device Tool (FDT) [10] are available for device integration [26, 11, 15, 18]. The FDT technology uses programmed software components, called Device Type Manager (DTM). Distributed control system (DCS) or automation system vendors would like to achieve robustness while assuring a high level of technology and platform independence. Device manufacturers want to support only one technology instead of two in order to reduce effort. The presented device integration strategy integrates (i) the advantages of EDDL – platform independence, ease of use and robustness and (ii) the advantages of FDT – unlimited functionality, extensibility, and market differentiation. To realize the ability of Internet-based monitoring and control, and to enable efficient collaboration between device-level SOA on the one hand and on the other hand services and applications, the solution developed by Grossmann et al. [11] was modified to apply the proposed framework, including Device Information Model (DIM) and Device Operation Model (DOM). The DIM presents the device data and function exposed by device firmware and is unique for each device. It encapsulates the communication with a device and ensures that applications can access data and functions independently from connection-specific properties. All DOMs access the device’s DIM to request device data and functions. The server loads the device description containing DIM and DOMs. While the DIM is directly processed within the server to establish the information model to the address space, the DOMs are only handled in terms of sever storage. The server information allows OPC UA clients to access the device data and functions that are part of the DIM. The architecture model of device integration using DIM and DOM is shown in Fig. 4. Each device is represented by a Device object. The device object includes sub-objects which hold the data items
Fig. 4. The device integration concept based on both FDT/DTM and EDDL
A SOA-Based Framework for Building Monitoring and Control Software
1023
of the data model. To offer the DIM access to the device data, the server will communicate with the devices over the fieldbus protocols such as FOUNDATION fieldbus, PROFIBUS, CANopen, DeviceNet, and HART through HART gateway. But having the services on devices will not be much of use if they can not be dynamically discovered by the server. Thus, discovery services should be implemented in order to find installed devices on the plant floor.
5
The Framework Security Solutions
Security of remote invocations in the heterogenous environments is very important for Internet-based monitoring and control including the authentication of clients, encryption of messages, and the access control. The security objectives for these systems are integrity, confidentiality, availability, authentication, nonrepudiation, accountability, and reliability [2, 23]. The XML security consisting of XML signature and XML encryption is the key concept to meet the required security objectives. The XML encryption complies with the generic security objective confidentiality. The XML signature meets the requirements of the security objectives like integrity, non-repudiation, and authentication. The XML signature also implicitly meets the security objectives accountability and confidentiality due to the authentication. Thus, digital signatures are suitable for OPC UA-based monitoring and control applications. The XML signature is a system to encode digital signatures in an XML document. It uses the same standards and technologies of cryptography as usual digital signatures. The basic of digital signatures is asymmetric cryptography. In contrast to symmetric cryptography, an entity in asymmetric cryptography creates a key pair in which one of them is the public key that can safely be published. By this way data can be encrypted for only the private key to decipher. The private key can be also used to generate digital signatures that are used to verify the public key. The steps to sign a message and to verify the signed message by communication partner using digital signatures are presented as follows: (1) Calculating the hash code for data to be signed. (2) Encrypting the hash code using private key. It ensures the security objective non-repudiation. (3) Attaching the hash code and the encrypted hash code into the message before transmitting this message. (4) The receiver of the singed message will decrypt the encrypted hash code with the public key of the communication partner. If the decrypted hash code and the plain hash code of the message are identical then the communication partner is the expected one. Otherwise the message has to be discarded or rejected because of a potential attack. (5) The receiver calculates the hash code of the same data and compares the two has codes. If the hash codes are identical the integrity of the message has been guaranteed. Otherwise the data were modified that must be discarded.
1024
6 6.1
V.Van Tan, D.-S. Yoo, and M.-J. Yi
Experiment Results and Performance Evaluation Experiment Setup
A number of performance tests were performed by verifying the various number of OPC items in the client side. The setup was composed of the OPC UA server and client running on Windows XP Service Pack 2 as follows: (1) (2) (3) (4) (5) 6.2
Intel Pentium IV CPU with the clock frequency of 2.66 Ghz. Capacity of PC 3200 DDRAM is 1 Gb. Hard disk at least 5 Gb of free space. The OPC UA client and server running on the same computer. Microsoft .NET Framework 3.0 Service Pack 1. Preliminary Results and Performance Evaluation
The time taken for operation Read to fetch a various number of items under various conditions is set out to measure. The comparison of the time taken for operation Read under the fixed conditions (binary encoding, signature and encryption, 256-bit Basic) with either using HTTP or TCP is shown in Fig. 5(a). It indicates that using binary encoding will improve performance much. The comparison of the time taken for operation Read under the fixed conditions (binary encoding, HTTP, and 256-bit Basic) with using signature and encryption, signature, and none, respectively, can be shown in Fig. 5(b). It indicates that the overall performance of the proposed system when using both signature and encryption is approximately about two times slower than the case of not using both ones. It also indicates that the overall performance of the proposed system when using only signature is about three second times faster than the case of using both encryption and signature. The preliminary results show that the proposed system has a sufficient good performance and can be acceptable to many real monitoring and control system applications.
(a)
(b)
Fig. 5. (a) The time taken for operation Read using either HTTP or TCP protocol. (b) The time taken for operation Read with different security modes: Signature and Encryption, Signature, and None.
A SOA-Based Framework for Building Monitoring and Control Software
7 7.1
1025
Comparison and Discussion Comparison with Existing Approaches
The comparison of the proposed architecture with existing approaches [4, 14, 25, 16, 8] is difficult due to their conceptual nature, architecture, and wide range of production environments. A qualitative comparison of the proposed framework with others could be made. The proposed system has the following advantages: 1. The proposed framework provides a single and interoperable way to access process data from the plant floor to the enterprise applications such as Computer Maintenance Management System (CMMS), Enterprise Resource Planning (ERP) system, Enterprise Asset Management (EAM) system, and Manufacturing Execution System (MES) in easy mechanism. 2. This framework fully addresses the issues of sharing information in complex data structure formats with the enterprise systems in a way that they can understand. It can do them by providing the means to handle complex data structures and to transport them in a securable, reliable, and SOA approach. 3. All data such as current data, historical data, and alarms and events are provided in a unified address space to make them relative with each other while existing approaches only collect each type of data into theirs address space without relations. 4. The proposed framework will open a flexible gate to enterprise applications to communicate with plant floor via OPC DA or others. This means that existing products still play as they are by using wrappers. 5. By complying with six framework design criteria, the structure of the proposed framework is designed and developed in a systematic way. The framework provides generic components upon which sophisticated processes can be modeled. 7.2
Discussion
Since the work presented in this paper is part of ongoing research on the area of automation systems with OPC technologies, it seems to have such a framework in particular to access future work. The potential areas of the proposed framework include the followings: – The applications of process monitoring and control systems used in process and factory automation where control is distributed and large amounts of distributed data are collected. – Business applications such as CMMS, ERP, and EAM systems. These applications require data updates that the frequency is in second(s) or minute(s) and usually update every shift or more typical. – Process Analysis applications such as Enterprise Process Historian, Analysis Reporting, Trending, and others. The frequency of data capture of these applications is not nearly as important as the fact that data donot get loss. – Remote Monitoring applications such as web-based process visualization.
1026
8
V.Van Tan, D.-S. Yoo, and M.-J. Yi
Concluding Remarks and Future Works
A SOA-based framework was introduced for designing and developing monitoring and control system applications. The proposed framework fulfills the six suggested design criteria to make it flexible, open and compatible with existing OPC functionality. The security solutions were incorporated into the proposed framework for developers and programmers to develop their applications in a systematic way. The proposed framework has a good performance and is expected to deploy for modern process and factory automation systems. Industrial systems now need to be independent of any operating systems and platforms. Therefore, the OPC UA standard is a good choice for the development of web-enabled industrial automation and manufacturing software systems. Technology in turn will mature in time and the OPC UA standard will surely rise as the eventual winner. It will be included in the future products made by many leading companies in area of industry. The SOA will be dominant in the future. Therefore, the integration of field devices into the business-IT systems through SOA is a promising approach to researchers. Trying to use the proposed framework for the implementation of a control and monitoring approach for flexible systems is the major task. The application case studies will be also investigated and deployed in the future work. Acknowledgements. This work was supported in part by the Korean Ministry of Knowledge Economy and Ulsan Metropolitan City through the Networkbased Automation Research Center (NARC) at the University of Ulsan and by the Korea Research Foundation Grant funded by the Korean Government (KRF-2009-0076248).
References 1. Bohn, H., Bobek, A., Golatowski, F.: SIRENA – Service Infrastructure for Realtime Embedded Networked Devices: A Service Oriented Framework for Different Domains. In: Proceedings of the Int. Conf. on Systems and Int. Conf. on Mobile Communications and Learning Technologies, p. 43 (2006) 2. Braune, A., Hennig, S., Hegler, S.: Evaluation of OPC UA Secure Communication in Web browser Applications. In: Proceedings of the IEEE Int. Conf. on Industrial Informatics, pp. 1660–1665 (2008) 3. Bruce, C.S.: Cubewerx Position Paper for Binary XML Encoding, http://www.cubewerx.com/main/HTML/Binary_XML_Encoding.html 4. Chilingargyan, S., Eppler, W.: High Speed Data Exchange Protocol for Modern Distributed Data Acquisition Systems based on OPC XML-DA. In: Proceedings of the 14th IEEE-NPSS Real-time Conference, pp. 352–356 (2005) 5. Chiu, K., Devadithya, T., Lu, W., Slominski, A.: A Binary XML for Scientific. In: Proceedings of the 1st Int. Conf. on e-Science and Grid Computing (2005) 6. de Souza, L.M.S., Spiess, P., Guinard, D., K¨ ohler, M., Karnouskos, S., Savio, D.: SOCRADES: A web service based shop floor integration infrastructure. In: Floerkemeier, C., Langheinrich, M., Fleisch, E., Mattern, F., Sarma, S.E. (eds.) IOT 2008. LNCS, vol. 4952, pp. 50–67. Springer, Heidelberg (2008)
A SOA-Based Framework for Building Monitoring and Control Software
1027
7. EDDL - Electronic Device Description Language, http://www.eddl.org/ 8. Eppler, W., Beglarian, A., Chilingarian, S., Kelly, S., Hartmann, V., Gemmeke, H.: New Control System Aspects for Physical Experiments. IEEE Transactions on Nuclear Science 51(3), 482–488 (2004) 9. Fayad, M.E., Schmidt, D.C., Johnson, R.E.: Building Application Frameworks: Object-Oriented Foundation of Framework Design. Wiley, Chichester (1999) 10. FDT-JIG Working Group: FDT Interface Specification, Version 1.2.1. FDT Joint Interest Group (2005), http://www.fdtgroup.org/ 11. Grossmann, D., Bender, K., Danzer, B.: OPC UA based Field Device Integration. In: Proceedings of the SICE Annual Conference, pp. 933–938 (2008) 12. Jammes, F., Smit, H.: Service-Oriented Paradigms in Industrial Automation. IEEE Transactions on Industrial Informatics 1(1), 62–70 (2005) 13. Jammes, F., Mensch, A., Smit, H.: Service-Oriented Device Communications using the Devices Profile for Web Services. In: Proceedings of the 3rd Int. Workshop on Middleware for Pervasive and Ad-Hoc Computing, pp. 1–8 (2005) 14. Jia, Z., Li, X.: OPC-based architecture of embedded web server. In: Wu, Z., Chen, C., Guo, M., Bu, J. (eds.) ICESS 2004. LNCS, vol. 3605, pp. 362–367. Springer, Heidelberg (2005) 15. Kastner, W., Kastner-Masilko, F.: EDDL inside FDT/DTM. In: Proceedings of the 2004 IEEE Int. Workshop on Factory Comm. Systems, pp. 365–368 (2004) 16. Khalgui, M., Rebeuf, X., Zampognaro, F.: Adaptable OPC-XML Contracts Taking into Account Network Traffic. In: Proceedings of the 10th IEEE Conf. on Emerging Technologies and Factory Automation, pp. 31–38 (2005) 17. M¨ onch, L., Stehli, M.: Manufag: A Multi-agent-System Framework for Production Control of Complex Manufacturing Systems. Information Systems and e-Business Management 4(2), 159–185 (2006) 18. Simon, R., Riedl, M., Diedrich, C.: Integration of Field Devices using Field Device Tool (FDT) on the basis of Electronic Device Descriptions (EDD). In: Proceedings of the 2003 IEEE Int. Symp. on Indus. Electronics, pp. 189–194 (2003) 19. The OPC Foundation (2004a): The OPC Data Access Specification. Version 3.0 (2004), http://www.opcfoundation.org/Downloads.aspx 20. The OPC Foundation (2004b): The OPC XML-Data Access Specification. Version 1.01 (2004), http://www.opcfoundation.org/Downloads.aspx 21. The OPC Foundation (2003): The OPC Historical Data Access Specification. Version 1.0 (2003), http://www.opcfoundation.org/Downloads.aspx 22. The OPC Foundation (2002): The OPC Alarms and Events Specification. Version 1.0 (2002), http://www.opcfoundation.org/Downloads.aspx 23. The OPC Foundation (2008): The OPC Unified Architecture Specifications: Parts 1-11. Version 1.xx (2008), http://www.opcfoundation.org/Downloads.aspx 24. Tom, H., Mikko, S., Seppo, K.: Roadmap to Adopting OPC UA. In: Proceedings of the IEEE Int. Conf. on Industrial Informatics, pp. 756–761 (2008) 25. Usami, K., Sunaga, S.-I., Wada, H.: A Prototype Embedded XML-DA Server and its Evaluations. In: Proceedings of the SICE-ICASE Int. Joint Conference, pp. 4331–4336 (2006) 26. Yamamoto, M., Sakamoto, H.: FDT/DTM Framework for Field Device Integration. In: Proceedings of the SICE Annual Conf., pp. 925–928 (2008)
Data Fusion Algorithm Based on Event-Driven and Minimum Delay Aggregation Path in Wireless Sensor Network∗ Tianwei Xu1, Lingyun Yuan1, and Ben Niu2 1
College of Computer Science and Information Technology, Yunnan Normal University, Yunnan, China 2 College of Management Shenzhen University, Shenzhen, China
[email protected],
[email protected],
[email protected] Abstract. Emergent event detection is one of the most important applications in event-driven wireless sensor network. In order to save energy, a dynamic clustering routing algorithm based on event severity degree is presented rather than period monitoring in the paper. The lifetime and the scale of clusters lie on the emergent event severity degree. On the other hand, the data must be collected, fused and transmitted in real time for the emergencies. Most existing data aggregation methods in wireless sensor networks just have longer network delay. The minimal delay aggregation path mechanism is put forward to reduce the network delay. The hop-count from itself to its cluster head and the degree determine the waiting time for a fusing node. With every waiting time for the fusing nodes in the path, the minimal delay aggregation path can be obtained. The simulation results show that our algorithm can save energy efficiently, and reduce the network delay remarkably. It achieved a better tradeoff between the energy and delay. Moreover, it provides a good method for wireless sensor network applied to monitoring emergencies. Keywords: Wireless Sensor Network, Data Fusion, Event Driven, Minimum Delay Aggregation Path.
1 Introduction The emergent event detection is a crucial application in wireless sensor network, such as fire, earthquake and et al. For the emergent event detection, event-driven wireless sensor network which has some characteristics such as emergency, abundant data and high redundancy is the most appropriate model. The sensor nodes have limited ∗
The work is supported by the general project of social development in Yunnan Province (Grant no. 2008CD113), the foundation project of education office in Yunnan province (Grant no. 08Y0136), and Shenzhen-Hong Kong Innovative Circle project (Grant no. SG200810220137A).
D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 1028–1038, 2009. © Springer-Verlag Berlin Heidelberg 2009
Data Fusion Algorithm Based on Event-Driven and Minimum Delay Aggregation Path
1029
energy, and can not replace expediently. To extend the network lifetime in emergent event detection is one of the most important goals. On the other hand, emergent events always need to be reported in time. If some event occurs, the data will increase abruptly in a short time. The event message must be transmit to the sink node accurately and in real-time. So the network delay is another factor which must be considered. How to fusing the event data with low network delay and long lifetime is a very important problem for the event-driven wireless sensor network. At present, data fusion methods in wireless sensor network are mostly based on routing algorithm, such as LEACH [1], TEEN [2], Steiner tree [3]. In SRDA [4], the nodes only need to transmit the variation rather than original data, in which the variation is made by comparing the original data to reference data. In SIA [5], the nodes only transmit the fused data to base station, in which a lot of fusing functions such as max, min, average, sum are involved. A new data aggregating model Q-Digest [6] is presented, in which wireless sensor network can respond to multi-request. The member nodes and cluster heads are executed independently in distributing KMDBA [7], and the network is divided into multi-clusters to avoid that the redundancy data is directly transmitted to the cluster head from the member nodes. These protocols in [13] mainly focus on constructing the shortest path tree to achieve data fusion. The methods in [4-7] aim to study the data fusing algorithm based on periodicity and center-based query. The authors [8] presented the detection standard while event occurs, and the data fusion method is based on the data correlativity in space-time. Tang and et al. [9] put forward an event correlativity method based on finite state machine. But it’s just a theory model. The characteristics of emergent event are not considered, such as the severity degree, the location and the data comparability from the nodes in the event areas in all above algorithms. In the paper, a dynamic clustering method based on the properties of emergent event is presented as the first phase of data fusion algorithm with which energy consumption of network nodes is reduced. On the other hand, to decrease network delay is another important goal. In LEACH[1], SCT[10], all fusing nodes have the same waiting time before the fusing operations are performed, in which the waiting time is enough long so that the fusing nodes can receive all fusing data before timeout. The method would increase the network delay, while saving the network nodes’ energy. In [11, 12], the authors propose a fusing time mechanism. A diverse fusing tree would be built between the source node and sink node by transmitting the data. The fusing nodes set up their waiting time according to the distance from themselves to sink node. Yuan and et al [13] put forward a controllable multi-level fusing time scheme. The fusing nodes set up their waiting time according to the distance from themselves to sink node, and the sink node tunes up the maximum delay. In the timing control protocol [14], the waiting time can be tuned up according to the fusing quality. Wang [15] has presented a group aware real-time data aggregation tree algorithm, in which the path can be selected with satisfying the latency constraint and lower energy consumption. An aggregate contribution based delay-time allocation algorithm [16] is proposed, in which the impact on aggregation efficacy of different positions in the route tree is quantified, and the aggregation time of every node at sink is allocated. Through all above fusing timing methods are trying to reduce network delay, the real-time of data is very important for the event-driven network. Taking the factor into consideration, the minimum delay aggregation path is put forward in this paper,
1030
T. Xu, L. Yuan, and B. Niu
in which the waiting time is determined by the hop-count from the member node to its cluster head and the degree of the member node. With the waiting time of all nodes in the paths, the minimum delay aggregation path can be achieved. So a data fusion algorithm based on event-driven and minimum delay aggregation path (EDMDP) is presented for emergent event detection in the paper. This paper is organized as follows. Network model and radio model are introduced in section 2. In section 3, data fusion algorithm based on event-driven and minimum delay aggregation path is described in detail. And the experimental results and analysis are given in section 4. Finally, the conclusions are drawn in section 5.
2 Network Model 2.1 Network Model This research is based on event-driven wireless sensor network. We assume that there are N nodes deployed in the monitoring area A randomly. And some characteristics are described as following: (a) The sensor nodes are static, and they will never move when deployed. (b) There is only one base station. The base station can be deployed in the center of network or somewhere out of A. The experimental results indicate it’s more efficient when the base station is deployed in an immobile location out of A. The base station has unlimited energy, and can send information to all nodes. (c) All nodes have the same initial energy which can’t be replaced. And all nodes have similar processing and communicating capabilities. (d) The energy consumption of these sensor nodes is different in each round. The base station can calculate the residual energy after each round ends up. (e) The sensor nodes can obtain their location information by the base station. 2.2 Radio Energy Model Low-power and wireless communication are studied in recent years widely. We use the same radio model as used in LEACH, PEGAIS, which is the first order radio model. In this model, a radio dissipates Eelec = 50nJ / bit to run the transmitter or receiver circuitry and ε amp = 100 pJ / bit / m 2 for the transmitter amplifier. The radios have power control and can expend the minimum required energy to reach the intended recipients. The radios can be turned off to avoid receiving unintended transmissions. This radio model defines a distance threshold d0 which is a constant and its value depends on applications. When the distance between transmitting node and receiving node is below on d0, the energy consumption of the transmitting nodes is in inverse proportion to d2, otherwise it’ s in inverse proportion to d4, namely free space model and multi-path fading model. According to the distance between the transmitting nodes and the receiving nodes, the transmitting nodes can use different energy consumption model to calculate the needful energy which is used to transmit data. The equations are used to calculate transmitting costs and receiving costs for a k-bit message and a distance d are shown below:
Data Fusion Algorithm Based on Event-Driven and Minimum Delay Aggregation Path
1031
Transmitting:
ETx (k , d ) = ETx −elec (k ) + ε Tx − amp (k , d ) ,
(1)
ETx (k , d ) = Eelec * k + ε amp * k * d 2 ,
(2)
ETx (k , d ) = Eelec * k + ε amp * k * d 4 .
(3)
ETx (k , d ) = ETx − elec (k ) ,
(4)
ERx (k ) = Eelec * k .
(5)
Receiving:
On the other hand, the energy consumption on data fusion also can not be ignored. 5nJ/bit/message is use to calculate the energy cost for data fusion. In this paper, it is assumed that the radio channel is symmetric so that the energy required to transmit a message from node i to node j is the same as the energy required to transmit a message from node j to node i for a given signal-to-noise ratio. For the comparative evaluation purposes, we assume that there are no packet losses in the network. It is not difficult to model errors and losses in terms of increased energy cost per transmission. With known channel error characteristics and error coding, this cost can be modeled by suitably adjusting the constants in the above equations.
3 Data Fusion Algorithm Based on Event-Driven and Minimum Delay Aggregation Path 3.1 Dynamic Clustering In order to save energy, after the nodes are deployed in the monitoring environment, all nodes will be set to “restraining” state rather than clustered. And they’re activated just when some emergent event occurs. Then the nodes will be clustered. The scale and lifetime of the clusters lie on the event severity degree, in which several thresholds are set to label. 3.1.1 Threshold Definition Def.1 Basic Hard Threshold (BHT), is used to estimate the emergent event severity degree. Its value will not vary with Mdata, in which Mdata is the measurement data from restraining nodes. Def.2 Standard Hard Threshold (NHT). The restraining nodes use this NHT to compete to be a cluster. If Mdata >BHT &&| Mdata-NHT|>=ST , the restraining nodes will change to the “working” state from “restraining” state. And the NHT value will be replaced with Mdata. Moreover, if Mdata >BHT &&| Mdata-NHT|>=ST , then NHT=Mdata , and the number nodes will transmit the data Mdata to their upper layer nodes.
1032
T. Xu, L. Yuan, and B. Niu
Def.3 Soft Threshold (ST) is used to denote the minimal change of measurement data that make the nodes exciting or transmitting Mdata. Def.4 Relative Exciting Threshold (RETT) is used to confirm the live time of the cluster head. The value of RETT varies with the event severity degree, RETT = α Mdata − BHT , where α is the live time factor. Def.5 Absolute Exciting Threshold (AETT). The cluster head broadcasts the CH message with AETT, with which the member nodes can estimate the live time of their cluster. AETT=t+RETT , here t is the current time. If t > AETT , the node will change “working” state to “restraining” state. 3.1.2 Dynamic Clustering Based on Event Severity Degree The cluster-tree structure is used to save energy in the paper, with multi-hop rather than single hop from the member nodes to the cluster head. The scale and lifetime of the clusters lie on the severity degree of emergent event as defined in the section 3.1.1. The dynamic clustering steps are described as following: Step 1: The nodes are initialized after deployed. The base station first sends its location information to all nodes. The nodes get their location information, neighbors’ location and the distance between itself and base station. Step 2: After they are initialized, all nodes will not be clustered beforehand, just be in “restraining” state. Step 3: Cluster head node. When some emergent event occurs in some area, for the measurement data from some restraining node i (i N) , if Mdatai > BHT & & Mdatai − NHT >= ST , then node i will change to the “working” state
∈
from “restraining” state, and become a cluster head. The severity degree |Mdata-BHT| determines the lifetime time and the scale of the cluster. After the node becomes a cluster head, they will calculate the hop count of CH message broadcast K, K=u |Mdata-BHT|. And the value of RETT and AERR will also be calculated. RETT = α Mdata − BHT , AETT=t+RETT . To update the value of NHT that is replaced with new Mdata. The cluster head broadcasts the CH message, the broadcast packet consists of the sign of CH, cluster head ID, transmitting hop-count K, new NHT, AETT and the residual energy. Step 4: number nodes. The restraining nodes join into the cluster and change to the “working” state when they received the CH broadcast message. If K-1>0, (the value of K comes from the CH broadcast packet), the nodes will modify the hop-count K and forward this message. If the value of ID and AETT is same to the former, the message will be dropped. If the nodes what already change to the working state receive the different CH broadcast message, they will decided to join some cluster according to the severity degree, hop count and the residual energy of cluster head. Step 5: When some emergent event occurs, the first cluster will be formed in event area regarding the first node activated as its cluster head at once. In order to avoid the redundant cluster, the nodes which have been cluster head can’t be into exciting state again, and the member node in the first cluster can’t be the other cluster head, when they are activated by the same event. Step 6: If t > AETT, which means that the exciting time of the cluster head is over, the cluster will be released. And the cluster head and the member nodes will get back
Data Fusion Algorithm Based on Event-Driven and Minimum Delay Aggregation Path
1033
to the restraining state. If some new emergent event occurs, the nodes will compete to be a cluster head over again. An example of dynamic clustering based on event severity degree in some time is shown in fig.1. The sink node is denoted with green, the cluster head is denoted with blue, and others are member nodes. Sink node Member node
Cluster head
Fig. 1. An example for dynamic clustering based on event severity degree
3.2 Minimum Delay Aggregation Path After the clusters are formed in wireless sensor networks, the member nodes can collect data, and transmit to the upper layer nodes if Mdata>BHT && | MdataNHT|>=ST with some communicating path. For a multi-hop cluster, the member nodes can not only collect data, also can fuse and forward the data. There are multiple paths from the source nodes to cluster head. In the paper, we present the minimum delay aggregation path to minimize the network delay. The most important factor is the fusion timing mechanism in data fusion algorithm. Here, we regard all member nodes as the fusing nodes. The waiting time is used to determine when data fusion begins and ends in the node. The waiting time of one fusing node i can be defined in the following (6):
wt (i ) = α Where
di
ki
(i ∈ M ) ,
(6)
ki is the hop-count from node i to its cluster head, d i is the degree of node i,
α is regarded as the waiting timing coefficient, which will be set to some constant according to the actual demand. M is the number of member nodes in this cluster. The equation (6) show that the waiting time increases with the degree of the nodes. Larger d i indicate more amount data from node i. To ensure the integrality of data, long
1034
T. Xu, L. Yuan, and B. Niu
waiting time should be given. At the same time, larger
ki show that node i is far from
the cluster head, the waiting time will be set shorter so that the data can be transmit more quickly. It just has a tradeoff between the integrality of data and delay. The total waiting time of the fusing path equals to the sum of the waiting time (wt(i)) of all nodes in this path. With the following equation (7), we can calculate the delay of all paths, and get the minimal delay path. k
D = min ∑∑ wt (i ) . l
(7)
i =1
4 Simulation and Analysis 4.1 Simulation Setup To evaluate the performances of the proposed algorithm, we simulated it in NS-2.27, and compared to LEACH and TEEN. The simulation parameters are set up in Table 1. Table 1. Experiment parameters Parameters Monitoring areas the numbers of nodes The initial energy The number of cluster head The location of base station The length of data The numberk of emergent Ethreshold Eelec Eamp
Scenes 100m×100m 100 2J %5 (50, 250) 512bytes 20 0.01J 50nJ 100pJ/bit/m2
4.2 Energy Consumption One of the most important goals of data fusion in-network is to save energy by reducing the amount of transmitting data packets. The average energy consumption of all nodes in LEACH, TEEN, and EDMDP is shown in fig.2. When the same emergent event occurs, the average energy consumption in EDMDP is less than that in LEACH and TEEN remarkably. For LEACH and TEEN, the clustering is not correlative to the emergent event, especially, the cluster head is selected with random probability. The data can not be fused in the event area in a good time. Moreover the periodic clustering also consumes more energy. The clusters are formed only when some emergent event occurs in EDMDP. And the nodes joining into clustering are just those in event area. The scale and lifetime is not periodic, just determined by the event severity degree. The clusters are released after the event ends.
Data Fusion Algorithm Based on Event-Driven and Minimum Delay Aggregation Path
1035
Fig. 2. The average energy consumption in LEACH, TEEN and EDMDP
The number of live nodes is shown in fig.3. From the figure, we find that the TEEN and EDMDP have the similar death rate of nodes before 300 seconds. With more emergent event, EDMDP gives a good live rate of nodes, because event-driven dynamic clustering is just triggered by the emergent event. The nodes don’t need to monitor the area and transmit the data periodically, which will decrease the energy consumption. So after 350 seconds, EDMDP is better than TEEN in death rate of nodes. LEACH is better than EDMDP before 550 seconds, but there’s a sharp drop after 550 seconds. EDMDP has a longer lifetime, in which the final nodes die until 900 seconds. 4.3 Network Delay When some node wants to fuse data, it need wait to receive the data from other nodes. If more nodes regard it as the route node, it will wait more long time. So the minimum delay fusing delay path is obtained to fuse and transmit with the minimum delay. Comparing to the other data fusion method based on reducing network delay, EDMDP can choose the path dynamically, just with the acutal minimum delay. At the same time, the cluster structure of network is not destroyed. The network delay varying with the simulation time is shown in Fig.4. In LEACH, all member nodes send their sensing data to the cluster head with single-hop in which the delay mainly comes from the cluster head. The cluster fuses data with some period. In our proposed algorithm, all member nodes can fuse the data with a waiting time. The total delay is much more than LEACH. Comparing to GART[15], EDMDP has a lower delay which will not vary obviously with the increasing of number in emergent event area, because the minimum delay path is always selected according to the total waiting time.
1036
T. Xu, L. Yuan, and B. Niu
Fig. 3. The number of live nodes in LEACH, TEEN and EDMDP
Fig. 4. The network delay in EDMDP
5 Conclusions Event-driven wireless sensor network is an appropriate application in emergent event detection. Taking energy consumption and network delay into consideration, we present a new data fusion algorithm based on event-driven and minimum delay aggregation path
Data Fusion Algorithm Based on Event-Driven and Minimum Delay Aggregation Path
1037
for monitoring emergencies, in which dynamic clustering method based on emergent event severity degree are proposed, and the minimum delay aggregation path is achieved by calculating the total waiting time of all node in the paths, under the consideration of the hop-count and the degree of member nodes. The experimental results show that our algorithm has a better performance, comparing to the other data fusion method. The more residual energy indicates that dynamic clustering method can reduce energy consumption remarkably, to prolong the network lifetime. What’s more, there is a lower network delay in our algorithm. In the future, we will do more work on optimizing the waiting time, taking more factors into consideration, such as data veracity and time synchronization precision.
References 1. Heinzelman, W.R., Chandrakasan, A., Balakrishnan, H.: Energy-efficient Communication Protocol for Wireless Micro-sensor Network. In: Proc. of the 33rd Intl Conf on System Science, pp. 1–10 (2000) 2. Manjeshwar, A., Agrawal, D.P.: TEEN: A Routing Protocol for Enhanced Efficiency in Wireless Sensor Network. In Proc of the first Intl Workshop on Parallel and Distributed Computing Issues in Wireless Networks and Mobile Computing (2001) 3. Krishnamachari, B., Estrin, D., Wicker, S.: Modeling Data-Centric Routing in Wireless Sensor Networks. In: Proc. of IEEE Infocom (2002) 4. Sanli, H.O., Suat, O., Hasan, C., et al.: SRDA: Secure Reference-Based Data Aggregation Protocol for Wireless Sensor Networks. In: Proc. of IEEE VTC Fall 2004 Conference, pp. 4650–4654. IEEE Society Press, New York (2004) 5. Haowen, C., Adrian, P., Bartosz, P., et al.: SIA: Secure Information Aggregation in Sensor Networks. In: Proc. of the first ACM Conference on Embedded Networked System, pp. 255–265. ACM Press, New York (2003) 6. Nisheeth, S., Chiranjeeb, B., Divyakant, A.: Medians and Beyond: New Aggregation Techniques for Sensor Networks. In: Proc. of the second ACM Conf on Embedded Networked Sensor Systems, Baltimore, pp. 239–249 (2004) 7. Shipa, D.: Energy Aware Topology Control Protocols for Wireless Sensor Networks. University of Calcutta, India (2005) 8. Melmet, C.V., Ozgur, B.A., Ian, F.A.: Spatiotemporal Correlation: Theory and Applications for Wireless Snsor Networks. Computer Networks 45, 245–259 (2004) 9. Tang, Y., Zhou, T.M.: Study on the Event Correlativity in Large Scale Wireless Sensor Network. Computer Science 34(9), 356–359 (2007) 10. Zhu, Y.J., Vedantham, R., Park, S.J.: A Scalable Correlation Aware Aggregation Strategy for Wireless Sensor Networks. In: Proc. of IEEE International Conference on Wireless Internet (WICON), Budapest (2005) 11. Ignacio, S., Katia, O.: The Impact of Timing in Data Aggregation for Sensor Networks. Proc. of IEEE International Conference on Communications, 3640–3645 (2004) 12. Ignacio, S., Katia, O.: In-network Aggregation Trade-offs for Data Collection in Wireless Sensor Networks. International Journal of Sensor Networks 1(3/4), 200–212 (2006) 13. Yuan, W., Srikanth, V.K., Satish, K.T.: Synchronization of Multiple Levels of Data Fusion in Wireless Sensor Networks. In: Proc. of Global Telecommunications conference, pp. 223–225 (2003)
1038
T. Xu, L. Yuan, and B. Niu
14. Hu, F., Cao, X.J., May, C.: Optimized Scheduling for Data Aggregation in Wireless Sensor Networks. In: Proc. of the 9th International Symposium on Computers and Communications, pp. 200–225 (2004) 15. Wang, J.T., Xu, J.D., Yu, H.X.: Group aware real-time data aggregation tree algorithm in wireless sensor networks. Chinese Journal of Sensors and Actuators 21(10), 1760–1764 (2008) 16. Duan, B., Ke, X., Huang, F.W.: An Aggregate Contribution Based Delay-Time Allocation Algorithm for Wireless Sensor Networks. Journal of Computer Research and Development 45(1), 34–40 (2008)
Handling Multi-channel Hidden Terminals Using a Single Interface in Cognitive Radio Networks Liang Shan and Myung Kyun Kim∗ Computer Network Lab, University of Ulsan, Mugoo-Dong, Nam-Ku, Ulsan, Republic of Korea
[email protected],
[email protected] Abstract. Cognitive networks enable efficient sharing of the radio spectrum. Multi-hop cognitive network is a cooperative network in which cognitive users take help of their neighbors to forward data to the destination. Control signals exchanged through a common control channel (CCC) to enable cooperation communication. But, using a common control channel introduces a new issue like channel saturation which degrades the overall performance of the network. Moreover, the multi-channel hidden terminal problem will be another important challenge in cognitive radio networks, in which the multi-channel hidden terminals can decrease the throughput, cause much overhead, and sometimes even make the whole network invalidated. In this paper, a novel MAC protocol to resolve the multi-channel hidden terminal problem using a single interface which avoid using the CCC. Keywords: Cognitive Radio Networks, Multichannel Hidden Terminal Problem, Single Interface, Synchronization.
1 Introduction A cognitive network is an opportunistic network. Spectrum opportunity deals with the usage of an available (free) channel that is a part of the spectrum which is not currently used by primary users [1]. The licensed owner of a frequency band is called a primary user and the one who utilizes spectrum opportunities for communication is called a secondary user. When the receiver is not in the transmission range of the sender, data is forwarded through several hops forming a Multi-Hop Cognitive Radio Network (MHCRN). But unlike in a normal multi-hop network in which all users operate in the same channel, users in a MHCRN use different frequencies depending on spectrum availability. As a result, all users are connected depending on whether they have a common frequency band for operation. A MHCRN is, in many ways, similar to a multi-channel network. In both networks, each user has a set of channels available for communication. When two users want to communicate, they negotiate via a common control channel (CCC) ∗
Corresponding author.
D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 1039–1048, 2009. © Springer-Verlag Berlin Heidelberg 2009
1040
L. Shan and M.K. Kim
to select a communicating channel. Two major differences in these two network environments are: a) the number of channels available at each node is fixed in a multi-channel network whereas it is a variable in a MHCRN. It is possible that a user has no available channel at all due to the complete occupancy of the spectrum by primary users. b) in general, the channels in a multi-channel environment have equal transmission ranges and bandwidths unlike in a MHCRN in which the environment is heterogeneous. Thus, a MHCRN is a combination of a multi-hop and a multi-channel network. The protocols used in multi-channel networks cannot be applied to a MHCRN due to the above mentioned differences in the two networks. However, the issues and challenges related to these networks apply to a MHCRN as well. For example, CCC and multichannel hidden terminal problems [2], [5], [6] which are related to a multi-channel network are common to a MHCRN. In this paper, a novel MAC protocol for MHCRNs is proposed, which avoids the need of a dedicated CCC and solves the multi-channel hidden terminal problem, only using a single interface. The main techniques are using a novel channel structure to fully exploit the multiple channels available in a scalable manner for communication, channel switching and periodic resynchronization. Firstly, we use the default channel for nodes which randomly choosed by themselves and each node can know its two hops neighbors’ information by gathering the control signal (i.e. ATIM message which is employed in IEEE 802.11 PSM,) from its neighbors. Each superframe is comprised of two consecutive parts: the Control signal Transfer Period (CTP) and the Data Transfer Period (DTP). At the beginning of each Control signal Transfer Period (CTP), all nodes in the corresponding channel listen to the channel, represents for exchanging control signals. Thus, all nodes in the network are synchronized. It can switch to another channel for transmitting out of the CTP for some requirements such as avoid multi-channel hidden terminals, QoS and load balancing of the network. After transmitting, the interface should be switched back to the default channel, periodic resynchronization and update the channel information. To my best knowledge it is the first paper which focus on solving the multi-channel hidden terminal problem using a single radio in cognitive radio networks. The rest of the paper is organized as follows: Section 2 identifies some of the major issues such as CCC saturation and multi channel hidden terminal problem in a MHCRN. Section 3 reviews existing protocols and discusses the drawbacks. Section 4 describes the proposed MAC layer protocol. Section 5 analyses the protocol and shows the simulation result. Section 6 concludes the paper.
2 Issues in Multi-Hop CR Network In this section, we describe the common control channel (CCC) problem and briefly explain the multi-channel hidden terminal problem [2], [7], [8] in the context of a MHCRN.
Handling Multi-channel Hidden Terminals
1041
2.1 Common Control Channel Problem As discussed earlier, all users in a MHCRN are connected if they have a common channel for communication. It is possible that each user has a choice of more than one channel. In that case, the sender and the receiver need to agree upon a common communicating channel which is available to both. The initial handshake signals to negotiate the choice of a common channel are called control signals. But such negotiations require communication over a common signaling channel and in Cognitive Radio (CR) Networks, it is difficult to find an available common channel for all nodes. This is called the common control channel problem. 2.2 Multi-channel Hidden Terminal Problem in MHCRNs The multi-channel hidden terminal problem was identified in [2], [9] for multichannel networks. The same problem is extended to a cognitive network environment in [4]. To avoid the references mentioned disadvantages, a MAC protocol which does not need a pre-allotted control channel and which solves the multi-channel hidden terminal problem, is proposed in this paper.
3 Related Work In [3], a synchronized MAC protocol for multi-hop cognitive radio networks is proposed, which avoids the need of a dedicated CCC and solves the multi-channel hidden terminal problem. In this paper, every node is equipped two radios, one is used for listening and the other is used for transmitting. According to this, the multi-channel hidden terminal problem can be solved ideally, but it also has some disadvantages, the most evidently one is the cost. The benefits brought by multi-channel operation in CR networks, such as higher aggregate throughput and better robustness, should not come at the expense of other factors such as increased cost and complexity. In [2], the MAC protocol for Ad Hoc Networks requires only a single transceiver per node, but solves the multi-channel hidden terminal problem using temporal synchronization. The simulation result shows that it can not only solve the multi-channel hidden terminal problem but also improve network throughput significantly, but Ad Hoc network is different from CR network. So the scheme used in this paper is not suit for CR network for the requirement of the dynamic character.
4 The Proposed Protocol 4.1 Assumption In this section, the proposed scheme is presented. Firstly, the assumption is summarized below: Each node is equipped only a single half-duplex radio. In other words, at any given time a node is capable of either transmitting or receiving, but not both. In addition, a terminal can only send or receive in one channel at a time, in such a way that when the terminal is receiving in one channel, it cannot perform carrier sense in another.
1042
L. Shan and M.K. Kim
4.2 Default Channel (DC) and Backup Channel (BC) In the Network initialization state each node randomly choose one of all available channels as the default channel (DC). The DC is used to manage the entire network as follows: Through DC The following features can be accomplished: (1) Synchronization: Nodes that do not send beacons within the DC periodically visit the DC to get resynchronized. This is used to adjust the CTP Start Time of all channels as to make them non-overlapping. (2) Neighborhood discovery: It is used to get information about network connectivity by exchanging ATIM or ATIM-ACK messages among its neighbors. (3) Communication Detection: Every node can detect as far as two hop neighbors’ communication in the network by receiving the ATIM messages from its one hop neighbors. In this way the multi-channel hidden terminals can be addressed. The conception of BC is employed to make the network connectivity extremely robust. BC’s function is that when the DC invalidation it will be instead of DC or help the DC for the out-band-measurement. There are some substantial control signals a node have to receive in each ATIM Message, (NDC) Name of the Default Channel of the neighbor, if NDC is marked i, it means the neighbor’s default channel is channel i, (CPB) Channel Name of Primary user comes Back, if the primary user comes back in channel j, the value of CPB will be j, (NAC) Name of the Available Channel in its neighbor, which contains all available channels name. (NPN) Name of Previous Node i.e. two hop neighbor, (NTC) Name of Transmitting channel on the two hop neighbor at the previous superframe, (NFN) Name of two hop neighbors Forward Nodes in current superframe, it is so called transmission schedule, (NPC) Name of Possible channel a two hop neighbor with its downstream node. The usage of the control signals introduced above will be proposed in the example in section 4. It should be clear that the function of DC and BC are absolute the same, when the radio locates in their owns. Moreover, even after nodes are already associated with one DC, they periodically scan other available channels. This is so called out-of-band measurements, which are used to detect the presence of primary users, identify other overlapping DCs, determine a suitable BC. 4.3 Multi-channel Structure While there is plenty of research in channel structure for single channel MAC, to the best of our knowledge no existing work has addressed this issue from a multi-channel perspective.
Handling Multi-channel Hidden Terminals
1043
Algorithm for determining DC & BC 1) Add a node Ni to the graph G for each user in MHCRN. 2) Set NDC, NPN, NTC, NFN, NPC to NULL 3) Choose an available channel Ci randomly for Ni as the default channel and set the NDC to i then Ni sends a ATIM message in CTP and also listens to the ATIM and ATIM-ACK from nodes Na, Nb… ,which Na, Nb… are the neighbors of node Ni on channel Ci. 4) If there is not any data should be transmitted in the default channel then the node continue scan Cq, Ck. Cq and Ck are the other available channels. It can quickly figure out information about the network connectivity. 5) If there is no ATIM and ATIM-ACK it receives on default channel in the first CTP, then the node will choose an available channel randomly again and set NDC from i to j if i is not equal to j, then send the ATIM on Cj (here j can be equal to i) in the CTP on channel Cj, using this way until it receives a ATIM or ATIM-ACK, then Cj is a BC, in this case, it can instead of DC for communication then go to step 4.
To fully exploit the multiple channels available in a scalable manner, the channel structure illustrated in Figure 1. Each channel has its own superframe structure, and out of all the channels in use one is uniquely identified as the DC (channel 2, in Figure 1 – see also Section 4.2 ). This is in contrast to the channel structure where a superframe is only used in the common channel, hence requiring all network nodes to switch back to the common channel upon the start of every new superframe (which limits scalability). Another added benefit of the channel structured is that it also overcomes the bandwidth wastage for using a common control channel, as beacons can be trans- mitted on any channel and not only on the common channel. Hence, this leads to a better load balancing and allows more time for actual data transfer. Each superframe is comprised of two consecutive parts: the Control signal Transfer Period (CTP) and the Data Transfer Period (DTP). A distinctive feature of this multichannel structure is that the CTPs across channels are non-overlapping (see Figure 1). It allows a node to quickly gather information from other channels in an optimized fashion by simply switching channels in repeating order of CTP start time and listening for ATIM message during the CTP. All a node needs is information about which node is located on which channel, this can be efficiently obtained from ATIM message received on the DC channel itself. Notice that, we use the ATIM message only as the control message in the Control signal Transfer Period other than the ATIM window which is employed in IEEE 802.11 PSM, also node will never go to the doze zone for it should continuously sensing on other channel when there was no data for transmission in the default channel. 4.4 Exchange of Control Signals and Data When any of information events which are shown below occurs, a node should exchange the control signals. To exchange the control signals it chooses DC for communication between itself and its neighbor. If there is not any data required to transfer or receive on the default channel during current super frame, it will choose a BC and
1044
L. Shan and M.K. Kim
Fig. 1. Multi-channel structure for node A whose available channel are 1,2 and 3
continue try to send the control signal to its neighbors who is on the same channel. The 5 kinds of information events which are: 1) When a new node enters the network, it should notify its arrival to its neighboring nodes. 2) When the available channel list at a node changes due to the primary user traffic. 3) When a node changes its channel for communication 4) When a node wants to communicate with its neighbor 5) When a node receives a ATIM from its previous neighbor it should forward to the other neighbor in the next ATIM window. 4.5 Channel Selection Contrary to a number of existing multi-channel MAC protocols that include complex channel selection and negotiation schemes, this is not needed in this protocol. Once on a channel, all data communication of a node happens on that channel. If a node wants to communicate in some other channel which is not the DC, it has to first visit the DC. To do it so, nodes have to first announce their impending channel change via ATIM message transmission over the DC. For the receiver who will switch to another channel for communication, it also has to announce its communication channel to all its neighbors. Once the communication pair of nodes switch to the specified channel they should start to transmit data until DTP coming on this channel other than immediately. 4.6 Two-Hop Neighbors Information In our protocol, nodes are required to transmit a ATIM message during the CTP of a superframe when any of 5 kinds of information events occurred. The node rebroadcasts information that it received from its neighbors in the previous superframe. Thus, nodes have the information about their neighbor’s neighbors, such as occupied channel and communication schedules. With this mechanism, it is possible to overcome the multi-channel hidden terminal problem.
Handling Multi-channel Hidden Terminals
1045
The complete process of communication is illustrated with an example. Consider a network of 15 nodes as shown in Fig.2. There are a total of 4 possible channels among them. The array of blocks above each node represents the available channels at that node.
Fig. 2. Fifteen cognitive nodes with a set of free channels at each node
Fig. 3. The default channel between each pair of cognitive nodes
Now, suppose that, node M enters the network. Then it will randomly chooses an available channel according to the available channel list in the block of itself, i.e. 1, 2 and 4. we assume the default channel is channel 4, if it can not receive any ATIM or ATIM-ACK during the CTP, then it will choose an available randomly again out of the CTP in channel 4, and continuously search for its neighbors, using this way, after some time, it will find all its neighbors. In like manner, all nodes in the network find their neighbors and a possible graph of the network connectivity is represented in Fig.3. In the CTF of the ith superframe, node C sends a ATIM message in which there are some communication information received by node C in the previous superframe i.e. the i-1th superframe, then it transmits its information to node M (Information Event 5),
1046
L. Shan and M.K. Kim
the information in ATIM message from node C to M is shown in Fig 4. It reminds node M there is no current primary users come back. In the previous superframe i.e. the i-1th superframe. node D had a communication on the previous superframe, i.e. the i-1th superframe. Node D had a communication on channel 1 and it would communicate with node C in the channel 4, in the ith superframe, which is so called communication schedule. According to this message, node M should avoid to use the channel 4 for communication with its neighbors. Multi-channel hidden terminal problems can be ideally resolved in this manner.
Fig. 4. Addressing the hidden nodes
5 Performance Evaluation For simulations, we used Qualnet 4.0 simulator. Firstly, to understand the maximum throughput the novel protocol can provide, we consider a single transmitter-receiver pair per channel and assume that the transmitter has infinite frames to send to the receiver till the end of the simulation. We assume the Control signal Transfer Period length to be fixed at 20ms and the superframe size is 100ms. Similar figures as in 802.11a are adopted for this analysis, and so we further assume that the physical data rate per channel is 54 Mbps (as in 802.11a) Figure 5 depicts the maximum MAC throughput achievable for different frame sizes when 1, 2, 3, 4 and 5 channels are available. As expected, larger frame sizes better throughputs because of small overheads. More importantly, we can see that the maximum throughput increases significantly as more channels are added to the network. Secondly, to do the simulation which our protocol comparing with the 802.11 and the CCC-MAC, 100 nodes are randomly placed in a 500m×500m area. 40 nodes are chosen to be sources and 40 nodes are chosen be destinations. Because of the 802.11 maximum physical data rate is 2Mbps, the physical data rate per channel is also 2Mbps. The transmission range of each node is 250m and the length of the superframe is set to 100ms. The length of the Control signal Transfer Period is 20 ms. We also assume that the packet size is fixed to 1024 bytes. A set of 5 channels is chosen, each
Handling Multi-channel Hidden Terminals
1047
of which is available at each node with a probability of p = 70 %.Each source node generates and transmits constant-bit rate (CBR) traffic. Each data point in the result graphs is an average of 10 runs.
Fig. 5. Maximum MAC throughput VS different frame size
Fig. 6. Aggregate throughput for different protocols
Now we look at results in our multi-hop CRNetwork simulations, a node can be a source for multiple destinations, or it can be a destination for multiple sources. Figure 6 shows the aggregate throughput of different protocols as the network load increases. Five channels are used in Figure 6. In Fig.6, our protocol performs better than CCCMAC protocol, because in CCC-MAC protocol, if a node has flows to two different destinations, each destination may choose a different channel and one flow may have to wait for an entire beacon interval to negotiate the channel again. Also, if a node is a destination for two flows from other sources, these two flows must be transmitted on the same channel, reducing the benefit of having multiple channels. As the network load becomes very high, throughput of CCC-MAC protocol drops faster than our protocol. This is because a single control channel is shared by every node in CCC-MAC protocol. When the network load is very high, the collision rate of control packets increases, degrading the throughput. We call this control channel
1048
L. Shan and M.K. Kim
saturation, also in the Cognitive Radio Network, all nodes having the same channels as a control channel is difficult to achieve.
6 Conclusion In this paper we have presented a MAC protocol for MHCRNs which avoids the need for a common control channel for the entire network. The multi-channel hidden terminal problem is solved only use one transceiver by introducing synchronization and two-hop neighbors’ information into the protocol.
Reference 1. Zhao, Q., Sadler, B.M.: A Survey of Dynamic Spectrum Access: Signal Processing, Networking, and Regulatory Policy. IEEE Signal Processing Magazine (2007) 2. So, J., Vaidya, N.: Multi-Channel MAC for Ad Hoc Networks: Handling Multi-Channel Hidden Terminals Using A Single Transceiver. In: Proc. ACM MobiHoc (2004) 3. Kondareddy, Y.R., Agrawal, P.: Synchronized MAC Protocol For Multi-Hop Cognitive Radio Networks Communications, 2008. In: ICC 2008. IEEE International Conference (2008) 4. Nan, H., Yoo, S., Hyon, T.: Distributed Coordinated Spectrum Sharing MAC Protocol for Cognitive Radio. In: IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks, pp. 240–249 (2007) 5. Adya, A., Bahl, P., Padhye, J., Wolman, A., Zhou, L.: A multiradio unification protocol for IEEE 802.11 wireless networks. In: Proc. of Broadnets (2004) 6. Alichery, M., Bhatia, R., Li. L.: Joint Channel Assignment and Routing for throughput optimization in Multi-Radio wireless mesh networks. In: ACM MobiCom (2005) 7. Vaidya, N., Kyasanur, P.: Routing and Interface Assignment in Multi-channel Multiinterface Wireless Networks. In: WCNC (2005) 8. Raniwala, A., Chiueh, T.: Architechture and Algorithms for an IEEE 802.11-based Multichannel Wireless Mesh Network. In: IEEE Infocom (2005) 9. Raniwala, A., Gopalan, K., Chiueh. T.: Centralized Channel Assignment and Routing Algorithms for Multi-Channel Wireless Mesh Networks. ACM SIGMOBILE Mobile Computing and Communications Review 8(2), 50–65 (2004)
Network Construction Using IEC 61400-25 Protocol in Wind Power Plants Tae O Kim, Jung Woo Kim, and Hong Hee Lee∗ School of Electrical Engineering, University of Ulsan, Nam-Gu, 680749 Ulsan, South Korea {gskim94,jwkim,hhlee}@mail.ulsan.ac.kr
Abstract. In recent years, the wind power plants are widely developing as an alternative energy source. In order to provide a uniform communications basis for the monitoring and control of wind power plants, IEC 61400-25 has been developed. This paper describes a Web service based network construction using communication protocol stack which is included in IEC 61400-25-4. This system is necessary to implement remote control systems for wind power plants. Keywords: IEC 6140-25, Wind Power Plants, Web Service, MMS.
1 Introduction The Wind Power Plants (WPPs) technology has been proved one of the promising generation technologies among the renewable energy sources and globally increases due to its profitability. The wind generator supplies electric power to many part of the power demand. The wind generation technology had been developed consistently in Europe, and the related products are introduced in large numbers by many makers. Generally, the control center is far away from the WPPs in the wind farm. Furthermore, each wind power generators is installed separately with long distance within the same wind farm. The communication infrastructure is very important to control the system and to monitor the condition of WPPs. In order to keep compatibility with the different maker’s devices, the single standard for communication is needed so that the products of the various makers are able to make the mutual communication. In 2006, the communication standard IEC 61400-25 for the supervisory control of the WPPs was announced based on the international standard IEC 61850 of the substation automation system by the IEC technical committee TC 88. Currently, it is published that the IEC 61400-25 standard with the part 1, 2, 3, 4, and 5. The part 6 is under consideration finally and it will be published soon[1]. The new technologies such as the communication structure, the Web browser, the communication protocol have been developed in order to implement the new information model in the automation or the network system in the power system. Moreover, the important researches for construction the information infrastructures for the power systems are progressed actively. ∗
Corresponding author.
D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 1049–1058, 2009. © Springer-Verlag Berlin Heidelberg 2009
1050
T. O Kim, J.W. Kim, and H.H. Lee
After analysis of the power system disturbances, the work of Xie et al. emphasized on the importance of information systems under the regulated and the competitive environment, for which an information architecture for power systems was proposed[2]. But the work in [2] focuses only on constructing the physical communication architecture without addressing the appropriate information integration required for the proposed information architecture. Qiu et al. discussed the communication issues for the SPID system[3], where An XML-based Client/Server architecture for information exchange was proposed. A new communication architecture called GridStat was proposed in [4]. But this study is limited only to control-related applications in the power systems. Khatib et al. proposed a new internet based Power System Information Network architecture (PSIN) where information in power system was posted in Web pages[5]. In this paper, we survey the various study and efforts to process the information infrastructure and propose the implementation of the supervisory control information system using Web service technologies which are suitable to the wind power generation system. The new information integrating requirement for operation and maintenance of the wind power generation systems are considered. And we also propose the prototype about the framework and Web service-based system to integrate information for the Web service infrastructure in the wind power generation system. The proposed Web service system is an open, flexible, and scalable framework with higher cooperation and integration capability. We perform information exchange between host node and the wind tower to construct the network system of wind power generation facility using MMS communication service and construct the environment to monitor the important information through the Web browser in the remote control center.
2 Network Modeling for Control and Monitoring in WPPs 2.1 Information Model in WPPs In the basic device of the WPPs, each component member of the system devices (wind turbine, control center, meteorological system, etc.) has been modeled in a data structure which is the logical node. The logical node is a data holder that can hold different types of information related to the respective component. In part 2, the logical device class, the logical node class, data class and theoretical common data class are defined abstractedly based on the logical node defined in IEC 61850-7-2. But it does not provide that compatibility between devices in IEC 61400-25. Wind farm has several wind towers and each wind tower has brake system, gear box, generator, controller and protection unit inside it. Figure 1 shows logical node of the wind tower. This is modeled as a information model that control and monitoring can be possible about functions and resources combined with the real device of the wind power facility to the communication system of wind farm which is defined in part 2[6]. The logical nodes which are defined in logical device in figure 1 are well known functions and are modeled by virtual model related to real device. In figure 1, LN WROT with rotator of wind tower inside, WGEN with generator, WCNV with converter and WAPC are modeled with a controller.
Network Construction Using IEC 61400-25 Protocol in Wind Power Plants
1051
Fig. 1. Wind Tower Logical Nodes
Table 1 shows logical nodes of WPPs. The number of logical node which is defined in IEC 61400-25 is 16. Logical node related to overall WPPs is 5, 12 in wind turbine. Table 1. Logical Node List for WPPs LN classes WTUR WROT WTRM WGEN WCNV WTRF WNAC WYAW WTOW WMET WALM WSLG WALG WREP WAPC WRPC
Description Wind turbine general information Wind turbine rotor information Wind turbine transmission information Wind turbine generator information Wind turbine converter information Wind turbine transform information Wind turbine nacelle information Wind turbine yawing information Wind turbine tower information Wind power plant meteorological information Wind power plant alarm information Wind turbine state log information Wind turbine analogue log information Wind turbine report information Wind power plant active power control information Wind power plant reactive power control information
M/O M M O M O O M O M M O O
Service subset of ACSI (Abstract Communication Service Interface) which needs to carry out all exchange of the information within the WPPs is defined in part 3. It describes client certification, transmission of control command, protocol and mechanism of data exchange. 2.2 Mapping of Communication Profile in WPPs Mapping of communication profile is described in part 4. According to figure 2 the SCSM (Specific Communication Service Mapping) maps the abstract communication
1052
T. O Kim, J.W. Kim, and H.H. Lee
services, objects and parameters to the specific application layers. The wind power plant information models and the information exchange models need to be mapped to appropriate protocols. The protocol TCP/IP is the basic lower layer protocols provided by all mappings. Specific data link and physical layers are beyond the scope of this standard. Depending on the technology of the communication network, these mappings may have different complexities, and some ACSI services may not be supported directly in all mappings but the equivalent services are provided[7].
Fig. 2. Mapping and communication profiles
In order to decide the solution which is suitable for monitoring and control application, the mapping which can satisfy supplier and client firstly has to be selected. Moreover, the selected mapping should have conformance fitted for the international standard be selected at least. Recently, the IEC 61400-25-4 standard has been published with the new proposal, which includes the following 5 mapping methods: - SOAP based Web Services - OPC/XML-DA - IEC 61850-8-1 MMS - IEC 60870-5-104 - DNP 3(Distributed Network Protocol 3)
3 Web Service Based Wind Power System 3.1 Network Based Modeling of WPPs Wind farm has several wind towers, and brake system, gear box, generator, controller and protection unit are located inside of each wind tower. The method to construct network system which is composed of server and client can be classified into three types such as centralized topology, distributed topology and mixed topology. In this
Network Construction Using IEC 61400-25 Protocol in Wind Power Plants
1053
work, we propose Ethernet method based on centralized topology to construct network system between wind tower and local system. Each wind tower is structurally working as an independent server. Control center collects the information from each wind tower and operates as a server for the remote control center. 3.2 Mapping of Information Model in WPPs Information model of WPPs is defined from logical node class, data class, data attribute type, DA-Component type. Figure 3 shows information model of WPP using XML schema. The converter as logical node (LN) included in WPP is described and its attribute data which is instant value of ‘Generator Frequency’ is given 50.1 as ‘mag’. This schema (ex. converter) can be used in all services through instance values of WPPs information model.
Fig. 3. XML Schema-Information Model of WPP
3.3 Implementation of MMS Service Figure 4 shows a flow chart to implement MMS service. In the select mode, it is determined if MMs service is working MMS server or MMS client before declaring variables. After initializing memory of each section, the logging parameters of WPP are defined. We set lower 4 layers from OSI 7 layer for MMS service of WPP and MMS object is added. MMS server remains in ready-state response until request of MMS client. MMS service is implemented using SISCO’s MMS-EASE Lite tool. It is used to develop communication services in IEC 61850 based Substation Automation System. But MMS service used in IEC 61400-25 is the same as in IEC 61850 about basic communication method and protocol. Therefore, this work develops MMS service using modified information model and functions with SISCO’s tool to apply IEC 61400-25 service. Abstraction model which is defined in IEC 61850-7 should be mapped into application layer. ACSI object of IEC 61850-7-2 should be mapped into IEC 61400-25 MMS object. The VMD, Domain, Named Variable, Named Variable List, Journal, and File management included in information model and the various control blocks are also mapped into the corresponding MMS services. Figure 5 shows operation window of MMS communication service in IEC 6140025. This work describes transmission service of various information (frequency,
1054
T. O Kim, J.W. Kim, and H.H. Lee
Fig. 4. MMS Service Flow Chart
speed, temperature, etc) between MMS server and client using logical node about rotator, generator and converter which are defined in wind tower. 3.4 Implementation of Web Service Web service is accessed through SOAP interface and the interface of Web service describes in WSDL. SOAP which is extended technology of XML message protocol is based on Web service. In this work, SOAP server is implemented by using JAVA and Apache Tomcat is used for Web service system. JAVA’s SOAP is implemented through AXIS (Apache eXtensible Interactive System). This work develops Web service using AXIS under Tomcat environment. The important thing to implement Web service is as follows: - Web Service: the software offered on an internet operates to be identical in the different devices. - XML: XML which is the standard of data exchange makes competition technology communicate and cooperate. - SOAP: SOAP describes how to communicate between Web services through Internet. - WSDL: WSDL describes what Web service is and how to access to that. After JAVA makes SOAP server and creates WSDL using AXIS. Web service server is composed with WSDL. The function which is offered in Web service is described in this WSDL. Figure 6 shows Web service user interface which performs control and monitoring service to connect local control center through Web browser to client of remote
Network Construction Using IEC 61400-25 Protocol in Wind Power Plants
1055
Fig. 5. MMS Communication between Server and Client
Fig. 6. Web Service using Web Browser
control center. The data of WPP transmitted through MMS communication service are mapped into XML variables of Web server system. The real-time data are transmitted from remote control center to local control center and the data can be confirmed on Web browser through Internet.
1056
T. O Kim, J.W. Kim, and H.H. Lee
4 Experiments and Discussion Figure 8 shows network construction of control and monitoring system in WPPs. Each wind tower operates as a server, local control center as a client in WPPs. Local
Fig. 7. Network based WPPs System
Fig. 8. Analysis of MMS using MMS-Ethereal Tool
Network Construction Using IEC 61400-25 Protocol in Wind Power Plants
1057
control center receives information from wind tower server and operates as a MMS server and Web server for remote control center. Remote control center operates as a MMS client and can monitor information of WPP by connecting Web server of local control center through Web browser. Therefore, the data exchange is required between MMS service and Web service in local control center. In order to confirm performance of MMS service and data transmission, MMS messages using MMS-Ethereal network analyzer are analyzed and data exchanges between MMS server and client are investigated. Data are exchanged according to the message request and response signal while MMS message are being monitored in Figure 8. It is verified that the variable list and variable values of transmission data are suitable to MMS service request. As shown in Fig. 8, the time from data request to response is lower than 1[ms], and it is enough to control and monitoring in control center. The result of MMS message analysis using MMS network analyzer shows that the proposed Web service works well in PC based IEC 61400-25 communication system.
5 Conclusions This work develops test bed system using IEC 61400-25 communication protocol which comes from the extended type of IEC 61850 for control and monitoring system of WPPs. MMS communication and XML based Web service are implemented and the Web service to monitor the data of wind tower on the Web browser is realized. To verify the proposed system, we have analyzed MMS message through MMS-Ethereal network analyzer and Web browser, and then the stable operation of network system which is constructed using PC is proved.
Acknowledgment The authors would like to thank Ministry of Knowledge Economy and Ulsan Metropolitan City which partly supported this research through the Network-based Automation Research Center (NARC) at University of Ulsan.
References 1. International Standard IEC 61400-25, Wind Turbine Generator Systems part 25: Communications for Monitoring and Control of Wind Power Plants, WD edn. (2001) 2. Xie, Z., Manimaran, G., Vittal, V.: An Information Architecture for Funiture Power Systems and Its Reliability Analysis. IEEE Trans. on Power Systems, 857–863 (2002) 3. Qiu, B., Liu, Y., Phadke, A.G.: Communication Infrastructure Design for Strategic Power Infrastructure Defence (SPID) System. In: IEEE Power Engineering Society Winter Meeting, pp. 672–677 (2002) 4. Tomsovic, K., Bakken, D.E., Bose, A.: B.: Designing the Next Generation of Real-Time Control, Communication, and Computations for Large Power Systems. Proceeding of the IEEE 93(5), 965–979 (2005)
1058
T. O Kim, J.W. Kim, and H.H. Lee
5. Khatib, A., Dong, X., Qiu, B., Liu, Y.: Thoughts on Future Internet Based Power System Information Network Architecture. In: Power Engineering Society Summer Meeting, 2000 IEEE, vol. 1, pp. 155–160 (2000) 6. International standard.: WD IEC 61400-25 Wind Turbine Generator Systems Part 25: Communications for Monitoring and Control of Wind Power Plants, First edn. (2003-05) 7. International standard. IEC 61850-7-1: Basic Communication Structure for Substation and Feeder Equipment – Principles and Models, First edn. (2003-05)
Stability and Stabilization of Nonuniform Sampling Systems Using a Matrix Bound of a Matrix Exponential Young Soo Suh Dept. of Electrical Eng., University of Ulsan, Mugeo, Nam-gu, Ulsan, 680-749, Korea
[email protected] Abstract. This paper is concerned with stability and stabilization of networked control systems, where sampling intervals are time-varying. A nonuniform sampling system is modeled as a time-varying discrete time system. With the assumption that the sampling time variation bounds are known, the stability condition is derived in the form of linear matrix inequalities. Compared with previous results, a less conservative stability condition is derived using a matrix bound of a matrix exponential. Keywords: matrix exponential, sampled-data control, networked control systems, linear matrix inequalities.
1
Introduction
In networked control systems, control elements (sensors and actuators) are connected through a network. Since only one node can transmit data over a network, a node must wait until a network is available (i.e., no higher priority data are being transfered). From the viewpoint of a control system, the system can be modeled as a time delay system. The time delay systems have been extensively studied in [1]. On the other hands, some networked control systems can be modeled as a nonuniform sampling system. For example, suppose a node tries to transmit sensor data y(t) at time t1 and a network is not available until time t2 (t2 > t1 ). At time t2 , instead of transmitting y(t1 ) (old data), the current sensor data y(t2 ) (new data) can be transmitted. This method was called “try-once-anddiscard” in [2]. If the sensor node tries to transmit sensor data periodically with this method, this system can be modeled as a nonuniform sampling system, where sampling intervals are time-varying. One more example is an event-based sampling system [3, 4]. Instead of using the periodic sampling, if the sampling is based on the value of output y(t), the sampling interval is time-varying. Thus the system can also be modeled as a nonuniform sampling system. There are many results about stability and stabilization of a nonuniform sampling systems. The results can be classified into three approaches. The first approach is to model a system as a continuous system with a time-varying input D.-S. Huang et al. (Eds.): ICIC 2009, LNAI 5755, pp. 1059–1066, 2009. c Springer-Verlag Berlin Heidelberg 2009
1060
Y.S. Suh
delay. This input delay methods are proposed in [5, 6]. The second approach is to model a system as a hybrid system. The third approach [7], which is most relevant to this paper, is to model a time-varying sampling interval as a parameter variation in a discrete time system. The key technical idea in this approach is to bound a matrix exponential function integral by a scalar function as follows: τ exp Ar dr ≤ s(τ ) (1) 0
where s(τ ) ∈ R. However, a scalar bound does not exploit the structure of A matrix and a stability condition based on (1) could be conservative. In this paper, a stability condition is derived using the following matrix bound: τ τ ( exp Ar dr) ( exp Ar dr) ≤ M (τ ) (2) 0
0
where M (τ ) is a matrix with the compatible dimension.
2
Problem Formulation
Consider the following continuous time linear system x(t) ˙ = Ax(t) + Bu(t)
(3)
where x ∈ Rn is the state and u ∈ Rm is the control input. We assume that the state x(t) is sampled at the discrete time instances 0 = t0 < t1 < · · · < tk < · · · and the control input u(t) is piecewise constant between the discrete time instances: u(t) = Kx(tk ), ∀t ∈ [tk , tk+1 ). (4) The sampling interval Tk is defined as Tk tk+1 − tk . It is assumed that Tk is time-varying and its lower bound and upper bound are known: 0 < Tmin ≤ Tk ≤ Tmax , ∀k. (5) The system (3) and (4) can be written as a time-varying discrete time system: x(tk+1 ) = G(Tk )x(tk ) where
G(Tk ) exp(ATk ) +
Tk
exp(Ar)Bdr K. 0
(6)
Stability and Stabilization of Nonuniform Sampling Systems
1061
Let a nominal point Tnom be chosen so that Tmin ≤ Tnom ≤ Tmax .
(7)
Note that G(T ) can be written as G(T ) = G(Tnom ) + Δ(τ )Q(Tnom )
(8)
where τ (T, Tnom ) T − Tnom τ Δ(τ ) 0 exp(Ar) dr T Q(Tnom ) A exp(ATnom ) + A 0 nom exp(Ar) drK + BK. From the assumption (5), note that τ satisfies Tmin − Tnom ≤ τ (T, Tnom ) ≤ Tmax − Tnom .
(9)
We will treat Δ as a uncertainty matrix whose matrix bound is given. In Lemma 1, a matrix bound of Δ(τ ) Δ(τ ) is derived for τ1 ≤ τ ≤ τ2 , where τ1 and τ2 are the same sign. In Lemma 2, a matrix bound of Δ(τ ) Δ(τ ) is derived for −τl ≤ τ ≤ τu , where τl > 0 and τu > 0. Lemma 1. Let β¯ be a constant satisfying τ ¯ − τ2 − τ1 ≤ τ ≤ τ2 − τ1 . exp(At) dt2 ≤ β, 2 2 0
(10)
Let R1 (τ1 , τ2 ) and R3 (τ1 , τ2 ) be defined by R1 (τ1 , τ2 )
τ2 +τ1 2
exp(Ar) dr, R3 (τ1 , τ2 ) exp(A
0
τ2 + τ1 ). 2
If there exist M = M ∈ Rn×n > 0 and ∈ R > 0 such that ¯2 ¯ ) −M + (1 + )R1 R1 + β R3 R3 R3 L(M, τ1 , τ2 , β, R − I < 0, 3
then
τ
exp(Ar) dr 0
(11)
β¯2
τ
exp(Ar) dr
≤ M, τ1 ≤ τ ≤ τ2
(12)
0
where τ1 τ2 ≥ 0 (i.e., τ1 and τ2 do not have different signs.) Proof. Note that τ 0
τ2 +τ1 τ exp(Ar) dr = 0 2 exp(Ar) dr + τ2 +τ1 exp(Ar) dr 2 = R1 + R2 R3
(13)
1062
Y.S. Suh
where R2 (τ1 , τ2 , τ ) is defined by
τ−
τ1 +τ2 2
R2 (τ1 , τ2 , τ ) =
exp(Ar) dr. 0
Using R1 , R2 , and R3 , we have the following: τ τ exp(Ar) dr exp(Ar) dr 0 0 = (R1 + R2 R3 ) (R1 + R2 R3 ) = R1 R1 + R3 R2 R2 R3 + R3 R2 R1 + R1 R2 R3 .
(14)
If τ1 ≤ τ ≤ τ2 , we have − and from (10), we have
τ2 − τ1 τ1 + τ2 τ2 − τ1 ≤τ− ≤ 2 2 2 R2 R2 ≤ β¯2 I.
(15)
Inserting (15) into (14), we obtain τ τ ≤ R1 R1 + β¯2 R3 R3 + R3 R2 R1 + R1 R2 R3 . 0 exp(Ar) dr 0 exp(Ar) dr (16) Using the following inequality R3 R2 R1 + R1 R2 R3 ≤ R3 R3 +
β¯2 R R1 1
for any > 0, we have τ τ ≤ R1 R1 + β¯2 R3 R3 + R3 R3 + 0 exp(Ar) dr 0 exp(Ar) dr for any > 0. Applying the Schur complement, we obtain (11).
β¯2 R1 R1
<M (17)
To derive a less conservative matrix bound, the interval [−τl , τu ] is divided into several partitions (see Fig. 1): τl,i −τl + i Nτl1 τu τu,i i N u Based on this partition, a less conservative matrix bound is derived in Lemma 2. Lemma 2. Assume that τl > 0 and τu > 0. Let βl and βu be constants satisfying τ τl 0 exp(Ar) dr2 ≤ βl , |τ | ≤ 2N l τ (18) τu 0 exp(Ar) dr2 ≤ βu , |τ | ≤ 2N . u If there exist M = M ∈ Rn×n > 0, l,i ∈ R > 0 (i = 1, · · · , Nl ), and u,i ∈ R > 0 (i = 1, · · · , Nu ) L(M, τl,i−1 , τl,i , βl , l,i ) < 0, i = 1, · · · , N1 L(M, τu,i−1 , τu,i , βu , u,i ) < 0, i = 1, · · · , Nu
(19)
Stability and Stabilization of Nonuniform Sampling Systems
τl
1063
τu
Nl
Nu
−τ l
τu
0
τ l ,1 UUU
τ l ,0
τ l , N −1 τ l , N τ u ,0
l
l
UUU
τ u ,1
τ u,N
u −1
τ u,N
u
Fig. 1. Partitioning of the interval
then
τ
τ
exp(Ar) dr
≤ M, −τl ≤ τ ≤ τu .
exp(Ar) dr
0
(20)
0
Proof. From Lemma 1, L(M, τl,i−1 , τl,i , βl , l,i ) < 0 implies that
τ
exp(Ar) dr
τ
exp(Ar) dr
0
≤ M, −τl,i−1 ≤ τ ≤ τl,i .
0
Thus Nl inequalities in the first row of (19) imply
τ
exp(Ar) dr 0
τ
≤ M, −τl ≤ τ ≤ 0.
exp(Ar) dr
(21)
0
Similarly Nu inequalities in the second row of (19) imply
τ
exp(Ar) dr 0
τ
exp(Ar) dr
≤ M, 0 ≤ τ ≤ τu .
Combining (21) and (22), we obtain (20).
3
(22)
0
Stability and Stabilization
In the next theorem, the stability condition of (3) is derived. Theorem 1. Let M be derived from (19) with τl = Tnom − Tmin , τu = Tmax − Tnom .
1064
Y.S. Suh
If there exist P = P ∈ Rn×n > 0 and ∈ R > 0 satisfying the following ⎡ ⎤ −P ⎣ G(Tnom )P −P + I ⎦ < 0, −1 QP 0 −M
(23)
then the system (3) with the feedback control (4) is stable for any sampling intervals satisfying (5). In (23), denotes symmetric elements of the matrix. Proof. The system is stable if there exists P = P > 0 satisfying the following inequality: −P 0. Invoking (26) in (25), we have for any > 0 1 −P −P P G(Tnom ) ≤ + RR + S M S < 0. G(Tnom )P + Δ(τ )QP −P G(Tnom ) −P (27) Applying the Schur complement, we obtain (23). Thus (23) guarantee the stability for any τ satisfying (9). Now a stabilization problem (i.e., finding K) is given in the next theorem. Theorem 2. If there exist P = P > 0 ∈ Rn×n , ∈ R > 0, and Z ∈ Rm×n satisfying ⎡ ⎤ −P ⎣ ⎦ k if x1 - x 2
≤k
(1)
where k must be carefully selected case by case, to obtain a proper compatibility measure. Also extension to Xn of R is possible, through the relation: R(x1,….xn) = min R(x i , x j ) i, j=1,...n
(2)
In this paper, we have used for data fusion the Ordered Weighted Average (OWA) operator and the compatibility function R, defined in equation (1). Given a set A= {a1, a2,…..an} and a fusion function F, an OWA operator is a weighting vector W= [w1, …wn] such that: − wi∈[0,1]; − Σi wi = 1; − F(a1, a2,…..an) = Σi bj wj in which bj is the j-th largest element of A. By adjusting the weighting vector, we can represent different drivers’ attitudes: when W favours the smaller valued arguments in the aggregation process it reflects an aggressive driver, otherwise it reflects a cautious driver. O’Hagan [4] suggested a method to calculate the weights wi (i = 1,…n) through the following simple mathematical programming problem: n
Maximize -
∑w
i
ln w i
i =1
⎧ n ⎪ w i h n (i) = β ⎪ i=1 ⎪⎪ ⎨ wi = 1 ⎪ i ⎪w 0 i ⎪ i ⎪⎩
∑
subject to
∑
(3)
≥ ∀
n -i , and β∈[0,1] is a coefficient representing, in our case, drivers’ n -1 cautiousness. Note that, if fusion involves only two sets, then h2(1) =1, h2(2) =0. Thus, from the constraints of previous program (Eq. 3):
where hn(i) =
1078
M. Dell’Orco and M. Marinelli
w1 = β,
(4)
w2 = 1-β.
(5)
The basic hypothesis we have made in this work to set up a value of β, is that drivers’ cautiousness is a function of uncertainty related to perceived information. Let us explain this last concept through an example. Assume that the shorter one of two alternative paths is temporarily closed by barriers. In this case, information that path is closed is not uncertain, that is U(I) = 0, and drivers must choose the longer path. This means that the OWA operator should favour the largest value, that is w1 = 1, and consequently β = 1 from equation (4). Conversely, if instead of barriers there is an informative system giving very vague information about the condition of the path, uncertainty U(I) is very large, and drivers should prefer to rely on their own experience. In this case, the OWA operator favours the smallest value, that is w2 approaches 1 and thus, from equation (5), β approaches 0. From this example it appears that the parameter β can be interpreted also as drivers’ compliance with information. In fact, β = 1 means that the driver is totally compliant with information, β = 0 means the opposite. Experimental studies have been carried out in last years by some researchers to find out a value of drivers’ compliance. Different values, ranging from 0.2 to 0.7, have been found, mainly due to the fact that β is affected by the level of uncertainty imbedded in information. In this study we have assumed that: − drivers’ compliance with information decreases with increasing of uncertainty. This means that the relative elasticity of compliance with respect to uncertainty is dβ / β negative. In analytical terms: