Communications in Computer and Information Science
35
“This page left intentionally blank.”
Yong Shi Shouyang Wang Yi Peng Jianping Li Yong Zeng (Eds.)
Cutting-Edge Research Topics on Multiple Criteria Decision Making 20th International Conference, MCDM 2009 Chengdu/Jiuzhaigou, China, June 21-26, 2009 Proceedings
13
Volume Editors Yong Shi Graduate University of Chinese Academy of Sciences, Beijing, China E-mail:
[email protected] and University of Nebraska, Omaha, NE, USA E-mail:
[email protected] Shouyang Wang Chinese Academy of Sciences, Beijing, China E-mail:
[email protected] Yi Peng University of Electronic Science and Technology of China, Chengdu, China E-mail:
[email protected] Jianping Li Chinese Academy of Sciences, Beijing, China E-mail:
[email protected] Yong Zeng University of Electronic Science and Technology of China, Chengdu, China E-mail:
[email protected] Library of Congress Control Number: Applied for CR Subject Classification (1998): D.2, F.4.3, F.4.2, C.2, K.6 ISSN ISBN-10 ISBN-13
1865-0929 3-642-02297-9 Springer Berlin Heidelberg New York 978-3-642-02297-5 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12689613 06/3180 543210
Preface
MCDM 2009, the 20th International Conference on Multiple-Criteria Decision Making, emerged as a global forum dedicated to the sharing of original research results and practical development experiences among researchers and application developers from different multiple-criteria decision making-related areas such as multiple-criteria decision aiding, multiple criteria classification, ranking, and sorting, multiple objective continuous and combinatorial optimization, multiple objective metaheuristics, multiple-criteria decision making and preference modeling, and fuzzy multiple-criteria decision making. The theme for MCDM 2009 was “New State of MCDM in the 21st Century.” The conference seeks solutions to challenging problems facing the development of multiple-criteria decision making, and shapes future directions of research by promoting high-quality, novel and daring research findings. With the MCDM conference, these new challenges and tools can easily be shared with the multiple-criteria decision making community. The workshop program included nine workshops which focused on different topics in new research challenges and initiatives of MCDM. We received more than 350 submissions for all the workshops, out of which 121 were accepted. This includes 72 regular papers and 49 short papers. We would like to thank all workshop organizers and the Program Committee for the excellent work in maintaining the conference’s standing for high-quality papers. We also express our gratitude to the staff and graduates of the Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences and University of Electronic Science and Technology of China, for their hard work in support of MCDM 2009. We would like to thank the Local Organizing Committee for their persistent and enthusiastic work toward the success of MCDM 2009. We owe special thanks to our sponsors, the University of Science and Technology of China, Sun YatSen University, the Chinese University of Hong Kong, Korea Advanced Institute of Science and Technology, Graduate University of Chinese Academy of Sciences, Southwest Jiaotong University, National Natural Science Foundation of China, Chinese Society of Management Modernization, the Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, the Academy of Mathematics and Systems Science, Chinese Academy of Sciences, University of Nebraska at Omaha, University of Electronic Science and Technology of China, and Springer. MCDM 2009 was jointly organized by the Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, the Academy of Mathematics and Systems Science, Chinese Academy of Sciences, and the University of Nebraska at Omaha. It was hosted by the University of Electronic Science and Technology of China. June 2009
Yong Shi Shouyang Wang Yi Peng Jianping Li Yong Zeng
“This page left intentionally blank.”
Organization
MCDM 2009 was jointly organized by the Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, the Academy of Mathematics and Systems Science, Chinese Academy of Sciences, and University of Nebraska at Omaha. It was hosted by the University of Electronic Science and Technology of China.
Committee and Chairs Honorary Chairs • • • •
Siwei Cheng, Director of Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, China Po Lung Yu, Institute of Information Management, National Chiao Tung University, Taiwan, and School of Business, University of Kansas, Kansas, USA Jifa Gu, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China Weixuan Xu, Institute of Policy and Management, Chinese Academy of Sciences, China
Organizing Committee Conference Chairs • •
Yong Shi, Graduate University of Chinese Academy of Sciences, China/University of Nebraska at Omaha, USA Shouyang Wang, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
Members • • • • • • • • •
Hesham Ali, University of Nebraska at Omaha, USA Valerie Belton, University of Strathclyde, UK Xiaojun Chen, Polytechnic University of Hong Kong, Hong Kong, China Zhangxin Chen, University of Nebraska at Omaha, USA Martin Josef Geiger, University of Southern Denmark, Denmark Chongfu Huang, Beijing Normal University, China Zhimin Huang, Adelphi University, USA Jianming Jia, Southwest Jiaotong University, China Deepak Khazanchi, University of Nebraska at Omaha, USA
Organization
VIII
• • • • • • • • • • • • • • • • • • •
Heeseok Andrew Lee, Korea Advanced Institute of Science and Technology, Korea Duan Li, Chinese University of Hong Kong, Hong Kong, China Liang Liang, Chinese University of Science and Technology, China Zengliang Liu, National Defense University, China Hirotaka Nakayama, Konan University, Japan David Olson, University of Nebraska at Lincoln , USA Xiaowo Tang, Education Bureau of Sichuan Province, China Yingjie Tian, Graduate University of Chinese Academy of Sciences, China Gwo-Hshiung Tzeng, National Chiao Tung University, Taiwan Fan Wang, Sun Yat-Sen University, China Hsiao-Fan Wang, National Tsing Hua University, Taiwan Jiuping Xu, Sichuan University, China Yang Xu, Sichuan Information Industry Department, China Yamamoto Yoshitsugu, University of Tsukuba, Japan Wuyi Yue, Konan University, Japan Yong Zeng, University of Electronic Science and Technology of China, Chengdu, China Guangquan Zhang , University of Technology, Sydney, Australia Lingling Zhang, Graduate University of Chinese Academy of Sciences, China Yanchun Zhang, Victoria University, Australia
Local Organizing Committee Chairs • •
Yong Zeng, University of Electronic Science and Technology of China, Chengdu, China Runtian Jing, University of Electronic Science and Technology of China, Chengdu, China
Members • • • • • •
Gang Kou, University of Electronic Science and Technology of China, Chengdu, China Yi Peng, University of Electronic Science and Technology of China, Chengdu, China Jing He, Chinese Academy of Sciences, China Jianping Li, Chinese Academy of Sciences, China Gushan Shi, Chinese Academy of Sciences, China Zhongfang Zhou, University of Electronic Science and Technology of China, Chengdu, China
Organization
IX
Program Committee Chairs • •
Gang Kou, University of Electronic Science and Technology of China, Chengdu, China Heeseok Andrew Lee, Korea Advanced Institute of Science and Technology, Korea
Tutorials Chairs • •
Jianping Li, Chinese Academy of Sciences, China Milan Zeleny, Fordham University, USA
Workshops Chairs • •
Yi Peng, University of Electronic Science and Technology of China, Chengdu, China Duan Li, Chinese University of Hong Kong, Hong Kong, China
Publicity Chairs • •
Liang Liang, Chinese University of Science and Technology, China Zhongfang Zhou, University of Electronic Science and Technology of China, Chengdu, China
Sponsorship Chairs • •
Fan Wang, Sun Yat-Sen University, China Jing He, Chinese Academy of Sciences, China
Finance Chair •
Gushan Shi, Chinese Academy of Sciences, China
MCDM International Society Executive Committee President Jyrki Wallenius
Helsinki School of Economics, Finland
Members Jim Corner Kalyanmoy Deb
University of Waikato, New Zealand IIT Kanpur/Helsinki School of Economics, India/Finland
X
Organization
Matthias Ehrgott Xavier Gandibleux Martin Geiger Salvatore Greco Birsen Karpak Kathrin Klamroth Murat M. Köksalan Kaisa Miettinen Gilberto Montibeller Yong Shi
Theodor J. Stewart Daniel Vanderpooten Luis Vargas Shouyang Wang
Past Meeting Ex-Officio, University of Auckland, New Zealand University of Nantes, France Newsletter Editor, University of Hohenheim, Germany Universita di Catania, Italy Vice-President of Finance , Youngstown State University, USA University of Erlangen Nuremberg, Germany Chairman of the Awards Committee , Middle East Technical University, Turkey President-Elect, Secretary University of Jyväskylä, Finland London School of Economics, UK Future Meeting Ex-Officio, Chinese Academy of Sciences/University of Nebraska at Omaha, China/USA Immediate Past-President, University of Cape Town, South-Africa l'Université Paris Dauphine, France University of Pittsburgh, USA Future Meeting Ex-Officio, Chinese Academy of Sciences, China
Sponsoring Institutions University of Science and Technology of China Sun Yat-Sen University Chinese University of Hong Kong Korea Advanced Institute of Science and Technology Graduate University of Chinese Academy of Sciences University of Nebraska at Omaha Southwest Jiaotong University National Natural Science Foundation of China Chinese Society of Management Modernization Springer
Table of Contents
Workshop on Evolutionary Methods for Multi-Objective Optimization and Decision Making An Evolutionary Algorithm for the Multi-objective Multiple Knapsack Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Banu Soylu and Murat K¨ oksalan
1
Adaptive Differential Evolution for Multi-objective Optimization . . . . . . . Zai Wang, Zhenyu Yang, Ke Tang, and Xin Yao
9
An Evolutionary Approach for Bilevel Multi-objective Problems . . . . . . . Kalyanmoy Deb and Ankur Sinha
17
Multiple Criteria Decision Making: Efficient Outcome Assessments with Evolutionary Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ignacy Kaliszewski and Janusz Miroforidis
25
Workshop on Mining Text, Semi-structured, Web, or Multimedia Data Automatic Detection of Subjective Sentences Based on Chinese Subjective Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ziqiong Zhang, Qiang Ye, Rob Law, and Yijun Li
29
Case Study on Project Risk Management Planning Based on Soft System Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xie Lifang and Li Jun
37
Experiments with Bicriteria Sequence Alignment . . . . . . . . . . . . . . . . . . . . . Lu´ıs Paquete and Jo˜ ao P.O. Almeida Integrating Decision Tree and Hidden Markov Model (HMM) for Subtype Prediction of Human Influenza A Virus . . . . . . . . . . . . . . . . . . . . . Pavan K. Attaluri, Zhengxin Chen, Aruna M. Weerakoon, and Guoqing Lu Fuzzy Double Linear Regression of Financial Assets Yield . . . . . . . . . . . . . Taiji Wang, Weiyi Liu, and Zhuyu Li Detection of Outliers from the Lognormal Distribution in Financial Economics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yunfei Li, Zongfang Zhou, and Hong Chen
45
52
59
63
XII
Table of Contents
A Bibliography Analysis of Multi-Criteria Decision Making in Computer Science (1989-2009) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gang Kou and Yi Peng
68
Workshop on Knowledge Management and Business Intelligence A Six Sigma Methodology Using Data Mining: A Case Study on Six Sigma Project for Heat Efficiency Improvement of a Hot Stove System in a Korean Steel Manufacturing Company . . . . . . . . . . . . . . . . . . . . . . . . . . Gil-Sang Jang and Jong-Hag Jeon Internal and External Beliefs as the Determinants of Use Continuance for an Internet Search Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Soongeun Hong, Young Sik Kang, Heeseok Lee, and Jongwon Lee
72
81
Development of Knowledge Intensive Applications for Hospital . . . . . . . . . Jongho Kim, Han-kuk Hong, Gil-sang Jang, Joung Yeon Kim, and Taehun Kim
90
Organizational Politics, Social Network, and Knowledge Management . . . Hyun Jung Lee, Sora Kang, and Jongwon Lee
98
Implementation of Case-Based Reasoning System for Knowledge Management of Power Plant Construction Projects in a Korean Company . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gil-Sang Jang
107
Workshop on Data Mining Based Extension Theory Mining Conductive Knowledge Based on Transformation of Same Characteristic Information Element in Practical Applications . . . . . . . . . . Li Xiao-Mei Research on Customer Value Based on Extension Data Mining . . . . . . . . Yang Chun-yan and Li Wei-hua
117 125
The Intelligent System of Cardiovascular Disease Diagnosis Based on Extension Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Baiqing Sun, Yange Li, and Lin Zhang
133
From Satisfaction to Win-Win: A Novel Direction for MCDM Based on Extenics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xingsen Li, Liuying Zhang, and Aihua Li
141
The Methods to Construct Multivariate and Multidimensional Basic-Element and the Corresponding Extension Set . . . . . . . . . . . . . . . . . Li Qiao-Xing and Yang Jian-Mei
150
Table of Contents
XIII
A New Process Modeling Method Based on Extension Theory and Its Application in Purified Terephthalic Acid Solvent System . . . . . . . . . . . . . Xu Yuan and Zhu Qunxiong
154
Research on Liquidity Risk Evaluation of Chinese A-Shares Market Based on Extension Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sun Bai-qing, Liu Peng-xiang, Zhang Lin, and Li Yan-ge
158
Contradictory Problems and Space-Element Model . . . . . . . . . . . . . . . . . . . Wang Tao and Zou Guang-tian
162
Workshop on Intelligent Knowledge Management Knowledge Intelligence: A New Field in Business Intelligence . . . . . . . . . . Guangli Nie, Xiuting Li, Lingling Zhang, Yuejin Zhang, and Yong Shi Mining Knowledge from Multiple Criteria Linear Programming Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peng Zhang, Xingquan Zhu, Aihua Li, Lingling Zhang, and Yong Shi
166
170
Research on Domain-Driven Actionable Knowledge Discovery . . . . . . . . . . Zhengxiang Zhu, Jifa Gu, Lingling Zhang, Wuqi Song, and Rui Gao
176
Data Mining Integrated with Domain Knowledge . . . . . . . . . . . . . . . . . . . . Anqiang Huang, Lingling Zhang, Zhengxiang Zhu, and Yong Shi
184
A Simulation Model of Technological Adoption with an Intelligent Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tieju Ma, Chunjie Chi, Jun Chen, and Yong Shi
188
Research on Ratchet Effects in Enterprises’ Knowledge Sharing Based on Game Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ying Wang, Lingling Zhang, Xiuyu Zheng, and Yong Shi
194
Application of Information Visualization Technologies in Masters’ Experience Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Song Wuqi and Gu Jifa
198
Study on an Intelligent Knowledge Push Method for Knowledge Management System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lingling Zhang, Qingxi Wang, and Guangli Nie
202
Extension of the Framework of Knowledge Process Analysis: A Case Study of Design Research Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Georgi V. Georgiev, Kozo Sugiyama, and Yukari Nagai
209
XIV
Table of Contents
The Ninth International Workshop on Meta-Synthesis and Complex Systems On Heterogeneity of Complex Networks in the Real World . . . . . . . . . . . . Ruiqiu Ou, Jianmei Yang, Jing Chang, and Weicong Xie Some Common Properties of Affiliation Bipartite CooperationCompetition Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Da-Ren He Cases of HWMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xijin Tang
213
220 228
Group Argumentation Info-visualization Model in the Hall for Workshop of Meta-synthetic Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wang Ming-li and Dai Chao-fan
236
Study on Improving the Fitness Value of Multi-objective Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yong Gang Wu and Wei Gu
243
Simulation for Collaborative Competition Based on Multi-Agent . . . . . . . Zhiyuan Ge and Jiamei Liu
251
Fuzzy Optimal Decision for Network Bandwidth Allocation with Demand Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lean Yu, Wuyi Yue, and Shouyang Wang
258
A Comparison of SVD, SVR, ADE and IRR for Latent Semantic Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wen Zhang, Xijin Tang, and Taketoshi Yoshida
266
The Bilevel Programming Model of Earthwork Allocation System . . . . . . Wang Xianjia, Huang Yuan, and Zhang Wuyue
275
Knowledge Diffusion on Networks through the Game Strategy . . . . . . . . . Shu Sun, Jiangning Wu, and Zhaoguo Xuan
282
The Analysis of Complex Structure for China Education Network . . . . . . Zhu-jun Deng and Ning Zhang
290
Priority-Pointing Procedure and Its Application to an Intercultural Trust Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rong Du, Shizhong Ai, and Cathal M. Brugha
296
Exploring Refinability of Multi-Criteria Decisions . . . . . . . . . . . . . . . . . . . . Cathal M. Brugha
304
Methodology for Knowledge Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yoshiteru Nakamori
311
Table of Contents
Study on Public Opinion Based on Social Physics . . . . . . . . . . . . . . . . . . . . Yijun Liu, Wenyuan Niu, and Jifa Gu
XV
318
Context-Based Decision Making Method for Physiological Signal Analysis in a Pervasive Sensing Environment . . . . . . . . . . . . . . . . . . . . . . . . Ahyoung Choi and Woontack Woo
325
A Framework of Task-Oriented Decision Support System in Disaster Emergency Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jun Tian, Qin Zou, Shaochuan Cheng, and Kanliang Wang
333
Study on the Developing Mechanism of Financial Network . . . . . . . . . . . . Xiaohui Wang, Yaowen Xue, Pengzhu Zhang, and Siguo Wang
337
Solving Sudoku with Constraint Programming . . . . . . . . . . . . . . . . . . . . . . . Broderick Crawford, Carlos Castro, and Eric Monfroy
345
A Study of Crude Oil Price Behavior Based on Fictitious Economy Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaoming He, Siwei Cheng, and Shouyang Wang Study on the Method of Determining Objective Weight of Decision-Maker (OWDM) in Multiple Attribute Group Decision-Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Donghua Pan and Yong Zhang Machining Parameter Optimal Selection for Blades of Aviation Engine Based on CBR and Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yan Cao, Yu Bai, Hua Chen, and Lina Yang A Multi-regional CGE Model for China . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Na Li, Minjun Shi, and Fei Wang
349
357
361
370
Workshop on Risk Correlation Analysis and Risk Measurement The Method Research of Membership Degree Transformation in Multi-indexes Fuzzy Decision-Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kaidi Liu, Jin Wang, Yanjun Pang, and Ji-mei Hao Study on Information Fusion Based Check Recognition System . . . . . . . . Dong Wang Crisis Early-Warning Model Based on Exponential Smoothing Forecasting and Pattern Recognition and Its Application to Beijing 2008 Olympic Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Baojun Tang and Wanhua Qiu
374
384
392
XVI
Table of Contents
Measuring Interdependency among Industrial Chains with Financial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jingchun Sun, Ye Fang, and Jing Luo
399
Multi-objective Economic Early Warning and Economic Risk Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guihuan Zheng and Jue Wang
407
An Analysis on Financial Crisis Prediction of Listed Companies in China’s Manufacturing Industries Based on Logistic Regression and Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wenhua Yu, Hao Gong, Yuanfu Li, and Yan Yue
414
Comparative Analysis of VaR Estimation of Double Long-Memory GARCH Models: Empirical Analysis of China’s Stock Market . . . . . . . . . Guangxi Cao, Jianping Guo, and Lin Xu
420
Estimation of Value-at-Risk for Energy Commodities via CAViaR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhao Xiliang and Zhu Xi
429
An Empirical Analysis of the Default Rate of Informal Lending— Evidence from Yiwu, China . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wei Lu, Xiaobo Yu, Juan Du, and Feng Ji
438
Empirical Study of Relations between Stock Returns and Exchange Rate Fluctuations in China . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jian-bao Chen, Deng-ling Wang, and Ting-ting Cheng
447
Cost Risk Tolerance Area of Material Supply in Biomass Power Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sun Jingchun, Chen Jianhua, Fang Ye, and Hou Junhu
455
The Effect of Subjective Risk Attitudes and Overconfidence on Risk Taking Behaviors: A Experimental Study Based on Traders of the Chinese Stock Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qi-an Chen, Yinghong Xiao, Hui Chen, and Liang Chen
461
Application of the Maximum Entropy Method to Risk Analysis of Mergers and Acquisitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jigang Xie and Wenyun Song
473
Internal Control, CPA Recognition and Performance Consequence: Evidence from Chinese Real Estate Enterprises . . . . . . . . . . . . . . . . . . . . . . Chuan Zhang, Lili Zhang, and Yi Geng
477
The Influence of IPO to the Operational Risk of Chinese Commercial Banks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lijun Gao and Jianping Li
486
Table of Contents
XVII
The Measurement and Analysis Risk Factors Dependence Correlation in Software Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ding JianJie, Hou Hong, Hao KeGang, and Guo XiaoQun
493
Assessment of Disaster Emergency Management Ability Based on the Interval-Valued fuzzy TOPSIS Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jing Kun-peng and Song Zhi-jie
501
Dynamic Project Risk Analysis and Management Based on Influence Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaohua Liu and Chaoyuan Yue
507
Risk Prediction and Measurement for Software Based on Service Oriented Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jun Liu, Jianzhong Qiao, and Shukuan Lin
515
Risk Measurement and Control of Water Inrush into Qiyue Mountain Tunnel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ge Yan-hui, Ye Zhi-hua, Li Shu-cai, Lu Wei, and Zhang Qing-song
523
Operational Risk Measurement of Chinese Commercial Banks Based on Extreme Value Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jiashan Song, Yong Li, Feng Ji, and Cheng Peng
531
A Multi-criteria Risk Optimization Model for Trustworthy Software Process Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jianping Li, Minglu Li, Hao Song, and Dengsheng Wu
535
Country Risk Volatility Spillovers of Emerging Oil Economies: An Application to Russia and Kazakhstan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaolei Sun, Wan He, and Jianping Li
540
Modeling the Key Risk Factors to Project Success: A SEM Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juan Song, Jianping Li, and Dengsheng Wu
544
Research on R&D Project Risk Management Model . . . . . . . . . . . . . . . . . . Xiaoyan Gu, Chen Cai, Hao Song, and Juan Song
552
Software Risks Correlation Analysis Using Meta-analysis . . . . . . . . . . . . . . Hao Song, Chen Cai, Minglu Li, and Dengsheng Wu
559
A Two-Layer Least Squares Support Vector Machine Approach to Credit Risk Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jingli Liu, Jianping Li, Weixuan Xu, and Yong Shi
566
Credit Risk Evaluation Using a C-Variable Least Squares Support Vector Classification Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lean Yu, Shouyang Wang, and K.K. Lai
573
XVIII
Table of Contents
Ecological Risk Assessment with MCDM of Some Invasive Alien Plants in China . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guowen Xie, Weiguang Chen, Meizhen Lin, Yanling Zheng, Peiguo Guo, and Yisheng Zheng Empirically-Based Crop Insurance for China: A Pilot Study in the Down-middle Yangtze River Area of China . . . . . . . . . . . . . . . . . . . . . . . . . . Erda Wang, Yang Yu, Bertis B. Little, Zhongxin Chen, and Jianqiang Ren A Response Analysis of Economic Growth to Environmental Risk: A Case Study of Qingdao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yunpeng Qin, Jianyue Ji, and Xiaoli Yu
580
588
595
Workshop on Optimization-Based Data Mining Method and Applications A Multiple Criteria and Multiple Constraints Mathematical Programming Model for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peng Zhang, Yingjie Tian, Dongling Zhang, Xingquan Zhu, and Yong Shi New Unsupervised Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . Kun Zhao, Ying-jie Tian, and Nai-yang Deng Data Mining for Customer Segmentation in Personal Financial Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guoxun Wang, Fang Li, Peng Zhang, Yingjie Tian, and Yong Shi Nonlinear Knowledge in Kernel-Based Multiple Criteria Programming Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dongling Zhang, Yingjie Tian, and Yong Shi
600
606
614
622
A Note on the 1-9 Scale and Index Scale In AHP . . . . . . . . . . . . . . . . . . . . Zhiyong Zhang, Xinbao Liu, and Shanlin Yang
630
Linear Multi-class Classification Support Vector Machine . . . . . . . . . . . . . Yan Xu, Yuanhai Shao, Yingjie Tian, and Naiyang Deng
635
A Novel MCQP Approach for Predicting the Distance Range between Interface Residues in Antibody-Antigen Complex . . . . . . . . . . . . . . . . . . . . Yong Shi, Ruoying Chen, Jia Wan, and Xinyang Zhang
643
Robust Unsupervised Lagrangian Support Vector Machines for Supply Chain Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kun Zhao, Yong-sheng Liu, and Nai-yang Deng
649
Table of Contents
A Dynamic Constraint Programming Approach . . . . . . . . . . . . . . . . . . . . . . Eric Monfroy, Carlos Castro, and Broderick Crawford The Evaluation of the Universities’ Science and Technology Comprehensive Strength Based on Management Efficiency . . . . . . . . . . . . Baiqing Sun, Yange Li, and Lin Zhang
XIX
653
657
Topics in Risk Analysis with Multiple Criteria Decision Making MCDM and SSM in Public Crisis Management: From the Systemic Point of View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yinyin Kuang and Dongping Fan The Diagnosis of Blocking Risks in Emergency Network . . . . . . . . . . . . . . . Xianglu Li, Wei Sun, and Haibo Wang
661 669
How Retailer Power Influence Its Opportunism Governance Mechanisms in Marketing Channel?–An Empirical Investigation in China . . . . . . . . . . . Yu Tian and Xuefang Liao
676
Applications in Oil-Spill Risk in Harbors and Coastal Areas Using Fuzzy Integrated Evaluation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chaofeng Shao, Yufen Zhang, Meiting Ju, and Shengguang Zhang
681
Coexistence Possibility of Biomass Industries . . . . . . . . . . . . . . . . . . . . . . . . Sun Jingchun and Hou Junhu
689
How Power Mechanism Influence Channel Bilateral Opportunism . . . . . . Yu Tian and Shaodan Chen
692
Workshop on Applications of Decision Theory and Method to Financial Decision Making Compromise Approach-Based Genetic Algorithm for Constrained Multiobjective Portfolio Selection Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jun Li Financial Time Series Analysis in a Fuzzy View . . . . . . . . . . . . . . . . . . . . . . Zhuyu Li, Taiji Wang, and Cheng Zhang
697 705
Asset Allocation and Optimal Contract for Delegated Portfolio Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jingjun Liu and Jianfeng Liang
713
The Heterogeneous Investment Horizon and Dynamic Strategies for Asset Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Heping Xiong, Yiheng Xu, and Yi Xiao
721
XX
Table of Contents
Tracking Models for Optioned Portfolio Selection . . . . . . . . . . . . . . . . . . . . Jianfeng Liang Size, Book-to-Market Ratio and Relativity of Accounting Information Value: Empirical Research on the Chinese Listed Company . . . . . . . . . . . . Jing Yu, Siwei Cheng, and Bin Xu
729
737
New Frontiers of Hybrid MCDM Techniques for Problems-Solving Fuzzy MCDM Technique for Planning the Environment Watershed . . . . . Yi-Chun Chen, Hui-Pang Lien, Gwo-Hshiung Tzeng, Lung-Shih Yang, and Leon Yen
744
Nonlinear Deterministic Frontier Model Using Genetic Programming . . . Chin-Yi Chen, Jih-Jeng Huang, and Gwo-Hshiung Tzeng
753
A Revised VIKOR Model for Multiple Criteria Decision Making - The Perspective of Regret Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jih-Jeng Huang, Gwo-Hshiung Tzeng, and Hsiang-Hsi Liu
761
A Novel Evaluation Model for the Vehicle Navigation Device Market Using Hybrid MCDM Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chia-Li Lin, Meng-Shu Hsieh, and Gwo-Hshiung Tzeng
769
A VIKOR Technique with Applications Based on DEMATEL and ANP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yu-Ping Ou Yang, How-Ming Shieh, and Gwo-Hshiung Tzeng
780
Identification of a Threshold Value for the DEMATEL Method: Using the Maximum Mean De-Entropy Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . Li Chung-Wei and Tzeng Gwo-Hshiung
789
High Technology Service Value Maximization through an MCDM-Based Innovative e-Business Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chi-Yo Huang, Gwo-Hshiung Tzeng, Wen-Rong Ho, Hsiu-Tyan Chuang, and Yeou-Feng Lue Airline Maintenance Manpower Optimization from the De Novo Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . James J.H. Liou and Gwo-Hshiung Tzeng A Novel Hybrid MADM Based Competence Set Expansions of a SOC Design Service Firm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chi-Yo Huang, Gwo-Hshiung Tzeng, Yeou-Feng Lue, and Hsiu-Tyan Chuang
797
806
815
Table of Contents
XXI
A Genetic Local Search Algorithm for the Multiple Optimisation of the Balanced Academic Curriculum Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carlos Castro, Broderick Crawford, and Eric Monfroy
824
Using Consistent Fuzzy Preference Relations to Risk Factors Priority of Metropolitan Underground Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shih-Tong Lu, Cheng-Wei Lin, and Gwo-Hshiung Tzeng
833
Using MCDM Methods to Adopt and Assess Knowledge Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ying-Hsun Hung, Seng-Cho T. Chou, and Gwo-Hshiung Tzeng
840
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
849
An Evolutionary Algorithm for the Multi-objective Multiple Knapsack Problem Banu Soylu1 and Murat Köksalan2 1
Department of Industrial Engineering, Erciyes University 38039 Kayseri, Turkey
[email protected] 2 Department of Industrial Engineering, Middle East Technical University 06531, Ankara, Turkey
[email protected] Abstract. In this study, we consider the multi-objective multiple knapsack problem (MMKP) and we adapt our favorable weight based evolutionary algorithm (FWEA) to approximate the efficient frontier of MMKP. The algorithm assigns fitness to solutions based on their relative strengths as well as their nondominated frontiers. The relative strength is measured based on a weighted Tchebycheff distance from the ideal point where each solution chooses its own weights that minimize its distance from the ideal point. We carry out experiments on test data for MMKP given in the literature and compare the performance of the algorithm with several leading algorithms. Keywords: Evolutionary algorithms; multiple knapsack problem.
1 Introduction Evolutionary algorithms (EAs) have been successfully applied to multi-objective combinatorial optimization (MOCO) problems in the last decades. EAs maintain a population of solutions and thus they can obtain multiple efficient solutions in a single run. In this paper, we address the well-known MOCO problem: Multi-objective Multiple Knapsack Problem (MMKP). Given a set of J items and a set of m knapsacks with capacities Ck, k=1,2,…,m, Zitzler and Thiele [1] formulate the MMKP as follows: J
" Maximize "
zk ( x) = ∑ pk , j x j ,
k = 1,..., m
j =1
Subject to J
∑w j =1
k, j
x j ≤ Ck
x j ∈{0,1}
∀k ∀j
where pk,j is the profit obtained by placing item j in knapsack k, wk,j is the capacity item j uses up in knapsack k, and x j takes on the value 1 if the jth item is selected and 0 otherwise. We use the quotation marks since the maximization of a vector is not a well-defined mathematical operation. Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 1–8, 2009. © Springer-Verlag Berlin Heidelberg 2009
2
B. Soylu and M. Köksalan
The bi-objective and single constraint cases of this problem have been well studied in the literature ([2]-[6]). For the MMKP, several meta-heuristic algorithms have been proposed and comparison studies have been conducted using some test instances (available at http://www.tik.ee.ethz.ch/sop/download/). SPEA [1] and SPEA2 [7] are among the first elitist Multi-objective Evolutionary Algorithms (MOEAs) that have been applied to MMKP and yield promising results. NSGA [8] and NSGAII [9] also perform well using the non-dominated sorting concept of Goldberg [10]. Köksalan and Pamuk [11] introduce an evolutionary metaheuristic, called EMAPS, for approximating the preference-non-dominated solutions. They apply EMAPS to the multi-objective spanning tree problem and the MMKP (single constraint). Alves and Almeida [12] use Tchebycheff programming in their EA and implement it for the MMKP. There are successful applications of genetic local search to MMKP. Knowles and Corne [13] perform one of the first studies. They present M-PAES, which incorporates both local search and evolutionary algorithms in multiple criteria framework. Jaskiewicz [14] proposes a genetic local search based evolutionary algorithm (MOGLS) and carries out experiments on MMKP. Guo et al. [15] present another such algorithm that performs the local search operation combined with the simulated annealing. Vianna and Aroyo [16] propose a greedy randomized search procedure (GRASP) that first constructs a feasible solution using a greedy algorithm and then improves it with local search. In this study, we adapt FWEA [17] for MMKP, which we call FWEA_KP. The fitness function of FWEA_KP is designed to both approximate the Pareto frontier and have a good distribution over the Pareto frontier. For this purpose, each member chooses its own favorable weight according to a Tchebycheff distance function. Seed solutions at initial population and a crowding mechanism are also utilized. In Section 2, we give the details of the evolutionary algorithm for MMKP. We present the computational results in Section 3 and discuss the conclusions and directions for further research in Section 4.
2 The FWEA_KP Algorithm A solution x i ∈ X is said to be efficient if there exists no other solution x r ∈ X such that zk ( x r ) ≥ zk ( xi ) for all k and z k ( x r ) > zk ( x i ) for at least one k. We will denote i zk ( xi ) as zk for simplicity. The set of efficient solutions is called the efficient or the Pareto frontier. Efficient solutions are important since the most preferred solution has to be one of the efficient solutions. When a positive linear combination of the objective functions is maximized, the resulting solution is a special efficient solution called a supported efficient solution. The ideal point z * is composed of the best value for each objective, which is obtained by maximizing each objective separately.
2.1 Development of FWEA_KP The main concern while designing EAs is to assign each member a suitable fitness value that will indicate the capability of this member to survive. Therefore, each member needs to have a high fitness score. In our algorithm, this is satisfied by allowing each member to select its own favorite direction according to the Tchebycheff distance function. EMAPS also assigns favorable weights but these are based on a linear distance function. It is likely
An Evolutionary Algorithm for the Multi-objective Multiple Knapsack Problem
3
to experience difficulties in finding the unsupported efficient solutions using linear distance functions. The Tchebycheff distance function alleviates those difficulties. The main aspects of the algorithm are given below. Representation and Genetic Operators In our study, we use a binary chromosome representation of length J, one-point crossover operator with probability pc and bit-wise mutation with probability pmut as in [1]. Each gene of the chromosome represents the value of one decision variable x j . Since such a coding may lead to infeasible solutions, the repair algorithm presented in [1] is applied to the infeasible solutions. Generating the Initial Population Several seed solutions are introduced into the initial population of our algorithm. These are obtained from the LP relaxations of the single objective problems that are constructed by linearly aggregating the objectives using different weight vectors. In this study, we solve the LP relaxation of each objective separately and additionally solve one equally weighted sum of the objectives. Then we apply a round-and-repair procedure where we first include all positive-valued variables in the knapsack. If the resulting solution violates any of the knapsack constraints, it is repaired by applying the procedure presented in [1]. The remaining members of the population are generated randomly. Since we solve LP relaxations of the problems, the computational effort is reasonable. A similar seeding approach is also used in MOTGA [12] successfully. The initial population of MOTGA is generated by solving LP relaxations of the weighted Tchebycheff program with some dispersed weight vectors and applying a round-and-repair procedure for infeasible members. In later iterations of MOTGA, the weighted Tchebycheff program is solved for the newly born weight vectors each time. Fitness Function The fitness function of FWEA_KP is similar to that of FWEA [17]. We measure the strength Δ( z i , z r ) of a member z i ∈ Z relative to some z r ∈ Z , i ≠ r , by using a weighted Tchebycheff distance function as follows: Δ ( z i , z r ) = ϕ ( wi , z r ) − ϕ ( wi , z i ) ,
{
(1)
}
i i where ϕ ( wi , z i ) = max wi ( z * − z i ) , and w is the favorable weight vector of z . We k k k k =1,..., m
find the favorable weights as those that minimize the weighted Tchebycheff distance i of z to the ideal point. These weights can easily be determined in a closed form formula as follows (Steuer [18] p. 425). −1 ⎧ 1 ⎡m 1 ⎤ ⎪ * i ⎢∑ * i ⎥ if zli < zl* for all l ⎪(zk − zk ) ⎣ l =1 (zl − zl ) ⎦ ⎪ wki = ⎨ ⎪ i * ⎪1 if zk = zk ⎪0 if zi < z* but ∃l ∋ zi = z* and l ≠ k k k l l ⎩
(2)
4
B. Soylu and M. Köksalan
Then, the raw fitness value is calculated as: rawfitness( xi ) = α Δ+ (1−α )Δmin
∑
Δ( z i , z r )
is the average relative strength, Δ min = min {Δ( z i , z r )} is the z r ∈Z \zi |Z| worst case measure and α controls the balance between the average and the
where Δ =
z r ∈Z \z i
Table 1. Steps of the FWEA_KP Input Parameters: Pop.Size1: Pop.Size2: # eval.: α:
size of the initial population upper limit on the size of the population maximum number of function evaluations controls the balance between average and minimum strength in raw fitness function, (0.1)
Step 0.
Initialization Generate an initial population and seed it with several members by using the approaches proposed in Section 2.1. Evaluation of the initial population 1.1 Compute the objective values 1.2 Estimate the ideal point 1.3 Compute the favorable weights. 1.4 Determine the frontiers. Compute the fitness scores by adjusting raw fitnesses according to frontiers. Selection Select two parents according to binary tournament selection with replacement operator Crossover Apply one-point crossover with probability pc to generate two offspring. Mutation Apply bit-wise mutation operator with probability pmut. Duplication check If any offspring is a duplicate of any existing population member (in decision variable space), then discard that offspring. If both of them are duplicate, then discard both and goto Step 10. Evaluation of the offspring Repeat Steps 1.1-1.3. Stillborn check If any offspring is dominated by any member of the worst frontier of current population (offspring is stillborn), then discard that offspring. If both of them are stillborn, then discard both and goto Step 10. Insertion and Replacement 8.1 Increment population cardinality and insert the offspring one by one into the population.
Step 1.
Step 2. Step 3. Step 4. Step 5.
Step 6. Step 7.
Step 8.
r
Step 9. Step 10.
8.2 If the offspring z off does not outperform any population member z with its own weights wr , remove the member having the lowest crowding measure if the population cardinality exceeds a preset upper limit. 8.3 If the offspring outperforms any existing population members at their own favorable weights: 8.3.1 Remove the weakest such member unless that member is on the first frontier. 8.3.2 Remove the member with lowest crowding measure, if the weakest such member is on the first frontier. Update ranks and raw fitnesses. Adjust raw fitness scores. Termination If a stopping condition is not reached, goto Step 2.
An Evolutionary Algorithm for the Multi-objective Multiple Knapsack Problem
5
minimum strengths and is assigned a value between 0 and 1. In order to further differentiate the non-dominated solutions, we make sure that the fitness score of a dominated solution is worse than that of the solution in a better frontier (see [17] for details) utilizing the non-dominated sorting idea of Goldberg [10]. The Crowding Measure Each member (in the last frontier) determines its own nearest neighbor with its own favorable weights. We compute the crowding measure between each member and its nearest neighbor using rectilinear distance. For the member having the smallest distance to its nearest neighbor, we remove either the neighbor or the member itself; whichever has the smaller fitness value. The insertion and replacement rules, as well as the fitness update strategies are similar to those used in FWEA. The steps of the algorithm are given in Table 1.
3 Computational Results In this section, we compare the performance of our algorithm with EMAPS, SPEA2 and NSGAII. We present the computational results in terms of the proximity indicator [19] and the hypervolume indicator [20]. Before computing these indicator values we scale each point using the range of the Pareto frontier. The proximity indicator is normally computed relative to the Pareto frontier. However, in MMKP, there is difficulty in finding the exact Pareto frontier for more than two objectives due to computational complexity. In order to overcome this problem, we construct a Non-Dominated Union Set (NDUS) that includes the non-dominated members of all algorithms, and we compute the proximity indicator relative to NDUS. We also compute the percent contribution of each algorithm to NDUS. Since the proximity indicator measures the distance from the efficient solutions (or their proxies), its smaller values are desirable. On the other hand, the hypervolume indicator tries to capture the hypervolume dominated by the solutions of an algorithm, and hence its larger values are more desirable. We consider m=2, 3 and 4 knapsacks with J=750 items and perform 10 independent runs for each problem in our experiments. The parameters pk,j and wk,j are from a discrete uniform distribution in the interval [10,100]. The values of these parameters as well as the Pareto optimal set for m=2 and the run results of NSGAII and SPEA2 are taken from the site http://www.tik.ee.ethz.ch/~zitzler/testdata.html/ by Zitzler. The knapsack capacities are set to half of the total weight of all the items considered for the kth knapsack. Table 2 shows the parameter settings of FWEA_KP and EMAPS for MMKP. The crossover and mutation probabilities are chosen as 0.80 and 0.01, respectively, for FWEA_KP. All these parameter values are the same as those used by Zitzler and Thiele [1]. We choose α=0.1 in FWEA_KP based on our preliminary experiments [21]. For EMAPS we use the genetic operators and parameter settings stated in [11]. We provide the proximity indicator (P.I.) and the hypervolume indicator (H.V.) results of different algorithms in Table 3.
6
B. Soylu and M. Köksalan Table 2. Parameter settings of FWEA_KP and EMAPS for MMKP POPSIZE 1 POPSIZE 2 # of function evaluations
m=2, J=750 200 250 125000
m=3, J=750 240 300 150000
m=4, J=750 280 350 175000
Table 3. The proximity indicator (P.I.) and the hypervolume indicator (H.V.) results of FWEA_KP, EMAPS, SPEA2 and NSGAII Algorithm
SPEA2
NSGAII FWEA_KP (m+1-seed) EMAPS (m+1-seed) FWEA_KP (11-seed) EMAPS
Avg. Best Worst Avg. Best Worst Avg. Best Worst Avg. Best Worst Avg. Best Worst Avg. Best Worst
m=2, J=750 P.I. to H.V. * Pareto 0.1127 0.6008 0.1001 0.6069 0.1210 0.5870 0.1455 0.5765 0.1249 0.5879 0.1589 0.5622 0.0568 0.6919 0.0536 0.6942 0.0612 0.6844 0.1055 0.5937 0.0963 0.6169 0.1245 0.5550 0.0210 0.7517 0.0185 0.7550 0.0218 0.7483 0.1195 0.5404 0.1080 0.5446 0.1463 0.5366
m=3, J=750 P.I. to H.V. NDUS 0.1727 0.3319 0.1326 0.3411 0.1888 0.3219 0.1834 0.3175 0.1508 0.3211 0.1998 0.3126 0.0197 0.4091 0.0139 0.4146 0.0246 0.4042
m=4, J=750 P.I. to H.V. NDUS 0.2042 0.0344 0.1925 0.0389 0.2462 0.0312 0.2189 0.0194 0.1984 0.0200 0.2428 0.0146 0.0102 0.0592 0.0009 0.0604 0.0505 0.0560
-
-
-
-
-
-
-
-
-
-
-
-
* The hypervolume of the efficient frontier is 0.7832 for the m=2, J=750 problem.
The results indicate that FWEA_KP is closer to the Pareto frontier than all other algorithms tested. The performances of SPEA2 and EMAPS are similar and both seem to outperform NSGAII. As the original EMAPS was designed for single constraint knapsack problems, it seems to need more calibration in order to adapt for the multiple-constraint case. We therefore, continue experiments with FWEA_KP, SPEA2 and NSGAII. We also experimented with FWEA_KP on the m=2, J=750 problem using more seed solutions. For this purpose, we use 11 equally spaced seed vectors obtained with the weights: [0.0, 1.0], [0.1, 0.9],…,[1.0, 0.0]. The results indicate that additional seed solutions further improve the performance of FWEA_KP. Regarding the hypervolume indicator presented in Table 3, our inferences are similar to those of the proximity indicator in favor of FWEA_KP. The results regarding the percent contribution of algorithms to NDUS are given in Table 4. According to this table, FWEA_KP makes the highest contribution to NDUS by far followed by SPEA2. Moreover, as the number of objectives increases, the contribution of FWEA_KP increases while the contributions of the other algorithms decrease. To show the performance of algorithms with a larger number of function evaluations, we perform further experiments with 480,000 function evaluations. We present the results in Table 5 for the m=2, J=750 problem. As the number of function evaluations increases, the performances of all algorithms improve. However, the performance of
An Evolutionary Algorithm for the Multi-objective Multiple Knapsack Problem
7
FWEA_KP for 125,000 function evaluations is still better than those of SPEA2 and NSGAII for 480,000 function evaluations. Table 4. Percent contribution of FWEA_KP, SPEA2 and NSGAII to NDUS Algorithm SPEA2
NSGAII FWEA_KP (m+1-seed)
Avg. Best Worst Avg. Best Worst Avg. Best Worst
m=3, J=750 % contribution to NDUS 24.6 32.9 11.9 10.9 16.0 5.2 64.6 72.4 54.4
m=4, J=750 % contribution to NDUS 6.7 12.6 1.1 2.42 10.7 0.0 90.9 98.6 76.9
Table 5. The results of FWEA_KP, SPEA2 and NSGAII for 480,000 function evaluations Algorithm SPEA2
NSGAII FWEA_KP (m+1-seed)
Avg. Best Worst Avg. Best Worst Avg. Best Worst
m=2, J=750 Proximity to Pareto frontier 0.08739 0.07738 0.09487 0.11326 0.09943 0.11929 0.04755 0.04423 0.05506
m=2, J=750 Hypervolume indicator 0.65479 0.66478 0.64953 0.62999 0.63930 0.61636 0.70729 0.71236 0.65479
4 Conclusions We adapted our FWEA algorithm to approximate the Pareto frontier of MMKP. According to our experimental results, FWEA_KP outperforms EMAPS, SPEA2 and NSGAII. A future research direction may be to employ seeding mechanisms on other EAs as well. A well-dispersed set of seed solutions may improve the performance of the algorithms. Another future research direction is to improve the mechanisms with which we measure the performances of the algorithms. In this paper we had difficulty due to the unknown efficient frontiers of the test problems. It would be helpful to find ways of better approximating the efficient frontiers or generating them exactly for larger size problems.
References [1] Zitzler, E., Thiele, L.: Multiobjective Evolutionary Algorithms: A Comparative Case Study and the Strength Pareto Approach. IEEE T Evolut. Comput. 3(4), 257–271 (1999) [2] Visée, M., Teghem, J., Pirlot, M., Ulungu, E.L.: Two-phase method and branch and bound procedures to solve the bi-objective knapsack problem. J. Global Optim. 12, 139–155 (1998)
8
B. Soylu and M. Köksalan
[3] Captivo, M.E., Climaco, J., Figueira, J., Martins, E., Santos, J.L.: Solving bicriteria 0-1 knapsack problems using a labelling algorithm. Comput. Oper. Res. 30, 1865–1886 (2003) [4] Silva, C.G., Climaco, J., Figueira, J.: A scatter search method for bi-criteria {0,1}- knapsack problems. Eur. J. Oper. Res. 169, 373–391 (2006) [5] Silva, C.G., Figueira, J., Cimaco, J.: Integrating partial optimization with scatter search for solving bi-criteria {0,1}-knapsack problems. Eur. J. Oper. Res. 177, 1656–1677 (2007) [6] Bazgan, C., Hugot, H., Vanderpooten, D.: Solving efficiently the 0-1 multi-objective knapsack problem. Comput. Oper. Res. 36, 260–279 (2009) [7] Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the Strength Pareto Evolutionary Algorithm. TIK-Report, No.103, Swiss Federal Institute of Technology, Switzerland (2002) [8] Srinivas, N., Deb, K.: Multiobjective Function Optimization Using Non-dominated Sorting Genetic Algorithm. Evol. Comput. 2(3), 221–248 (1994) [9] Deb, K., Amrit, P., Agarwal, S., Meyarivan, T.: A Fast and Elitist Multi-Objective Genetic Algorithm-NSGA-II. IEEE T Evolut. Comput. 6(2), 182–197 (2002) [10] Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading (1989) [11] Köksalan, M(P.)., Phelps, S.: An evolutionary metaheuristic for approximating preference-nondominated solutions. INFORMS J. Comput. 19(2), 291–301 (2007) [12] Alves, M.J., Almeida, M.: MOTGA: A multiobjective Tchebycheff based genetic algorithm for the multidimensional knapsack problem. Comput. Oper. Res. 34, 3458–3470 (2007) [13] Knowles, J.D., Corne, D.W.: Approximating the nondominated front using the pareto archived evolution strategy. Evol. Comput. 8(2), 149–172 (2000) [14] Jaskiewicz, A.: On the computational efficiency of multiple objective metaheuristics. The knapsack problem case study. Eur. J. Oper. Res. 158, 418–433 (2004) [15] Guo, X., Yang, G., Wu, Z.: A hybrid fine-tuned multi-objective memetic algorithm. IEICE Trans. Fundamentals E89-A(3), 790–797 (2006) [16] Vianna, D.S., Arroyo, J.E.C.: A GRASP algorithm for the multiobjective knapsack problem. In: Proc XXIV Int. Conf. Chilean Comp. Sci. Soc. (2004) [17] Soylu, B., Köksalan, M.: A favorable weight based evolutionary algorithm for multiple criteria problems. IEEE T Evolut. Comput. (2009) (forthcoming) [18] Steuer, R.E.: Multiple criteria optimization: theory, computation and application. John Wiley & Sons, Inc., New York (1986) [19] Bosman, P.A.N., Thierens, D.: The balance between proximity and diversity in multiobjective evolutionary algorithms. IEEE T Evolut. Comput. 7(2), 174–188 (2003) [20] Zitzler, E., Laumanns, M., Thiele, L., Fonseca, C.M., Fonseca, V.G.: Performance assessment of multiobjective optimizers: An analysis and review. IEEE T Evolut. Comput. 7(2), 117–132 (2003) [21] Soylu, B.: An Evolutionary Algorithm for Multiple Criteria Problems. Ph.D. dissertation, Middle East Technical University, Industrial Engineering Department, Ankara-Turkey (2007)
Adaptive Differential Evolution for Multi-objective Optimization Zai Wang, Zhenyu Yang, Ke Tang , and Xin Yao Nature Inspired Computation and Applications Laboratory (NICAL), Department of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, China {wangzai,zhyuyang}@mail.ustc.edu.cn,
[email protected],
[email protected] http://nical.ustc.edu.cn
Abstract. No existing multi-objective evolutionary algorithms (MOEAs) have ever been applied to problems with more than 1000 real-valued decision variables. Yet the real world is full of large and complex multiobjective problems. Motivated by the recent success of SaNSDE [1], an adaptive differential evolution algorithm that is capable of dealing with more than 1000 real-valued decision variables effectively and efficiently, this paper extends the ideas behind SaNSDE to develop a novel MOEA named MOSaNSDE. Our preliminary experimental studies have shown that MOSaNSDE outperforms state-of-the-art MOEAs significantly on most problems we have tested, in terms of both convergence and diversity metrics. Such encouraging results call for a more in-depth study of MOSaNSDE in the future, especially about its scalability.
1
Introduction
Multi-objective Optimization Problems (MOPs) often involve several incommensurable and competing objectives which need to be considered simultaneously. In the past decade, using evolutionary techniques to tackle MOPs has attracted increasing interests and a number of effective multi-objective evolutionary algorithms (MOEAs) have been proposed [3,4]. For MOEAs, how to generate new individuals (i.e., what reproduction operator we should use) is one of the most important issues. One general approach to devising effective reproduction operators for MOEA is to adopt advanced single objective optimization algorithms to MOPs, and there exist several successful attempts in this direction [3,11].
This work is partially supported by the National Natural Science Foundation of China (Grant No. 60428202), The Fund for Foreign Scholars in University Research and Teaching Programs (Grant No. B07033) and an EPSRC Grant (EP/D052785/1) on “SEBASE: Software Engineering By Automated SEarch”. Corresponding author. Xin Yao is also with CERCIA, the School of Computer Science, University of Birmingham, Edgbaston, Birmingham B15 2TT, U.K.
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 9–16, 2009. c Springer-Verlag Berlin Heidelberg 2009
10
Z. Wang et al.
Differential evolution (DE) is a simple yet effective algorithm for single objective global optimization problems [5]. It conventionally involves several candidate mutation schemes control parameters, e.g., population size NP, scale factor F and crossover rate CR. These control parameters, as well as mutation schemes, are usually problem dependent and highly sensitive, which often make DE difficult to be utilized in practice. To overcome such disadvantages, we have proposed a DE variant, namely self-adaptive differential evolution with neighborhood search (SaNSDE), in [1]. Three adaptation mechanisms are utilized in the SaNSDE: adaptation for the selection of mutation schemes, adaptations for controlling scale factor F and crossover rate CR. As a result, no parameters fine tune is needed in the algorithm. Empirical studies showed that SaNSDE not only significantly outperformed the original DE on standard benchmark problems [1], but also obtained promising performances on large-scale problems with 1000 dimensions [2]. Due to SaNSDE’s outstanding performance in single objective optimization [1], it is natural to ask whether it will benefit to MOPs as well. For this purpose, we extend SaNSDE in this paper by introducing the Pareto dominance concept into its fitness evaluation. An external archive is also adopted in the proposed algorithm, namely MOSaNSDE, in order to boost its performance. The effectiveness of MOSaNSDE was evaluated by comparing MOSaNSDE to three well-known MOEAs on nine benchmark problems. The rest of this paper is organized as follows. Section 2 summarizes the multiobjective optimization problems and the SaNSDE algorithm. Section 3 describes the new MOSaNSDE algorithm. Section 4 presents the simulation results of MOSaNSDE and the comparison with three other competitive MOEAs. Section 5 concludes this paper briefly.
2 2.1
Preliminaries Multi-objective Optimization Problem
A general multi-objective optimization problem with m conflicting objectives can be described as follow: max/min y = f (x) = (f1 (x), f2 (x), ..., fm (x)) subject to x = (x1 , x2 , ..., xn ) ∈ X y = (y1 , y2 , ..., ym ) ∈ Y
(1)
where x is decision vector and X is the decision space, y is the objective vector, and Y is the objective space. As the objectives of MOPs are conflicting, there might not exist a unique solution which is optimal with respect to all objectives. Instead, there are usually a set of Pareto optimal solutions that are nondominated with one another. The Pareto solutions together make up the so called Pareto-front. In the context of MOPs, we aim at finding a set of nondominated solutions that involve good convergence to the Pareto-front and distribution along it.
Adaptive Differential Evolution for Multi-objective Optimization
2.2
11
Self-adaptive Neighborhood Differential Evolution (SaNSDE)
Differential evolution (DE), proposed by Storn and Price in [5], is a populationbased algorithm which employs a random initialization and three reproductive operators (i.e. mutation, crossover and selection) to evolve its population until a stopping criterion is met. Individuals in DE are represented as a D-dimensional vector xi , ∀i ∈ {1, 2, ..., NP}, where D is the number of dimension variables and N P is the population size. The classical DE is summarized as follow: – Mutation: vi = xi1 + F · (xi2 − xi3 )
(2)
where i1 , i2 , i3 are different integers randomly selected from [1, NP] and they are different with the vector index i, while F is a positive scaling factor. – Crossover: if Uj (0, 1) < CR vi (j), ui (j) = (3) xi (j), otherwise. where ui (j) is the value of j th dimension of the offspring vector ui , Uj (0, 1) is a uniform random number between 0 and 1, CR ∈ (0, 1) is the crossover rate. – Selection: if f (ui ) < f (xi ) ui , xi = (4) xi , otherwise. where xi is the offspring of xi for the next generation. Although the original DE performs well on a large variety of problems, it lacks the Neighborhood Search (NS) operator. Thus, Yang et al. borrowed the idea of neighborhood search from another major branch of evolutionary algorithms, evolutionary programming, and proposed the SaNSDE. SaNSDE is similar to the original DE except that Eq. (2) is replaced by the following Eq.(5): if U (0, 1) < SC di · |N (0.5, 0.3)|, vi = xi1 + (5) di · |δ|, otherwise. where di = (xi2 − xi3 ) is the differential variation, N (0.5, 0.3) is a Gaussian random number with mean 0.5 and standard deviation 0.3, and δ denotes a Cauchy random variable with scale parameter t = 1. SC is the selection criterion to guide which random number (Gaussian or Cauchy) should be used, thus the main parameters of SaNSDE are SC and CR instead of F and CR of original DE. The idea behind SaNSDE is to adapt SC and CR throughout the optimization process via some learning scheme. Concretely, SaNSDE divides the optimization process into several learning periods, each of which consists of a predefined number of generations. Assume the kth learning period has finished and we need to update SC and CR for the next learning period. Let the number of offsprings generated with Gaussian distribution and Cauchy distribution that successfully replaced their parents during the kth learning period be nsg and
12
Z. Wang et al.
nsc, respectively. Let the number of offsprings generated with Gaussian distribution and Cauchy distribution that failed to replace their parents during the kth learning period be nfg and nfc. The SC is updated as Eq. (6): SC =
nsg · (nsc + nf c) nsc · (nsg + nf g) + nsg · (nsc + nf c)
(6)
In SaNSDE, the value of CR is randomly drawn form a Gaussian distribution with mean CRm (initialized as 0.5) and stand deviation 0.1. Within each learning period, the values of CR for each individual will be changed every five generations. At the end of the learning period, CRm is updated to the average of the CR values of the offsprings that have successfully survived to the next generation. Then the next learning period begin. In [1], the length of each learning period was defined as 20 generations, we adopt the same setting in this paper.
3
New Algorithm: MOSaNSDE
Algorithm 1. Pseudo-Code of MOSaNSDE Set the parent population P = φ, the external archive A = φ, and the generation counter t = 0. Initialize the population P with NP individuals P = {p1 , p2 , ..., pNP } and set A = P. while t < tmax (i.e. the terminate generation number) do Update the parameters SC and CR after each learning period. for i=1:N P do Using SaNSDE to generate an offspring individual ci based on pi . If pi dominate ci , ci is rejected. If pi is dominated by ci , pi is replaced by ci and update the archive A. If pi and ci are nondominated by each other, the less crowded one with A will be selected as the new pi . 10: end for 11: Update the archive A. 12: Set t = t + 1. 13: end while 14: the nondominated population in A are the solutions.
1: 2: 3: 4: 5: 6: 7: 8: 9:
From the experimental studies in [1], it can be observed that SaNSDE outperforms not only the original DE, but also several state-of-the-art DE variants on a set of test problems. The advantages of SaNSDE has also been detailedly explained in [1]. Hence we extend SaNSDE to multi-objective optimization problems, due to the success of the similar vein in [3,11]. Algorithm 1 presents the pseudo code of MOSaNSDE. Next, we briefly summarize the major steps of the algorithm. First of all, an initial population is randomly generated according to uniform distribution, then an external archive A is established to store the nondominated solutions found as far. SaNSDE serves as the reproduction operator to generate new solutions. Both the population and the external archive evolve throughout the optimization process. In each generation, an offspring individual will replace its parent if the former dominates the latter. Otherwise, the parent individual will be preserved. In case the two individuals are nondominated by each other, crowding distance [4] between the two solutions and those in the external archive will be calculated, and the one with larger crowding distance will survive. The external archive is updated following several rules. If a new
Adaptive Differential Evolution for Multi-objective Optimization
13
solution is nondominated by any solution in the archive, it will be inserted into the archive. At the same time, those solutions (if any) in the archive that are dominated by the new solution will be removed. When the size of the archive exceeds a predefined value, truncation is required. We first calculate the crowding distance of each individual in the archive, then sort them in descending order. Those individuals with smallest crowding distance will be discarded.
4 4.1
Simulation Results Experimental Settings
We evaluated the performance of the new algorithm on nine widely used test problems (seven bi-objective problems and two 3-objective problems) in the MOEA literature. Three of the bi-objective problems, i.e., SCH, FON, KUR, were firstly proposed by Schaffer, Fonseca and Kursawe in [6,7,8], respectively. The other four bi-objective problems (ZDT1-3, ZDT6) were proposed by Zitzler et al. in [9]. The two 3-objective problems, DTLZ1 and DTLZ2, were proposed by Deb et al. in [10]. Due to the space constraint, we will not list the detailed characteristics of them in this paper, readers can get the explicit formulation of them in the original publications. In the experiments, we compared our MOSaNSDE with other three wellknown MOEAs, including the Nondominated Sorting Genetic Algorithm II (NSGA-II) [4], the Multi-Objective Particle Swarm Optimization (MOPSO) [11], and the Pareto Archived Evolution Strategy (PAES) [12]. These algorithms have been widely used in the literature of MOPs and provide a good basis for our comparative study. For each compared algorithm, 250 generations are simulated per run on all of the test problems. The parameters of MOSaNSDE were set as follows: population size NP = 50 for bi-objective problems and NP = 150 for 3-objective problems, archive size Nmax = 100 for bi-objective problems and Nmax = 300 for 3-objective problems, and “learning period” of SC and CR are both set to be 20 for all test problems. NSGA-II uses the real-coded format with a population size of 100 for bi-objective problems and 300 for 3-objective problems, with crossover rate 0.9 and mutation rate 1/n (n is the number of the decision variables). We also set the parameters for distributions as ηc = 20 and ηm = 20, which are the same as the settings in [4]. For MOPSO, the number of particles was set to 50 for bi-objective problems and 150 for 3-objective problems, the size of repository is 100 for bi-objective problems and 300 for 3-objective problems, and the number of divisions was set to 30. PAES adopted the (1 + λ) scheme with an archive size of 100 for bi-objective problems and 300 for 3-objective problems, while the grid depth was set to 4 for all the test problems. The goal of solving the MOPs are twofold: 1) the solutions obtained should converge as close to the true Pareto-optimal set as possible. 2) the solutions should remain a certain degree of diversity. Based on the above two goals, two metrics have been proposed to measure MOEAs’ performance [4]:
14
Z. Wang et al.
– Convergence Metric (γ). This metric calculates the average distance between the obtained nondominated solutions and the actual Pareto-optimal set, it can be calculated as follows: i=N γ=
i=1
di
N
where di is the Euclidean distance between the ith solution of the N obtained solutions and its nearest neighbor on the actual Pareto-optimal front. A smaller value of γ indicates a better convergence performance. – Spread Metric (Δ). This metric was proposed by Deb et al. in [4] and it measures how well the obtained nondominated solutions distributed: M N −1 e m=1 dm + i=1 |di − d| Δ= M e m=1 dm + (N − 1)d where dem is the Euclidean distance between the extreme solutions of the obtained solutions and the boundary solutions of the actual Pareto set. The parameter di is the Euclidean distance between the two neighboring solutions. d is the mean value of all di ’s. Same as γ, smaller the value of Δ means the better performance. 4.2
Results
We ran all the MOEAs for 30 times independently, and then calculated the means and variances of two metrics (i.e, the convergence and diversity metric). The results are presented in Tables 1 and 2. The best results among the four algorithms are shown in bold. Furthermore, with the purpose to observe the statistical differences between the results obtained by MOSaNSDE and the results obtained by other three algorithms, we employ the nonparametric Wilcoxon sum tests. For each test problem, the Wilcoxon test was carried out between MOSaNSDE and the best one in the three compared algorithms. The h values presented in the last row of Table 1 and Table 2 are the results of Wilcoxon tests, where “1” indicates that the performances of two algorithms are statistically different with 95% certainty, while h = 0 means they are not statistically different. From Tables 1 and 2, we can find that MOSaNSDE converged same as or a little better than other three representative algorithms on the two simple test functions (SCH, FON) and a 3-objective problem (DTLZ1). MOSaNSDE achieved the same best results as MOPSO and PAES on ZDT2 with respect to the convergence metric, while they are all much better than the NSGA-II. On the other five test functions (KUR, ZDT1, ZDT3, ZDT6 and DTLZ2), MOSaNSDE significantly outperformed other three algorithms with respect to the convergence metric. Concerning the diversity metric, it can be observed that MOSaNSDE spread significantly better than other three algorithms on all test functions except the SCH and DTLZ1, on which the performances of four algorithms are comparable.
Adaptive Differential Evolution for Multi-objective Optimization
15
Table 1. MOEAs compared based on Convergence Metric (γ) (mean in the first rows and variance in the second rows, the results which are significantly better than other three algorithms are emphasized in boldface) SCH
FON
KUR
ZDT1
MOSaNSDE 0.006923 0.002248 0.023275 0.001161 2.31E-07 6.47E-08 5.08E-06 2.92E-08
ZDT2
ZDT3
ZDT6
DTLZ1
DTLZ2
0.001432 0.003181 0.006405 0.075387 0.023694 4.54E-08 2.95E-08 6.78E-07 3.67E-03 4.85E-07
NSGA-II
0.008068 0.003165 0.061022 4.32E-07 6.29E-08 5.75E-05
0.072084 9.20E-04
MOPSO
0.007322 0.002454 0.030052 4.28E-07 5.37E-08 2.73E-05
0.018577 0.0017045 0.130576 0.330672 0.378510 0.186092 7.23E-05 5.92E-04 5.54E-05 7.73E-01 4.18E-02 7.35E-06
PAES
0.008004 0.002221 0.984901 5.93E-07 4.67E-08 2.84E-01 0 0 1
0.004046 7.10E-05 1
h
0.052184 0.310079 0.415028 0.065297 0.042863 3.67E-04 7.34E-03 6.38E-02 6.57E-02 1.36E-05
0.001612 0.021562 1.450573 0.096420 0.05796 5.39E-07 7.22E-05 3.02E-01 3.26E-03 5.62E-06 0 1 1 0 1
Table 2. MOEAs compared based on Diversity Metric (Δ) (mean in the first rows and variance in the second rows, the results which are significantly better than other three algorithms are emphasized in boldface) SCH
FON
KUR
ZDT1
ZDT2
ZDT3
ZDT6
DTLZ1
DTLZ2
MOSANSDE 0.344681 0.230678 0.382288 0.246235 0.261846 0.497681 0.325468 1.169830 0.553207 1.33E-03 2.63E-04 1.17E-04 2.83E-03 7.46E-04 2.86E-04 6.07E-02 3.72E-02 9.20E-04 NSGA-II
0.423661 0.397886 4.65E-03 2.12E-03
0.632615 0.675597 6.67E-03 1.73E-03
0.957422 7.82E-02
0.791458 1.54E-03
1.064076 1.569836 0.953721 3.32E-02 3.92E-02 5.14E-02
MOPSO
0.557639 0.568493 6.10E-04 6.74E-03
0.586673 0.580741 2.57E-03 3.65E-03
0.650889 7.97E-02
0.543900 1.88E-03
0.963582 0.852471 1.352095 5.22E-04 4.7E-03 6.24E-03
PAES
0.802243 0.571838 2.45E-03 7.05E-03 0 1
0.675707 0.821802 9.58E-03 6.06E-02 1 1
0.839597 3.39E-02 1
0.750043 3.95E-03 1
0.873567 1.069328 0.772964 9.25E-02 8.17E-03 7.28E-02 1 0 1
h
5
Conclusions
In this paper, we extended our previous single-objective algorithm, SaNSDE, to the multi-objective optimization field and proposed a new MOEA, namely MOSaNSDE. The self-adaptation utilized in SaNSDE make possible controlling the sensitive parameters of DE via statistical learning experience during evolution. Consequently, MOSaNSDE is also capable of adapting its control parameters effectively. Experimental studies on nine benchmark problems showed that MOSaNSDE performed comparably or significantly better than three wellknown MOEAs, in terms of both convergence and diversity metrics. Recently, scaling up MOEAs to large-size problems has emerged as the most challenging research topic in the field of evolutionary multi-objective optimization [13,14]. Given SaNSDE’s superior performance on high-dimensional single-objective optimization problems, MOSaNSDE might also be a potential tool for MOPs with many decision variables. This issue, as well as scaling up MOSaNSDE to MOPs with many objectives, will be the major foci of our future investigation.
16
Z. Wang et al.
References 1. Yang, Z., Tang, K., Yao, X.: Self-adaptive Differential Evolution with Neighborhood Search. In: Proceedings of the 2008 Congress on Evolutionary Computation, pp. 1110–1116. IEEE Press, Hong Kong (2008) 2. Yang, Z., Tang, K., Yao, X.: Large Scale Evolutionary Optimization Using Cooperative Coevolution. Information Sciences 178, 2985–2999 (2008) 3. Knowles, J.D., Corne, D.W.: Approximating the Nondominated Front Using the Pareto Archived Evolution Strategy. Evolutionary Computation 8, 149–172 (2000) 4. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 182–197 (2002) 5. Storn, R., Price, K.: Differential Evolution - A Simple and Efficient Heuristic Strategy for Global Optimization over Continuous Spaces. Journal of Global Optimization 11, 341–359 (1997) 6. Schaffer, J.D.: Multiple Objective Optimization with Vector Evaluated Genetic Algorithms. In: Proceedings of the First International Conference on Genetic Algorithms, pp. 93–100 (1987) 7. Fonseca, C.M., Fleming, P.J.: Multiobjective Optimization and Multiple Constraint Handling with Evolutionary Algorithms-Part II: Application Examples. IEEE Transcations on System, Man and Cybernetics, Part A 28, 8–47 (1998) 8. Kursawe, F.: A Variant of Evolution Strategies for Vector Optimization. In: Schwefel, H.-P., M¨ anner, R. (eds.) PPSN 1990. LNCS, vol. 496, pp. 193–197. Springer, Heidelberg (1991) 9. Zitzler, E., Deb, K., Thiele, L.: Comparison of Multiobjective Evolutionary Algorithms: Empirical Results. Evolutionary Computation 8, 173–195 (2000) 10. Deb, K., Thiele, L., Laumanns, M., Zitzler, E.: Scalable Multiobjective Optimization Test Problems. In: Proceedings of the Congress on Evolutionary Computation, pp. 825–830 (2002) 11. Coello, C.A.C., Pulido, G.T., Lechuga, M.S.: Handling Multiple Objectives with Particle Swarm Optimization. IEEE Transcations on Evolutionary Computation 8, 256–279 (2004) 12. Knowles, J.D., Corne, D.W.: The Pareto Archived Evolution Strategy: A New Baseline Algorithm for Pareto Multiobjective Optimization. In: Proceedings of the Congress on Evolutionary Computation, pp. 98–105 (1999) 13. Praditwong, K., Yao, X.: How Well Do Multi-objective Evolutionary Algorithms Scale to Large Problems. In: Proceedings of the 2007 IEEE Congress on Evolutionary Computation (CEC 2007), Singapore, pp. 3959–3966 (2007) 14. Khare, V., Yao, X., Deb, K.: Performance Scaling of Multi-objective Evolutionary Algorithms. In: Fonseca, C.M., Fleming, P.J., Zitzler, E., Deb, K., Thiele, L. (eds.) EMO 2003. LNCS, vol. 2632, pp. 376–390. Springer, Heidelberg (2003)
An Evolutionary Approach for Bilevel Multi-objective Problems Kalyanmoy Deb and Ankur Sinha Department of Business Technology, Helsinki School of Economics, Finland {Kalyanmoy.Deb,Ankur.Sinha}@hse.fi Abstract. Evolutionary multi-objective optimization (EMO) algorithms have been extensively applied to find multiple near Pareto-optimal solutions over the past 15 years or so. However, EMO algorithms for solving bilevel multi-objective optimization problems have not received adequate attention yet. These problems appear in many applications in practice and involve two levels, each comprising of multiple conflicting objectives. These problems require every feasible upperlevel solution to satisfy optimality of a lower-level optimization problem, thereby making them difficult to solve. In this paper, we discuss a recently proposed bilevel EMO procedure and show its working principle on a couple of test problems and on a business decision-making problem. This paper should motivate other EMO researchers to engage more into this important optimization task of practical importance.
1 Introduction Over the past 15 years or so, the field of evolutionary multi-objective optimization (EMO) has received a lot of attention in developing efficient algorithms, in applying EMO procedures to difficult test and real-world problems, and in using EMO procedures in solving other types of optimization problems [3]. Despite the wide-spread applicability, EMO approaches still stand promising to be applied to many other problem solving tasks. Bilevel multi-objective optimization problems are one one kind of problems, which has received a lukewarm interest so far. In bilevel multi-objective optimization problems, there are two levels of optimization problems. An upper level solution is feasible only if it is one of the optima of a lower level optimization problem. Such problems are found abundantly in practice, particularly in optimal control, process optimization, transportation problems, game playing strategies, reliability based design optimization, and others. In such problems, the lower level optimization task ensures a certain quality or certain physical properties which make a solution acceptable. Often, such requirements come up as equilibrium conditions, stability conditions, mass/energy balance conditions, which are mandatory for any solution to be feasible. These essential tasks are posed as lower level optimization tasks in a bilevel optimization framework. The upper level optimization then must search among such reliable, equilibrium or stable solutions to find an optimal solution corresponding to one or more different (higher level) objectives. Despite the
Deva Raj Chair Professor, Department of Mechanical Engineering, Indian Institute of Technology Kanpur. PIN 208016, India
[email protected] Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 17–24, 2009. c Springer-Verlag Berlin Heidelberg 2009
18
K. Deb and A. Sinha
importance of such problems in practice, the difficulty of searching and defining optimal solutions for bilevel optimization problems [9] exists. Despite the lack of theoretical results, there exists a plethora of studies related to bilevel single-objective optimization problems [1,2,11,12] in which both upper and the lower level optimization tasks involve exactly one objective each. In the context of bilevel multi-objective optimization studies, however, there does not exist too many studies using classical methods [10]. Mostly, the suggested algorithms use an exhaustive search for the upper level optimization task, thereby making the approaches difficult to extend for a large number of variables. Another study [13] uses the weighted-sum approach to reduce multi-objective linear problems in both levels into single objective problems and hence is not applicable to non-convex problems. Recently, authors of this paper have suggested a bilevel EMO approach [8] and a number of scalable test problems [7]. In this paper, we suggest a modified version of a recently suggested BLEMO procedure [8] and show its working procedure on a test problem borrowed from the classical literature, one test problem from our suggested test suite, and one business decisionmaking problem. Results are encouraging, particularly since problems are treated as multi-objective problems and should motivate practitioners and EMO researchers to pay more attention to bilevel multi-objective problem solving tasks in the coming years.
2 Description of Bilevel Multi-objective Optimization Problem A bilevel multi-objective optimization problem has two levels of multi-objective optimization problems such that the optimal solution of the lower level problem determines the feasible space of the upper level optimization problem. In general, the lower level problem is associated with a variable vector xl and a fixed vector xu . However, the upper level problem usually involves all variables x = (xu , xl ), but we refer here xu exclusively as the upper level variable vector. A general bilevel multi-objective optimization problem can be described as follows [9]: minimize(xu ,xl ) F(x) = (F1 (x), .. . , FM (x)) , subject to xl ∈ argmin(xl ) f(x) = ( f1 (x), . . . , f m (x)) g(x) ≥ 0, h(x) = 0 , (L) (U ) G(x) ≥ 0, H(x) = 0, xi ≤ xi ≤ xi , i = 1, . . . , n.
(1)
In the above formulation, F1 (x), . . . , FM (x) are the upper level objective functions, and G(x) and H(x) are upper level inequality and equality constraints, respectively. The corresponding functions in lower case letters represent the lower level problem. It should be noted that the argmin operator defines Pareto-optimality in the presence of multiple objectives. The lower level optimization problem is optimized only with respect to the variables xl and the variable vector xu is kept fixed. The Pareto-optimal solutions of a lower level optimization problem become feasible solutions to the upper level problem. The Pareto-optimal solutions of the upper level problem are determined by objectives F and constraints G and H, and restricting the search among the lower level Pareto-optimal solutions. In all problems of this paper, we have only considered inequality constraints.
3 Proposed Procedure (BLEMO) The algorithm uses the elitist non-dominated sorting GA or NSGA-II [6]. The upper level population (of size Nu ) uses NSGA-II operations for Tu generations. However,
An Evolutionary Approach for Bilevel Multi-objective Problems
19
the evaluation of a population member calls a lower level NSGA-II simulation with a population size of Nl for Tl generations. The upper level population has ns = Nu /Nl subpopulations of size Nl each. Each subpopulation has the same xu variable vector. This structure of populations are maintained by the EMO operators. In the following, we describe one iteration of the proposed BLEMO procedure. Every population member in the upper level has the following quantities computed from the previous iteration: (i) a non-dominated rank NDu corresponding to F, G and H (ii) a crowding distance value CDu corresponding to F, (iii) a non-dominated rank NDl corresponding to f, g and h and (iv) a crowding distance value CDl using f. In addition to these, for the members stored in the archive, we have (v) a crowding distance value CDa corresponding to F and (vi) a non-dominated rank NDa corresponding to F, G and H. For every subpopulation in the upper level population, members having the best non-domination rank (NDu ) are saved as an ‘elite set’ which will be used in the recombination operator in the lower level optimization task of the same subpopulation. Step 1: Apply a pair of binary tournament selections on members (x = (xu , xl )) of Pt using NDu and CDu lexicographically. Also, apply a pair of binary tournament selections, using NDa and CDa lexicographically, on members randomly chosen from the archive At . Randomly choose one of the two winners from At and one of the two winners from Pt . The member from At participates as one of the parents with t| a probability of |At|A |+|Pt | otherwise the member from Pt becomes the first parent for crossover. Perform a similar operation on rest of the two parents to decide the second parent for crossover. The upper level variable vectors xu of two selected parents are then recombined using the SBX operator ([5]) to obtain two new vectors of which one is chosen at random. The chosen solution is mutated by the polyno(1) mial mutation operator ([4]) to obtain a child vector (say, xu ). We then create Nl (i) new lower level variable vectors xl by applying selection-recombination-mutation operations on entire Pt and At . Step 2: For each subpopulation of size Nl , we now perform a NSGA-II procedure using lower level objectives (f) and constraints (g) for Tl generations. It is interesting to note that in each lower level NSGA-II, the upper level variable vector xu is not changed. For every mating, one solution is chosen as usual using the binary tournament selection using a lexicographic use of NDl and CDl , but the second solution is always chosen randomly from the ‘elite set’. The mutation is performed as usual. All Nl members from each subpopulation are then combined together in one population (the child population, Qt ). Step 3: Each member of Qt is now evaluated with F, G and H. Populations Pt and Qt are combined together to form Rt . The combined population Rt is then ranked (NDu ) according to non-domination and members within an identical non-dominated rank are assigned a crowding distance CDu . Step 4: From the combined population Rt of size 2Nu , half of its members are chosen in this step. Starting from NDu = 1 members, other rank members are chosen one at a time. From each rank, solutions having NDl = 1 are noted one by one in the order of reducing CDu , for each such solution the entire Nl subpopulation from its source population (either Pt or Qt ) is copied in an intermediate population St . If a
20
K. Deb and A. Sinha
subpopulation is already copied in St and a future solution from the same subpopulation is found to have NDu = NDl = 1, the subpopulation is not copied again. Step 5: Each subpopulation of St which are not the immediate offsprings of the current generation are modified using the lower level NSGA-II procedure applied with f and g for Tl generations. This step helps progress each lower level populations towards their individual Pareto-optimal frontiers. Step 6: Finally, all subpopulations obtained after the lower level NSGA-II simulations are combined together to form the next generation population Pt+1 . The good solutions (described below) of every generation are saved in the archive (At ). Initially, the archive A0 is an empty set. Thereafter, at the end of every upper level generation, solutions which have undergone and survived r (a non-negative integer) number of lower level generations and have both NDu = 1 and NDl = 1 from Pt is saved in the archive At . The non-dominated solutions (with F and G) of the archive are kept in At and remaining members are deleted from the archive. This method is different from our previous algorithm [8] in two ways. Firstly, the offsprings being produced in the upper level generations were being evaluated twice in the previous algorithm which led to a significant increase in the function evaluations without much benefit. In the modification the double evaluations have been avoided making the algorithm more economical. Secondly, in the previous version, the parents were chosen for crossover only from the parent population. In the new version the archive members are also allowed to participate in crossover which leads to better results.
4 Results We use the following parameter settings: Nu = 400, Tu = 100, Nl = 20, and Tl = 40 for all problems. Since lower level search is made interacting with the upper level search, we have run lower level optimization algorithm for a fewer generations and run the upper level simulations longer. We have used r = 2. The other NSGA-II parameters are set as follows: for SBX crossover, pc = 0.9, ηc = 15 [5] and for polynomial mutation operator, pm = 0.1, and ηm = 20 [4]. For brevity, we show a single simulation run, but multiple runs have produced similar results. 4.1 Problem 1 Problem 1 has a total of three variables with x1 , x2 belonging to xl and y belonging to xu and is taken from [10]:
x −y x min. F(x)= 1 , s.t. (x1 , x2 ) ∈ argmin(x1 ,x2 ) f(x)= 1 g1 (x) = y2 − x21 − x22 ≥ 0 , x2 x2 G1 (x) = 1 + x1 + x2 ≥ 0, −1 ≤ x1 , x2 ≤ 1, 0 ≤ y ≤ 1. (2)
Figure 1 shows the obtained solutions using proposed BLEMO. It is clear that the obtained solutions are very close to the theoretical Pareto-optimal solutions, as shown in the figure. The lower boundary of the objective space is also shown to indicate that although solutions could have been found lying between the theoretical front and the boundary and dominate the Pareto-optimal points, BLEMO is able to avoid such solutions and find solutions very close to the Pareto-optimal solutions. Also, BLEMO is
An Evolutionary Approach for Bilevel Multi-objective Problems
21
0 BLEMO points
−0.2 −0.4 F2
1
Pareto−optimal front
−0.6
Theory BLEMO
0.9 y
−0.8 −1
0.8
Boundary of objective space
0.7 0
−2
−1.8 −1.6 −1.4 −1.2 F1
−1
Fig. 1. BLEMO results for problem 1
−0.2
−0.4 −0.6 −0.8 x2
−1 −1
−0.8
−0.6
−0.4 x1
−0.2
0
Fig. 2. Variable values of obtained solutions for problem 1. BLEMO solutions are close to theoretical results.
able to find a good spread of solutions on the entire range of true Pareto-optimal front. Figure 2 shows the variation of x for these solutions. It is clear that all solutions are close to being on the upper level constraint G(x) boundary (x1 + x2 = −1). 4.2 Problem 2 It has K real-valued variables each for x and yand is taken from [7]: minimize F(x,y) =
⎧ j−1 2 ⎫ (1 + r − cos(απ y1 )) + ∑Kj=2 (y ⎪ ⎪ ⎪ j − 2 ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎨ +τ ∑K (xi − yi )2 − ρ cos π x1 i=2 2 y1 , j−1 K 2 ⎪ (1 + r − sin(απ y1 )) + ∑ j=2 (y j − 2 ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ π x1 2 +τ ∑K i=2 (xi − yi ) − ρ sin 2 y 1
−K ≤ xi ≤ K, 1 ≤ y1 ≤ 4,
subject to(x) ∈ argmin(x) ⎧ ⎛ ⎞⎫ 2 ⎪ ⎪ x21 + ∑K ⎪ ⎪ i=2 (xi − yi ) ⎪ ⎪ ⎪ ⎜ ⎟⎪ ⎨ ⎜ + ∑K ⎟⎬ 10(1 − cos(4 π (x − y ))) i i i=2 ⎟ , f(x) = ⎜ ⎜ K (x − y )2 ⎟⎪ ⎪ ⎪ i ⎝ ∑i=1 i ⎠⎪ ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ K + ∑i=2 10|sin(4π (xi − yi )|
(3)
for i = 1,... ,K, −K ≤ y j ≤ K, ∀ j = 2,... ,K.
The lower level Pareto-optimal front for a given y vector corresponds toxi = yi for i = 2, . . . , K and x1 ∈ [0, y1 ]. The objectives are related as follows: f2∗ = ( f1∗ − y1 )2 . Here, τ = 1. The upper level Pareto-optimal front corresponds to yi = ( j − 1)/2 for j = 2, . . . , K. The parametric functional relationship is u1 = 1 + ρ − cos(απ y1 ) and u2 = 1 + ρ − sin(απ y1 ). This is a circle of radius one and center at ((1 + ρ ), (1 + ρ )) in the F1 -F2 space. Thus, the non-dominated portion is the third quadrant of this circle and this happens for y1 ∈ (2p + [0, 0.5])/α , where p is any integer including zero. For α = 1 and for 1 ≤ y1 ≤ 4, this happens for y1 ∈ [2, 2.5] and y1 = 4. Accumulating the non-dominated portions of all circles of radius ρ at every optimal y1 , we have the overall upper level Pareto-optimal front defined as a circle of radius (1 + ρ ) and center at ((1 + ρ ), (1 + ρ )). We have used ρ = 0.1. This test problem is solved for K = 3 (6-variables) and K = 4 (8-variables). Figure 3 and 4 show the obtained Pareto-optimal front for the 6 and 8 variables, respectively.
K. Deb and A. Sinha 1.2
1.2
1
1
0.8
0.8
0.6
0.6
F2
F2
22
0.4
0.4
0.2
0.2
0
0 0
0.2
0.4
0.6 F1
0.8
1
1.2
Fig. 3. Obtained Pareto-optimal front for problem 2 (6-variable)
0
0.2
0.4
0.6 F1
0.8
1
1.2
Fig. 4. Obtained Pareto-optimal front for problem 2 (8-variable)
The proposed algorithm seems to solve the smaller-sized problem, but its performance deteriorates with an increase in number of variables, thereby emphasizing the need for improved EMO algorithms for handling difficult bilevel multi-objective optimization problems. 4.3 Problem 3 In a company scenario, the CEO’s goal is usually to maximize net profits and quality of products, whereas a branch head’s goal is to maximize its own profit and worker satisfaction. The problem involves uncertainty and is bilevel in nature, as a CEO’s decision must take into account optimal decisions of branch heads. We present a deterministic version of the case study from [13] in equation 4. Figure 5 shows the obtained frontier of the upper level problem using BLEMO. A weighted-sum of objectives in both levels with weights (0.5, 0.5)T yielded a single solution: x = (0, 67.9318, 0)T and y = (146.2955, 28.9394)T [13]. This solution is marked in the figure and is found to correspond to the maximum-F2 solution. The fact that this solution lies on one of the extremes of our obtained front gives us confidence in nearoptimality of our obtained front. Figure 6 shows the left-side constraint values of all five constraints for all obtained solutions. The fact that constraints G1, g2 and g3 are all active for all Pareto-optimal solutions provides confidence on our proposed approach. subject to (x) ∈ argmin(x) ⎫ ⎧ ⎪ ⎪ (4,6)(y1 ,y2 )T + (7,4,8)(x1 ,x2 ,x3 )T ⎪ ⎪ ⎪ ⎪ f(x) = ⎪ ⎪ T + (8,7,4)(x ,x ,x )T ⎪ ⎪ ⎪ ⎪ (6,4)(y ,y ) 1 2 1 2 3 ⎬ ⎨ T T (1,9)(y1 ,y2 ) + (10,1,3)(x1 ,x2 ,x3 ) T + (−9,−4,0)(x ,x ,x )T ≤ 61, , , g1 = (3,−9)(y ,y ) 1 2 1 2 3 T T ⎪ ⎪ (9,2)(y1 ,y2 ) + (2,7,4)(x1 ,x2 ,x3 ) ⎪ ⎪ ⎪ g2 = (5,9)(y1 ,y2 )T + (10,−1,−2)(x1 ,x2 ,x3 )T ≤ 924, ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ g3 = (3,−3)(y1 ,y2 )T + (0,1,5)(x1 ,x2 ,x3 )T ≤ 420 maximize F(x,y) =
G1 = (3,9)(y1 ,y2 )T + (9,5,3)(x1 ,x2 ,x3 )T ≤ 1039, G2 = (−4,−1)(y1 ,y2 )T + (3,−3,2)(x1 ,x2 ,x3 )T ≤ 94, x1 ,x2 .y1 ,y2 ,y3 ≥ 0. (4)
An Evolutionary Approach for Bilevel Multi-objective Problems
Constraints (G1, g2, g3)
BLEMO Branch&Bound
1850 1800
F2
1750 1700 1650
Feasible Region
0
g2
−100
800
g1
−200 −300 −400
600
−500
g3
400
−600 G2
−700
200
1600 1550
G1
1000
Constraints (G2, g1)
1900
23
−800 0
480
500
520
540
560
580
F1
Fig. 5. Obtained front for problem 3
480
500
520
540
560
580
−900
F1
Fig. 6. Constraints for obtained solutions
5 Conclusions Here, we have proposed and simulated a bilevel evolutionary multi-objective optimization (BLEMO) algorithm based on NSGA-II. The large computational demand for solving the nested optimization task is handled by approximate solution of the lower level problem in a single upper level iteration and the accuracy of the procedure has been achieved through multiple iterations of the upper level optimization task. Simulation studies on a number of problems including a business decision-making problem have shown that the proposed interactive upper and lower level population processing strategy is able to steer the search close to the correct Pareto-optimal set of the overall problem. Many other ideas are definitely possible and many challenges (e.g. performance metrics, local search based hybrid etc.) still need to be addressed. We sincerely hope that this study will spur an interest among EMO researchers and practitioners in the coming years. Acknowledgements. Authors wish to thank Academy of Finland and Foundation of Helsinki School of Economics for their support of this study.
References 1. Calamai, P.H., Vicente, L.N.: Generating quadratic bilevel programming test problems. ACM Trans. Math. Software 20(1), 103–119 (1994) 2. Colson, B., Marcotte, P., Savard, G.: An overview of bilevel optimization. Annals of Operational Research 153, 235–256 (2007) 3. Deb, K.: Genetic algorithms in multi-modal function optimization. Master’s thesis, University of Alabama, Tuscaloosa (1989) 4. Deb, K.: Multi-objective optimization using evolutionary algorithms. Wiley, Chichester (2001) 5. Deb, K., Agrawal, R.B.: Simulated binary crossover for continuous search space. Complex Systems 9(2), 115–148 (1995) 6. Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast and elitist multi-objective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6(2), 182–197 (2002)
24
K. Deb and A. Sinha
7. Deb, K., Sinha, A.: Constructing test problems for bilevel evolutionary multi-objective optimization. Technical Report KanGAL Report No. 2008010, Kanpur, Indian Institute of Technology, India (2008) 8. Deb, K., Sinha, A.: Solving bilevel multi-objective optimization problems using evolutionary algorithms. In: Proceedings of Evol. Multi-Criterion Optimization (EMO 2009). Springer, Heidelberg (2009) (in press) 9. Dempe, S., Dutta, J., Lohse, S.: Optimality conditions for bilevel programming problems. Optimization 55(56), 505–524 (2006) 10. Eichfelder, G.: Soving nonlinear multiobjective bilevel optimization problems with coupled upper level constraints. Technical Report Preprint No. 320, Preprint-Series of the Institute of Applied Mathematics, Univ. Erlangen-Nrnberg, Germany (2007) 11. Oduguwa, V., Roy, R.: Bi-level optimisation using genetic algorithm. In: Proceedings of the 2002 IEEE International Conference on Artificial Intelligence Systems (ICAIS 2002), pp. 322–327 (2002) 12. Yin, Y.: Genetic algorithm based approach for bilevel programming models. Journal of Transportation Engineering 126(2), 115–120 (2000) 13. Zhang, G., Liu, J., Dillon, T.: Decntralized multi-objective bilevel decision making with fuzzy demands. Knowledge-Based Systems 20, 495–507 (2007)
Multiple Criteria Decision Making: Efficient Outcome Assessments with Evolutionary Optimization Ignacy Kaliszewski1 and Janusz Miroforidis2 1
2
Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01-447 Warszawa, Poland
[email protected] Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01-447 Warszawa, Poland
Abstract. We propose to derive assessments of outcomes to MCDM problems instead of just outcomes and carry decision making processes with the former. In contrast to earlier works in that direction, which to calculate assessments make use of subsets of the efficient set (shells), here we provide formulas for calculation of assessments based on the use of upper and lower approximations (upper and lower shells) of the efficient set, derived by evolutionary optimization. Hence, by replacing shells, which are to be in general derived by optimization, by pairs of upper and lower shells, exact optimization methods can be eliminated from MCDM.
1 Introduction For a class of ”complex” Multiple Criteria Decision Making (MCDM) decision problems, where because of scale, bulk of data, and/or intricate framing a formal model is requested, efficient variants, and among them the most preferred variant (the decision), can be derived with the help of exact optimization methods. This in turn requires that the model has to be tied to an exact optimization package, which certainly precludes popular, lay and widespread use on MCDM methods. In a quest for simpler MCDM tools than those available at present, it was proposed in (Kaliszewski 2004, 2006) that the decision maker (DM) instead of evaluating exact outcomes (i.e. vectors of variant criteria values) would evaluate assessments of outcomes, provided with sufficient (and controlled) accuracy. Once the most preferred outcome assessment is derived, the closest (in a sense) variant is determined. However, for efficient outcome (i.e. the outcome of an efficient variant) assessment calculations a subset of efficient variants (a shell) has to be known. As a shell can be derived (by exact optimization methods) prior to starting the decision process, replacing outcomes by their assessment relives MCDM from a direct dependence on exact optimization methods and packages. In (Miroforidis 2009) it has been recently proposed to replace shells by somewhat weaker constructs, namely lower shells and upper shells and formulas for assessments of weakly efficient outcomes (i.e. outcomes of weakly efficient variants) have been derived. As lower and upper shells can be derived by evolutionary optimization, replacing shells by pairs of lower and upper shells leads to replacement of exact optimization Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 25–28, 2009. c Springer-Verlag Berlin Heidelberg 2009
26
I. Kaliszewski and J. Miroforidis
methods (required to derive shells) by their evolutionary (bona fide heuristic) counterparts. This, in consequence, eliminates from MCDM the need for exact optimization methods and packages completely. In this short note present the basic concept of lower and upper shells, and we report on derivation of formulas for assessments of properly efficient outcomes (i.e. outcomes of properly efficient variants), based on the concept of lower and upper shells. These formulas subsume as a special case the formulas derived in (Miroforidis 2009).
2 Definitions and Notation Let x denote a (decision) variant, X a variant space, X0 a set of feasible variants, X0 ⊆ X . Then the underlying model for MCDM is formulated as: ”max” f (x) x ∈ X0 ⊆ X ,
(1)
where f : X → Rk , f = ( f1 , . . . , fk ), fi : X → R, fi are objective (criteria) functions, i = 1, . . . , k, k ≥ 2; ”max” denotes the operator of deriving all efficient variants in X0 . Below we denote y = f (x) (if x ∈ X0 , y is an outcome) and we refer to standard definitions of outcome and variant efficiency (weak, proper). By N we denote the set of efficient variants of X0 . We define on X the dominance relation in the usual way, x x ⇔ f (x ) f (x), where denotes fi (x ) ≥ fi (x), i = 1, ..., k, and fi (x ) > fi (x) for at least one i. If x x, then we say that x is dominated by x and x is dominating x. The following definitions of lower and upper shells come from (Miroforidis 2009). Lower shell is a finite nonempty set SL ⊆ X0 , elements of which satisfy ∀x∈SL ¬∃x ∈SL x x.
(2)
By condition (2) all elements of shell SL are efficient in SL . For a given lower shell SL we define nadir point ynad (SL ) as ynad i (SL ) = minx∈SL f i (x), i = 1, ..., k. Upper shell is a finite nonempty set SU ⊆ X \ X0 , elements of which satisfy
∀x∈SU
∀x∈SU ¬∃x ∈SU x x ,
(3)
∀x∈SU ¬∃x ∈N x x,
(4)
fi (x) > ynad i (SL ), i = 1, ..., k.
(5)
Below we make use of a selected element of outcome space R k , denoted y∗ , defined as y∗i = yˆi + ε , i = 1, ..., k, where ε is any positive number and yˆ is the utopian element yˆ of R k , calculated as
MCDM: Efficient Outcome Assessments with Evolutionary Optimization
27
yˆi = maxy∈ f (X0 )∪ f (SU ) yi , i = 1, ..., k, and we assume that all these maxima exist. We assume that either all efficient outcomes of problem (1) are ρ -properly efficient, i.e. they can be derived by solving the optimization problem min max λi ((y∗i − yi ) + ρ ek (y∗ − y)),
y∈ f (X0 )
i
(6)
where λi > 0, i = 1, ..., k, and ρ > 0, or only ρ -properly efficient outcomes are of DM’s interest (cf. e.g. Kaliszewski 2006). By condition (3) all elements of upper shell SU are efficient in SU . We also assume that they all are ρ -properly efficient in SU , i.e. they can be derived by solving the optimization problem min max λi ((y∗i − yi ) + ρ ek (y∗ − y)),
y∈ f (SU )
i
(7)
where λi > 0, i = 1, ..., k, and ρ > 0 has the same value as for ρ -properly efficient outcomes defined above.
3 Parametric Bounds on Outcomes An outcome which is not derived explicitly but is only designated by selecting vector λ for the purpose to solve the optimization problem (6), is called an implicit outcome. We use lower and upper shells of N (recall: the set of efficient variants of X0 ) to calculate parametric bounds on implicit outcomes, with weights λ as parameters. We are aiming at the following. Suppose vector of weights λ is given. Let y(λ ) denote an implicit properly efficient outcome of f (X0 ), which would be derived if optimization problem (6) were solved with that λ . Let L(y(λ )) and U(y(λ )) be vectors of lower and upper bounds on components of y(λ ), respectively. These bounds form an assessment [y(λ )] of y(λ ), [y(λ )] = {L(y(λ )),U(y(λ ))}. To calculate such an assessment we make use of a pair of lower and upper shells. Because of the limited size of this note the formulas for bound calculation are not shown here, they can be found, together with proofs, in (Kaliszewski 2008). Once a pair of lower and upper shells is given, computational costs to calculate these formulas are negligible for they consist of no more than simple arithmetic operations and calculating maxima or minima over finite sets of numbers.
4 Concluding Remarks and Directions for Further Research The obvious advantage of replacing shells, which are to be derived by solving optimization problems, with their lower and upper counterparts SL and SU , which can be derived, as in (Miroforidis 2009), by evolutionary computations, would be complete elimination of optimization form MCDM.
28
I. Kaliszewski and J. Miroforidis
The open question is the quality (tightness) of assessments when SL ⊂ N, SU ⊂ N. This question imposes itself on the same question with respect to assessments derived with SL = SU ⊂ N, addressed in (Kaliszewski 2004,2006). However, if SL and SU derived by evolutionary computations are ”close” to N, there should be no significant deterioration in the quality of assessments. Indeed, preliminary experiments with some test problems reported in (Miroforidis 2009), confirm such expectations. To make condition (4) of the definition of upper shells operational one has to replace N by SL , for obviously N is not known (for details cf. Miroforidis 2009), but with such a replacement the assessment formulas remain valid (though in principle become weaker).
References 1. Kaliszewski, I.: Out of the mist – towards decision-maker-friendly multiple criteria decision making support. Eur. J. Oper. Res. 158(2), 293–307 (2004) 2. Kaliszewski, I.: Soft Computing for Complex Multiple Criteria Decision Making. Springer, Heidelberg (2006) 3. Kaliszewski, I.: Multiple Criteria Decision Making: Outcome Assessments with Lower and Upper Shells. Systems Institute Research Report, RB/9/2008 (2008) 4. Miroforidis, J.: Decision Making Aid for Operational Management of Department Stores with Multiple Criteria Optimization and Soft Computing. Ph.D Thesis, Systems Research Institute, Warsaw (2009)
Automatic Detection of Subjective Sentences Based on Chinese Subjective Patterns Ziqiong Zhang1,2, Qiang Ye1,2, Rob Law2, and Yijun Li1 1
School of Management, Harbin Institute of Technology, China 2 School of Hotel & Tourism Management Hong Kong Polytechnic University, Hong Kong
Abstract. Subjectivity analysis requires lists of subjective terms and corpus resources. Little work to date has attempted to automatically recognize subjective sentences and create corpus resources for Chinese subjectivity analysis. In this paper, we present a bootstrapping process that can use subjective phrases to automatically create training set from unannotated data, which is then fed to a subjective phrases extraction algorithm. The learned phrases are then used to identify more subjective sentences. The bootstrapping process can learn many subjective sentences and phrases. We show that the recall for subjective sentences is increased with slightly drop in reliability.
1 Introduction Resent years have seen a sudden increase of Web documents that contain opinions, including consumer reviews of products, complaints about services, and so on. Automatic analysis of such subjective content has become a quite active research domain. Besides, subjectivity analysis can also benefit many other natural language processing applications. Some resources and tools for English subjectivity detection have been explored in the previous work, including the approaches that have automatically identified words and phrases that are statistically associated with subjective language [1, 2], and approaches that could automatically separate subjective from objective text [3-5]. But the existing resources for English text may not apply to the Chinese texts directly because of the difference between the two languages. Recently, work on Chinese subjectivity analysis is growing [6, 7] and one of the obstacles is a lack of labeled data, especially on a sentence level. To train a documentlevel classifier, one can easily find collections of subjective texts, such as editorials and reviews. It would be harder to obtain collections of individual sentences that can be easily identified as subjective or objective. Manually producing annotations is time consuming and the amount of available annotated sentence data is relatively small. As subjective language contains a large variety of words and phrases, and many subjective terms occur infrequently, subjectivity learning systems must be trained on extremely large text collections before they will acquire a collection of subjective expressions that is broad and comprehensive in scope. Motivated by this, we are aiming Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 29–36, 2009. © Springer-Verlag Berlin Heidelberg 2009
30
Z. Zhang et al.
at automatically recognize subjective sentences, and build large corpus and lexical resources for Chinese. In this study, we explore the use of bootstrapping methods to allow classifiers to learn from a collection of unannotated texts. Based on some Chinese subjective patterns, we propose a learning algorithm to determine subjective phrases from a training set of labeled data. The learned phrases can be used to automatically identify more subjective sentences which grow the training set. This process allows us to generate a large set of labeled sentences automatically.
2 Literature Review Previous work has investigated the detection of English features, such as subjective adjectives, nouns and phrases, and has applied these features to distinguish between factual and subjective sentences. Wiebe [1] used a seed set of subjective adjectives and a thesaurus generation method to find more subjective adjectives. Grefenstette et al. [2] presented a Web mining method for identifying subjective adjectives. Riloff et al. [3] presented bootstrapping methods that learn extraction patterns for subjective expressions. Wilson et al. [4] and Kim et al. [8] further presented methods of classifying the strength of opinion being expressed in individual clauses (sentences). For Chinese subjectivity detection, Ye et al. [6] determined a set of Chinese patterns that are strongly associated with subjective language. Yao et al. [7] proposed several features that are suitable to the recognition of subjective text and applied multiple classification algorithms integrated in Weka tool to performance subjective classification on documents. Zhang et al. [9] conducted subjectivity classification on Chinese documents: movie reviews and movie plots, based on supervised machine learning algorithms. The results show that the performance is comparable to those of the existing English subjectivity classification studies. In these Chinese-related studies, the training sets were manually annotated and relatively small. The goal of our research is to use high-precision classifiers to automatically identify subjective and objective sentences in unannotated text corpora.
3 Subjectivity Detection with Chinese Subjective Patterns An overview of the proposed method is shown in Figure 1. The process begins with only a handful of seed labeled sentences. Based on a set of Chinese subjective patterns, we developed two classifiers. One classifier searches the unlabeled corpus for sentences that can be labeled as subjective with high confidence, and the other classifier searches for sentences that can be labeled as objective with high confidence. All other sentences in the corpus are left unlabeled. The labeled sentences are then fed to a subjective phrase learner, which produces a set of subjective phrases that are statistically correlated with the subjective sentences. These phrases are used to identify more subjective sentences within the unlabeled documents. The subjective phrase learner can then retrain using the larger training set and the process repeats. The dashed line in Figure 1 represents the part of the process that is bootstrapped.
Automatic Detection of Subjective Sentences Based on Chinese Subjective Patterns
31
objective sentences
Labeled Sentences
labeled sentences Subjective Phrase Learner
Chinese Subjective Patterns
subjective phrases Subjective Sentence Learner
Unlabeled Texts
Objective Sentence Learner
Fig. 1. The bootstrapping process
In this section, we will describe the sentence classifiers, the subjective phrase learner, and the details of the bootstrapping process. 3.1 Chinese Subjective Patterns We employed the subjective patterns proposed by Ye et al. [6] for Chinese subjective phrase extraction. These patterns are represented as two consecutive part-of-speech and have proved suited for separating subjective and objective sentences. Table 1 shows the Chinese patterns used in our experiment for extracting two-word subjective phrases. The a tags indicate adjectives, the d tags are adverbs, the n tags are nouns, the q tags are quantifiers, the r tags are pronoun, the u tags are auxiliary, the v tags are verbs, and the y tags are Chinese modal articles. Table 1. Chinese Subjective patterns for extracting two-word phrases
的)>
a X 2 > b X 3 < c and the new discovered knowledge
> a X 2 > b X 3 < e , then the updating task should be performed. The limit value of X 3 should be changed to e from c . is X 1
The modifying task of intelligent knowledge refers to the structure change of the existing knowledge. If the structure of the knowledge changed, the BI should change the structure of the knowledge according to the discovered knowledge. For example, if the existing classification rule is still as following X 1 > a X 2 > b X 3 < c and the new discovered knowledge is X 1
> a X 2 > b X 3 > e , then the modifying task
should be performed. In the previous example, the structure of the knowledge has been changed. The existing can be modified by the new knowledge. Deleting knowledge from BI refer to drop the useless knowledge of BI. It is a tough task to decide which knowledge is not useful anymore. The most easily understood example is that a new model generated to replace the previous model. Now that a piece of knowledge is generated, the past model knowledge becomes useless. For example, the following function can detect the churn customer with high accuracy y = α + β1 x1 + β 2 x2 + β 3 x3 + β 4 x4 , and then the former classification function
y = α + γ 1 x1 + γ 3 x3 + γ 4 x4 + γ 5 x5 becomes a useless one. The useless classi-
fication function can be deleted from BI.
3 Conclusions We research the difference between the traditional knowledge and intelligent knowledge which is discovered from large amount of data. Because the intelligent knowledge is more structural than the traditional knowledge in business intelligence system, the management of intelligent knowledge can be classified into adding new knowledge, updating the existing knowledge, modifying the existing knowledge and deleting the useless knowledge.
Knowledge Intelligence: A New Field in Business Intelligence
169
Acknowledgments. This research has been partially supported by a grant from National Natural Science Foundation of China (#70621001, #70531040, #70501030, #70472074), Beijing Natural Science Foundation (#9073020).
References [1] Golfarelli, M., Rizzi, S., Cella, I.: Beyond Data Warehousing:What’s Next in Business Intelligence? In: DOLAP 2004, Washington, DC, USA (2004) [2] Luhn, H.P.: A Business Intelligence System. IBM Journal (October 1958) [3] Power, D.J.: A Brief History of Decision Support Systems, version 4.0 (2007) [4] Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Magazine, 37–53 (1996) [5] Palace, B.: Data Mining, vol. 2008 (1996) [6] Luhn, H.P.: A Business Intelligence System. IBM Journal (1958) [7] Widmer, G., Kubat, M.: Learning in the Presence of Concept Drift and Hidden Contexts. Machine Learning 23, 69–101 (1996)
Mining Knowledge from Multiple Criteria Linear Programming Models* Peng Zhang1, Xingquan Zhu2, Aihua Li3, Lingling Zhang1, and Yong Shi1,4 1 FEDS Research Center, Chinese Academy of Sciences, Beijing 100190 Dep. of Computer Sci. & Eng., Florida Atlantic University, Boca Raton, FL 33431 3 Dep. of Management Sci. & Eng., Central Univ. of Finance & Economics, Beijing 4 College of Inform. Science & Technology, Univ. of Nebraska at Omaha, Nebraska
[email protected],
[email protected], {aihua,yshi}@gucas.ac.cn 2
Abstract. As a promising data mining tool, Multiple Criteria Linear Programming (MCLP) has been widely used in business intelligence. However, a possible limitation of MCLP is that it generates unexplainable black-box models which can only tell us results without reasons. To overcome this shortage, in this paper, we propose a Knowledge Mining strategy which mines from blackbox MCLP models to get explainable and understandable knowledge. Different from the traditional Data Mining strategy which focuses on mining knowledge from data, this Knowledge Mining strategy provides a new vision of mining knowledge from black-box models, which can be taken as a special topic of “Intelligent Knowledge Management”. Keywords: Data Mining, Knowledge Mining, MCLP, Intelligent Knowledge Management.
1 Introduction In 2001, Shi et al. [1] proposed a Multiple Criteria Linear Programming (MCLP) model for classification and reported a promising future of its application in business world. Since then, a series of applications of MCLP in business intelligence such as credit scoring, customer classification, medical fraud detection have been observed. Although a lot of empirical studies have shown that MCLP is an effective model for classification, a major drawback that MCLP generates, as many other neural network methods, unexplainable black-box models holds it back from further applications. For example, in credit scoring, MCLP models can tell us if a customer is a high-risk one, and those who have a low credit score will be rejected to get any loan. However, to customers who are rejected, MCLP models can’t tell us why they are assigned low credit scores. Another example is about customer relationship management. MCLP models can tell us the customers who are going to give up the services, but can’t tell us why those customers are considering give up the services and how to keep them. *
This research has been partially supported by a grant from National Natural Science Foundation of China (#70501030, #70621001, #90718042, #60674109), Beijing Natural Science Foundation (#9073020).
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 170–175, 2009. © Springer-Verlag Berlin Heidelberg 2009
Mining Knowledge from Multiple Criteria Linear Programming Models
171
To solve this shortage, in this paper, we present a knowledge mining strategy which mining knowledge from black-box MCLP models. More specifically, we do knowledge mining on MLCP models to extract useful and explainable knowledge. By using this knowledge mining strategy, we can open up the black-box to extract usable, readable rules. These rules can be easily understood by humans. The basic idea behind this knowledge mining on MCLP model is that: once we classify all the instances by a MCLP model, a clustering algorithm is used to determine the prototype vectors (which are the clustering centers) for each class. Then, we draw hyper cubes by using the prototype as the central point. This procedure will be iteratively executed until no new hypercube is added. At last, the regions enclosed by the hyper cubs can be translated into if-then rules. The rest of this paper is organized as follows: in the next section, we give a short introduction of MCLP model; in the third section, we give a clustering based knowledge mining algorithm to mine knowledge from MCLP; in the fourth section we give some experimental results on a synthetic dataset to evaluate the performance of our new method; in the last section, we conclude our paper with discussions on the future works.
2 Multiple Criteria Linear Programming (MCLP) Model Considering a two-group classification problem, assume we have a training set A = { A1 , A2 ,..., An } which has n instances, each instance has r attributes. We define a boundary vector b to distinguish the first group G1 and the second group G2 . Then we can establish the following linear non-equality functions:
A i x < b, ∀Ai ∈ G1 , A i x ≥ b, ∀Ai ∈ G2 .
(1)
To formulate the criteria functions and complete constraints for data separation, some other variables need to be introduced. We define external measurement α i to be the overlapping distance between boundary and a training instance, say Ai . When a record Ai ∈ G1 has been wrongly classified into group G2 or a record Ai ∈ G2 has been wrongly classified into group G1 , α i will equal to | Ai x − b | . We also define internal measurement β i to be the distance of a record Ai from its adjusted boundary b* . When Ai is correctly classified, distance βi will equal to | Ai x − b* | , where
b* = b + α i or b* = b − α i . To separate the two groups as far as possible, we design two objective functions which minimize the overlapping distances and maximize the distances between classes. Suppose || α || pp denotes for the relationship of all overlapping α i while
|| β ||qq denotes for the aggregation of all distances βi . The final correctly classified instances is depended on simultaneously minimize || α || pp and maximize || β ||qq . By
172
P. Zhang et al.
choosing p=q=1, we get the linear combination of these two objective functions as follows: n
n
i =1
i =1
(MCLP) Minimize wα ∑ α i − wβ ∑ βi
(2)
Subject to:
Ai x − α i + β i − b = 0, ∀Ai ∈ G1 Ai x + α i − β i − b = 0, ∀Ai ∈ G2 where Ai is given, x and b are unrestricted, α i and βi ≥ 0 . This is the MCLP model, the procedure of training MCLP model is shown in Algorithm 1. Algorithm 1. Building MCLP model Input: The data set X = {x1 , x2 ,..., xn } , training percentage p Output: MCLP model (w,b) Begin Step 1. Randomly select p*|x| instances as the training set TR, the remained instances are combined as the testing set TS; Step 2. Choose appropriate parameters of ( wα , wβ , b ) ;
Step 3. Apply the MCLP model (1) to compute the optimal solution W*=(w1, w2, …, wn) as the direction of the classification boundary; Step 4. Output y=sgn(wx-b) as the model. End
3 Algorithm of Knowledge Mining In the last few years, there have been many methods proposed to extract rules from black-box models to generate explainable rules. These approaches can be categorized into two groups: decompositional method and pedagogical method [2]. The decompositional method is closely intertwined with the internal structure of models. For example, in 2002 Nunez et al. [3] proposed a clustering based rule extraction of SVM models by creating rule-defining regions based on prototype and support vectors. The extracted rules are represented by equation rules and interval rules. In 2005, Fung et al. [4] proposed a non-overlapping rule by constructing hyper cubes with axis-parallel surfaces. On the other hand, the pedagogical rule extraction method directly extracts rules by using other machine learning algorithms. For example, after building a blackbox model, we can use some other rule-extraction algorithms such as C4.5 to extract rules out of the model. In this paper, we present a clustering based knowledge mining method for mining knowledge (described as decision rules) from MCLP models. The procedure of knowledge extraction can be described as follows: firstly, a MCLP model is built and all the instances are classified into their own classes. Then in each class, a clustering method (here we use kmeans) will be carried out to catch the prototype instances (which are the center
Mining Knowledge from Multiple Criteria Linear Programming Models
173
of the clusters). After that, we generate hyper cubes with edges parallel to the axis and one vertex on the classification boundary. Moreover, if not all the instances are covered in the hyper cubes, a new prototype will be generated from the uncovered instances by clustering method and a new hypercube will be generated until all the instances in the sample are covered by the generated hyper cubes. Algorithm 2 describes this procedure in detail. Algorithm 2. Mining Knowledge from MCLP Models Input: The data set X = {x1 , x2 ,..., xn } , MCLP model m Output: Knowledge {w} Begin Step 1. Classify all the instances in X using model m; Step 2. Define Covered set C= Φ , Uncovered se U=X; Step 3. While ( U != Φ ) Do Step 3.1 for each group Gi, Calculate the clustering center Pi = Kmeans(Gi); end for Step 3.2 calculate the Distance d = Distance(m, Gi); Step 3.3 Draw a new hypercube H=DrawHC(d, Gi); Step 3.4 for all the instances xi ∈ U , if covered by H, then U = U\xi , C = C xi; end if end for end While Step 4 Translate each hypercube H into knowledge W; Step 5 Return Knowledge set {w} End
∪
G
MCLP G
Fig. 1. An illustration of Algorithm 2. Based on MCLP decision boundary (the straight line), Algorithm 2 uses hyper cubes to cover the sample space in each group. The hyper cubes can be easily translated into knowledge (here denoted as rules) that explainable and understandable to humans.
174
P. Zhang et al.
4 Experiments To investigate whether our new knowledge mining method works, we design a synthetic dataset for numeric testing. The Algorithm 2 is implemented by C++ in the Unix/Linux environment.
(a)
(b)
Fig. 2. (a) The synthetic dataset; (b) Experiment results. The straight line is the MCLP classification boundary which generated by Algorithm 1, and the squares are the hyper cubes generated by Algorithm 2. All the instances are covered by the squares. It is not difficult to translate the squares into explainable rules.
Synthetic Dataset: As shown in Figure 2 (a), we generate a 2-dimensional 2-class dataset containing 60 instances, 30 for each class. In each class, we use 50% instances to train a MCLP model. That is, totally 30 training instances are used to train a model. All the instances comply with the Gaussian distribution x ~ N ( μ , Σ) , where μ is the mean vector and Σ is the covariate matrix. The first group is generated by a mean
vector
μ 1= [1,1] with a covariance matrix Σ1 = ⎡⎢ 0.1
erated by a mean vector μ
0 ⎤ . The second group is gen⎥ 0 0.1 ⎣ ⎦ 2= [2,2] with a covariance matrix Σ 2 = Σ1 . Here we only
discuss the two-group classification problem. It is not difficult to extend it to the multiple-group classification circumstance. It is expected to extract knowledge from the MCLP model in the form of: if (a ≤ x1 ≤ b, c ≤ x2 ≤ d) then Definition 1 else Definiton 2.
(2)
Experimental Results: As shown in Figure 2(b), the decision boundary of MCLP is denoted as the straight line. And the algorithm generates night rules, 4 rules for group 1, and 5 rules for group 2. The extracted rules can be translated into knowledge by the means of (2) as follows:
K1: if 0.6 ≤ x1 ≤ 0.8 and 2 ≤ x2 ≤ 2.8 , then x ∈ G1 ; K2: if 1.1 ≤ x1 ≤ 1.3 and 1.8 ≤ x2 ≤ 2.1 , then x ∈ G1 ; K3: if 0.4 ≤ x1 ≤ 1.5 and −1 ≤ x2 ≤ 1.6 , then x ∈ G1 ;
Mining Knowledge from Multiple Criteria Linear Programming Models
175
K4: if 0.9 ≤ x1 ≤ 2.2 and −0.8 ≤ x2 ≤ 0 , then x ∈ G1 ; K5: if 1.2 ≤ x1 ≤ 1.6 and 2.2 ≤ x2 ≤ 3.2 , then x ∈ G2 ; K6: if 1.4 ≤ x1 ≤ 1.6 and 1.8 ≤ x2 ≤ 2.0 , then x ∈ G2 ; K7: if 1.7 ≤ x1 ≤ 2.8 and 1.0 ≤ x2 ≤ 4.0 , then x ∈ G2 ; K8: if 1.9 ≤ x1 ≤ 2.0 and 0.7 ≤ x2 ≤ 0.8 , then x ∈ G2 ; K9: if 2.1 ≤ x1 ≤ 2.4 and 0.1 ≤ x2 ≤ 0.5 , then x ∈ G2 ; Where ki (i=1, .., 9) denotes the ith rule.
5 Conclusions Data Mining is a multi-discipline area that refers to mine knowledge from large scale of data. In this paper, we step further from “Data Mining” to “Knowledge Mining” in order to mine knowledge from black-box MCLP models. MCLP has been widely used in business world such as the credit scoring, medical fraud, customer relationship management etc. However, as many neural network models, MCLP is a black-box model which only can tell us results without reasons. This inherent drawback holds MCLP back from further applications. In this paper, we present a second mining strategy to extract explainable and understandable knowledge from MCLP models. This second mining strategy aims at extracting hyper cubes which can be translated into rules that are understandable by humans. Experimental results on a synthetic dataset show its effectiveness. In the future, we will refine this model for several other non-linear models based on multiple criteria programming such as the MCQP, RMCLP models.
References 1. Shi, Y., Wise, W., Lou, M., et al.: Multiple Criteria Decision Making in Credit Card Portfolio Management. Multiple Criteria Decision Making in New Millennium, 427–436 (2001) 2. Martens, D., Baesens, B., Van Gestelc, T., Vanthienena, J.: Comprehensible credit scoring models using rule extraction from support vector machines. European Journal of Operational Research 183(3), 1466–1476 (2007) 3. Nunez, H., Angulo, C., Catala, A.: Rule based learning systems from SVMs. In: European Symposium on Artificial Neural Networks Proceedings, pp. 107–112 (2002) 4. Fung, G., Sandilya, S., Bharat Rao, R.: Rule extraction from linear support vector machchines. In: Proceeding of KDD 2005, pp. 32–40 (2005)
Research on Domain-Driven Actionable Knowledge Discovery Zhengxiang Zhu1, Jifa Gu2, Lingling Zhang3, Wuqi Song1, and Rui Gao4 1
Institute of Systems Engineering, Dalian University of Technology, Dalian, China 2 Institute of Systems Science, Chinese Academy of Sciences, Beijing, China 3 School of Management, Graduate University of Chinese Academy of Sciences, Beijing, China 4 Xiyuan Hospital of China Academy of Chinese Medical Sciences, Beijing, China
[email protected] Abstract. Traditional data mining is a data-driven trial-and-error process, stop on general pattern discovered. However, in many cases the mined knowledge by this process could not meet the real-world business needs. Actually, in realworld business, knowledge must be actionable, that is to say, one can do something on it to profit. Actionable knowledge discovery is a complex task, due to it is strongly depend on domain knowledge, such as background knowledge expert experience, user interesting, environment context, business logic, even including law, regulation, habit, culture etc. The main challenge is moving data-driven into domain-driven data mining (DDDM), its goal is to discover actionable knowledge rather than general pattern. As a new generation data mining approach, main ideas of the DDDM are introduced. Two types of process models show the difference between loosely coupled and tightly coupled. Also the main characteristics, such as constraint-base, human-machine cooperated, loop-closed iterative refinement and meta-synthesis-base process management are proposed. System architecture will be introduced, as well as a paradigm will be introduced.
1 Introduction Knowledge discovery in Databases (KDD) is the process of extracting previously unknown, hidden and interesting patterns from a huge amount of data stored in databases [1]. Unfortunately, much of the research in the area of KDD has focused on the development of more efficient and effective data mining algorithms. These new techniques can yield important information about patterns hidden in databases [2], and stop at discovery pattern. The traditional data mining is typical data centric and trial-an error process. As a result, however, a discovery system can generate a plenty of patterns, most of them are no interest to user, and couldn’t meet real world business requirements. In real world, knowledge must to be actionability, that is, the user can do something to bring direct benefits (increase in profits, reduction in cost, improvement in efficiency, etc.) to the organization’s advantage [3], or it can be used in the decision making process of a business activity. Next generation knowledge discovery focus will move from general Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 176–183, 2009. © Springer-Verlag Berlin Heidelberg 2009
Research on Domain-Driven Actionable Knowledge Discovery
177
pattern into actionable knowledge. The main challenge is overcoming the gap between academia and business. In the business community, actionable is strongly depended on domain knowledge, such as background knowledge, expert experience/interestingness, business logic, environment context, even including law, regulation, habit, etc. The approach involving domain knowledge into in-depth data mining process for actionable knowledge is named domain-driven data mining (DDDM) [4,5]. How to involving domain knowledge into DDDM is the essential issue. Researchers have done many meaningful tasks in the past several years. Most of proposed methods can be classification into three types: post-process [6-9], constraint-base [4,10], and in-depth mining [5,11]. The briefly introduction of each method as following: (1) (2)
(3)
Post-process: great deals of expert manual are required to re-mining actionable knowledge after a lot of patterns are mined. Constraint-base: domain knowledge is involved into system as constraint format. Such as knowledge constraints, data constraints, rule constraints, and interestingness constraints. In-depth mining: which think data mining process as a human-machine cooperation, loop-closed iterative feedback and refinement mining process, until obtain satisfied and actionable knowledge.
Most of former task just focused on part of DDDM, in this paper, a high-level and systemic view will be introduced, as well as some preliminary work is mentioned. The following sections of this paper are organized as followings. In Section 2, two main process models of DDDM for actionable knowledge mined are proposed. In section, definition and explain several basic concepts, In Section 4 the main framework of DDDM was proposed. And a paradigm is be introduced in section 5. Finally in section 5 concludes the work.
2 Several Basic Concepts of DDDM In order to better understand DDDM, a few basic concepts are introduced firstly. Definition 1. Interesting rule: The mined rules are unexpected/desired to expert and being useful or meaningful, and then named it interestingness. It has been recognized that a knowledge discovery system can generate a glut of patterns/rules, most of which are of no interest to the expert [11]. As a result, nowadays, the focus of data mining has moved from discovered rule into discovered interesting rule. However, it is a complex task to measure interestingness due to one rule maybe interesting to some users but not interesting to others, that is, interestingness is strongly dependent on the application domain, expert knowledge and experience. In business, actionable rule often is more important than interesting rule. Because they can be used at a decision making process and one actionable rule can increase its profit or decrease cost. Actionable knowledge mined is the essential objective. Definition 2. Actionable rule: refers to the mined rules suggest concrete and profitable action to the decision-maker, that is, the user can do something bring direct benefits to the organization’s advantage.
178
Z. Zhu et al.
Actionability is an important measure of interestingness because experts are mostly interested in the knowledge that permits them to do their jobs better by taking some specific actions in response to the newly discovered knowledge [10]. It concludes that one actionable rule must be interesting one, because only it is interesting one that he/she can focus on it and judge whether it is actionable or not. However, on the other hand, one interesting rule is not necessarily actionable. As mention previous, interestingness/actionability is strongly dependent application domain, that is to say, one rule maybe is very interesting & actionable to one domain, but is not interesting & actionable to others domain. On the on hand, the rule base on application domain can increase its actionable in special environment context,that aslo improves its re-usefulness. Actionable Rules algorithms exam the data in an objective way and represent the discovered information in a short and clear statement. The discovered rules can be served as choices to help a decision maker to produce better decisions. The rules presented to a decision maker should only consist of simple, understandable, and complete strategies that allow a reasonably easy identification of preferable rules [12]. Definition 3. Domain knowledge: domain knowledge is the knowledge which is valid and directly used for a pre-selected domain of human endeavor or an autonomous computer activity. Specialists and experts use and develop their own domain knowledge. If the concept domain knowledge or domain expert is used we emphasize a specific domain which is an object of the discourse/interest/problem.
3 Two Main Process Models Usually, two process models for mining actionable knowledge are inexplicitly adopted by existing research methods [3]: (1) loosely coupled process model (2) tightly coupled process model In data mining process, discovering patterns from column of data is usually not a goal in itself, but rather a means to an end. The actual goal is to determine concrete and profitable actions to the decision-maker. Usually, a data mining algorithm is executed first and then, on the basis of these data mining results, the profitable actions are determined, hence, in the loosely coupled framework, extraction of actionable knowledge is preceded by some particular data mining task, i.e., they are two loosely coupled process, as shown in Fig 1.
Fig. 1. The Procedure Mining Actionable Knowledge in Loosely Coupled Framework
Research on Domain-Driven Actionable Knowledge Discovery
179
In tightly coupled process, decision-making task is seamless integrated into the data mining task, therefore leads to the formulation of a new data mining or optimization problem, as shown in Fig. 2. In contrary to loosely coupled process, this determines the optimal mined patterns and the optimal actions using the same criterion. Hence, tightly coupled process is better than loosely coupled process in finding actionable knowledge to maximize profits. However, it deserves the following disadvantages: (1) it is strongly dependent on the application domain (2) the new formulate problem is usually very complex (3) defining and solving the new data mining problem is also a non-trivial task
Fig. 2. The Procedure Mining Actionable Knowledge in Tightly Coupled Framework
4 DDDM Frameword In contrast to traditional data mining focusing on data and algorithms, aims to find general rule and stop at discovered rule, DDDM is a complex human-machinecooperated and feedback-base loop-closed iterative refinement process, aims to discover interesting or actionable rule according to real-world needs. The main characteristics of DDDM include: Domain knowledge involving: In order to involving domain knowledge to mining process, it is essential task to encoding knowledge. Generally there are many formats to encode knowledge. The essential format might be such symbolic formats as formula, equation, rule, and theorem. It is very easy for people to understand and use it. Constraint-based context: There are four types of constraint, such as Domain knowledge constrain, Data constraint, Interesting constraint and Actionable constraint. human-machines-cooperated[5]: Expert knowledge play significant role in the whole data mining process such as business and data understanding, features selection, hypotheses proposal, model selection and learning, and evaluation and refinement of algorithms and resulting outcomes, each step expert interact with machine closely together to Feedback-base loop-closed iterative refinement: Practice has shown that the process is virtually a loop which encloses iterative refinement and feedbacks of hypotheses, features, models, evaluation and contradiction, which means that rule discovered is not the end but a sub-process. Actually the feedback from real world plays an importance and meaningful role in mining process, from feedback results,
180
Z. Zhu et al.
expert can re-evaluate domain knowledge, constrain context, and such as tasks, then begin new mining process until satisfied the results. meta-synthesis-base process management: Traditional system management methodology, such as CRISP-DM [13] couldn’t meet the complex system process. Hence, a new system approach must to be adopted to meet the complex system. A new systemic methodology called Meta-synthesis is suited to DDDM [14]. Metasynthesis system approach (MSA) is a system approach for solving complex system problems proposed by Qian and his colleagues around late 1980s [15]. Meta-synthesis approach wishes use the data, information, model, knowledge and experience and wisdom and integrate all of them. MSA also wishes use the ability of computer and other advance information technologies [16, 17]. Although data mining is a complex process, we can divide the process into three phase briefly, pre-DM, DM, post-DM [18]. The pre-DM phase performs data preparation tasks such as to locate and access relevant data set(s), transform the data format, clean the data if there exists noise and missing values,reduce the data to a reasonable and sufficient size with only relevant attributes. The DM phase performs mining tasks including classification, prediction, clustering, and association. The post-DM phase involves evaluation, based on corresponding measurement metrics, of the mining results. DM is an iterative process in that some parameters can be adjusted and then restart the whole process to produce a better result. The post-DM phase is composed of knowledge evaluator, knowledge reducer, and knowledge integrator. These three components perform major functionalities aiming at a feasible knowledge deployment which is important for the applications. As previous mention, in order to requiring actionable knowledge, domain knowledge can be involed into data mining procedure, specially at post-DM, need to in-depth mining according to real-world actionable rules, in the next section, a paradigm is be proposed.
5 Case Studies Traditional Chinese Medicine (TCM) has played an important role in the healthcare of Chinese people for several thousand years, and provides unique contributions to the development of life science and medicine. From September of 2007 Ministry of Science and Technology in China started continue a project on methods for mining the academic thoughts and diagnose experiences of famous and elder TCM doctors as one project in “Eleven Five plan” National supported to Science and Technology. This project had collected the academic thoughts and experiences from 100 Chinese masters in TCM based on IT technology and database on medicine cases. There are a lot of useful knowledge about TCM in data. Two types clustering knowledge are extracted through data mining from data, one is the relationship between diagnosis, the other is the relationship between doctors. These knowledge may be is interesting , but not actionable, because to TCM, how to diagnose base on similar symptom depend on doctor individual experience, that is to say, it is importance to require the knowledge between diagnosis knowledge and doctors , only by this way we can find
Research on Domain-Driven Actionable Knowledge Discovery
181
the doctor throught diagnosis knowledge, as well as find the disgnosis knowledge throught doctor. So it is necessary to mining in-depth. In order to fulfill the task, correspondence analysis is applied to analyze the knowledge between diagnosis knowledge and doctors in-depth. Correspondence analysis is a descriptive technique designed to analyze simple two-way and multi-way tables containing some measure of correspondence between the rows and columns. Correspondence analysis provides away out of this: reducing the dimensionality of the table via factors helps to concentrate on the most important variables. We build two-way table consist of diagnosis knowledge and experts, and the result as shown in table 1. Table 1. The original data matrix of diagnosis knowledge and doctors
5.1 Matrix of Profiles on Row and Column In a typical correspondence analysis, a crosstabulation table of frequencies is first standardized, so that the relative frequencies across all cells sum to 1.0. The formula is
Z = ( z ij ) n × m , xij − x.x. i j/T ,(i=1,2,…,n; j=1,2,…,m) Zij = i j x.x. m
x.i = ∑
Xij ,
j=1
n
x.j = ∑
Xij ,
i =1
5.2 R Factorial Axes Calculate covariance matrix Z’Z eigenvalues calculate eigenvalues U1,U2.
T=
n
m
i =1
j=1
∑ ∑X
ij
, take first two singular values, and
5.3 Q Factorial Axes Calculate
Correspondence
matrix
V 1 = ZU 1, V 2 = ZU 2 , L , V p = ZU p
。
factorial
B=ZZ’
eigenvector,
following
,then can get Q the eigenvector of R
182
Z. Zhu et al.
5.4 The Factorial Maps It is customary to summarize the row and column coordinates in a single plot. The joint display of row and column points shows the relation between a point from one set and all points of another set, not between individual points between each set. As was indicated in figure 3, there are four categories between diagnosis and experts.
Fig. 3. Relationship between diagnosis and epxerts
According to mult-clustering, requiring know-who and who own what types of diagnosis knowledge, which can make user to find diagnosis knowledge through doctor(s), as well as find doctor(s) through diagnosis knowledge.
6 Conclusions Interesting and actionable knowledge discovery is significant and also very challenging. It is a research prominence for next generation data mining. The research on this issue may change the existing situation where a great number of rules are mined while few of them are interesting or actionable to business in real-world. The method is moving datadriven data mining into domain-driven data mining. This paper gave a systematic overview of the issues in discovering interesting and actionable knowledge. Four type constraints were discussed, showing how involving domain knowledge, expert experience, business logic and environment context into mining system. As well as, a few methodologies were purposed such as human-machines-cooperated, Feedback-base loop-closed iterative refinement, meta-synthesis-base process management, etc. Finally, the framework of DDDM has been developed, it includes six phases: (I) problem understanding & definition, (II) constraints analysis, (III) data preprocess, (IV) modeling, (V) finding interesting & actionable rule, (VI) feedback from real-world. Researching on DDDM is still in its beginning state, many issues need to be overcome in the future.
Acknowledgement This paper is supported by Ministry of Science and Technology of China, State Administration of Traditional Chinese Medicine (#2007BAI10B06), grants from National Natural Science Foundation of China (#70871111) and Innovation Group 70621001.
Research on Domain-Driven Actionable Knowledge Discovery
183
References 1. Frawley, W., Piatetsky-Shapiro, G., Matheus, C.: Knowledge Discovery in Databases: An Overview. AI Magazine 13, 213–228 (1992) 2. Terrance, G.: From Data To Actionable Knowledge:Applying Data Mining to the Problem of Intrusion Detection. In: The 2000 International Conference on Artificial Intelligence (2000) 3. He, Z., Xu, X., Deng, S.: Data mining for actionable knowledge: A survey, Technical report, Harbin Institute of Technology China (2005) 4. Cao, L.B., Zhang, C.Q.: Domain-driven actionable knowledge discovery in the real world. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 821–830. Springer, Heidelberg (2006) 5. Cao, L., et al.: Domain-driven in-depth pattern discovery: a practical methodology. In: Proceeding of AusDM, pp. 101–114 (2005) 6. Lin, T.Y., Yao, Y.Y., Louie, E.: Mining value added association rules. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS, vol. 2336, pp. 328–334. Springer, Heidelberg (2002) 7. Ras, Z.W., Wieczorkowska, A.: Action-rules: How to increase profit of a Company. In: Proc. Of ICDM 2002 (2002) 8. Yang, Q., Yin, J., Lin, C.X., Chen, T.: Postprocessing decision trees to extract actionable knowledge. In: Proc. of ICDM 2003 (2003) 9. Yang, Q., Yin, J., Ling, C., Pan, R.: Extractiong Actionable Knowledge from Decision Tress. IEEE Transactions On Knowledge And Data Engineering 19(1), 43–56 (2007) 10. Han, J.W., Laks, V., Lakshmanan, S., Ng, R.T.: Constraint-Based, Multidimensional Data Mining. Computer 32(8), 46–50 (1999) 11. Kovalerchuk, B.: Advanced data mining, link discovery and visual correlation for data and image analysis. In: International Conference on Intelligence Analysis (IA 2005), McLean, VA, May 2 (2005) 12. Tsay, L.-S., Raś, Z.W.: Discovering the concise set of actionable patterns. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds.) Foundations of Intelligent Systems. LNCS, vol. 4994, pp. 169–178. Springer, Heidelberg (2008) 13. http://www.crisp-dm.org 14. Zhu, Z.X., Gu, J.F.: Research on Domain Driven Depth Knowledge Discovery Based on Meta-synthesis. In: The theme for the 15th annual conference of systems engineering society of china, pp. 121–127 (2008) 15. Tang, X.J., Nie, K., Liu, Y.j.: Meta-systemsis approach to exploring constructing comprehensive transportation system in china. Journal of systems science and systems engineering 14(4), 476–494 (2005) 16. Qian, X.S., Yuan, J.Y., Dai, R.W.: A new discipline of science- the study of open complex giant systems and its methodology. Journal of Systems Engineering & Electronic 4(2), 2– 12 (1993) 17. Gu, J.F., Wang, H.C., Tang, X.J.: Meta-Synthesis Method System and Systematology Research. Science Press (2006) (in Chinese) 18. Kerdprasop, N., Kerdprasop, K.: Moving Data Mining Tools toward a Business Intelligence System. In: Proceedings of world academy of science, engineering and technology, vol. 21 (January 2007)
Data Mining Integrated with Domain Knowledge Anqiang Huang1,2, Lingling Zhang1,2,*, Zhengxiang Zhu3, and Yong Shi2 1
School of Management, Graduate University of Chinese Academy of Sciences, Beijing 100190, China 2 Research Center on Fictitious Economy and Data Science, CAS, Beijing 100190, China 3 Institute of Systems Engineering, Dalian University of Technology, Dalian 116024, China
[email protected] Abstract. Traditional data mining is a data-driven trial-and-error process[1], which aims at discovered pattern/rule. People either view data mining as an autonomous process, or only analyze the issues in an isolated and case-by-case manner. Because it overlooks some valuable information, such as existing knowledge, expert experience, context and real constraints, the results coming out can’t be directly applied to support decisions in business. This paper proposes a new methodology called Data Mining Integrated With Domain Knowledge, aiming to discovery more interesting, more actionable knowledge. Keywords: Domain knowledge, data mining, interestingness, actionable, ontology.
1 Introduction We are now living in a data-boosting era. But there seems to be a tremendous gap between data and useful knowledge, and in the past 10 years, data mining has been thought to be an effective method to bridge over the gap. However, discovered knowledge can not meet the needs of the real-world and has two big problems. The first problem is the rule-overload which can be divided into two kinds. One is RuleOverload In Depth (ROID), for example, data mining generates so long a relation-rule that people must judge many times if they use the rule. Rule-Overload In Depth can depress the rule actionability. The other is Rule-Overload In Quantity (IORQ), which means algorithms produce so many rules that people are too confused to choose the suitable ones. Both ROID and ROIQ can prevent the traditional data mining to generate perfect results. The second problem is that results have much divergence from facts, so decision-maker doesn’t believe that rules meet the requirement in real-world business[1], and consequently users’ interestingness is low. To deal with problems mentioned above, data mining integrated with domain knowledge can import helpful outside information-domain knowledge-into data mining process to form a new human-machine-cooperated and loop-closed iterative refinement process, and improve data mining results. *
Corresponding author.
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 184–187, 2009. © Springer-Verlag Berlin Heidelberg 2009
Data Mining Integrated with Domain Knowledge
185
2 Acquisition and Representation of Domain Knowledge in Data Mining The definition of domain knowledge closely concerning data mining is “additional knowledge…are often used to guide and constrain the search of interesting knowledge. We refer to this form of information as domain knowledge and background knowledge”[2]. Domain knowledge is involved into whole knowledge discovery process, and its content and representation depend on domain expert, user, business logic, environment context, etc [1]. 2.1 Acquisition of Domain Knowledge Domain knowledge must be acquired and coded into the form computers can understand before it is utilized in the data mining process. According to the automatic degree of acquiring knowledge, there are three kinds of methods - manual method, semi-automatic method and fully-automatic method. The third method requires some knowledge-discovery tools, including Ontology Learning, Knowledge Acquisition Based on Ontology, Semantic Web. 2.2 Representation of Domain Knowledge Either acquiring domain knowledge automatically or utilizing it in data mining, we face a problem of representing domain knowledge in the form that computer understand. Concept hierarchy and ontology are two methods often employed by researchers. Concept hierarchy method is simple but highly functional, allowing discovering knowledge on multiple stratified concept layers. Concept hierarchy defines a mapping from bottom concepts to top concepts. When some property value is of a large quantity and too detailed to discover interesting knowledge, using concept hierarchy people can replace low-level concept by high-level concept and then mine data on the higher concept layer. Besides concept hierarchy, ontology is now a very popular method to represent domain knowledge. Ontology includes concepts and relationships between them in some domain. Given that domain experts are not always available, ontology can provide an alternative knowledge source. Domain knowledge can applied to every step of data mining. If we divide the whole data mining process into three phases – data preparation, data mining, and evaluation, correspondingly we should focus on three jobs - data preparation integrated with domain knowledge, data mining integrated with domain knowledge, and evaluation integrated with domain knowledge. According to this idea, we advance the following Architecture of DMIWDK.
3 Architecture of DMIWDK The main architecture of DMIWDK is shown in Figure1. DMIWDK is a process which is integrated with domain knowledge, expert experience, and real world
186
A. Huang et al.
constraints and aims to generate actionable knowledge. It typically involves the following phases: Step 1 understanding data mining task. Before data mining, experts and user must think the answers of the following questions: who use, why use, what to use, where to use and how to use (W4H) [1]. They also have to understand the constraints they face and existing domain knowledge. Step 2 preprocessing data. According to expert experience, domain knowledge, and real world constraints, this step figures out data characters, removes non-interesting data, create new derived attributes, etc. If data is not enough to generate useful regularities, domain experts can create“artificial data” and this “artificial data” can be added to training data [1]. An efficient mechanism for constructing“artificial data” is one of the new approaches for data preprocess, which strongly depend on domain expert knowledge and experience [3]. Step 3 selecting algorithm and tuning parameter. On the basis of understanding task and data characters, some method must be employed to select proper algorithm and tune parameter values, which may impose much influence on the final results. Considering the complexity of this work, it may involve human-machine cooperation and multi-agent system which can automatically choose algorithm automatically, and what expert need to do is to decide repeatedly whether to accept the parameter value according to feedback or not till he is satisfied. Step 4 finding interesting and actionable knowledge. Algorithm may generate so much knowledge that users are just not able to judge which are really useful and profitable. In order to fulfill the final aim of data mining which is to discover interesting and actionable knowledge to support decision, it is necessary to evaluate result produced by data mining. What should be emphasized is that evaluation methods applied should not be limited to objective and technical methods, subjective and business ones should also be included. Domain knowledge
Apply knowledge in real world
Understand mining task
Process d data
Find interesting and a actionable k knowledge Select algorithm// Fit Tune parameter T re eal real world?
Y
remove N Fig. 1. Framework of DMIWDK
Data Mining Integrated with Domain Knowledge
187
Step 5 applying and enriching domain knowledge. All the final knowledge got from data mining must be applied to and validated in the real world. Those which survive the real-world tests are added to existing domain knowledge and can be used in next mining process. When algorithm does not generate satisfying results, this five-step procedure can be looped till ideal results come out.
4 Conclusion Data mining now is facing an embarrassing situation that while lots of rules are mined out, few of them are interesting and actionable in the real world. It is significant and challenging to change this situation. One method is to integrate existing domain knowledge, expert experience and real-world constraints with data mining process. This paper first talked about the concept, acquisition, and representation of domain knowledge, and then broke the whole data mining process into 5 steps and gave a framework of finding interesting and actionable knowledge.
Acknowledgement This research has been partially supported by a grant from National Natural Science Foundation of China (#70501030, #70621001, 90718042), Beijing Natural Science Foundation (#9073020).
References 1. Zhu, Z.X., Gu, J.F.: Toward Domain-Driven Data Mining. In: DDDM 2007 (2007) 2. Piatetesky, G., Shapiro, C., Matheus, J.: Knowledge Discovery in Databases: An Overview. In: Piatetsky-Shapiro, Frawley, W.J. (eds.) Knowledge Discovery in Databases. AAAI Press/The MIT Press, California (1999) 3. Silberschatz, A., Tuzhilin, A.: What Makes Patterns Interesting in Knowledge Discovery Systems. IEEE Trans. Knowledge and Data Engineering, 970–974 (1996)
A Simulation Model of Technological Adoption with an Intelligent Agent Tieju Ma1, Chunjie Chi1, Jun Chen1, and Yong Shi2 1
School of Business, East China University of Science and Technology Meilong Road 130, Shanghai 200237, China
[email protected] 2 Research Center on Fictitious Economy & Data Science, Chinese Academy of Science, No. 80 Zhongguancun East Road, Beijing 100080, China
Abstract. Operational optimization models of technology adoptions commonly assume the existence of a social planner who knows a long-term future. Such kind of planner does not exist in reality. This paper introduces a simulation model in which an intelligent agent forms its expectation on future by continuous learning from its previous experience, and adjusts its decision on technology development continuously. Simulations with the model show that with the intelligent agent, an advanced but currently expensive technology will be adopted, but with a much slower pace than in optimization models.
1 Introduction Traditional optimization models of technological change commonly assume a global social planner who knows future trend in demand, resource, and new technologies’ decreasing costs for a long term [1, 2]. But in reality, future demand, resource, and also technologies are quite uncertain [3]. There is no global social planner who can know everything about future. People need revise their vision about future based on the new knowledge they get. This paper introduces a stylized model in which the decision maker –an intelligent agent-- does not know exactly the demand, resources, and technologies for a long term, but it always gathers and analyzes what happened in the past, and based on his knowledge about past, it forms its vision for a short term future, makes decisions on adoptions of new technologies, that is to say the agent keeps learning. With this stylized model, we explore whether the technology adoption pattern is different form a global social planner model. The research introduced in this paper is still in process. In the future, we will explore how different knowledge generated by the intelligent agent influence the technological adoptions. The rest of the paper is organized as follows. Section 2 describes the simulation model. Section 3 introduces initializations and analyzes results of the models. Section 4 gives concluding remarks and future work. Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 188–193, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Simulation Model of Technological Adoption with an Intelligent Agent
189
2 The Simulation Model of Technology Adoption with an Intelligent Agent Our model assumes the economy in a techno-economic system demands one kind of homogeneous goods (e.g., electricity). There are three technologies, namely, existing — T1, incremental — T2, and revolutionary — T3, that can be used to produce the goods from resources. The existing technology (eg., coal power plants) has low efficiency with a low initial investment cost, the incremental technology (e.g., gas turbines) has higher efficiency with a higher initial investment cost, and the revolutionary technology (e.g., photovoltaic cells) has much higher efficiency with a much higher initial investment cost. The incremental and revolutionary technologies have learning potential, which means their initial investment cost could decrease in the future. With the above techno-economic system, the story of the traditional optimization model is: there is a global social planner who makes a long-term strategic (e.g., 100-year) plan for the system, thus the discounted total cost is minimized to satisfy the given demand. Examples of such kind of models can be found in [1-3]. The story of our simulation model is that the decision maker (the agent) is not a global social planner (which is actually does not exist in reality). It does not know long-term future demand, extraction cost, and investment costs of the incremental and revolutionary technologies; what it can do is to make a short-term plan based on its knowledge about the past and expectations about the future. The agent is adaptive and keeps learning. It will adjust its decisions according to the patterns of resource depletion and demand dynamics created by its previous decisions. Each year, the agent calculates the average annual growth rate of extraction cost for the last three years, and then uses this growth rate to forecast the extraction cost for the next year. The agent uses each technology’s current cost to evaluate which technology is cheapest for the next year. The agent’s expectation of demand is also based on the last three years’ data. The agent calculates the average annual growth rate of demand for the last three years, and then uses this growth rate to forecast the demand rate for the next year. If the agent’s expected demand for the following year is higher than available capacity, it will build new capacity of the cheapest technology to fill the gap; otherwise it will not build any new capacity. The mathematical expression of the model follows. Let xit ( i = 1,2,3 ) denote the production of technology i at time t, and let η i denote
the efficiency of technology i; then the extraction R t is the sum of resources consumed by each technology at time t, as shown in Equation (1) 3
Rt = ∑
1
i =1 η i
xit
(1)
Let y it ( i = 1,2,3 ) denote the new installed capacity of technology i at time t; then the total installed capacity of technology i at time t, denoted by Cit ( i = 1,2,3 ) can be calculated according to Eq. (2).
190
T. Ma et al.
t
C it = ∑ y ij
(2)
j =t −τ i
where τ i denotes the plant life of technology i. The investment cost of a technology (except the existing technology) will decrease with the increase of its cumulative installed capacity, as denoted by Eq. (3).
( )
t 0 c Fi = c Fi Cit
− bi
(3)
where 1 − 2 −bi is technology i’s learning rate, which means the percentage reduc0 is tion in future investment cost for every doubling of cumulative capacity; c Fi the initial investment cost of technology i, and Cit is the cumulative installed capacity of technology i by time t, therefore t
Cit = ∑ Ci j = ∑ Ci j + C i0 j = −∞
(4)
j =1
where Ci0 denotes initial cumulative installed capacity of technology i, which means the cumulative experience on technology i before t = 1 . The extraction cost of the resource is assumed to increase over time as a linear function of resource depletion, as shown in Equation (5):
c Et = c E0 + k E R t
(5)
where c E0 denotes the initial extraction cost, k E is a coefficient denoting the sensitivity of extraction cost to cumulative extraction, and by time t and
R t is the cumulative extraction
t
Rt = ∑R j .
(6)
j =1
The demand will increase over time with an exogenous annual growth rate as well as influenced by price for satisfying it, which is decided by weighted average cost of technologies, as described in Eq. (7)
d t +1 = (1 + α )d t
(1 − e )p (1 + e )p p
t +1
p
t +1
( ) + (1 − e )p
+ 1 + e p pt p
(7)
t
where t is time period (year), α is the exogenous annual growth rate of demand, d and d
t +1
t
p
denote the demand at times t and t+1, respectively; e is the price elasticity t
t +1
of demand; p and p are the prices of the goods at time t and t+1, which are decided by weighted average cost of technologies at each the corresponding step, as denoted in Eq. (8).
A Simulation Model of Technological Adoption with an Intelligent Agent
3 ~ p t = ∑ wi C it ,
191
(8)
i =1
where wi denotes the share of technology i , and C% it denotes the (levellized) cost of producing one unit good with technology i at time step t , which can be obtained according to Eq. (9) ~ 1 δ (1 + δ )τ i t + cOM i + c Et , Cit = c Fi τi ηi (1 + δ ) − 1
(9)
where δ denotes the discount rate of the economy, and cOMi denotes technology i’s operation and management cost.
3 Initialization and Results of the Model We assume exogenous annual growth rate of demand is 2.6%, and the price elasticity is 0.5. The initial extraction cost is 200 US$/KW, and the extraction cost will increase with the increase of cumulative extraction with a coefficient 0.01. The existing technology is assumed to be entirely mature, its initial investment cost is 1000 US$/KW and efficiency is 30%, which do not change over time; the incremental technology’s efficiency is 40%, its initial investment cost is 2000US$/KW, and its learning rate is 10%; the efficiency of the revolutionary technology is 90%, its initial investment cost is 30,000US$/KW, its learning rate is 30%. All three technologies’ plant life is assumed to be 30 years. The initial total installed capacity and the initial cumulative installed capacity of the existing technology are assumed to be 100 KW and 1000KW, respectively 1 . For the incremental and the revolutionary technologies, since they are new technologies, there is no initial total installed capacity for them, but their initial cumulative installed capacities are assumed to be 1, which can be understood as human being’s knowledge of them (e.g., in laboratories) before they have really been used. The O+M costs of the three technologies are assumed to be 30US$/KW-Year, 50US$/KW-Year, and 50US$/KW-Year, respectively. Fig. 1 shows the result of the simulation which starts from 1990. We can see, the incremental technology dominate the system from 2020 to 2120, and then it is replaced by the revolutionary technology. Figure 2 shows the result of an optimization model (see [4]) with the exact same initialization. Comparing Fig. 1 with Fig. 2, we can see that the simulation model results slower adoption of the revolutionary technology. Although this research is still in process, and we need explore more the model’s behaviors with sensitivity analysis, with the first-cut result, we can still learn that because of the nature that any decision maker cannot know everything about future, adoptions of new technologies might be slower than an optimal path. 1
In fact, it does not matter how large the initial cumulative installed capacity is, since the existing technology has no learning potential.
192
T. Ma et al.
T1 – existing technology, T2 – incremental technology, T3 – revolutionary technology Fig. 1. Result of the simulation model
T1 – existing technology, T2 – incremental technology, T3 – revolutionary technology Fig. 2. Result of an optimization model
4 Concluding Remarks The adoption patterns of the incremental and revolutionary technologies could be different if we make different assumptions regarding the agent’s decision behavior. For example, the planning period of the agent could be 5 years, instead of one year. The agents’ expectation models of future demand, future extraction cost of resource, and future investment costs could be different from the current ones introduced in this paper. In our future work, we will explore how the agent’s different learning and decision
A Simulation Model of Technological Adoption with an Intelligent Agent
193
behaviors influence the adoption of new technologies as well as do sensitivity analysis to the model. Furthermore, we will explore technological adoptions when there are multiple agents make decisions simultaneously and interact with each other. Acknowledgments. This research was sponsored ECUST Foundation, Shanghai Shuguang Project Foundation and National Natural Science Foundation of China (No. 70621001).
References [1] Messner, S.: Endogenised technological learning in an energy systems model. Journal of Evolutionary Economics 7, 291–313 (1997) [2] Kypreos, S., Barreto, L., Capros, P., Messner, S.: ERIS: A model prototype with endogenous technological change. International Journal of Global Energy 14, 374–397 (2000) [3] Gritsevskyi, A., Nakicenovic, N.: Modeling uncertainty of induced technological change. Energy Policy 28, 907–921 (2000) [4] Ma, T.: An agent-based model of endogenous technological change – an extension to the Grübler-Gritsvskyi model, report no. IR-06-044, International Institute for Applied Systems Analysis, Laxenburg, Austria (2006)
Research on Ratchet Effects in Enterprises’ Knowledge Sharing Based on Game Models Ying Wang, Lingling Zhang1, Xiuyu Zheng, and Yong Shi 1
Research Center on Fictitious Economy & Data Science, Chinese Academy of Sciences, School of Management, Graduate University of the Chinese Academy of Sciences Beijing 100190, China
[email protected] Abstract. Knowledge sharing in enterprises is propitious to the employees’ knowledge richness and growth. The purpose of this paper is to apply game models to analyze knowledge sharing. First, we conclude that knowledge sharing is often trapped in “prisoner’s dilemma”. Then we find that “Ratchet Effects” exists and weaken the validity of incentive mechanisms. At last, we conclude that a more objective evaluation standard and long-term incentive contract would help to eliminate “Ratchet Effects”.
1 Introduction The abilities to create and apply knowledge are very important for organizations to gain sustainable competitive advantages [1]. Knowledge has the characteristics of public property, the high cost production, the use of non-exclusive, but low dissemination cost [2]. Knowledge develops in communication and value-added in use [3]. Knowledge sharing is propitious to knowledge richness and growth. However, knowledge sharing in enterprises always trapped by the following factors: differences in organizations, cultural distance, the conservative mentality, the implication of information technology and so on [4]. Many more researchers analyze knowledge sharing from the perspective of management and economy [5]. In this paper, we will discuss knowledge sharing with game models. The rest of this paper is organized as follows: section 2 analyze “prisoner’s dilemma” in knowledge sharing. Section 3 discusses “Ratchet Effects” and its main cause. Section 4 concludes the paper with some remarks and further research direction.
2 “Prisoner’s Dilemma” in Knowledge Sharing Assume k : Amount of knowledge shared by employees. w : The incentive costs that enterprise pay to employees under encouragement. c ( k ) : The costs of employees for knowledge-sharing. π ( k ) : The output of knowledge-sharing for enterprise. Enterprise can decide whether to take incentive methods or not. The game matrix is shown in table 1, and (not encourage, not sharing) is a Nash Equilibrium. Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 194–197, 2009. © Springer-Verlag Berlin Heidelberg 2009
Research on Ratchet Effects in Enterprises’ Knowledge Sharing
195
Table 1. The game matrix between employee and enterprise Enterprise
Employee
Encourage
Sharing
Not sharing
(π (k ) − w, w − c(k ))
(− w, w)
(π (k ), −c(k ))
(0, 0)
Not encourage
3 “Ratchet Effects” in Knowledge Sharing “Ratchet effects” initially comes from the study of a Soviet-type economy [6] [7]. In knowledge sharing, employees’ past performance contains useful information and can be used as a standard. The dilemma is that: the more knowledge employees share, the higher the coming standard will be. Consider a model with two stage ( t = 1,2). At every stage, the function of knowledge sharing is: kt = atθ + ut , t = 1, 2
(1)
k : Amount of knowledge shared by employees. It can be measured by knowledge management system. at : Employee’s will to share knowledge. It is private information.
θ : The actual level of employees’ knowledge. It obeys normal distribution. Eθ = θ , Dθ = σ θ . ut : Exogenous random variables. It obeys normal distribution. Eut = 0 , 2
Dut = σ u . u1 and u 2 are independent. 2
At the end of every stage, enterprises can observe kt and will adjust θ according to kt . Assume employees are risk-neutral and the discount rate for each stage is 0. Then the employees’ utility function is: U = w1 − c ( a1) + w2 − c ( a 2 )
(2)
w : The incentive costs. c ( at ) : The costs of employees. It is a strictly increasing convex function, c '( at ) 0, c ''( at ) 0. Enterprises decide w2 after the observation of
>
>
k 1 . They will subtract α t standing for the expectation of kt : wt = kt − α t , α 1 = E ( k 1) = a 1θ , α 2 = E ( k 2 / k 1) = E ( a 2θ / k 1) = a 2 E (θ / k 1) a 1 is the expectation of the employees’ will. Enterprises can forecast a 1 and θ by judging from other information, such as their education, experience and so on. k1 u1 Assume enterprises have rational expectations. Enterprises know = θ + , but a1 a1 they can not tell θ and u1 apart. They will infer θ by k 1 . Assume
τ =
var(θ ) var(θ ) + var(
σθ
2
u1 a1
= )
σθ + 2
1
a1
2 σu
, τ ∈ [0,1] . According to the rational expectations
196
Y. Wang et al.
formula, we can have: E (θ / k 1) = (1 − τ ) E (θ ) + τ cording to the new information. The bigger ment they will have. α t can be rewritten as:
σ θ2
k1
.Amendment will be made ac-
a1
is, the bigger τ is, the more amend⎡
⎤
⎢⎣
a 1 ⎥⎦
α 1 = E ( k 1) = a 1θ , α 2 = E ( k 2 / k 1) = a 2 E (θ / k 1) = a 2 ⎢ (1−τ ) E (θ )+τ k 1 ⎥ If there is only one stage, U = w1 − c ( a1) = k 1 − α 1 − c ( a1) = a1θ − a 1θ − c ( a1) If there are two stages: U = w1 − c ( a1) + w2 − c ( a 2) = k 1 − α 1 − c ( a1) + k 2 − α 2 − c ( a 2 )
⎡ ⎣
= a1θ − a 1θ − c ( a1) + a 2θ − a 2 ⎢(1 − τ ) E (θ ) + τ
⎡ ⎢⎣
= a1θ − a 1θ − c ( a1) + a 2θ − a 2 (1 − τ )θ + τ
k1 ⎤
a 1 ⎥⎦
− c ( a 2)
a1θ + ut ⎤ a1
⎥⎦ − c ( a 2)
The first-order conditions for optimization are: θ a2 c '( a1) = θ − a 2τ = (1 − τ )θ ≤ θ , c '( a 2) = θ a1 a1 At the first stage, α 1 is less than the choice in Pareto Optimal. Employees need to consider both direct and indirect results: when they increase α 1 by a unit,
⎡ ⎣
α 2 ( α 2 = a 2 ⎢(1 − τ ) E (θ ) + τ come is (1 − (
a2 a1
a2 a1
k1 ⎤
< θ ). The bigger τ
τ )θ (
a2
) will be increased by τ . The marginal net ina 1 ⎥⎦ a1 is , the bigger lose of incentive mechanisms
τθ ) will be.
4 Conclusion Knowledge sharing is beneficial to both employees and enterprise but trapped in prisoner’s dilemma. “Ratchet effect” exists and weaken the incentive mechanisms. Measures should be taken to overcome the asymmetry of information to establish a more objective cultivation standard. Further research includes considering the influence of different risk preferences of enterprises and employees.
Acknowledgement This research has been partially supported by a grant from National Natural Science Foundation of China (#70501030, #70621001) and Beijing Natural Science Foundation (#9073020).
Research on Ratchet Effects in Enterprises’ Knowledge Sharing
197
References 1. Huber Organizational learning. The contributing processes and the literatures. Organizations Science 2, 88–115 (1992) 2. Romer Paul, M.: Increasing returns and long run growth. Journal of Political Economy 94, 1002–1037 (1986) 3. Grant, R.M.: Toward a knowledge-based theory of the film. Strategic Management Journal 17(special issues), 109–122 (1996) 4. Simonin, B.L.: Ambiguity and the Process of Knowledge Transfer in Strategic Alliances. Strategic Management Journal 9, 595–623 (1999) 5. Fiorina, C., et al.: Moving mountains. Harvard Business Review 81, 41–47 (2003) 6. Zhang, W.Y.: Game Theory and Information Economy. Shanghai Sanlian Press, Shanghai (2004) (in Chinese) 7. Christine, A., Caldwell, A.E.M.: Experimental models for testing hypotheses about cumulative cultural evolution. Evolution and Human Behavior 29, 165–171 (2007)
Application of Information Visualization Technologies in Masters’ Experience Mining Song Wuqi1 and Gu Jifa2 1 School of Management, Dalian University of Technology, Dalian 116023 School of Economics and Management, Northeast Dianli University, Jilin 132012
[email protected] 2 Academy of Math. & Systems Science, CAS, Beijing 100190
[email protected] Abstract. Experiences which belong to a kind of tacit knowledge were gradually summarized by the experts during their long working procedures. To analyze and inherit those experiences are worthwhile to the social construction and improvement. We build a platform composed of some visualization methods and analysis methods to present and analyze the data (from database, paper, web and etc.). So that students can intuitively understand the academic thinking of masters better than before. The platform has been applied in investigating the masters' experiences of Traditional Chinese Medicine (TCM) and the positive results were also introduced.
1 Preface The persons who had a lot of knowledge and fulfill of working experiences in a specialized field could be called “experts”. During the process of solving the problems the masters usually have a set of specific modes of thinking to solve a kind of difficult puzzles or to provide some constructive suggestions to the customers. Furthermore, “elder and famous master” could always summarize and upgrade their working experiences to a certain theoretical level. Therefore they own their unique methodology, theoretic point of view, methodology and apply those into the practical works. These masters are the social treasures and their experiences possess character of uniqueness. So inheriting and carrying forward their experiences and theoretic point of view could benefit learners to gain progress in their idea formation and practice. Otherwise those unique experiences and thinking will be lost which would be a big loss to the society. Our study is being focused on the group of masters. Just like fig.1 showed below. [1,2] There is a Chinese saying called “books are always not good enough to tell and the speeches are always not clear enough to express”. That implied that word format is only a part of the knowledge. Some of the knowledge could not be exactly expressed but only thought and some could be spoken only but could not be written. That old saying highly sums up the relationship between tacit and explicit knowledge. The mined objects in expert system and data mining are always a part of the “book” and certainly could not express properly about the experienced knowledge fields of the experts. So it is necessary to use new idea to mined and analyze the thinking of experts. Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 198–201, 2009. © Springer-Verlag Berlin Heidelberg 2009
Application of Information Visualization Technologies
Fig. 1. Level of experts
199
Fig. 2. Level of knowledge
2 Visualization Technologies in Master Mining 1) Social Network Analysis (SNA) is a qualitative and quantitative combination analysis method based on Graph Theory. By analyzing the directed network or undirected network formed by nodes and edges to analyze the relations of these nodes based on graphic topological structure. It is very useful for assessing relationships. SNA provide some qualitative analyzing indicators such as centrality, weekly connected structure, sub-population Structure and connectivity etc. Although we could not say that those merits quantitatively analyze the social network, they have already made progress on quantifying the Social Network. 2) Sammon’s Nonlinear Mapping (NLM) algorithm has been found to be highly effective in the analysis of multivariate data. The analysis problem is to detect and identify "structure" which may be present in a list of N L-dimensional vectors. Here the word structure refers to geometric relationships among subsets of the data vectors in the L-space.[3] 3) Correspondence Analysis is an exploratory data analytic technique designed to analyze simple two-way and multi-way tables containing some measure of correspondence between the rows and columns. As opposed to traditional hypothesis testing designed to verify a priori hypotheses about relations between variables, exploratory data analysis is used to identify systematic relations between variables when there are not (or rather incomplete) a priori expectations as to the nature of those relations. 4) Semantic Network is a network which represents semantic relations between the concepts. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, which represent concepts, and edges.[4]
3 Meta-synthesis Intelligence Mining Platform According to master mining method, we constructed software named Meta-synthesis Master Knowledge Mining Platform (MIMP). The methodology of this platform is Meta-Synthesis Methodology [5,6]. The framework of MIMP showed in Fig.3. And the platform working base on Fig.4. In MIMP, we can analyze the keywords that separated from database and documents through basic statistics, Social Network Analysis (SNA), Correspondence Analysis (CA) and Nonlinear Mapping for Data Structure Analysis (MDS) (See Fig.4.).
200
S. Wuqi and G. Jifa
3.1 Framework of MIMP There are 3 layers in the MIMP. 1) Database layer: include domain knowledge base constructed by domain problem process rules. Database of academic thinking documents of experts. And database of experts’ problem solving process data. 2) Analysis layer: Through Social Network Analysis and nonlinear mapping method to analyze the structure and relationship of data. 3) Presentation layer: Provide knowledge visualization tools to present the process of solving a problem by expert.
Fig. 3. Meta-Synthesis Intelligence Mining Framework
3.2 Workflow of MIMP We can divide the works of MIMP into 9 steps. First, we collect various types of documents we need, including structured (data), semi-structured (web) or unstructured (text). Then, we construct relevant database or knowledge base according to the demand of research. Then we cut important keywords (automated, semi-automated or manual) from these documents and send them to structured database. Before we analyze these keywords data, we must normalize them, this step is very important and we need spend a lot of times on it. Next, we can do some analysis with these processed text keywords or numeric data. Here, we use basic data mining method and some other methods we have just mentioned before. Same time, we can construct domain knowledge base based on rule. So, when we analyze, we can compare the case data with domain rules, then we can understand the expert’s thinking. We can also find out individual knowledge or experience of the expert. 3.3 Main Functions of MIMP The analysis functions of MIMP could be classified into 2 categories: 1) Keywords fragmentation functions 2) Visualization functions(SNA, NLM, Correspondence Analysis)
Application of Information Visualization Technologies
201
4 Summary Visualization methods used in MIMP to support inherit masters’ experience were discussed in this paper with parts of the primary results about its application in TCM expert academic idea mining. The expert mining approach is different from data mining and text mining. The study of the expert’s idea and experience are not based on the great capacity of data but mining on small amount of samples. It is different from the artificial intelligence based expert system on emphasizing man relied “mancomputer integration”. The intelligence of man and groups are the major priorities. The master mining system integrated the theories of science of thinking and science of knowledge and fully applied modern computer techniques and form a developing theory and technology.
References [1] Jifa, G., Wuqi, S., Zhengxiang, Z., Rui, G., Yijun, L.: Expert mining and TCM knowledge. In: KSS 2008, Guangzhou (2008) [2] Jifa, G., Wuqi, S., Zhengxiang, Z.: Meta-synthesis and Expert mining, Systems. In: IEEE International Conference on Systems, Man, and Cybernetics (October 2008) [3] Sammon Jr., J.W.: A Nonlinear Mapping for Data Structure Analysis. IEEE Transactions on Computers C-18(5), 401–409 (1969) [4] Sowa, J.F.: Semantic Networks. In: Shapiro, S.C. (ed.) Encyclopedia of Artificial Intelligence (1987) (retrieved, 29-04, 2008) [5] Gu, J.F., Wang, H.C., Tang, X.J.: Meta-Synthesis Method System and Systematology Research. Science Press, Beijing (2007) (in Chinese) [6] Gu, J.F.: On synthesizing opinions——how can we reach consensus. Journal of Systems Engineering (5), 340–348 (2001) (in Chinese)
Study on an Intelligent Knowledge Push Method for Knowledge Management System Lingling Zhang1,2,*, Qingxi Wang1, and Guangli Nie1,2 1
Graduate University of Chinese Academy of Sciences, Beijing 100190, China Tel.: +86 10 82680676; Fax: +86 10 82680698
[email protected] 2 Research Center on Fictitious Economy and Data Science, CAS, Beijing 100190, China
Abstract. In this paper, we design a mechanism which can measure the affinity between knowledge and user, affinity among users to achieve the intelligent management of knowledge. Based on the affinity, we can implement knowledge push to provide the right knowledge to the right person automatically. Several matrixes are designed to calculate the affinity.
1 Introduction Knowledge management has attracted a lot of attentions. Traditionally, a lot of knowledge management tasks are carried out by knowledge engineers or knowledge providers. As a result, much human effort is required and the management consistency cannot be guaranteed (Hou, Sun and Chuo, 2004). In last years, there have been significant developments in information technology, which offers possibilities to promote knowledge management (Carneiro, A., 2001). As the development of information technology, knowledge management system has been an affective way to implement knowledge management (Ferna´ndez-Breis, J.T. and R. Martı´nez-Be´jar, 2000). The design of knowledge management system is a trade-off of the function and the application. There has not been a widely accepted framework of knowledge management system. 26 different kinds of knowledge management frameworks were summarized ( Rubenstein-Montano, B., et al., 2001). Function design is a main work of the research of knowledge management. The current researches focus little on the intelligent management of the system to push the right knowledge at the right time to the right person automatically and actively. To effectively acquire and reuse the knowledge from other users of the system, a knowledge management system that can intelligently and automatically manage the huge amounts of documents and knowledge is required (Jenei, S., 2001). In this paper, we introduced the matrixes used to record information in section 2. The mechanism of the calculating of the affinity is described in section 3. Conclusion and future work are made in section 4. *
Corresponding author.
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 202–208, 2009. © Springer-Verlag Berlin Heidelberg 2009
Study on an Intelligent Knowledge Push Method
203
2 Matrix Designed Process is the key of the profit creating. Processes represent a lot of knowledge information. During the traditional knowledge management process, the knowledge retrieval is finished based on key words (Hou, Sun and Chuo, 2004). If we retrieve the knowledge based on keywords, this will result into knowledge overload. The system will provide unrelated knowledge to the knowledge seeker (Davies, J., et al., 2005). In our study, the first matrix we designed is the process matrixes which record the relationship between the process and knowledge.
pro1
pro2
k1 ⎡ bpro11 bpro12 ⎢ k2 ⎢bpro 21 bpro 22 k3 ⎢bpro 31 bpro32 ⎢ k4 ⎣⎢bpro 41 bpro 42
pro3
pro4
bpro13 bpro14 ⎤ bpro 23 bpro 24 ⎥ ⎥ bpro33 bpro34 ⎥ ⎥ bpro 43 bpro 44 ⎦⎥
(1)
bpro denotes the process that the knowledge attached to. If knowledge i is from process
j then the value of bpro ij would be 1, otherwise, the value of bpro ij would be 0.
This matrix reflects the information of process to avoid knowledge overload during knowledge retrieval and knowledge push. We calculate the affinity between knowledge based on the formula of text mining. In order to calculate the similarity from the perspective of text, we design matrix to record the information about the text of the knowledge.
wd1
wd 2
k1 ⎡ frek11 k2 ⎢⎢ frek 21 k3 ⎢ frek 31 ⎢ k4 ⎣ frek 41
frek 12 frek 22 frek 32 frek 42
wd3 frek13 frek 23 frek 33 frek 43
wd 4 frek14 ⎤ ⎥ frek 24 KK⎥ ⎥ frek 34 ⎥ frek 44 ⎦
(2)
The performances of knowledge reflect the status the knowledge is used. We classify the performance into three classes which are contribution, revision and read. The matrix record how many times the knowledge ( k1 , k 2 , k3 K ) respectively contributed, revised and used by users of the system (
p1 , p2 , p3 K ).
It is quite important to record the person who put the knowledge into the system. If the knowledge is contributed by one person, the affinity of the knowledge would be small which means the two pieces of knowledge are close. This factor would be taken into account when we classify knowledge. When we push knowledge, this factor would be ignored because it is unnecessary to push the knowledge to the person who contributes the knowledge. The example matrix related knowledge contribution is shown as following.
204
L. Zhang, Q. Wang, and G. Nie
p1
p2
k1 ⎡ bc11 bc12 k2 ⎢⎢bc 21 bc 22 k3 ⎢bc 31 bc 32 ⎢ k4 ⎣bc 41 bc 42
p3 p4 bc13 bc 23 bc 33 bc 43
bc14 ⎤ bc 24 ⎥⎥ bc 34 ⎥ ⎥ bc 44 ⎦
(3)
bc is a boolean number and denote who contribute the knowledge. If knowledge ki is contributed by p j then the number bc ij is 1, otherwise the number would be 0. The knowledge recorded in the knowledge management system should be allowed to revise by other users. This can improve the quality of the knowledge and make the knowledge to be more useful. So, we also record the frequency that the knowledge was revised by a certain person. This will also be helpful to measure the affinity of the knowledge. If the knowledge is revised by the same person, the affinity of the knowledge is small which means the relationship between the knowledge is close. Similarly, the relationship also is shown in the form of matrix as follows.
p1 k1 ⎡ frerev11 k2 ⎢⎢ frerev 21 k3 ⎢ frerev 31 ⎢ k4 ⎣ frerev 41
p2
p3
frerev12
frerev13
frerev 22
frerev 23
frerev 32 frerev 42
frerev 33 frerev 43
p4 frerev14 ⎤ frerev 24 ⎥⎥ frerev 34 ⎥ ⎥ frerev 44 ⎦
(4)
frerev is a integer and denotes the times the knowledge ki revised by person p j . frerevij means the knowledge ki is revised by p j frerevij times. The final aim of knowledge management is to reuse the knowledge. The times that the knowledge was used should be recorded. If two pieces of knowledge usually was used by certain person, the affinity should be small which means the relationship between the knowledge is close. In order to record the application of knowledge, we design the matrix as follows.
p1 k1 ⎡ freuse11 k2 ⎢⎢ freuse 23 k3 ⎢ freuse31 ⎢ k4 ⎣ freuse 41 (if
p2
p3
freuse12 freuse 23
freuse13 freuse 23
freuse32 freuse 42
freuse33 freuse 43
p4 freuse14 ⎤ freuse 24 ⎥ ⎥ freuse34 ⎥ ⎥ freuse 44 ⎦
(5)
ki is contributed by p j then freuse ij =0)
freuse is a integer and denotes the times the knowledge ki used by person p j . freuseij means the knowledge ki is revised by p j frerevij times.
Study on an Intelligent Knowledge Push Method
205
3 Affinity Calculation Definition of affinity: Affinity is a close measurement of some objects ( M, S., K. G, and K. V. 2000). The affinity of object O1 and O2 is small, if O1 and O2 is close from some perspective. From the definition of affinity we can get that ⎧ Aff (O1 , O3 ) < Aff (O1 , O2 ) ⎪ifsO sissmoresclosestosO sthansO 1 3 2 ⎪⎪ Aff O O > Aff O O ( , ) ( , ) ⎨ 1 3 1 2 ⎪ifsO sisslesssclosestosO sthansO 1 2 3 ⎪ ⎪⎩ Aff (O1 , O3 ) = Aff (O1 , O2 ) otherwise
(6)
The affinity is calculated based on the text similarity information and the performance (the behaviors that the users used the knowledge). We will discuss the calculation affinity between knowledge and user, between user and user. 3.1 The Affinity between Knowledge and Users We calculate the affinity based on the affinity of knowledge person P contributed and the existing. The reason why we do not take the performance between P and knowledge into account is because the purpose of the affinity is to implement knowledge push. The contribution relationship should also be ignored. If we take this factor into account, the affinity will mislead the system. For example, knowledge ki is usually used by P, the frequency would be large, but the knowledge contributed by P is still 0. It does not reflect the fact. We transfer the calculation between knowledge and people to the calculation between a specific piece of knowledge and a group of knowledge contributed or used by a certain people as figure1. Performance of the users
knowledge Contributed
Close
Other Knowledge
e dg le w h no s K Pu
Co Kn ntri o bu w led ted ge
Fig. 1. The affinity between knowledge and users
206
L. Zhang, Q. Wang, and G. Nie
Aff text ( k & p ) (ka , pb ) denotes the affinity between a specific piece of knowledge
ka and a certain person pb from the perspective of the text of knowledge. kt is a group of knowledge contributed by person pb . Because the purpose of this calculation is knowledge push, the knowledge k a should have not been read by pb before. kt can also be a group of top 10 (or 10%) knowledge visited by person pb . Aff text ( k & p ) (ka , pb ) =
∑ Aff
text
( k a , kt )
t
nkb
(7)
(kt is the knowledge contributed by pb )
nkb denotes the number of knowledge contributed by people pb . ⎧1 if ka is from pb Aff pro ( k & p ) (ka , pb ) = ⎨ ⎩0 otherwise Aff rev ( k & p ) (ka , pb ) =
∑ Aff
rev
(ka , kt )
t
(8)
(9)
nkb
(kt is the knowledge contributed by pb )
Aff rev ( k & p ) (ka , pb ) denotes the affinity between a specific piece of knowledge
ka and a certain person pb from the perspective of the revision of knowledge. nkb denotes the number of knowledge contributed by people pb . Affuse ( k & p ) (ka , pb ) =
∑ Aff
use
(ka , kt ) (10)
t
nkb
(kt is the knowledge contributed by pb )
Aff use ( k & p ) (ka , pb ) denotes the affinity between a specific piece of knowledge
ka and a certain person pb from the perspective of the use of knowledge. Aff perfor ( k & p ) (ka , pb ) =
γ 1 Aff rev (ka , pb ) + γ 2 Affuse (ka , pb )
∑γ
i
(11)
=1
i
Aff perfor ( k & p ) (ka , pb ) denotes the affinity between knowledge ka and a certain person pb from the perspective of the performance of knowledge.
Study on an Intelligent Knowledge Push Method
207
Aff k & p (ka , pb ) = δ1 Aff status ( k & p ) (ka , pb ) +
δ 2 Aff pro ( k & p ) (ka , pb ) + δ 3 Aff perfor ( k & p ) (ka , pb )
∑δ
i
(12)
=1
i
Aff k & p (ka , pb ) denotes the final affinity between knowledge ka and a certain person pb . 3.2 The Affinity among Users
The affinity among users is quite an important criteria for implicit knowledge management. We transfer the calculation of affinity between users into the calculation of affinity of the knowledge contributed or used by the certain people as shown in figure 2. Performance of other users
knowledge of Pc
Knowledge of Pd
Close
Affinity
e us
Pc
use
Pd
Knowledge of others
Fig. 2. The affinity between users and users
We can get the Aff text ( ki , k j ) based on text mining algorithm which will not be discussed here.
Aff k ( p & p ) ( pc , pd ) denotes the affinity between person pc and person pd from the perspective of knowledge. nc is the number of knowledge of person pc .
Aff k ( p & p ) ( pc , pd ) =
∑ Aff
k& p
(kl , pd )
l
nc
( kl ∈ the set of pc ' s knowledge)
(13)
208
L. Zhang, Q. Wang, and G. Nie
The final affinity between user and user can be gotten from formula 16.
Aff k ( p & p ) ( pc , pd ) =
∑ Aff
k&p
( kl , pd )
l
(14)
nc
( kl ∈ the set of pc ' s knowledge) The time the knowledge contributed into the system should also be a factor to consider. After the whole process of calculation, we can precisely get the affinity between knowledge and knowledge, user and knowledge, user and user(M, M.T., 1997).
4 Conclusion Because of the development of information technology, we can record the performance of the users of the system. The history use behaviors of the users reflect the preference of the users. In this paper, we design an affinity calculation mechanism. By transferring the relationship between user and user into the relationship between knowledge, we can measure the affinity between knowledge and user, user and user. In this paper, we design an innovative way to calculate the affinity between user and knowledge to the affinity between knowledge the user contributed and other knowledge. The affinity between users is reflected by the affinity between the knowledge contributed by the users. The future work of this paper includes how to set the parameter of the mechanism and the calculation case between knowledge and user, user and user. Acknowledgments. This research has been partially supported by a grant from National Natural Science Foundation of China (#70501030, #70621001, #90718042), Beijing Natural Science Foundation (#9073020).
References Hou, J.-L., Sun, M.-T., Chuo, H.-C.: An Intelligent Knowledge Management Model for Construction and Reuse of Automobile Manufacturing Intellectual Properties. In: Advanced Manufacturing Technology (2004) Carneiro, A.: The role of intelligent resources in knowledge management. Journal of Knowledge Management 5(4), 358–367 (2001) Fernańdez-Breis, J.T., Martıńez-Bej́ar, R.: A cooperative tool for facilitating knowledge management. Expert Systems with Applications 18, 315–330 (2000) Rubenstein-Montano, B., et al.: A systems thinking framework for knowledge management. Decision Support Systems 31 (2001) Jenei, S.: Incremental Operator Assistance Knowledge System an intelligent aid for a general class of problems. Journal of the Operational Research Society 52, 1078–1090 (2001) Davies, J., et al.: Next generation knowledge management. BT Technology Journal 23(3), 175– 190 (2005) Steinboch, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD, Boston, MA, USA (2000) Mitchell, T.M., Mitchell, T.M.: Machine Learning. The McGraw-Hill Companies Inc., New York (1997)
Extension of the Framework of Knowledge Process Analysis: A Case Study of Design Research Process Georgi V. Georgiev, Kozo Sugiyama, and Yukari Nagai Japan Advanced Institute of Science and Technology, 1-1, Asahidai, Nomi, Ishikawa 923-1292, Japan
[email protected] Abstract. This study undergoes the approach of Knowledge process analysis in an academic research project. It investigates the knowledge creation primitives of KPA used in previous studies and tests other possible primitives from the domain of design studies. This is a step improving KPA with design research experience.
1 Introduction The Knowledge process analysis (KPA) is a framework under development. It aims to provide a tool for studying the knowledge work in scientific research projects [1]. This is achieved by elaboration of various theories from field of knowledge science and developing of knowledge creation model specific to research project. The main clues for development of KPA are found in analysis the process of organizational knowledge creation in research projects. This analysis framework elaborates various knowledge creation theories in an exploratory approach to academic research, aiming to improve future research projects and education. The employed theories in the work guide for KPA [1] are: Theory of tacit thought; Equivalent transformation theory; Knowledge management theory; Non-explicit knowledge process support with model of knowledge categorization; KJ method; Concept synthesis in creativity; and accidental discovery with Serendipity (for further reference see [1]). These knowledge creation theories are utilized in order to explain certain aspects of research activities, however, not the whole process. The applied primitives concern knowledge creation generally. Sugiyama and Meyer [1] synthesize the concepts from above theories in factors influencing the knowledge creation through tacit knowledge process. Here we undergo the KPA in a project, aiming to investigate the existing primitives, to test possible candidates for primitives from the design studies domain, improving the KPA [1].
2 Elaboration of Primitives from Design Studies Focusing on the primitives’ extraction, such candidate theories from design studies are: Iterative linear design process by Goldschmidt [2]; Function – Behavior – Structure framework by Gero [3]; General model of creative process strategies by Cross [4]; and Design insight model by Taura and Nagai [5] (in Table 1). Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 209–212, 2009. © Springer-Verlag Berlin Heidelberg 2009
210
G.V. Georgiev, K. Sugiyama, and Y. Nagai1
Iterative linear design process: Iterative model [2] is a sequence of analysis, synthesis, and evaluation as linear design process. This sequence builds a spiral of design decisions from abstract to concrete levels. This model represents one of the models of design, focused on its iterative nature. Function – behavior – structure framework describes the fundamental processes involved in designing [3] as a set of processes linking together. Function (F) describe what the object is for; Behavior (B) describe the attributes that are expected from the object - structure (S); and S describe the components and relationships of the object. General model of creative process strategies: The importance of the strategic knowledge of the outstanding designers’ creative work is illustrated as a leveled General model of creative process strategies in design [4]. The levels are: 1) taking a broad ‘systems approach’ to the problem; 2) ‘framing’ the problem in a distinctive way; and 3) designing from ‘first principles’. Design insight theory [5] expresses a driving force of design through knowledge operation is from inner criteria of the designers. A power of inner criteria was expressed as “push” a design process (concept generation perspective) and pull (problem solving perspective) towards the goal. Table 1. Selection of primitives based on Sugiyama [1] and new primitives from design theories Knowledge Creation Theory Tacit dimension (Polanyi)
Primitives Tacit foreknowing
Equivalent transformation the- Equivalent finding; Equivalent transformation; ory (Ichikawa) Analog route; Digital route KC theories in Knowledge creation theory SECI model; Ba (shared context); Knowledge KPA (Nonaka et al) leadership; Knowledge assets (for further Non-explicit knowledge process Social network; Knowledge categorization; reference see support (Meyer et al) Knowledge exchange; Knowledge inventory [1]) Creative cognition (Finke et al) Concept synthesis KJ method (Kawakita)
Ideas exhaustion; Knowledge structure mapping
Serendipity (Roberts)
Accidental discovery
Iterative linear process Theories from (Goldschmidt) design studies FBS framework (Gero et al) (Basic primitives “formu- General model of creative lated” from the process strategies (Cross) models) Design insight (Taura, Nagai)
Analysis, synthesis, and evaluation iterations Function – Behavior – Structure linking Problem solution leveling Two side process
3 Case Study Project This case study is adopting the methodology of KPA [1] – knowledge process modeling, primitives’ synthesis and reflective verification. It is focusing on analyzing own research process through reflective verification of events and knowledge exchange between participants.
Extension of the Framework of Knowledge Process Analysis
211
Fig. 1. Social network and course of the project with phases and SECI modes
The small research project’s theme is “Method of Design Focusing on the Network Structure of Meanings” [6], aiming at methodology supporting meanings in design. The initial investigation towards evaluation of meanings showed possibilities for analysis. This led to an original method, evolving concept analysis with similarity measures from a concept database. This approach further is applied in design method in conceptual design. The project is analyzed with KPA (Table 1). The discussed ongoing project was collaboration with individuals’ roles in time shown in Figure 1. This clarifies the social network among the involved persons as basis for further analysis of knowledge processes. The course of the project [6] with phases and SECI modes is shown in Figure 1. The first participant A is a PhD student, having been supervised by B - leader of the project. The expertise of B in design creativity; design research are in the base of the research theme. Member C’s knowledge is base for the ideas, course of the project and realizations. Member D’s experience is enriching the analysis approach and methodology.
Fig. 2. Thinking flow map on analog and digital routes
212
G.V. Georgiev, K. Sugiyama, and Y. Nagai1
The next stage of analysis describes the environment of the research project according Nonaka’s theory as the most essential factor for a project [1]. The main context of A is as follows: Social Context: Support of the process of designing are essential; Stimulation: The work background of A is stimulating the ideas for the project; Intuition and skills: Graduated engineering design major; design skills. The representation of total thinking flow map with identified primitives (as in Table 1) is shown in Figure 2 for analog and digital routes [1]. The newly added design primitives are clearly expressed in different phases: e.g. FBS, IDP and DI.
4 Discussion and Conclusion The framework of KPA [1] was applied in an academic research project, along with primitives from the domain of design theory. This application is exploratory and contributes to the accumulation of cases towards integrated knowledge creation theory in academic environment. The results show good elaboration with previous applications [1]. The majority of differences are connected with the specifics of the presented academic project. However, the discussed project accents on the different knowledge creation primitives, thus showing the importance of all elaborated theories. The newly explored design theory primitives show potential to contribute to the KPA framework in form of Iterative design process, FBS framework and Design insight theory. In our case the model of CPS is not fully applicable.
References [1] Sugiyama, K., Meyer, B.: Knowledge process analysis: Framework and experience. Journal of Systems Science and Systems Engineering 17(1), 86–108 (2008) [2] Goldschmidt, G.: Design. Encyclopedia of Creativity 1, 525–535 (1999) [3] Gero, J., Kannengiesser, U.: The situated function–behaviour–structure framework. Design Studies 25(4), 373–391 (2004) [4] Cross, N.: Creative cognition in design: Processes of exceptional designers. In: 4th CC conference, NY, USA (2002) [5] Taura, T., Nagai, Y.: Design insight - A key for studying design creativity. In: Taura, T., Nagai, Y. (eds.) NSF 2008 conference (2008) [6] Georgiev, G.V., Taura, T., Chakrabarti, A., Nagai, Y.: Method of design through structuring of meanings. In: ASME IDETC/CIE 2008 conference, New York, USA (2008)
On Heterogeneity of Complex Networks in the Real World* Ruiqiu Ou, Jianmei Yang, Jing Chang, and Weicong Xie Business School of South China University of Technology, Guangzhou, P.R. China 510640
Abstract. Although recent studies have made great progress in the research on heterogeneity of complex networks with power-law degree distribution in the real world, they seem to ignore that there may be different types of heterogeneities. Therefore, this paper, from the perspective of power-law degree distribution, suggests a comprehensive analysis taking several coefficients into account to reveal heterogeneity of complex networks in the real world more accurately. We show that there are at least two types of heterogeneities. Keywords: Complex network; Heterogeneity; Power-law degree distribution.
1 Introduction Empirical studies have revealed that complex networks with power-law degree distribution in the real world are usually heterogeneous networks, in the sense of having a few hub nodes with relatively high degree, a great deal of nodes with relatively low degree and very few nodes with medium degree [1, 2, 3]. This heterogeneity leads to some crucial properties of complex networks in the real world, such as the robustyet-fragile property [4, 5] and small-world effect [2, 6]. Consequently, many literatures investigate how to quantify heterogeneity and thus several coefficients are proposed, including standard deviation of the degree distribution [7, 8], entropy of the remaining degree distribution [9], entropy of the degree distribution [10], and Gini coefficient [11]. Empirical studies also show that the heterogeneity of complex networks in the real world is attributed to their power-law degree distribution [1, 2, 3, 5]. Therefore, some literatures investigate the structural properties of complex networks in the real world, especially heterogeneity, from the perspective of power-law degree distribution. For example, Ref.[6] points out that a smaller value of power-law exponent implies that the network has more hub nodes. Further, Ref.[7] demonstrates that only when the power-law exponent ranges from 2 to 3, does an infinite network with power-law degree distribution possess a few hub nodes. In addition, Ref.[11] shows that the heterogeneity index, i.e. Gini coefficient, of an infinite network with powerlaw degree distribution is between 1 and 0.5 when the power-law exponent ranges from 2 to 2.5. *
This work is supported by National Natural Science Foundation of China (70773041).
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 213–219, 2009. © Springer-Verlag Berlin Heidelberg 2009
214
R. Ou et al.
As is well known, a network is said to be a homogeneous network if and only if each node has approximately the same degree. Contrarily, if the nodes of a network have quite different degrees, it is said to be a heterogeneous network. However, heterogeneous networks could be at least classified into two types; one type is characterized by small difference between the fraction of high-degree nodes and that of low-degree nodes, and the other type is characterized by a few high-degree nodes and a mass of low-degree nodes. Note that the heterogeneity of complex networks usually mentioned in literatures is of the latter one. Nevertheless, current studies seem to ignore the difference between them and mix them when measuring the heterogeneity of complex networks in the real world. Therefore, this paper suggests a comprehensive analysis taking several coefficients into account, which will help us to distinguish the different types of heterogeneities. Following the empirical studies, our analysis will concentrate on finite networks with power-law degree distribution. It should be noted that the conclusions of this paper may have implications for other power-law phenomena.
2 Theoretical Analysis 2.1 Degree Distribution of Complex Networks in the Real World Since the fundamental research of Barabási and Albert in 1999 [1], empirical studies have demonstrated that many complex networks in the real world, including social networks, information networks, technological networks and biological networks, as well as economic networks [2, 3, 5, 6, 12], surprisingly tend to present a power-law degree distribution. In this paper, we consider complex networks with N + 1 nodes possess continuous power-law degree distribution functions as follows ⎧ γ − 1 −γ ⎪⎪1 − N 1−γ x , f ( x) = ⎨ ⎪ 1 x −γ , ⎪⎩ ln N
γ ≠1
(1)
γ =1
Consequently, their cumulative degree distributions are ⎧ N γ −1 x1−γ − 1 , ⎪⎪ γ −1 F ( x) = ⎨ N − 1 ⎪1 − ln x , ⎪⎩ ln N
γ ≠1
(2)
γ =1
2.2 Concentration Trend of the Degree Distribution
Although arithmetical mean is most popular to measure the concentration trend of a set of data, median is more reasonable when the data contains some extreme values. Empirical studies have revealed that complex networks in the real world are characterized by a few hub nodes and a mass of low-degree nodes [1, 2, 3]. Therefore, we choose median, rather than arithmetical mean, to measure the concentration trend of degree distribution.
On Heterogeneity of Complex Networks in the Real World
215
Solving equation F ( x ) = 0.5 yields median m 1 ⎧ 1−γ 1−γ ⎪⎪⎛ N + 1 ⎞ , m( N , γ ) = ⎨⎜⎝ 2 ⎟⎠ ⎪ ⎪⎩ N ,
γ ≠1
(3)
γ =1
As is shown in Fig.1 explicitly, when N is large but not very huge, m is very large in the case of 0 ≤ γ ≤ 1 2 implying the small difference between the fraction of highdegree nodes and that of low-degree nodes, while m is very small in the case of γ ≥ 3 2 , which means that most nodes in the network have low degree. In addition, m drops dramatically from very large to very small as γ increases from 1 2 to 3 2 .
Fig. 1. Relation between m and γ
2.3 Absolute Dispersion Degree of the Degree Distribution
The standard deviation is to measure the absolute dispersion degree of the degree distribution. We obtain the standard deviation of the degree distribution, σ , from degree distribution function (1) as follows. ⎡N ⎤ σ ( N , γ ) = ∫ x f ( x)dx − ⎢ ∫ xf ( x)dx ⎥ 1 ⎣1 ⎦ N
2
2
⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ =⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩
(γ − 1)(1 − N 3−γ ) (γ − 1)2 (1 − N 2−γ ) 2 − , (γ − 3)(1 − N 1−γ ) (γ − 2)2 (1 − N 1−γ ) 2
(4)
2
N −1 ⎛ N −1 ⎞ −⎜ ⎟ , 2 ln N ⎝ ln N ⎠ 2
N−
γ ≠ 1, 2,3
N 2 ( ln N ) ( N − 1)2
γ =1
2
,
2 N 2 ln N 4N 2 − , N 2 −1 ( N + 1)2
γ =2 γ =3
216
R. Ou et al.
As Fig.2 reveals clearly, in the event that N is large but not very huge, σ is very large in the case of 0 ≤ γ ≤ 1 and very small in the case of γ ≥ 3 , which means that the networks are heterogeneous and homogeneous, respectively. On the other hand, σ decreases intensely from very large to very small as γ increases from 1 to 3, which indicates that the network changes from a heterogeneous one to a homogeneous one.
Fig. 2. Relation between σ and γ
2.4 Relative Dispersion Degree of the Degree Distribution
The coefficient of variation is to measure the relative dispersion degree of the degree distribution. We calculate the coefficient of variation v of the degree distribution according to the degree distribution function (1) and the standard deviation formula (4) as follows. v( N , γ ) =
σ N
∫ xf ( x)dx 1
⎧ 2 1−γ 3 −γ ⎪ (γ − 2) (1 − N )(1 − N ) − 1, 2 −γ 2 ⎪ (γ − 1)(γ − 3)(1 − N ) ⎪ ⎪ ( N + 1) ln N =⎨ − 1, ⎪ 2( N − 1) ⎪ 2 ⎪ ( N − 1) − 1, ⎪ N ( ln N )2 ⎩
γ ≠ 1, 2,3
(5)
γ = 1,3 γ =2
As Fig.4 shows, in the event that N is large but not very huge, v increases with γ in the case of 0 ≤ γ < 2 , decreases with the increasing of γ in the case of γ ≥ 2 , and achieves its maximum value equals to about N ln N at γ = 2 . In addition, in the cases of γ < 1 and γ > 3 , v is very small. Consequently, networks with γ ∈ [1,3] are
probably heterogeneous networks in relative sense.
On Heterogeneity of Complex Networks in the Real World
217
Fig. 3. Relation between v and γ
2.5 Inequality Degree
Gini coefficient, which is derived from economics, is an index for quantifying the inequality degree of a set of data. In order to calculate the Gini coefficient of degree distribution, we construct the Lorenz function of degree distribution: 2 −γ ⎧ 1−γ 1−γ ⎡ ⎤ − − − 1 1 1 N x ⎪ ⎣ ⎦ , ⎪ 1 − N 2 −γ ⎪ x ⎪1 − N L ( x) = H ( G ( x) ) = ⎨ , ⎪ 1− N ⎪ − ln 1 − (1 − N −1 ) x ⎪ , ln N ⎪ ⎩
(
)
(
)
γ ≠ 1, 2
(6) γ =1 γ =2
where ⎧ 1 − x 2−γ , ⎪⎪ 2 −γ tf (t ) H ( x) = ∫ dt = ⎨1 − N μ 1 ⎪ ln x , ⎪⎩ ln N x
γ ≠2
(7)
γ =2
and G ( x) is the inverse function of 1 − F ( x) . The meaning of L ( x ) is intuitive, i.e., L( x) is the percentage that the lowestdegree x ×100% nodes account for total degree of the network. We figure out Gini coefficient g using Lorenz function (6) as follows. 1
g ( N , γ ) = 1 − 2 ∫ L( x)dx 0
⎧ (1 − γ ) N 3−2γ − 1 2 ⎛ ⎪1 − ⎜1 + − γ 2 ⎪ 1 − N ⎜ (3 − 2γ ) 1 − N 1−γ ⎝ ⎪ ⎪⎪ 2 2 , = ⎨1 − − 1 ln N N − ⎪ ⎪ N ln N − 2 N − 1 ⎪1 − , 2 ⎪ 1 N − ⎪⎩
(
(
(
)
(
)
) ⎞⎟ , ) ⎟⎠
3 2
γ ≠ 1, ,2 γ = 1, 2 γ=
3 2
(8)
218
R. Ou et al.
As Fig.4 shows, in the event that N is large but not very huge, g increases with γ in the case of 0 ≤ γ < 3 2 , decreases with the increasing of γ in the case of γ ≥ 3 2 , and achieves its maximum value equal to about 1 − ln N − 2 at γ = 3 2 . In particular, N g is greater than 0.5 for most values of γ in the interval ( 3 2,5 2 ) . As is well known
in economics, the situation that Gini coefficient exceeds 0.5 is regarded as extremely unequal. Thus, we can draw the conclusion that the degree distribution of a power-law network is highly unequal in the case of 3 2 < γ < 5 2 .
Fig. 4. Relation between g and
γ
3 Discussion and Conclusion Heterogeneity is an important structural property of complex networks in the real world. As is mentioned above, if the nodes of a network have quite different degrees, it is said to be a heterogeneous network. Furthermore, heterogeneous networks could be at least divided into two types, which we named type 1 and type 2 here, respectively. Networks of type 1 are characterized by small difference between the fraction of high-degree nodes and that of low-degree nodes. Networks of type 2 are characterized by a few high-degree nodes and a mass of low-degree nodes. Heterogeneous complex networks in the real world usually belong to type 2. The degree distribution of heterogeneous networks of type 1 should possess large standard deviation and median, but not necessarily large coefficient of variation and Gini coefficient. On the other hand, the degree distribution of heterogeneous networks of type 2 should possess large coefficient of variation and Gini coefficient, and small median, but not necessarily large standard deviation. In contrast, the degree distribution of homogeneous networks should have small standard deviation, coefficient of variation and Gini coefficient, but not necessarily small median. Therefore, there may be no suitable unified index to quantify heterogeneity of complex networks in the real world. Contrarily, perhaps a comprehensive consideration taking median, standard deviation, coefficient of variation and Gini coefficient into account is a better approach to analyze heterogeneity. Although our analysis is based on the networks with perfect power-law degree distribution, this method can be applied to empirical cases and complex networks with other kinds of degree distributions.
On Heterogeneity of Complex Networks in the Real World
219
According to the analytical results in section 2, we could come to conclusions as follows: (1) networks with power-law degree distribution are heterogeneous networks of type 1 in the case of 0 ≤ γ ≤ 1 2 , heterogeneous networks of type 2 in the case of 3 2 ≤ γ ≤ 5 2 and homogeneous networks in the case of γ ≥ 3 ; (2) with γ increasing from 1 2 to 3 2 , networks with power-law degree distribution change from heterogeneous networks of type 1 to type 2; (3) with γ rising from 5 2 to 3, networks with power-law degree distribution change from heterogeneous networks of type 2 to homogeneous networks; (4) there is no specific threshold value of γ for distinguishing heterogeneous networks of type 1 and type 2, neither do the heterogeneous networks of type 2 and homogeneous networks. Empirical studies indicate that most complex networks in the real world possess the heterogeneity of type 2. Consequently, the power-law exponent of the degree distribution of these networks usually ranges from 1 to 3. In addition, the conclusions above may also have implication for other power-law phenomena.
References [1] Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999) [2] Newman, M.E.J.: The structure and function of complex networks. SIAM Rev. 45, 167–256 (2003) [3] Jackson, M.O., Rogers, B.W.: Meeting Strangers and Friends of Friends: How Random are Socially Generated Networks? American Economic Review 97(3), 890–915 (2007) [4] Albert, R., Jeong, H., Barabási, A.-L.: Error and attack tolerance of complex networks. Nature 406, 378–382 (2000) [5] Yang, J., Lua, L., Xie, W., Chen, G., Zhuang, D.: On competitive relationship networks: A new method for industrial competition analysis. Physica A, 704–714 (August 2007) [6] Inaoka, H., Takayasu, H., Shimizu, T., Ninomiya, T., Taniguchi, K.: Self-similarity of banking network. Physica A 339(3-4), 621–634 (2004) [7] Nishikawa, T., Motter, A.E., Lai, Y.C., Hoppensteadt, F.C.: Heterogeneity in Oscillator Networks: Are Smaller Worlds Easier to Synchronize? Phys. Rev. Lett. 91, 014101 (2003) [8] Lin, W., Guanzhong, D.: On Degree distribution of Complex Network. Journal of North eastern Polytechnic University 24(4), 405–409 (2006) (in Chinese) [9] Solé, R.V., valverde, S.V.: Information Theory of Complex Networks. Lect. Notes. Phys. 650, 189 (2004) [10] Wang, B., Tang, H.W., Guo, C.H., Xiu, Z.L.: Entropy Optimization of Scale-Free Networks Robustness to Random Failures. Physica A 363, 591 (2005) [11] Hu, H.-B., Wang, X.-F.: Unified index to quantifying heterogeneity of complex networks. Physica A 387, 3769–3780 (2008) [12] Yang, J., Zhuang, D., Xu, X.: The complex network analysis on service channels of a bank and its management utility Dynamics of Continuous. Discrete and Impulsive Systems Series B: Applications & Algorithms 15, 179–193 (2008)
Some Common Properties of Affiliation Bipartite Cooperation-Competition Networks Da-Ren He College of Physics Science and Technology, Yangzhou University, Yangzhou 225002, China
Abstract. This article presents a brief review about some common properties of cooperation-competition networks described by affiliation bipartite graphs. Firstly, the distributions of three statistical quantities, the two bipartite graph degrees and a projected unipartite graph degree, which describes the network cooperation-competition configuration, are introduced. The common function forms of the distributions are deduced by the analytic and numerical analyses of a network evolution model, and then verified by the empirical investigations on 23 real world cooperation-competition systems. Secondly, for a description on the competition results, a node weight is proposed which represents a kind of its competition gain. A common node weight distribution function is empirically observed in the 23 real world systems. Thirdly, the relationships between the properties describing the cooperation-competition configuration and the competition properties are discussed. The only example reported in this article is the correlation between the node weight and a bipartite graph degree. These studies may be helpful for the development of complex system theory and the understanding of some important real world systems.
1 Introduction Complex network studies appeared as a frontier of sciences in 1998 [1]. Among the studies, the collaboration networks attracted attentions [1-8]. Actually in complex systems complete competition or complete cooperation appears only in a few extreme cases. In most cases competition and cooperation coexist, therefore a rising interest is shown on this topic [9-13]. The interest of our group has been concentrated on the cooperation-competition networks described by affiliation bipartite graphs [7,8,11-13]. An affiliation bipartite graph contains two types of nodes, one is called “acts” denoting some events, organizations, or activities (e.g., the sell markets of some products) and the other is called “actors” denoting some participants in the acts (e.g., the producers). Edges only exist between different types of nodes (e.g., a producer sells one type of its products in a sell market). To describe the cooperation-competition relation between the actors, a projected single-mode (unipartite) network is often used. In the unipartite network, all the actors, which take part in the same act, are connected by equivalent unweighted links. Therefore an act can be expressed by an “act complete subgraph” in the unipartite graph. The topological structure of the bipartite and the projected unipartite graphs can only describe the cooperation-competition configuration, which let you Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 220–227, 2009. © Springer-Verlag Berlin Heidelberg 2009
Some Common Properties of Affiliation Bipartite Cooperation
221
know how the actors are taking part in acts. To describe the competition with a different view of point, Fu et al. proposed introducing a “node weight”, wi, for a description of the competition gain of a node, i [11,12]. The rest of the paper is organized as follows: In Section 2 we introduce distributions of three statistical quantities for the description of the cooperation-competition configuration. Two kinds of the distribution functions are observed empirically and also explained by a network evolution model. Section 3 presents empirical investigation results on a common node weight distribution function. The function parameter value implication is also discussed. In Section 4 we report an empirical investigation on the relationship between the node weight and one of the bipartite graph degrees. In the last section, Section 5, the text is summarized.
2 Network Cooperation-Competition Configuration Descriptions In the projected unipartite graph, one node i ’s degree ki is defined as the number of its adjacent edges and can be expressed as ki = ∑ aij , where aij denotes the element j
of the projected unipartite graph adjacency matrix. The element of the adjacency matrix is defined as aij = 1 if nodes i and j are linked in the projected unipartite graph and aij = 0 otherwise. In collaboration-competition networks, we believe that the quantities, which describe how the actors are taking part in acts, are more important. One of such quantities is a bipartite graph degree, the number of actors in an act, addressed as “act size” and denoted by T. It can be expressed as Ti = ∑ bij , where bij is the element of the bii
partite graph adjacency matrix. The element is defined as bij = 1 if actor i and act j are linked in the bipartite graph, and bij = 0 otherwise. Another bipartite graph degree is the number of acts, in which an actor takes part, addressed as “act degree” of the actor nodes and denoted by hi = ∑ bij . j
In this article we use the standard definition of degree distribution P(k) [14] (the number of nodes with degree k in the network) to represent the probability of a node with degree k. The distributions of other quantities, such as P(h) and P(T ), are defined similarly. What are the different common function forms of P(h) and P(k), when P(T ) takes a rather homogeneous form (an approximate normal distribution), or a heterogeneous form (an approximate power law)? Zhang, Chang, Liu and the cooperators proposed a cooperation-competition affiliation bipartite network evolution model [7,8,13] as an answer to this question. In the model there are m0 actors at t0, which are connected and form some act complete subgraphs in the projected unipartite graph. In each time step a new node is added. It is connected to T-1 old nodes to form a new act complete subgraph of T nodes. The rule of selecting the T-1 old nodes is as follows: randomly selecting old nodes with a probability p, and using a linear preference rule with probability 1-p. With the linear preference rule an old node i is selected with the probability Π ∝ hi / ∑ h j , where hi denotes its act degree, and j denotes another old node. j
222
D.-R. He
When T shows a unimodal distribution one can assume that T is a constant integer and expect that the simplification does not influence the qualitative conclusion. A differential equation for h evolution can be set up as
∂hi (T − 1)hi 1 . = p(T − 1) + (1 − p) ∂t t + m0 T (m0 + t )
(1)
This can be solved to give
P ( h) =
η h + α − γ −1 ( ) , (1 + α ) 1 + α
(2)
where α=(Tp)/(1-p) and γ=T/[(T-1)(1-p)]. The conclusion is that the act degree distribution P(h) shows a so-called “shifted power law (SPL)” function form. The general form of SPL functions can be expressed as P ( x ) ∝ ( x + α )γ
(3)
Where γ and α are constants. The function can be shown by a linear line with a slope value γ on the ln P(x)-ln(x+α) plane. For α=0, one finds that P( x) ∝ x −γ ,
(4)
which indicates a power-law P(x) distribution with the scaling exponent γ. For α→∞, it is easy to show that P(x) tends to an exponential distribution ln P( x) ∝ (− x) .
(5)
So a distribution for 0
(2)
Vq is a comment on Vp, and the parameter Ty describes the semantic relationships between Vq and Vp. The symbol Q describes the set of the argument information: Q =< T , V , E > (3) The parameter V and E indicate the information attributes set of views and the relationship set of view nodes. The parameter T indicates the decision argument task which the information attributes set of views comment on.
4 The Argumentation Info-visualization Model Visual Modeling is a useful thing, which helps one capture the structure and relationships of a system. The Unified Modeling Language (UML) is a modeling language for specifying, visualizing, constructing, and documenting the artifacts of software systems, as well as for business modeling and other non-software systems [10]. A UML model is made up of one or more diagrams. A diagram graphically represents things, and the relationships among these things. Class diagrams are one of the most
Group Argumentation Info-visualization Model in the Hall
239
fundamental diagram types in UML. The purpose of a class diagram is to depict the classes within a model. In an object oriented application, classes have attributes (member variables), operati1ons (member functions) and relationships with other classes. UML provides several ways of representing relationships between classes. Each relationship represents a different type of connection between classes. Object diagrams specify instances of the class at a certain time point and the relations among them. Package diagrams are really special types of class diagrams, whose focus is on how classes and interfaces are grouped together. Class diagrams provide a way to capture the structure of a system, and the static relationships among them. It not only has great effect in argument, but also easy to maintenance [11]. We can design the argument information as a class without operation, that’s to say, a comment is an instance of the argument information, the semantic relationships among them are the relationships between objects. The argument information, organized in the form of object diagrams, is considers as a node and connected by directed arcs. As shown in Figure 1,”Idea” is the substandpoint while other nodes are the direct or indirect evaluation to them. A dynamic directed graph structure including all comments is constructed gradually following the progression of discussion. The graph reduces the interference of similar views to the experts, and the experts can concentrate on hot views and useful information.
Fig. 1. The Info-Visualization Model
Fig. 2. The package of the task
According to its structure and nature, the decision-making task is divided into several sub-issues when necessary. The argumentation information related to the same issue can
240
W. Ming-li and D. Chao-fan
be grouped into a package. It is easy for experts to capture the state of the group argument with the Argumentation Info-Visualization model (see Figure 1 and Figure 2). To help convey the correct information depending on his goals, we offer such views of the argumentation information as task view (a diagram emphasized the set of the information related to a task selected by user), and subject view (a diagram focused on the information related to one “Idea”). What’s more, different user may need to operate at different scales, so they are permitted to determine which attribute of the argumentation information can be shown.
5 The Algorithms to Analyze Consensus It’s important to discern the consensus state in real time and correctly in organizing group argumentation in HWME. The contrast between conflicts decides the consensus state. And, the contrast is represented by the argument information in HWME [12]. The goal of the argument is to get the best one from the options named “Idea” in this paper. So, we only need focus on “Idea”. As analyzed in section 3, the effective interactions lead to the network structure of a responded environment, characterized by a directed graph that includes the vertex and the directed edge. Each vertex has three quality attributes: attention quality attribute, agreement quality attribute, and disagreement quality attribute. Those attributes, with default value, may change along with argumentation. From the graph, we can get the consensus state by evaluating the degree of the contrast between conflicts. Next, we introduce the approach. 1. The Default value of vertex is equal to the value of people who says.
Value(V( spokesman) ) = Value( spokesman)
(4)
2. the value of edge lies on the semantic relationship
⎧1 ⎪a ⎪⎪ Value( E ) = ⎨0 ⎪− a ⎪ ⎩⎪− 1
if it means " Argues for" ; if it means " Informs" ; if it means " related"
(5)
if it means " Queries" if it means " Argues against"
The variable ‘a’, ranged from 0 to 1, is evaluated by the expert when he/she commented. 3. At(v) is short for the value of the attention quality attribute, A(V) is short for the value of agreement quality attribute, and O(v) is short for the value of the disagreement quality attribute. The default value of them is zero. 4. IF Vq comment on Vp, and E means the semantic relationship between Vq and Vp. A(V p ) =
Value(Vq ) * Value( E ) ∑ Vq →V p
if Value(E) > 0
(6)
O (V p ) =
Group Argumentation Info-visualization Model in the Hall
241
if Value(E) < 0
(7)
Value(Vq ) * Value ( E ) ∑ Vq →V p
At (V p ) =
∑ 1 + At (Vq ) Vq →V p
Value(V p ) = A(V p ) + O (V p ) + Value(V p )
(8) (9)
5. We can calculate those variables of every vertex iterative, beginning with the vertex whose in-degree is zero. Now, we get the consensus state by comparing those variables of the “idea”: – – –
It implies the mass of the experts agree the “idea” (v), if A(V)>>|O(V)|. It implies the mass of the experts disagree the “idea” (v), if A(V)P>S, 2R>T+S. When both firms adopt the collaborative strategy, the total payoffs are largest. When agents adopt different strategy, the party of adopting collaborative strategy will be loss. In this case, the total payoffs are less than the total payoffs that the firm agents gain when both firms adopt the collaborative strategy. Table 1. The payoff matrix collaborator competitor collaborator R,R S,T competitor T,S P,P
3.2 The Action Rules of Agent 1. The game rules of agent Firm agent plays with surrounding neighbors for Prisoner’s Dilemma Game. In this model, we simplify the payoff value, that is, if both firms adopt collaborative strategy, each of them gets one unit payoffs; if one adopts collaborative strategy and another adopts competitive strategy, the party of adopting collaborative strategy gets zero unit payoffs, another party of adopting competitive strategy gets R unit payoffs. 2. The learning evolution rules of agent Assuming firm agents are in Moore area or Von Neumann area, firm agents play with the surrounding neighbors. When firm agents have finished the game, we call that a generation agents complete game. At the same time, in every area, the firm agents that get higher payoffs are copied and insert into next generation. That is, all the agents have the ability of learning. Finishing the game, the agents learn the strategy of agent who got higher payoffs in last game, and then agents enter into next game. Finally, the new generation replaces the old generation to game. 3. The mobile rules of agent In reality, the collaborative competition among enterprises is not limit to the local place. They can collaborate and compete with each other in the global. So we assume the agents can move after one cycle. But we only consider the simple situation. In
254
Z. Ge and J. Liu
every simulation cycle, each firm agent selects direction randomly. In further research, we will consider other movements such as directed movement. In the random movement situation, if the point to the direction of grid is empty, and none of other firm agents point to this grid, in next simulation cycle, the firm agents will move to this grid. Otherwise, firm agents stay put. 3.3 Design Simulation Rules The flow of simulation is as follows: 1. Set initialization of location, strategy, payoff of the agents. The type of firm agents is determined in accordance with the strategy the agents adopt. 2. Agents judge the type of around neighbors. Then agents find the surrounding neighbors and play with them, at the same time, they accumulate own earnings. 3. Find out the agent whose payoff is largest in every area. Then learn its strategy and copy completely. At this point, new generation agents replace old generation agents that have played. 4. New generation move in the grid based on the mobile rules. Begin next simulation cycle.
4 Simulation Results The simulation parameters set initially: the size of market is 35*35, the proportion of firm is 0.8, the ratio of collaborative agents is 0.5, the type of neighbors is Von Neumann, in the payoff matrix R equals 1.1. The result of simulation is shown in Figure 3 and Figure 4. The agents distribute randomly in a grid whose size is 35*35. When the simulation time is twelve, the number of collaborators is more than competitors. Gradually, the payoff of collaborator is higher than that of the competitor, so most firm agents adopt collaborative strategy. We find the collaborators take up the whole market when simulation time reaches 633.
Fig. 3. The initial parameters
Fig. 4. Iterate 12 times
Observe the market data map, as shown in Figure 5, where the red curve represent competitor, the blue curve represent collaborator. In initial state, the number of collaborators equals that of competitors. Gradually, competitors are more than collaborators. Subsequently, the collaborators are greater than competitors. The whole market evolves to the direction of cooperation.
Simulation for Collaborative Competition Based on Multi-Agent
Fig. 5. The data map in the initial parameters
255
Fig. 6. The ratio of collaborators is 0.8
As the model involved in a number of parameters, the value of different parameters is in a direct impact on the simulation results. So we observe the evolution of firm’s collaborative competition through changing the ratio of the collaborators, the type of neighbors, as well as the value of R in the payoff matrix. 4.1 Change the Proportion of Collaborators Change the ratio of collaborators in the above simulation parameters to 0.8, as shown in Figure 6, collaboration is greater than competition, and the whole market evolves to collaboration. When ratio is up to 0.9, the whole market evolves to collaboration. Comparing the time of the front of several simulation, it evolves the entire collaborative state using the shortest time. Then we decline the ratio of initial collaborators. We set the parameters to 0.2. From Figure 7, we can find the market gradually evolve from competition to collaboration. When the ratio descends to 0.1, one situation is that the market is remaining competitive, and eventually evolves into competition, that is, none of collaborators exists in the market. Another situation is shown in Figure 8. The number of collaborators exceeds competitors gradually. Finally, the market evolves to the collaborative state. After several rounds of simulation, we found that when collaborators distribute scattered initially, the market evolves competition ultimately. Whereas the collaborators gather for distribution, after a long period of time, the market will evolve into a cooperative state gradually. From changing the ratio of collaborators, we can draw the following conclusions: 1. When initially collaboration is more than competition in the market, short-term competition may be increase firm’s profits for the time being, but if firms want to increase profits for a long-term time, the collaborative strategy is the best approach. At the same time the whole market will evolve quickly into cooperative and healthy state. 2. When initially competition is more than collaboration in the market, if firms are in the competitive market, they should adopt collaborative approach to increase profits continually rather than adopt competitive strategy. But if the competition between firms is far greater than collaboration, there are two cases. If collaborative firms are intensive, competitive firms change their strategy to collaboration in order to increase profits. On the contrary, if collaborative firms are scattered in the market, the whole market will be in the competitive environment, finally there is no collaborative firms in market. It also illustrates the necessary collaboration between firm clusters in order to promote the industry going on the track of sound progress.
256
Z. Ge and J. Liu
Fig. 7. The ratio of collaborators is 0.2
Fig. 8. The ratio of collaborators is 0.1
4.2 Change the Type of Neighbors Change the initial type of neighbors to Moore and maintain other parameters. The simulation results are shown in Figure 9. We can find the curves change steep. That is, the rate of the number of collaborators and competitors changes rapidly comparing to the Von Neumann type. Moreover, the entire market evolving into the cooperative state uses less time. Through changing neighbor’s type, it can be concluded that when firm cooperates with other firms in a single trade, the more the number of collaborators, the easier it adopt collaborative strategies to increase revenue, thereby it can be established in the market for a long time.
Fig. 9. Moore neighbors
Fig. 10. R=1.5
4.3 Change the Payoff Value R Change the value R in the payoff matrix. When R=1.5, as shown in Figure 10, after the beginning of the simulation, the number of competitors is rising rapidly, which is far greater than the number of collaborators. The market finally evolves into competitive state. If we continue increasing R, when R=1.8, the market is quickly filled with competitors and is driven into an intense competitive state. From the above changing R, we can find in the competition-collaboration game the higher payoff of the competitors, in particular the much greater than the payoff of the collaborators, the easier the firms adopt competitive strategy to increase profits. Finally, the whole market is driven into cutthroat competition.
5 Conclusions The collaborative competition of firms in reality shows complex and dynamic features of the evolution. We use the advantage of multi-agent technology in modeling of complex systems and combine the idea of Holland’s ECHO model to build a collaborative
Simulation for Collaborative Competition Based on Multi-Agent
257
competition model that contains the firm’s individual feature and many mechanisms of action. Through changing the agent’s location, strategy, payoff parameters, we simulate the learning feature of firm agents and the interactive characteristics between agents. Through creating the mechanism of the agents responding to collaborative competition, we observe the interaction between environment and agents. In the simulation, we explore the inherent law of the collaborative competition between agents. As the collaborative competition between firms is a complicated process, during the simulation, many issues and details have not yet taken into account in the model. In our future study, we will consider further about the credit of agents and the punishment of agents. We should design the mechanism of merger between agents. Through changing the parameters of simulation, we will study in-depth the mechanism of the evolution of collaborative competition and influencing factors.
References [1] Lee, M., Lee, J., Jeong, H.-J., Lee, Y., Choi, S., Gatton, T.M.: A cooperation model using reinforcement learning for Multi-agent, pp. 675–681. Springer, Heidelberg (2006) [2] Wang, T.-D., Fyfe, C.: Simulation of cooperation for price competition in oligopolies, pp. 718–725. Springer, Heidelberg (2006) [3] Axelrod, R.M.: The complexity of cooperation: Agent-based models of competition and collaboration. Princeton University Press, Princeton (1997) [4] Bengtsson, M., Kock, S.: Cooperation and competition in relationships between competitors in business networks. Journal of Bussiness & Industrial Marketing, 178–193 (1999) [5] Eriksson, K., Sharma, D.D.: Modeling uncertainty in Buyer-Seller cooperation. Journal of Business Research, 961–970 (2003) [6] Hausman, A., Fohnston, W.f., Oyedele, A.: Cooperative adoption of complex systems: a comprehensive model within and across networks. Journal of Business & Industrial Marketing, 200–210 (2005) [7] Mayoh, B.: Evolution of cooperation in Multi-agent Systems, pp. 701–710. Springer, Heidelberg (2002) [8] Khojasteh, M.R., Meybodi, M.R.: Evaluating learning automata as a model for cooperation in complex Multi-agent domains, pp. 410–417. Springer, Heidelberg (2007) [9] Esmaeilie, M., Aryanezhad, M.-B., Zeephongsekul, P.: A game theory approach in sellerbuyer supply chain. European Journal of Operational Research, 10–16 (2008) [10] Min, Z., Feiqi, D., Sai, W.: Coordination game model of co-opetition relationship on cluster supply chains. Journal of Systems Engineering and Electronics, 499–506 (2008) [11] Hulsmann, M., Grapp, J., Li, Y.: Strategic adaptivity in global supply chains-competitive advantage by autonomous cooperation. Int. J. Production Economics, 14–26 (2008) [12] Burkov, A., Boularias, A., Chaib-draa, B.: Competition and Coordination in Stochastic Games, pp. 26–37. Springer, Heidelberg (2007) [13] Holland, J.H.: Hidden order: How adaptation builds complexity. Addison Wesley Publishing Company, New York (1995)
Fuzzy Optimal Decision for Network Bandwidth Allocation with Demand Uncertainty Lean Yu1 , Wuyi Yue2 , and Shouyang Wang1 1
2
Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China {yulean,sywang}@amss.ac.cn Department of Intelligence and Informatics, Konan University, Kobe 658-8501, Japan
[email protected] Abstract. In this paper, a fuzzy methodology is proposed to optimize network bandwidth allocation with demand uncertainty in communication networks (CNs). In this proposed methodology, uncertain traffic demands are first handled by a fuzzification way. Then a fuzzy optimization methodology is presented for network bandwidth allocation problem with the consideration of the trade-off between resource utilization and service performance in CNs. Accordingly, the optimal network bandwidth is allocated to obtain maximum network revenue in CNs. Finally, a numerical example is presented for purpose of illustration.
1 Introduction Optimal decision for network bandwidth allocation is one of the most important issues in communication networks (CNs), which is closely related to resource utilization, performance stability as well as network revenue management. In many past studies, the network bandwidth optimization was usually formulated as a deterministic multicommodity flow (MCF) model, where demand of each network channel was assumed to be a deterministic quantity (Yu et al., 2008; Wu et al., 2006). However, the deterministic MCF model may be unsuitable when the network demands are uncertain. In the presence of demand uncertainty, we cannot know the exact network demand and thus it is difficult for network service providers to allocate an optimal network bandwidth capacity. If a large network bandwidth capacity is allocated, then the possibility that the network bandwidth is fully utilized will decrease. Furthermore, the over-provisioned network bandwidth will lead to some extra maintenance costs. To ensure effective network bandwidth utilization, the provisioned network bandwidth capacity should be small, but with a small capacity the network may not satisfy the possible traffic demands and thus increasing a risk of reduction of network revenue. Also, less-provisioned network bandwidth capacity may depress network service performance in CNs such as network congestion or traffic jam. For these reasons, it is important for network service providers to allocate an optimal network bandwidth capacity under the environment of network demand uncertainty. In the past studies, the uncertain demand was usually treated as a stochastic variable to allocate an optimal network bandwidth capacity in CNs (Wu et al., 2006; Mitra and Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 258–265, 2009. c Springer-Verlag Berlin Heidelberg 2009
Fuzzy Optimal Decision for Network Bandwidth Allocation
259
Wang, 2005). But in many practical applications, the network demands are often considered as possibilistic situations where the demands usually vary within the confidence interval due to uncertain environment. In such cases, uncertain network demand can be reasonably treated as a fuzzy number corresponding to the confidence interval. Based on the fuzzification treatment for network demands, this paper proposes a fuzzy method to optimize network bandwidth allocation in CNs. In the proposed methodology, uncertain traffic demand is first handled by a fuzzification way. Then a basic analysis of the network bandwidth allocation decision problem with the fuzzy traffic demand is provided. Finally, some important results about optimal network bandwidth allocation are derived based on the fuzzy optimization methodology. The main purpose of this paper is to make an optimal decision for network bandwidth allocation so that maximum profits from serving network demands are obtained. The rest of this paper is organized as follows. Section 2 presents some preliminaries about fuzzy theory. In Section 3, a fuzzy optimization methodology for network bandwidth allocation with demand uncertainty is formulated. For illustration, a numerical example is shown in Section 3. Section 4 concludes the paper.
2 Preliminaries about Fuzzy Number and Fuzzy Integral In order to apply fuzzy theory to formulate the bandwidth allocation decision problem, some preliminary definitions about fuzzy number and fuzzy integral are presented. Interested readers can refer to Dubois and Prade (1980) for more details about fuzzy theory. Definition 1. A triangular fuzzy number D˜ is a fuzzy set of the real line R = (−∞, +∞) whose membership function μD˜ (B) has the following forms: ⎧ B−l ⎪ ⎪ ⎪ ⎨ L(B) = m − l , l ≤ B ≤ m (1) μD (B) = R(B) = r − B , m ≤ B ≤ r ⎪ ⎪ r − m ⎪ ⎩ 0, otherwise where L(B) and R(B) are the left-shape and right-shape functions, respectively, of the ˜ l, m and r are the left, middle and right values of defining different fuzzy number D. intervals for fuzzy membership function, where −∞ < l < m < r < +∞. For example, as shown in Fig. 1, when B ∈ [l, m], the fuzzy membership function is L(B); similarly, if B ∈ [m, r], the fuzzy membership function is R(B). Definition 2. Let F be the family of fuzzy sets on the real number set R. For each D˜ ∈ F, we have a α -cut or α -level set D(α ) = {B |μD˜ (B) ≥ α } = [Dl (α ), Dr (α )] (0 ≤ α ≤ 1, l < r). Definition 3. Let λ ∈ [0, 1] be a predetermined parameter, then the total λ -integral ˜ of D˜ can be defined as value Iλ (D) ˜ = (1 − λ )IL(D) ˜ + λ IR(D) ˜ Iλ (D)
(2)
260
L. Yu, W. Yue, and S. Wang
˜ and IR (D) ˜ are the left and right integral values of where D˜ is defined in Eq. (1), IL (D) ˜ D, which are shown below: ˜ = IL (D)
1 0
˜ = L−1 (α )d α , IR (D)
1 0
R−1 (α )d α
(3)
where L−1 (α ) and R−1 (α ) are the inverse functions of L(B) and R(B), respectively. Usually, the total λ -integral value is used to rank fuzzy numbers. Remark 1. The parameter λ ∈ [0, 1] in Eq. (2) reflects decision-maker’s degree of optimism for market estimation, thus it is also called as “optimistic coefficient”. Usually a ˜ = IL (D) ˜ (λ = 0) and large λ indicates a high degree of optimism. In particular, I0 (D) ˜ ˜ I1 (D) = IR (D) (λ = 1) represent pessimistic and optimistic decision viewpoints, respec˜ = 0.5[IL (D) ˜ + IR (D)] ˜ (λ = 0.5) provides a comparison criterion to tively, while I0.5 (D) the moderately optimistic decision-makers (Li et al., 2002).
3 Bandwidth Allocation Decision with Demand Uncertainty In this section, a fuzzy network bandwidth allocation decision method is proposed under uncertainty so that maximum network revenue can be achieved. Let P(B, D) denote the total profit function by transmitting messages in CNs, where B is the network bandwidth capacity and D is the possible network demand. Assume that the network demand is uncertain, a triangular fuzzy number D˜ with a membership function μD˜ (B) described in Eq. (1) is used to describe the uncertain network demands. In order to obtain the maximum profit, an important problem is how to allocate a suitable network bandwidth capacity to satisfy all possible network demands. Suppose that ˜ D) ˜ of the network network bandwidth B is determined, then the total profit function P(B, can be formulated as ˜ D) ˜ = aD˜ − cB − h max{0, B − D} ˜ − s max{0, D˜ − B} P(B,
(4)
˜ D) ˜ is in the fuzzy sense associated with a fuzzy demand D. ˜ a and c are where P(B, the unit revenue for serving network demands and the unit cost for each network bandwidth allocation, respectively. h is the unit maintenance cost for redundant bandwidth capacity. s is the unit penalty cost for each unsatisfied traffic demand caused by network congestion or traffic jam. Now we would like to know, in the situation of demand uncertainty, how to allocate an optimal network bandwidth capacity B∗ to get the maximum profit from the CN system. To avoid unrealistic and trivial cases, we assume 0 < h < c < s < a < +∞. ˜ D) ˜ is dependent of From Eq. (4), it is easy to find that the total profit function P(B, ˜ ˜ ˜ the uncertain demand D. Thus the total profit function P(B, D) is also a fuzzy number, ˜ That is, D˜ and P(B, ˜ D) ˜ which has the same membership grade as the fuzzy demand D. have the same shape in membership function, as illustrated in Fig. 1. ˜ D). ˜ From DefinAccording to Definition 2, we let P(α ) denote the α -cut of P(B, ition 1 and Fig. 1, it is easy to find that there are two typical scenarios with the consideration of the values of network demand D. In this paper, when the market estimation
Fuzzy Optimal Decision for Network Bandwidth Allocation
261
Fig. 1. The membership function μD˜ (B) of triangular fuzzy number D˜
for network demand is optimistic, it is suitable for the network bandwidth capacity to select the right-shape function R(B) (m ≤ B ≤ r). On the contrary, if the market estimation is a pessimistic scenario, selecting left-shape function L(B) (l ≤ B ≤ m) as the range of designing network bandwidth capacity is suitable. m, r, l are defined in Section 2. According to the two scenarios, we have the following two propositions. Proposition 1. If the network demand is estimated to be a pessimistic scenario, then ˜ D) ˜ can be represented as follows: the α -cut of P(B, ⎧ −1 [aL (α ) − cB − h(B − L−1(α )), ⎪ ⎪ ⎨ aR−1(α ) − cB − s(R−1(α ) − B)], 0 ≤ α ≤ L(B), (5) P(α ) = −1 [aL (α ) − cB − s(L−1(α ) − B), ⎪ ⎪ ⎩ −1 −1 aR (α ) − cB − s(R (α ) − B)], L(B) ≤ α ≤ 1, Proof. In the pessimistic scenario, the network bandwidth capacity B lies between l and m, the membership grade μP˜ is the same as L(·). If α ≤ L(B), then the lower ˜ D) ˜ is aL−1 (α ) − cB − h(B − L−1(α )) because the network bound of the α -cut of P(B, bandwidth capacity is greater than the network demand with an amount (B − L−1 (α )). ˜ D) ˜ is aR−1 (α )−cB−s(R−1 (α )−B) because Also, the upper bound of the α -cut of P(B, the network bandwidth capacity does not satisfy the traffic demand. If α ≥ L(B), then the network bandwidth capacity is always insufficient for the traffic demand defined ˜ D) ˜ is aL−1 (α ) − cB − s(L−1 (α ) − in the α -cut. Thus the lower bound of α -cut of P(B, −1 ˜ ˜ B) and the upper bound of the α -cut of P(B, D) is aR (α ) − cB − s(R−1 (α ) − B). Thus when the network bandwidth capacity follows a left-shape function L(B), the α -cut of ˜ D) ˜ can be described as Eq. (5). P(B, Likewise, when the network bandwidth capacity B lies between m and r, a similar proposition can be obtained, as shown below. Because the proof of Proposition 2 is similar to that of Proposition 1, this proof is omitted. Proposition 2. If the network demand is estimated to be an optimistic scenario, then ˜ D) ˜ can be represented as follows: the α -cut of P(B, ⎧ −1 [aL (α ) − cB − h(B − L−1(α )), ⎪ ⎪ ⎨ aR−1 (α ) − cB − s(R−1(α ) − B)], 0 ≤ α ≤ R(B), (6) P(α ) = −1 [aL (α ) − cB − h(B − L−1(α )), ⎪ ⎪ ⎩ aR−1 (α ) − cB − h(B − R−1(α ))], R(B) ≤ α ≤ 1.
262
L. Yu, W. Yue, and S. Wang
Now the main task is to make optimal decision for network bandwidth allocation from ˜ D) ˜ in a CN system. As previously mentioned, the total profit P(B, ˜ D) ˜ the α -cut of P(B, is a fuzzy number, it can be ranked by the existing ranking methods for ranking fuzzy ˜ D) ˜ is the opnumbers. The network bandwidth capacity with the maximum profit P(B, timal network bandwidth capacity to be designed. In the past studies, there were many ranking methods for fuzzy number ranking (Chen and Hwang, 1992). However, most methods require the explicit form of the membership functions of all fuzzy numbers to be ranked, which is impossible in some cases. The method of Yager (1981), which is later modified by Liou and Wang (1992), does not require knowing the knowledge of the membership functions and can thus be applied. Using the existing ranking methods in Definition 3, we have the following theorem. Theorem 1. If the uncertain network demand is fuzzified into a triangular fuzzy number, then the optimal bandwidth capacity B∗ satisfies the following equation: 2(s − c) . s+h
λ R(B∗ ) − (1 − λ )L(B∗) = 2λ −
(7)
Proof. According to Definition 3 and Eqs. (2), (3), (5) and (6), we can calculate the ˜ of P(B, ˜ D) ˜ as follows: corresponding total λ -integral value Iλ (P) ˜ = (1 − λ )IL(P) ˜ + λ IR(P) ˜ Iλ (P) = (1 − λ ) [2(s − c)B − B(s + h)L(B) + (a + h) +(a − s)
1
L(B)
L−1 (α )d α + (a − s)
1 0
1
R(B)
L−1 (α )d α + (a + h)
1 0
0
L−1 (α )d α
R−1 (α )d α ]
+λ [−2(c + h)B + B(s + h)R(B)+ (a − s) +(a + h)
L(B)
R(B) 0
R−1 (α )d α
L−1 (α )d α ].
˜ we can derive the optimal bandwidth with fuzzy demand. The Using the above Iλ (P), ˜ with regard to B is given below: first order derivative of Iλ (P) ˜ ∂ Iλ (P) = (1 − λ )[2(s − c) − (s + h)L(B)] + λ [−2(c + h) + (s + h)R(B)]. ∂B
(8)
˜ with respect to B is given as follows: The second order derivative of Iλ (P) ˜ ∂ 2 Iλ (P) = −(1 − λ )(s + h)L (B) + λ (s + h)R (B). 2 ∂B
(9)
Since s and h are larger than zero, λ ∈ [0, 1], L(B) is an increasing function with L (B) > 0, R(B) is a decreasing function with R (B) < 0, and thus Eq. (9) is nega˜ D) ˜ can tive and therefore the second optimal condition is met. This indicates that P(B, ∗ arrive at the maximum at B . From Eq. (8), we can obtain Eq. (7). If L(B∗ ) and R(B∗ ) satisfy the Eq. (1), we have the following corollary.
Fuzzy Optimal Decision for Network Bandwidth Allocation
263
Corollary 1. If L(B∗ ) = (B∗ − l) (m − l) and R(B∗ ) = (r − B∗ ) (r − m), then the optimal network bandwidth capacity B∗ is B∗ =
(1 − λ )l(r − m) + λ r(m − l) + [2(s − c) (1 − λ )(r − m) + λ (m − l) −
2λ (s + h)](m − l)(r − m) (s + h) . (1 − λ )(r − m) + λ (m − l)
(10)
Proof. According to Eqs. (1) and (7), we have
λ
B∗ − l 2(s − c) r − B∗ − (1 − λ ) = 2λ − . r−m m−l s+h
By reformulation, the optimal bandwidth B∗ can be represented as Eq. (10).
In terms of different optimistic coefficients λ , we have the following three theorems (Theorems 2-4) and three corollaries (Corollaries 2-4). Theorem 2. If decision-makers have an optimistic market estimation, then the optimal network bandwidth capacity B∗ with fuzzy demand can be calculated by ∗ −1 2(c + h) , for (c + h) ≤ (s − c). (11) B =R s+h Proof. Using Eq. (7) and λ = 1, we have R(B∗ ) = 2 −
2(s − c) 2(c + h) = . s+h s+h
(12)
From Fig. 1, it is easy to find that Eq. (7) should lie between 0 and 1 so that the optimal network bandwidth capacity B∗ lies between m and r. Since c, s, h are larger than zero, the 2(c + h) (s + h) is always positive. The requirement of 2(c + h) (s + h) ≤ 1 implies (c + h) ≤ (s − c). Hence the optimal network bandwidth capacity B∗ is easily calculated, as shown in Eq. (11). If the R(B∗ ) satisfies the definition of Eq. (1), we have the following corollary. Corollary 2. If R(B∗ ) = (r − B∗ ) (r − m), then the optimal bandwidth B∗ is B∗ = r −
2(c + h) (r − m), for (c + h) ≤ (s − c). s+h
(13)
Proof. Combining R(B∗ ) = (r − B∗ ) (r − m) and Eq. (12), we have R(B∗ ) =
2(c + h) r − B∗ = . r−m s+h
By reformulation, the optimal bandwidth B∗ can be expressed as Eq. (13).
264
L. Yu, W. Yue, and S. Wang
Besides the optimistic estimation, other two theorems (Theorems 3 and 4) and corollaries (Corollaries 3 and 4) for pessimistic estimation and moderately optimistic estimation can be obtained, respectively. Since the proofs of them are very similar to the proofs of Theorem 2 and Corollary 2, their proofs are omitted here. Theorem 3. If decision-makers have a pessimistic market estimation, then the optimal network bandwidth capacity B∗ with fuzzy traffic demand is 2(s − c) , for (c + h) ≥ (s − c). (14) B∗ = L−1 s+h Corollary 3. If L(B∗ ) = (B∗ − l) (m − l), then the optimal network bandwidth capacity B∗ can be represented as B∗ = l +
2(s − c) (m − l), for (c + h) ≥ (s − c). s+h
(15)
Theorem 4. If decision-makers have a moderately optimistic market estimation, then the optimal network bandwidth capacity B∗ with fuzzy traffic demand satisfies L(B∗ ) − R(B∗ ) =
2(s − 2c − h) , for (3s − 4c) ≥ h and (4c + 3h) ≥ s. s+h
(16)
Corollary 4. If L(B∗ ) = (B∗ − l) (m − l) and R(B∗ ) = (r − B∗ ) (r − m), then the optimal network bandwidth capacity B∗ is B∗ = m +
2(s − 2c − h) (m − l)(r − m), for (3s − 4c) ≥ h and (4c + 3h) ≥ s. (s + h)(r − l)
(17)
Using the above theorems and corollaries, the optimal decision for network bandwidth allocation can be easily made. For illustration, a numerical example is presented below. Example. Considering a network bandwidth allocation decision problem with a triangular fuzzy demand D˜ = (100, 140, 160). Let the unit revenue, unit construction cost and unit maintenance cost for extra network bandwidth be, respectively, a = 20, c = 6 and h = 2. In the pessimistic market estimation (i.e., λ = 0), the unit penalty cost is s = 8. In such a market situation, we have (c + h) > (s − c). According to the Eq. (15), the optimal bandwidth capacity B∗ = 116. If the market has a moderately optimistic estimation (i.e., λ = 0.5), the unit penalty cost will increase to 12, i.e., s = 12 due to possible increasing demand. In this situation, we have (3s − 4c) ≥ h, and (4c + 3h) ≥ s. Using Eq. (17), the optimal network bandwidth B∗ = 131.11. If the market is estimated to be optimistic, the unit penalty cost will increase to 16, i.e., s = 16. In this situation, we have (c + h) < (s − c). Applying Eq. (13), the optimal network bandwidth B∗ = 143.22.
4 Conclusions In this paper, a fuzzy method was proposed to optimize the network bandwidth allocation with uncertain demands in communication networks (CNs). Through fuzzification
Fuzzy Optimal Decision for Network Bandwidth Allocation
265
processing for uncertain demands, we can obtain the optimal network bandwidth capacity based on different optimistic coefficients. For illustration, a simple numerical example was used to verify the effectiveness of the results about the optimal bandwidth capacity allocation. The experiments reveal that these results can be easily applied to many practical bandwidth allocation decision problems in CNs.
Acknowledgements This work is partially supported by the grants from the National Natural Science Foundation of China (NSFC No. 70221001), the Knowledge Innovation Program of the Chinese Academy of Sciences, and the GRANT-IN-AID FOR SCIENTIFIC RESEARCH (No. 19500070) and MEXT.ORC (2004-2008), Japan.
References 1. Yu, L., Yue, W., Wang, S.: Network bandwidth design under uncertainty. Memoirs of Konan University, Intelligence & Informatics Series 1(1), 91–98 (2008) 2. Wu, J., Yue, W., Wang, S.: Stochastic model and analysis for capacity optimization in communication network. Computer Communications 29(12), 2377–2385 (2006) 3. Mitra, D., Wang, Q.: Stochastic traffic engineering for demand uncertainty and risk-aware network revenue management. IEEE Transactions on Networks 13(2), 221–233 (2005) 4. Dubois, D., Prade, H.: Fuzzy Sets and System: Theory and Applications. Academic Press, New York (1980) 5. Li, L., Kabadi, S.N., Nair, K.P.K.: Fuzzy models for single-period inventory problem. Fuzzy Sets and Systems 132(3), 273–289 (2002) 6. Chen, S.J., Hwang, C.L.: Fuzzy Multiple Attribute Decision Making: Methods and Applications. Springer, Berlin (1992) 7. Yager, R.R.: A procedure for ordering fuzzy subsets of the unit interval. Information Sciences 24(2), 143–161 (1981) 8. Liou, T.S., Wang, M.J.: Ranking fuzzy numbers with integral values. Fuzzy Sets and Systems 50(3), 247–255 (1992)
A Comparison of SVD, SVR, ADE and IRR for Latent Semantic Indexing Wen Zhang1, Xijin Tang2, and Taketoshi Yoshida1 1
School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1 Ashahidai, Tatsunokuchi, Ishikawa 923-1292, Japan {zhangwen,yoshida}@jaist.ac.jp 2 Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100080, P.R. China
[email protected] Abstract. Recently, singular value decomposition (SVD) and its variants, which are singular value rescaling (SVR), approximation dimension equalization (ADE) and iterative residual rescaling (IRR), were proposed to conduct the job of latent semantic indexing (LSI). Although they are all based on linear algebraic method for tem-document matrix computation, which is SVD, the basic motivations behind them concerning LSI are different from each other. In this paper, a series of experiments are conducted to examine their effectiveness of LSI for the practical application of text mining, including information retrieval, text categorization and similarity measure. The experimental results demonstrate that SVD and SVR have better performances than other proposed LSI methods in the above mentioned applications. Meanwhile, ADE and IRR, because of the too much difference between their approximation matrix and original term-document matrix in Frobenius norm, can not derive good performances for text mining applications using LSI. Keywords: Latent Semantic Indexing, Singular Value Decomposition, Singular Value Rescaling, Approximation Dimension Equalization, Iterative Residual Rescaling.
1 Introduction As computer networks become the backbones of science and economy, enormous quantities of machine readable documents become available. The fact that about 80 percent of business is conducted on unstructured information [1] creates a great demand for the efficient and effective text mining techniques, which aim to discover high quality knowledge from unstructured information. Unfortunately, the usual logic-based programming paradigm has great difficulties in capturing fuzzy and often ambiguous relations in text documents. For this reason, text mining, which is also known as knowledge discovery from texts, is proposed to deal with uncertainness and fuzziness of languages and disclose hidden patterns (knowledge) among documents. Typically, information is retrieved by literally matching terms in documents with terms of a query. However, lexical matching methods can be inaccurate when they are Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 266–274, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Comparison of SVD, SVR, ADE and IRR for Latent Semantic Indexing
267
used to match a user’s query. Since there are usually many ways to express a given concept (synonymy), the literal terms in a user’s query may not match those of a relevant document. In addition, most words have multiple meanings (polysemy and homonym), so terms in a user’s query will literally match terms in irrelevant documents. A better approach would allow users to retrieve information on the basis of the conceptual topic or meanings of a document. Latent Semantic Indexing (LSI) attempts to overcome the problem of lexical matching by using statistically derived conceptual indices instead of individual words for retrieval and assumes that there is some underlying or latent structure in word usage that is partially obscured by variability in word choice [2]. The rest of this paper is organized as follows. Section 2 introduces SVD and recently proposed LSI methods as SVR, ADE and IRR. Section 3 describes information retrieval, text categorization and similarity measure, which are practical applications of text mining used to examine the SVD-based LSI methods. Section 4 conducts a series experiments to show the performances of the SVD-based LSI methods on real datasets, which includes an English and Chinese corpus. Finally, concluding remarks and further research are given in Section 5.
2 SVD-Based LSI Methods This section introduces the SVD-based LSI methods, which include SVD, SVR, ADE and IRR. 2.1 Singular Value Decomposition The singular value decomposition is commonly used in the solution of unconstrained linear least square problems, matrix rank estimation, and canonical correlation analysis [3]. Given an m × n matrix A , where without loss of generality m ≥ n and rank ( A) = r , the singular value decomposition of A , denoted by SVD(A) , is defined as
A = UΣV T
(1)
U TU = V TV = I n and Σ = diag (σ 1,...,σ n ) , σ i > 0 for 1 ≤ i ≤ r , σ j > 0 for j ≥ r + 1 . The first r columns of the orthogonal matrices U and V define the orthogonal eigenvector associated with the r nonzero eigenvalues of AAT and AT A , respectively. The columns of U and V are referred to as the left and right singular vectors, respectively, and the singular values of A are defined as the diagonal elements of Σ which are the nonnegative square roots of the n eigenvalues of AAT . where
2.2 Singular Value Rescaling
The basic idea behind SVR is that the “noise” in original document representation vectors is from the minor vectors, that is, the vectors far from representative vectors.
268
W. Zhang, X. Tang, and T. Yoshida
Thus, we need to augment the influence of representative vectors and reduce the influence of minor vectors in the approximation matrix [4]. Following this idea, SVR adjusts the differences among major dimensions and minor dimensions in the approximation matrix by rescaling the singular values in Σ. The rationale of SVR can be explained as equation 2. A = UΣ α V T
(2)
We can see that the difference of SVR in equation 2 with SVD in equation 1 is that the singular values in Σ are added with an exponential as α . That is, we can regard α = 1 is the case in SVR for SVD. If we want to enlarge the differences among major dimensions and minor dimensions, then Σ can be properly adjusted with α more than 1. Whereas, Σ can be adjusted with α less than 1. With this method, the vectors with major semantics in documents can be augmented to distinguish themselves from noisy vectors in documents significantly. 2.3 Iterative Residual Rescaling
Most contents in this Section can be regarded as a simplified introduction of reference [5]. Briefly, IRR conjectures that SVD removes two kinds of “noise” from the original term-document matrix: outlier documents and minor terms. However, if the concentration is on characterizing the relationships of documents in a text collection other than looking for the representative documents in the text collection, that is, we do not want to eliminate the outlier documents from text collection, then, IRR can exert great use of retaining the outlier documents in the approximation matrix while eliminating the minor dimensions (terms). In details, two aspects in IRR make it different with SVD. The first one is that the document vectors will be rescaled by multiplying a constant which is the exponential to the Euclidian length of the vectors, respectively, with a common rescaling factor. By this method, the residual outlier documents after subtraction from major eigenvectors will be amplified longer and longer. The second difference of IRR from SVD is that only the left eigenvector with the largest eigenvalue will be retained as a basis vector in each of the iterations, and subtracted from the original matrix to produce the residual matrix. With these two differences, the outlier document vectors will become major vectors in the residual matrix and extracted as basis vectors to reconstruct the approximation matrix. 2.4 Approximation Dimension Equalization
Based on the observation that singular values have the characteristic of low-rank-plus-shift structure [6], ADE flattens out the first k largest singular values with a fixed value, and uses other small singular values to relatively equalize the dimension weights after SVD decomposition. ADE extends the ability of SVD to compute the singular vectors and values of a large training matrix by implicitly adding additional ones with relatively equal weights to realize "extrapolating" the singular values [7]. With this method, ADE intends to improve the performance of information retrieval because document vectors will be flattened to become more similar to each other than before. In essence, we can regard
A Comparison of SVD, SVR, ADE and IRR for Latent Semantic Indexing
269
ADE as a method of reducing the discriminative power of some dimensions while enlarging the differences of other dimensions with minor singular values, so that document vectors in a certain range will seem more similar after the ADE process, while maintaining the differences between documents in this range and other documents outside this range. More specifically, ADE equalizes the singular values in Σ of approximated SVD matrix for term-document matrix. For a matrix A with singular values Σ as shown in Equation 3, and a number k < r , we define ~
Ik = Ik +
1
σk
Σ−
1
σk
Σk
(3) ~
This diagonal matrix is illustrated graphically in Figure 1. After obtaining
Σ use it to replace k to approximate the term-document matrix by Equation 4.
Ik
, we
~
Fig. 1. Combining dimension weights to form
Ik
~
Ak = U k I k VkT
(4)
3 Experiment Design In this section, parameter settings for above SVD-based LSI methods are specified and we describe information retrieval, text categorization and similarity measure for evaluation of indexing quality. 3.1 Parameter Setting
For SVD, SVDC and ADE, the only required parameter for them to compute latent subspace is preservation rate, which is equal to k / rank ( A) , where k is the rank of the approximation matrix. In most cases of a term-document matrix A , the number of index terms in A is much larger than the number of documents in A , so we can use the number of documents in A to approximate rank ( A) for computation simplicity. Moreover, the preservation rate of ADE is the proportion of singular values in Σ to be equalized. For example, if the preservation rate is set as 0.1 for ADE, then 10 percent of
270
W. Zhang, X. Tang, and T. Yoshida
singular values in Σ with the largest values will be equalized by replacement by an identity matrix. For IRR and SVR, besides the preservation rate, they further need another parameter, a rescaling factor, to compute the latent subspace. To compare document indexing methods at different parameter settings, preservation rate is varied from 0.1 to 1.0 in increments of 0.1 for SVD, SVR and ADE. For SVR, its rescaling factor is set to 1.35, as suggested in [4] for optimal average results in information retrieval. For IRR, its preservation rate is set as 0.1 and its rescaling factor is varied from 1 to 10, the same as in [5]. The preservation rate of IRR is set as 0.1 because R s will converge to a zero matrix when i increases. That is, the residual matrix approaches a zero matrix when more and more basic vectors are subtracted from the original term-document matrix. Consequently, all the singular vectors extracted at later iterations will be zero vectors if a large preservation rate is set for IRR. 3.2 Information Retrieval
In this research, for English information retrieval, 25 queries, which are uniformly distributed across the 4 categories, are developed to conduct the task of evaluating the semantic qualities of the SVD-based LSI methods. For Chinese information retrieval, 50 queries, which are uniformly distributed across the selected 4 categories, are designed for evaluation. 3.3 Text Categorization
In the experiments, support vector machine with linear kernel is used to categorize the English (Chinese) documents in the corpora. One-against–the-rest approach is used for multi-class categorization and three-fold cross validation is used to average the performance of categorization. 3.4 Similarity Measure
The basic assumption behind similarity measure is that similarity should be higher for any document pair relevant to the same topic (intra-topic pair) than for any pair relevant to different topics (cross-topic pair). In this research, documents belonging to same category are regarded as having same topics and documents belonging to different category are regarded as cross-topic pairs. Firstly, all the document vectors in a category are taken out and document pairs are established by assembling each document vector in the category and another document vector in the whole corpus. Secondly, cosine similarity is calculated out for each document pair and then all the document pairs are sorted descending by their similarity values. Finally, formula 5 and 6 are used to compute the average precision of similarity measure. precision( pk ) =
# of intra - topic pairs p j where j ≤ k k
(5)
m
average _ precision =
∑ pi
i =1
m
(6)
A Comparison of SVD, SVR, ADE and IRR for Latent Semantic Indexing
271
Here, p j denotes the document pair that has the ith largest similarity value of all document pairs. k is varied from 1 to m and m is the number of total document pairs. The larger is the average precision, the more document pairs, in which documents are belonging to the same category, will have larger similarity values than documents pairs in which documents are in different categories. Because documents can have similarities for their similar contents or their statistical properties of identifying its categories, similarity measure is employed to measure the semantic quality and statistical quality of indexing terms synthetically.
4 Results of Experiments This section describes the experimental results of SVD, SVR, ADE and IRR on three kinds of text mining tasks: information retrieval, text categorization and similarity measure. 4.1 The Corpora
The English corpus, Reuters-21578 distribution 1.0 is used for performance evaluation of our proposed method, which is available online (http://www.research.att.com/~lewis) and can be downloaded freely. It collects 21,578 news from Reuters newswire in 1987. Since 1991, it appeared as Reuters-22173 and was assembled and indexed with 135 categories by the personnel from Reuters Ltd in 1996. In this research, the documents from 4 categories as “crude” (520 documents), “agriculture” (574 documents), “trade” (514 documents) and “interest” (424 documents) are assigned as the target English document collection. That is, 2,042 documents from this corpus are selected for evaluation. After stop-word elimination and stemming processing, 50,837 sentences and 281,111 individual words are contained in these documents. As for the Chinese corpus, TanCorpV1.0 is used as our benchmark dataset, which is available in the internet (http://www.searchforum.org.cn/tansongbo/corpus.htm). On the whole, this corpus has 14,150 documents with 20 categories from Chinese academic journals concerning computer, agriculture, politics, etc. In this dissertation, documents from 4 categories as “agriculture”, “history”, “politics” and “economy” are fetched out as target Chinese document collection. For each category, 300 documents were selected randomly from original corpus so that totally 1,200 documents were used which have 219,115 sentences and 5,468,301 individual words in sum after morphological analysis. 4.2 Results on Information Retrieval
We can see from Figure 2 that obviously, on Chinese information retrieval, SVD has the best performance among all the SVD-based LSI methods. Meanwhile, on English information retrieval, SVR outperforms all other SVD-based LSI methods. It seems that language type or document genre of the corpus has a decisive effect on performance of SVD and SVR in information retrieval. The semantic quality of SVD is improved by SVR on Chinese documents, while it is worsened by SVR on English documents. That is to say, the effectiveness of augmenting singular values in Σ to
272
W. Zhang, X. Tang, and T. Yoshida
improve semantic quality of document indexing completely depends on the specific documents to be retrieved. The performance of ADE is very stable on Chinese information retrieval at a lower level while on English information retrieval, its local maxima occur at the limits of preservation rates. Its stable performance illustrates that the singular values of ADE are indistinguishable in value from each other even at the preservation rate 0.1. However, its erratic performances in English information retrieval indicate that the semantic quality of ADE is greatly influenced by the number of singular values to be equalized. IRR, on both Chinese and English retrieval, has the poorest performance among all the SVD-based LSI methods. This outcome illustrates that document vectors indexed by IRR do not have the competitive capacity to capture semantics from documents.
Fig. 2. Performances of SVD-based LSI methods on English (left) and Chinese (right) information retrieval
Fig. 3. Performances of SVD-based LSI methods on English (left) and Chinese (right) text categorization
4.3 Results on Text Categorization
We can see from Figure3 that also SVD and SVR outperform other SVD-based LSI methods on both Chinese and English text categorization. On English corpus, SVR is better than SVD while on Chinese corpus, they have comparable performances. The
A Comparison of SVD, SVR, ADE and IRR for Latent Semantic Indexing
273
better performance of SVR over other SVD-based indexing is in that it augments the differences between singular values in Σ. These differences are made by adding an exponential more than 1.0 to the singular values in Σ. Further, it can be deduced that statistical quality of an indexing method can be improved by increasing differences between its singular values in SVD when matrix decomposition is completed. Although ADE and IRR are obviously worse than the other three SVD-based methods on Text Categorization, there are some interesting behaviors in their performances. Regarding the Chinese corpus, IRR outperforms ADE overwhelmingly, but the outcome is the opposite regarding English corpus, where IRR peaks in performance when its rescaling factor is set as 2.0. 4.4 Results on Similarity Measure
We can see Figure 4 that SVD has the best performance on both Chinese and English corpus. SVR ranks the second among all SVD-based LSI methods. That means SVR can appropriately capture relationships between documents and their corresponding categories, but it cannot characterize relationships among documents in a collection excellently. As for ADE on both Chinese and English Similarity Measure, local maxima occur in performance at preservation rates 0.1 and 1.0. At preservation rate 0.1, ADE changes very few singular values in Σ, and at preservation rate 1.0, all the singular values more than 0 in Σ are equalized as 1.0. The results of ADE on Similarity Measure indicates that the best performance of ADE can only occur at two possible preservation rates: the rates 1.0 or 0.0. For IRR, its performance on Similarity Measure is kept stable across all rescaling factors from 1.0 to 10 on both Chinese and English corpus. Thus, we can conclude that for IRR, its rescaling factor is not the dominant factor influencing its capacity on Similarity Measure.
Fig. 4. Performances of SVD-based LSI methods on English (left) and Chinese (right) similarity measure
5 Concluding Remarks In this paper some experiments are carried out to examine the effectiveness of SVD-based LSI methods comparatively on text mining with two corpora as a Chinese and an English corpus. The experimental results demonstrate that SVD and SVR are
274
W. Zhang, X. Tang, and T. Yoshida
also still better choices than other methods for latent semantic indexing. ADE and IRR can not derive satisfying performances in practical applications of text mining, because of great differences between approximation matrix and original term-document matrix in Frobenius norm. Although the experimental results have provided us with some clues on latent semantic indexing, a generalized conclusion is not obtained from this examination. Our work is on the initial step and more examination and investigation should be undertaken for more convincing work. One of research directions supporting text mining is document representation [8]. In order to represent documents appropriately, we should improve not only the statistical quality but also the semantic quality of document indexing. Thus, more attention will be concentrated on the areas of semantic Web and ontology-based knowledge management [9], especially on the work that employs ontology to describe the existing concepts in a collection of texts in order to represent documents more precisely and explore the relationships of concepts from textual resources automatically.
Acknowledgments This work is partially supported by the National Natural Science Foundation of China under Grant No.70571078 and 70221001 and by Ministry of Education, Culture, Sports, Science and Technology of Japan under the “Kanazawa Region, Ishikawa High-Tech Sensing Cluster of Knowledge-Based Cluster Creation Project”.
References 1. White, C.: Consolidating, accessing and analyzing unstructured data, http://www.b-eye-network.com/view/2098 2. Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using linear algebra for intelligent information retrieval. SIAM Review 37(4), 573–595 (1995) 3. Golub, G.H., von Loan, C.F.: Matrix Computations, 3rd edn., pp. 72–73. The John Hopkins University Press (1996) 4. Yan, H., Grosky, W.I., Fotouhi, F.: Augmenting the power of LSI in text retrieval: Singular value rescaling. Data & Knowledge Engineering 65(1), 108–125 (2008) 5. Ando, R.K.: Latent Semantic Space: Iterative Scaling Imrpoves Precision of Inter-document Similarity Measurement. In: Proceedings of SIGIR 2000, pp. 216–223 (2000) 6. Zha, H., Marques, O., Simon, H.D.: Large scale SVD and subspace-based methods for information retrieval. In: Ferreira, A., Rolim, J.D.P., Teng, S.-H. (eds.) IRREGULAR 1998. LNCS, vol. 1457, pp. 29–42. Springer, Heidelberg (1998) 7. Jiang, F., Littman, M.L.: Approximate Dimension Equalization in Vector-based Information Retrieval. In: Proceedings of the Seventh International Conference on Machine Learning (ICML 2000), pp. 423–430 (2000) 8. Zhang, W., Yoshida, T., Tang, X.J.: Text classification based on multi-word with support vector machine. Knowledge-based Systems 21(8), 879–886 (2008) 9. Zhang, W., Yoshida, T., Tang, X.J.: Using Ontology to Improve Precision of Terminology Extraction from Documents. Expert Systems with Applications (2009) (in press)
The Bilevel Programming Model of Earthwork Allocation System Wang Xianjia, Huang Yuan1, and Zhang Wuyue 1
Institute of Systems Engineering, Wuhan University, Wuhan 430072, P.R. China
[email protected] Abstract. The earthwork allocation which is common in construction projects and directly affects the quality, costs and scheduling of projects is a transportation problem with hierarchical structure. Linear programming (LP) model can not clearly reflect the characteristics of the system. Considering Bilevel Programming (BLP) is the one of useful tools for solving the problem with this structure, in this paper, the BLP model of earthwork allocation is established. The objective of upper level is that of minimizing the transportation cost. And the objective of lower level is to balance the supply and demand of earthwork in the excavation and embankment activities. In addition, a hybrid particle swarm optimization algorithm is proposed to solve the model by combining the method of particle swarm optimization (PSO) with simplex algorithm.
1 Introduction The earthwork allocation is s a transportation problem. The basic transportation problem was initially proposed by Hitchock. Koopmans (1949) put forward a method to optimize the transportation system and gave some applications. The American mathematician Dantizg (1954, 1956 and 1964) first established the linear programming model for transportation problems. Murtagh (1981) proposed an improved linear programming algorithm to solve the transportation problems. Bazaraa and Shetty (1979) proposed the nonlinear programming theory and applied it in the transportation problems. Lee, Thorne and Hill (1980) proposed a more economical method for transportation, and compared it with other methods. All of these researches are of great significance for earthwork allocation problem. In the course of engineering construction, project schedules is not fixed, but can be adjusted making use of float time within the project duration when the schedule constraints are satisfied. For the earthwork allocation system, taking into account the adjustment of schedules, the earthwork supply of excavation activities and demand of embankment activities would vary in a certain range at the relevant stages. So, the decision-maker must not only determine the optimal transportation quantities but also need to balance the supply and demand of earthwork at all stages. However, the supply and demand can not be identified by the transportation model. They are determined by the adjustment of the activity schedules. That is to say, the model of project scheduling optimization needs to be established. And this model is also affected by the costs of moving earthwork from cut sections to fill sections. Therefore, the Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 275–281, 2009. © Springer-Verlag Berlin Heidelberg 2009
276
W. Xianjia, H. Yuan, and Z. Wuyue
decision of earthwork allocations problem involves two levels. The system presents a hierarchical structure. But in the recent literatures, Moreb and Ahmad (1996), Cao et al. (2003) and Yuan (2006), the earthwork allocation is regarded as a transportation problem with the fixed supply and demand of earthwork at all stages. And the LP model which is applied to depict the problem can not reflect the hierarchical structure. Considering the bilevel programming (BLP) model is the one of useful tools for solving the problem with this structure (Wang and Feng 1995, Wang et al. 2007), we established the model of earthwork allocation system based on BLP. The objective of upper level is that of minimizing the transportation cost. And the objective of lower level is to balance the supply and demand of earthwork in the excavation and embankment activities. In addition, a hybrid particle swarm optimization algorithm is proposed to solve the model by combining the method of particle swarm optimization (PSO) with simplex algorithm.
2 The BLP Model of Earthwork Allocation System 2.1 The Model in Upper Level For the transportation problem in upper level, the linear programming model is established to determine the optimal earth-moving quantities, with the objective function of minimizing the total transportation cost subject to technological, physical and operational constraints. In this model the supply and demand of earthwork at each stage need to be optimized by the model in lower level. The project duration is divided into several stages to disassemble the dynamic transportation problem within the project duration into a class of static and continuous problem at every stage. Set F as the objective function to depict the total earth-moving cost within the project duration, and it is: nW nZ ⎛ nW nT St Rr St Rr ⎜ ∑∑ CWiT j xWiT j + ∑∑ CWi Zk xWi Zk i =1 k =1 ⎜ i =1 j =1 nS nR ⎜ nW nQ nZ nT min ∑∑ ⎜ +∑∑ CWiQl xWSti QRrl + ∑∑ CZk T j xZSkt RTrj t =1 r =1 ⎜ i =1 l =1 k =1 j =1 ⎜ nL nT nL nZ St Rr ⎜+ C x + CLm Zk xLSmt RZrk ∑∑ LmT j LmT j ⎜ ∑∑ m =1 k =1 ⎝ m =1 j =1
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
(1)
xWStiTRjr , xWSti RZrk , xWStiQRrl , xZStkRTrj , xLSmt RTrj , xLSmt RZrk are decision variables. The constraint conditions are as follows. (a) Supply constraint of excavation activity At stage t, the amount of moving earthwork from excavation activity i to all of receivers should be equal to the supply of excavation activity i. The supply of earthwork is the decision variable of the model in lower level.
The Bilevel Programming Model of Earthwork Allocation System nR
nT
r =1
j =1
∑ (∑ x
St Rr WiT j
nZ
nQ
+∑x
St Rr Wi Z k
k =1
277
+ ∑ xWStiQRrl ) = yWSti l =1
(2)
(b) Demand constraint of embankment activity. At stage t, the amount of moving earthwork from all of providers to embankment activity j should be equal to the demand of embankment activity j. The demand of earthwork is the decision variable of the model in lower level. nR
nW
nZ
nL
r =1
i =1
k =1
m =1
∑ (∑ xWStiTRjr + ∑ xZStkRTrj + ∑ xLSmt RTrj ) = yTSjt
(3)
(c) Equilibrium constraints of transfer site. The quantity of earthwork stored in transfer site k at the beginning of stage t+1 should be equal to the sum of the quantity of earthwork stored in transfer site k at the beginning of stage t and the quantity of earthwork moved in or out of transfer site k at stage t. nR
nW
nL
nT
r =1
i =1
m =1
j =1
AZSkt + ∑ (∑ xWSti RZrk + ∑ xLSmt RZrk − ∑ xZSkt RTrj ) =AZSkt +1
(4)
(d) Capacity constraint of borrow site. The amount of material moved from borrow site m to embankment activity j and transfer site k should be equal to or less than the material available in borrow site m. nS
nR
nT
nZ
j =1
k =1
∑∑ (∑ xLSmt RTrj + ∑ xLSmt RZrk ) ≤ VLm t =1 r =1
(5)
(e) Capacity constraint of transfer site. The amount of earthwork stored in transfer site k at stage t-1 plus the amount of earthwork moved from excavation activity i and borrow site m to transfer site k at stage t should be equal to or less than the capacity of transfer site k. nR
nL
nR
nW
(∑∑ xLSmt RZrk + ∑∑ xWSti RZrk ) + AZSkt−1 ≤ VZk r =1 m =1
(6)
r =1 i =1
(f) Supply constraint of transfer site. At stage t, the amount of earthwork moved from transfer site k to embankment activity j should be equal to or less than the storage quantity in transfer site k. nR
nT
∑∑ x r =1 j =1
St Rr ZkT j
≤ AZSkt
(7)
(g) Zero storage quantity constraint of transfer site. In order to improve the efficiency of earthwork allocations, we hope that the storage quantity of transfer site k is zero when the project is completed.
278
W. Xianjia, H. Yuan, and Z. Wuyue nS
nR
nL
∑∑ (∑ x t =1 r =1
m =1
St Rr Lm Z k
nW
+∑x i =1
St Rr Wi Z k
nR
nT
−∑∑ xZStkRTrj ) = 0 r =1 j =1
(8)
(h) Capacity constraint of landfill site. The amount of earthwork moved from excavation activity i to landfill site l should be equal to or less than the capacity of landfill site l. nS
nR
nW
∑∑∑ x
St Rr Wi Ql
t =1 r =1 i =1
≤ VQl
(9)
(i) Nonnegative constraint.
xWStiTRjr , xWSti RZrk , xWStiQRrl , xZStkRTrj , xLSmt RTrj , xLSmt RZrk ≥ 0
(10)
2.2 The Model for in Lower Level In order to balance the supply and demand of earthwork at all stages, we choose the sum of square value of the difference between the total supply of all excavation activities and the total demand of all embankment activities at each stage as the standard to evaluate the matching degree. However taking account that the costs of moving earthwork from cut sections to fill sections would influence the matching degree, we bring in βt, the average unit cost of moving earthwork from cut sections to fill sections at a stage, to reflect the effect of the transportation cost on the project scheduling optimization. Set f is the objective function, and it is: nS
nW
t =1
i =1
nT
2
min ∑ ( β t (∑ y − ∑ y ) ) St Wi
j =1
St Tj
(11)
Where nR
nW
nT
βt = ∑∑∑ CW T x r =1 i =1 j =1
i j
St Rr WiT j
nR
nW
nT
∑∑∑ x r =1 i =1 j =1
St Rr WiT j
(12)
yWSti , yTSjt , are decision variables. We can change the actual starting time of the excavation and embankment activities to adjust the activity schedules by taking advantage of the float time of the activities. For the excavation and embankment activities, the supply and demand of earthwork at a stage would vary with the variation of their actual starting time. The relation between them can be expressed by piecewise functions. If the duration of a stage is shorter than the duration of excavation activity i, then, the piecewise function is as follows: Where PT < DWi
The Bilevel Programming Model of Earthwork Allocation System
⎧0, S * PT − AS ≤ 0 Wi ⎪ t ⎪( St * PT − ASW ) * EW , 0 < St * PT − ASW ≤ PT i i i ⎪⎪ St yWi = ⎨ EWi * PT , PT < St * PT − ASWi ≤ DWi ⎪ ⎪ ⎡⎣ PT − ( St * PT − AS wi − DWi ) ⎤⎦ * EWi , DWi < St * PT − ASWi ≤ DWi + PT ⎪ ⎪⎩0, DWi + PT < St * PT − ASWi
279
(13)
If the duration of a stage is equal to or longer than the duration of excavation activity i, then, the piecewise function is as follows: Where PT ≥ DWi
⎧0, S * PT − AS ≤ 0 Wi ⎪ t ⎪( St * PT − ASW ) * EW , 0 < St * PT − ASW ≤ DW i i i i ⎪⎪ yWSti = ⎨ EWi * DWi , DWi < St * PT − ASWi ≤ PT ⎪ ⎪ ⎡⎣ PT − ( St * PT − AS wi − DWi ) ⎤⎦ * EWi , PT < St * PT − ASWi ≤ DWi + PT ⎪ ⎪⎩0, DWi + PT < St * PT − ASWi
(14)
The relation between the demand of embankment activity j and its actual starting time can also be expressed by the similar piecewise functions. The constraint conditions are as follows. (a)
The actual starting time of each activity must be between its earliest starting time and its latest starting time.
ESWi ≤ ASWi ≤ LSWi EST j ≤ AST j ≤ LST j (b)
(15)
The actual starting time of each activity must be equal to or later than the latest one of the actual starting time of its all preceding activities. a ASWi ≥ max( AS PW ) i b AST j ≥ max( AS PT ) j
(16)
3 The Algorithm for the Model The model of earthwork allocation system established above is a nonlinear BLP model. It is hard to obtain the global optimal solution. In this paper, a hybrid PSO algorithm is proposed to solve the model by combining the method of PSO with
280
W. Xianjia, H. Yuan, and Z. Wuyue
simplex algorithm. The PSO algorithm is designed to operate in the lower level problem (Eberhart and Kennedy 1995) and the simplex algorithm is employed to solve the upper level problems. The hybrid PSO algorithm can be stated as follows: Step 1. Set k=1, F* equals a Max Value. According to the constraints of the upper level problem, generate an initial solution x’k. Step 2. take x’k into the lower level model and generate the initial particles; by PSO algorithm, obtain the global best particle yk. Step 3. take yk back to the upper level model, by simplex algorithm, Obtain the solution xk and the value Fk. if Fk< F*, then set x*= xk, y*= yk, F*= Fk.. Step 4. if the F* has not been improved for continuous μ iterations, the optimal solution of the bilevel programming model is x*, y*. The optimal value is F*. The hybrid PSO algorithm terminate; else, set k=k+1, and generate another initial solution x’k in the upper level model and go back to step 2. Making use of the LINGO package, we realize the Hybrid PSO algorithm in the Microsoft® Visual C++ 6.0.
4 Application and Conclusions Consider a hydroelectric project in China, which involves the earthwork allocations problem. The BLP model is established and by the hybrid PSO algorithm proposed above the optimum distribution of earthwork is determined. We also established the LP model to solve the problem. The total cost optimized by the BLP model is 50,219,020, while the total cost optimized by the LP model is 69,160,640. The BLP model cut the cost by 27.4% compared with the LP model. And according to the optimization of LP model, the amount of earthwork moved from excavation activities to embankment activities takes up 60% of the total amount of fill, while according to the optimization of BLP model, the amount of earthwork moved from excavation activities to embankment activities takes up 77.3% of the total amount of fill. Obviously, the BLP model is more effective and economical to solve the earthwork allocations problem comparing with the LP model. The earthwork allocation which is common in construction projects and directly affects the quality, costs and scheduling of the project is a transportation problem with hierarchical structure. If it is regarded as a single transportation problem and solved with linear programming model, the characteristic of hierarchical structure can not be depicted. In this paper, a BLP model is established to describe the hierarchical structure of the problem. In the upper level, the objective is to minimize the transportation costs. In the lower level, the objective is to balance the supply and demand of earthwork in the excavation and embankment activities. Eventualy, the hybrid PSO algorithm is proposed to solve the model by combining the method of PSO with simplex algorithm.
¥
¥
Acknowledgment. This work was supported by National Natural Science Foundation of China (Granted No. 60574071).
The Bilevel Programming Model of Earthwork Allocation System
281
References Cao, S.R., Wang, X.J., Shen, M.L.: Systems Analysis and Constitution of Linear Programming Model for the Earth-rock Allocation System of Massive Water Resources and Hydropower Project. Engineering Sciences 5(7), 72–76 (2003) Dantzing, G.B.: Variables with Upper Bounds in Linear Programming. The RAND Corporation, Calif. (1954) Dantzing, G.B., et al.: A Primal-Dual Algorithm. The RAND Corporation, Calif. (1956) Dantzing, G.B., Johnson, D.L.: Maximum Payloads per Unit Time Delivered Through an Air Network. Operation Research 12(2), 232–248 (1964) Eberhart, R.C., Kennedy, J.: A New Optimizer Using Particle Swarm Theory. In: Proceedings Sixth Symposium on Micro Machine and Human Science, pp. 39–43 (1995) Kennedy, J., Eberhart, R.C.: Particle Swarm Optimization. In: IEEE International Conference on Neural Networks, pp. 1942–1948 (1995) Koopmans, T.C.: Optimum Utilization of the Transportation System. Econometrics 17(suppl.), 53–66 (1949) Lee, T.H., Thorne, D.H., Hill, E.E.: A Transportation Method for Economic Dispatching Application and Comparison. IEEE Trans on PAS 99, 2373–2385 (1980) Moreb, A.A.: Linear Programming Model for Finding Optimal Roadway Grades that Minimize Earthwork Cost. European Journal of Operational Research 23(8), 148–154 (1996) Mokhtar, S., Bazaraa, et al.: Nonlinear Programming Theory and Algorithms. Wiley, New York (1979) Murtagh, B.A.: Advanced Linear Programming, Computation and Practice. McGraw Hill, New York (1981) Wang, X.J., Feng, S.Y.: Optimal Theory of Bilevel System. Science Press, Beijing (1995) Wang, G.M., Wan, Z.P., Wang, X.J.: Bibliography on Bilevel Programming. Advances in Mathematics 36(5), 513–529 (2007) Yuan, J.F.: The Application Research on Linear Programming of the Distribution of Earth & Rock Works on the Right Bank of TGP. Journal of Hydroelectric Engineering 25(1), 99–103 (2006)
Knowledge Diffusion on Networks through the Game Strategy Shu Sun, Jiangning Wu, and Zhaoguo Xuan Institute of Systems Engineering, Dalian University of Technology, Dalian, P.R. China 116024
[email protected] Abstract. In this paper, we develop a knowledge diffusion model in which agents determine to give their knowledge to others according to some exchange strategies. The typical network namely small-world network is used for modeling, on which agents with knowledge are viewed as the nodes of the network and the edges are viewed as the social relationships for knowledge transmission. Such agents are permitted to interact with their neighbors repeatedly who have direct connections with them and accordingly change their strategies by choosing the most beneficial neighbors to diffuse knowledge. Two kinds of knowledge transmission strategies are proposed for the theoretical model based on the game theory and thereafter used in different simulations to examine the effect of the network structure on the knowledge diffusion effect. By analyses, two main observations can be found: One is that the simulation results are contrary to our intuition which agents would like to only accept but not share, thus they will maximize their benefit; another one is that the number of the agents acquired knowledge and the corresponding knowledge stock turn out to be independent of the percentage of those agents who choose to contribute their knowledge. Keywords: Knowledge diffusion; Game strategy; Network.
1 Introduction The ability to transmit knowledge effectively among individuals is important for organizational knowledge sharing and creation. According to some scholars, such ability represents a distinct source of competitive advantage for organizations over other institutional arrangements such as markets (Kogut and Zander, 1992). Hence, effective and efficient knowledge diffusion has become one of the crucial issues for most organizations. Knowledge diffusion modeling is the first step of the study. Here the network structure is adopted to simulate individuals transmitting knowledge with each other in the real world, where individuals namely agents are viewed as the nodes of the network and the edges are viewed as the existed social relationships for their mutual knowledge transmission. This work mainly focuses on the interplay between the network architecture and knowledge diffusion effect through different exchanging means based on the game strategy. In previous studies, Cowan and Jonard’s model (2005) is typical and well-known. In their work, the behavior of knowledge diffusion on Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 282–289, 2009. © Springer-Verlag Berlin Heidelberg 2009
Knowledge Diffusion on Networks through the Game Strategy
283
networks were modeled and analyzed from only one aspect — the difference of the knowledge determines whether the knowledge diffuse among pairs of agents or not. But the psychological factor involved in two sides of knowledge exchange which may influence the transmission effect to a certain extent had not been mentioned in Cowan’s model. From this point of view, we develop a new theoretical model for knowledge diffusion by means of the game theory, in which two kinds of choosing strategies for agents are proposed through the defined benefit functions. Some simulations regarding three properties of the network structure, i.e. the reconnected probability, the number of the nearest neighbors, the network scale and one psychological factor, i.e. the percentage of knowledge contributors, have been done to examine relationships between the network structure and the knowledge diffusion effect. The simulation results provide some evidence that different network structures and the agents’ behavior do influence the “quantity” or “quality” of knowledge diffusion.
2 Theoretical Model Let I={1,…,N} denote a finite set of agents. For any i, j ∈ I , define the binary variable γ(i, j) to take the value γ(i, j)=1 if a connection exists between i and j , and γ(i, j)=0, otherwise. The network G={γ(i, j); i, j∈I}denotes the relationships between agents. Let Nt={j∈I: γ(i, j)=1} be the set of i ’s neighbors and the number of i ’s neighbors is Ni = ni . The small-world network proposed by Watts and Strogatz in 1998, namely WS small-world network, is applied as a basic model in the work. By creating the regular periodic lattice with n nearest neighbors, the edge of the network with probability p is disconnected to one of its vertices, and connected to a vertex that is chosen uniformly at random, meanwhile avoiding vertices being self-connected and two vertices being connected more than once. Suppose that agents in the above network are permitted to interact only with their neighbors repeatedly whose action can influence the agents’ decisions on knowledge exchange. Two kinds of choosing strategies are defined for agents in terms of the benefit function. Each agent i can choose one of the two strategies δ i ∈ { X , Y } , where X means giving its own knowledge to other agents, and Y means not. Assume that agent i has ni , X neighbors who prefer to share their knowledge by means of the X strategy and ni ,Y neighbors by means of the Y strategy. The knowledge stock of agent i is characterized by a variable Qi ( t ) that changes over time. Each time two kinds of neighbors are taken into account for a given agent: contributing knowledge or not, the same to the agent itself. After communicating with each other, the benefit of the agent together with its neighbor occurs in the following four cases, where the benefit function is defined as H i (δ i , δ j ) measuring the balance between the benefit of absorbing knowledge and the corresponding cost (such as time, energy and financial resources). Cases 1 and 2: For agent i choosing strategy X. If one of i’s neighbors chooses X either, there will be a cost for agent i when transmitting its own knowledge meanwhile
284
S. Sun, J. Wu, and Z. Xuan
acquiring knowledge from the other. In this case, the total benefit is set to be as Hi(X, X)=a; if chooses Y, there will only be a loss for agent i, and the benefit is Hi(X, Y)=−b. Cases 3 and 4: For agent i choosing strategy Y. If one of i’s neighbors chooses X, there will be no loss but only benefit Hi(Y, X)=c; if chooses Y, there is no action at all, and the benefit is Hi(Y, Y)=d. The payoff table is P2 P1
X
Y
X
a, a
-b,c
Y
c,-b
d, d
For above four cases, it is clear that d=0 and a=c−b. Therefore the benefit of agent i at time t can be given by ⎧⎪b ⋅ ni ,Y + d ⋅ ni , X , δ i = X . Πi (t ) = ⎨ ⎪⎩a ⋅ ni ,Y + c ⋅ ni , X , δ i = Y
(1)
According to Equation (1), the benefit of the agent for diffusing knowledge mainly depends on the strategy chosen by its neighbor. No matter what strategy the current agent chooses, as long as its neighbor preferring to give its knowledge, the knowledge stock of the agent will be increased. On the contrary, when the neighbor doesn’t like to share its knowledge with each other, there will be no change for the knowledge stock of the agent. In this case, the knowledge stock of agent i can be defined as ⎧⎪ Qi ( t ) + ω1ni , X , δ i = X , Qi ( t + 1 ) = ⎨ ⎪⎩ Qi ( t ) + ω 2 ni , X , δ i = Y
(2)
There exists ω1 > ω2 due to the fact that if one agent wish to share its knowledge, then its neighbors with the same will would give back the agent more knowledge.
3 Simulation Results To evaluate the theoretical model, simulation experiments on the knowledge diffusion effect are carried out with respect to three properties of the network structure, they are the reconnected probability p, the number of the nearest neighbors n, the network scale N and one psychological factor, i.e. the percentage of the initial knowledge contributors P0.In simulations, each factor is examined independently. 3.1 The Influence of the Reconnected Probability For the network with N=3000 and n=4, Figure 1 shows two cases corresponding to different reconnected probabilities 0.001 and 0.1 respectively, in which Figure (a) illustrates the trends of the percentage of knowledge contributors Pc and Figure (b) represents the distribution of knowledge stock Q. Both parameters Pc and Q can reflect the knowledge diffusion effect. For instance, as the status becomes steady-going as shown in Figure 1 (a), we find that more agents choose the strategy X in the case of
Knowledge Diffusion on Networks through the Game Strategy
285
p=0.001 than p=0.1. From Figure 1 (b), we find that when p=0.1, nearly 800 agents acquire knowledge although most of their knowledge stock growth is stopping at 0.15; on the other hand, when p=0.001, there are no more than 400 agents obtaining knowledge, but their knowledge stock are significantly higher.
(a)
(b)
Fig. 1. The influence of reconnected probability
From the above simulation results, we can say that the number of agents who choose to give their knowledge has nothing to do with the number of agents who acquire knowledge. However, agents who like to contribute knowledge will lead to the higher knowledge stock. Now, the question is coming “which strategy is better for knowledge diffusion?”. That depends on the concerned problem ⎯ “quantity” or “quality”. Concerning quantity, it needs more agents to obtain knowledge, so p=0.1 is appropriate. Concerning quality, it needs some agents to own more knowledge stock, in other words, some “experts” are needed, in this case, p=0.001. 3.2 The Influence of the Number of the Nearest Neighbors The influence about the number of the initial nearest neighbors can be seen in Figure 2, in which the network scale is N=3000 and the reconnected probability is p=0.001. The numbers of the initial nearest neighbors are set to be n=4 and n=6 respectively. In the earlier fifty rounds, the percentage of knowledge contributors regarding n=4 is higher than n=6, but after that round, curves change over. When the stable situation was reached, all the agents choose to give their own knowledge to others with n=6, but for n=4, there’s no distinct change about the percentage of knowledge contributor by comparing initial and stable situations. When n=4, the total number of agents with the increased knowledge stock is about 450, and most of these agents’ knowledge stock grows up at the range of 0.8 and 1; when n=6, the total number of agents with higher knowledge stock is around 900, and nearly all of their knowledge stock are 2. So, from the case of n=6, we know that although all of the agents choose to contribute their knowledge, not all of them could acquire knowledge. In this case, regardless of concerning “quantity” or “quality”, n=6 is a good choice. That means the more the numbers of the nearest neighbors, the better for knowledge diffusion.
286
S. Sun, J. Wu, and Z. Xuan
(a)
(b)
Fig. 2. The influence of the initial nearest neighbors’ number
Since there exists Hi(Y, X) > Hi(X, X) > Hi(Y, Y) > Hi(X, Y), it can be concluded that agents are willing to choose Y so as to obtain more benefit. But simulation results show that sometimes all agents would prefer to choose X. It results from the selection mechanism we used. For instance, if some agents always choose Y to get more benefit, their neighbors would like to choose the strategy with the best benefit till all of them choose Y. When this situation happens, there will be no benefit. Contrarily, those agents who would like to share knowledge with each other will get more benefit. Therefore, agents will give up strategy Y and choose strategy X instead by the selection mechanism mentioned before. When n becomes bigger, i.e. there are more nearest neighbors, all the agents would like to contribute their knowledge at last. 3.3 The Influence of the Network Scale Figure 3 shows the simulation results in terms of the network scale, in which N=1000 and N=3000 respectively. The other two parameters regarding the network structure are fixed at n=4 and p=0.001 respectively.
(a)
(b)
Fig. 3. The influence of the network scale
Figure 3 (a) shows the network scale’s influence on the percentage of knowledge contributors. Before the first 100 rounds, the difference between two network scales is not obvious. After that, there is big gap. It indicates that when the network scale is
Knowledge Diffusion on Networks through the Game Strategy
287
1000, nearly all the agents can acquire knowledge, and most of them can get the higher knowledge stock. From Figure 3 (b), we can see that when N=1000, nearly 1000 agents can acquire knowledge, and most of their knowledge stock are stopping at 0.65; when p=0.001, there are about 300 agents obtaining knowledge, and their knowledge stock are very lower. So in general cases, when considering both “quantity” and “quality” of knowledge, the small scale network will do more help for knowledge diffusion; but when considering the higher knowledge stock, the large scale network seems good. 3.4 The Influence of the Percentage of the Initial Knowledge Contributors Here we consider the influence of the percentage of the initial knowledge contributors whose values being P0=0.5 and P0=0.7 respectively. The other three parameters keep N=3000, n=4 and p=0.001. Simulation results as shown in Figure 4 show that different Pc generate different steady statuses. Although both of them go down greatly at the first round, the percentage of knowledge contributors will increase rapidly and reach the high position if there are more agents who like to contribute their knowledge in the initial status. From Figure 4 (b), we find that the total number of agents obtained knowledge is about 400 with P0=0.5, and most of these agents’ knowledge stock is between 0.7 and 0.8; when P0=0.7, the total number of agents obtained knowledge is around 1000, and nearly all of their knowledge stock is 1.1. So, it is clear to see that no matter what we concern “quantity” or “quality” of knowledge, P0=0.7 is a good choice. That means the larger the percentage of the initial knowledge contributors, the better for knowledge diffusion. This factor implies a kind of psychological effect. We know that if most of the agents wish to contribute their knowledge, then the total knowledge stock of all agents will reach a high level as a result more agents being capable of acquiring knowledge.
(a)
(b)
Fig. 4. The influence of the initiatory contributor percentage
To summarize from the first charts of Figures 1 to 4, an unusual phenomena can be found that there is a sudden drop of the percentage of knowledge contributors at the first round, and then this percentage grows up gradually until the steady status. This is because at the beginning some of agents who select strategy Y would have the best benefit, and hence most of agents follow this way as a result. But hereafter, those agents
288
S. Sun, J. Wu, and Z. Xuan
selected strategy Y have not got the benefit, so they change their strategies by X which results in the increase of the percentage of knowledge contributors.
4 Conclusions We have investigated the influence of the network structure as well as the psychological factor on knowledge diffusion with the proposed knowledge exchange strategies. One interesting finding is that simulation results are contrary to our intuition which agents would like to only accept but not share, thus they will maximize their benefit. But agents are not always making decision in this way, for instance, when the number of the nearest neighbors is larger, then all the agents would like to contribute their knowledge; Another finding is that, the number of agents who acquire knowledge and the corresponding knowledge stock turn out to be independent of the percentage of those agents who choose to contribute their knowledge, i.e., there is no directly relationships between them. By simulation results and analyses above, we can draw conclusions as following: firstly, lower connected probability, more nearest neighbors’ number, larger network scale and higher initial contributor percentage, such properties of the network structure will be helpful to promote the knowledge stock.; secondly, higher connected probability, larger nearest neighbors’ number, smaller network scale and higher initial contributor percentage will be beneficial to transmit knowledge widely. Except the structure of the network, the agent’s good will to contribute knowledge is also beneficial to knowledge diffusion in the whole network.
Acknowledgment This work has been supported by the Natural Science Foundation of China (NSFC) under Grant No. 70771019.
References 1. Cowan, R., Jonard, N.: Network structure and the diffusion of knowledge. Journal of Economic Dynamics & Control 28, 1557–1575 (2004) 2. Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002) 3. Ravasz, E., Somera, A.L., Mongru, D.A., Oltvai, Z.N., Barabási, A.: Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555 (2002) 4. Kim, B., Trusina, A., Holme, P., Minnhagen, P., Chung, J.S., Choi, M.Y.: Dynamic instabilities induced by asymmetric influence: Prisoners’ dilemma gamein small-world networks. PRE 66 (2002) 5. Chen, P.: Imitation, learning, and communication: central or polarized patterns in collective actions. In: Self-Organization, Emerging Properties and Learning, pp. 279–286. Plenum Press, New York (1991) 6. Berninghaus, S.K., Ehrhart, K.M., Keser, C.: Conventions and Local Interaction Structures: Experimental Evidence. Games and Economic Behavior 39, 177–205 (2002)
Knowledge Diffusion on Networks through the Game Strategy
289
7. Wagner, C.S.: Network structure, self-organization, and the growth of international collaboration in science. Research Policy 34, 1608–1618 (2005) 8. Kogut, B., Zander, U.: What Firms Do? Coordination, Identity, and Learning. Organization Science, 502–518 9. Beckmann, M.J.: Knowledge networks in science: collaboration among equals. In: The Annals of Regional Science, pp. 233–242. Springer, Heidelberg 10. Ping, L., Liu, L., Wu, K., Leung, W.K.: Interleave division multiple-access. IEEE Transaction on Wireless Communications, 938–947 (2006) 11. Camerer, C.F., Knez, M.: Coordination in Organizations: A Game Theoretic Perspective. In: Organizational Decision Making, pp. 158–188. Cambridge Univ. Press, Cambridge (1996)
The Analysis of Complex Structure for China Education Network Zhu-jun Deng and Ning Zhang
Abstract. We collected the data of the documents and their links of China Education and Research Network’s which construct the complex directed network China Education Network (CEN) with large amount of documents with their edges (URLs). This paper analyzes some statistical properties, including degree distributions, average path length, clustering coefficient, and the community structure of China Education Network basing on the practical data. By analyzing the practical data, we found that the in-degree and out-degree distribution of the CEN has power-law tail and the network displays both properties of small world and scale free. The CEN has a considerably small average path length and its clustering coefficient is in the mediate. As a large scale complex network, China Education Network clearly present its community structure in which the colleges in a school constitute communities generally with a large modularity. Keywords: complex directed network, scale-free, topological, community structure.
1 Introduction The World Wide Web (WWW or Web) has revolutionized the way we access information. By April 2001, the Web is estimated to have over 4 billion pages, more than 28 billion hyperlinks, and is growing rapidly at the rate of 7.3 million pages a day (Moore and Murray, Kleinberg et al). China Education and Research Network is the second biggest Internet of China. More than 1300 universities, colleges and institutes are connected to the network. Even though, it’s still seemed to be very small for the WWW. But it is of great importance for us to analyze CEN so as to better understand the Web’s evolution and characteristics. In this paper, we use the WWW spider to search the pages in the China Education Network and get more than 2.5 million documents with 31 million URLs, which compare to the data got in 2004 has changed tremendously. Not only had the scale of the network enlarged, some of the characters of the network also changed. The data we use in this paper is collected in January, 2008. However, the statistics of the CEN is only a part of the vast body which is still evolving. In the China Education Network, we see each page as a node and their links between pages as edges which construct the China Education Network. There are many statistical properties consist in the scale free-network, This essay analyzes some of these properties such as degree distribution average path length clustering coefficient and the community structure.
、
﹑
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 290–295, 2009. © Springer-Verlag Berlin Heidelberg 2009
The Analysis of Complex Structure for China Education Network
291
The rest of this paper is organized as follows. Section 2 provides the analysis of out-degree distribution and in-degree distribution of the CEN. Section 3 presents the clustering coefficient of the CEN, while Section 4 provides the average path length of the CEN. Lastly, Section 5 describes the community structure underlying the CEN and Section 6 concludes and gives the future works.
2 Degree Distribution Node’s degree refers to the number of nodes which connect to this node in the network. That is to say it’s the node’s edges total number. In the China Education Network, there are both outgoing links and incoming links. For directed network each node has both an in-degree and an out-degree, which are the number of incoming links and outgoing links incident on this node respectively. We used the data collected to determine the probabilities Pout (k ) and Pin ( k ) that a document has k outgoing and incoming links, respectively. In the network with the probability gree
P(ki ) the de-
ki of a node i is calculated by the following formula: p ( ki ) = k i
∑k
j
(1)
j
Accordingly, we get the figures below indicating the in-degree distribution and out-degree distribution.(see Fig. 1 and 2) We find that the both the in-degree distribution and out-degree distribution, pout ( k ) and pin ( k ) , follow a power law over several orders of magnitude with a heavy tail. Their
rout and rin values are 3.2 and 2.1,
respectively.
Fig. 1. Distribution of outgoing links
292
Z.-j. Deng and N. Zhang
Fig. 2. Distribution of incoming links
3 Clustering Coefficient Clustering coefficient refers to the average probability that those nodes which links to the same node link to each other ( 0 ≤ C ≤ 1 ). It reflects the tendency that nodes build up to groups in the network. That is to say, it’s used to test how much the friends of your friend are likely to be your friend in social networks. There are many investments about clustering coefficient in the small-world network. A widely used definition has been given by Watts and Strogatz is as follow: Ci =
number of triangles connected to vertex i number of triples centered on vertex i
(2)
For vertices with degree 0 or 1, for which both numerator and denominator are zero, we put Ci = 0 . Then the clustering coefficient for the whole network is the average of
Ci C=
1 n
n
∑C
i
(3)
i =1
We get the clustering coefficient 0.4995082, without taking the direction of China Education Network into consideration, while the clustering coefficient of random −7
networks with the same size is less than 10 , far less than that of China Education Network. In this way, we can see CEN as a network with high congregate, in which each page links to other pages and their link-to pages.
4 Average Path Length In order to get the average path length of the network, we should first calculate the shortest distance between node pairs in China Education Network. The China Education Network contains 2528708 nodes, in which all the possible shortest paths
The Analysis of Complex Structure for China Education Network
293
is 2528708 × (2528708 − 1) , approximately 6 × 10 . The calculate method is as follow: 12
For two nodes i ,
j in the same connected component, lij is the minimum length
of path between them. The average path length l is the average value of all lij . By using the parallel algorithm (MPI), we got the average path length l close to 14.95206 and the average path length in each school is 7.86. It’s very small for a 2 million large network. That means we can link to the other documents from one documents in the network by nearly 17 steps and from one document to another document in the same school only need to take about eight steps.
5 Community Structure As the deeper investment of physical meaning and mathematical characters of networks property, it is found that it’s popular to see community structure in complex network. Community means the existing of groups and teams in networks, in which the nodes in a same group connect with each other closely and the nodes in the different groups have rarely links. And networks are constructed by those groups. In another words, community is a part of the networks (nodes and their corresponding edges) in which the nodes connect with each other more closely. In this essay, we divided the China Education Network into many part according to their region basing on the notion that the schools in the same region are more likely to link to each other and 211 and 986 schools are more likely to link to each other. So we get eleven sub-graph which are pages and their links in 211985a 211985b 211985c 211only Northeast Northern_China Central_China Eastern_China Southern_China Northwest and Southwest.
、
、
、 、
、
、
Fig. 3. A part of CEN
、
、
、
294
Z.-j. Deng and N. Zhang
On the hypothesis before, we bring forward an algorithm to analyze the community structure of CEN. And we get very high modularity around 0.9. Generally, we can consider that the networks with a higher modularity have a more obviously community structure. Therefore, China Education Network is a complex network with boldly community structure, in which the pages in different college of each school constitute the communities. The communities constitute by the pages in different schools are rarely. From the experimental result, we got 6891 communities from the eleven subgraphs, among which there are considerably large communities with more than 77000 nodes and there are still considerably small communities with less than ten nodes. Here we give a sub-graph of one of the eleven parts (see Fig.3). We can see from the figure that the community structure is very distinctive, in which the sub-graph can be considered to of two communities. After the check of those nodes, we found the nodes in the same color are all belong to the same college.
6 Conclusions and Future Work Lots of real networks are reported to be scale free, i.e. its degree distribution P ( k ) is in power-law. From the above analyze, the property of scale free can be detected in the China Education Network and it is the same as for the property of small world. By studying the topological properties of China Education Network, we found that the network has a power-law out-degree and in-degree distribution, a small average path length, a big cluster coefficient and obviously community structure. We can realize that the China Education Network is a large scale network with great coherence as the home page of each school is the core. As the network is still evolving, there are still much we need to do. Fist of all, we need to update the data timely. Also we need to improve our software and hardware to get better cover with the China Education Network. At the same time, it is clear that there is much to be done in understanding relations of structures and system behaviors of large networks like China Education Network. Acknowledgments. This work was supported by the National Natural Science Foundation of China (Grant No.70571074), Shanghai Leading Academic Discipline Project (S30501) and the Natural Science Foundation of Shanghai (06ZR14144).
References Moore, A., Murray, B.H.: Sizing the web Cyveilliance Inc. White Paper (2000) Kleinberg, J., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: The web as a graph: measurements, models and methods. In: Asano, T., Imai, H., Lee, D.T., Nakano, S.-i., Tokuyama, T. (eds.) COCOON 1999. LNCS, vol. 1627, pp. 1–17. Springer, Heidelberg (1999) Juyong, P., Newman, M.E.J.: The statistical mechanics of networks. Phys. Rev. E (2004) Newman, M.E.J.: Detecting community structure in networks. Eur. Phys. J. B 38, 321–330 (2004) Laherrere, J., Sornette, D.: Stretched exponential distributions in nature and economy: "fat tails" with characteristic Scales. Eur. Phys. J. B. 2, 525–539 (1998)
The Analysis of Complex Structure for China Education Network
295
Bollobás, B.: Degree sequences of random graphs. Discrete Math. 33, 1–19 (1981) Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393, 440– 442 (1998) Xiaojun, N., Ning, Z., Meijuan, W.: Parallel algorithm (MPI) on solving the shortest-path problem of china educational network. Computer engineering and Applications 42(12) (2006) Boccaletti, S., Ivachenko, M., Latora, V., Pluchino, A., Rapisarda, A.: Phys. Rev. E 75, 045102 (2007) Girvan, M., Newman, M.E.J.: Proc. Natl. Acad. Sci. USA 99, 7821 (2002) Hopcroft, J., Khan, O., Kulis, B., Selman, B.: Proc. Natl. Acad. Sci. USA 101, 5249 (2004) Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Proc. Natl. Acad. Sci. USA. 101, 2658 (2004) Capocci, A., Servedio, V.D.P., Caldarelli, G., Colaiori, F.: Phys. A 352, 669 (2005) Latapy, M., Pons, P.: Proc. 20th Intl. Sympo. Comp. and Inf. Sci., 284–293 (2005) arXiv:physics/0512106 Eisler, Z., Kertesz, J.: Phys. Rev. E 71, 057104 (2005) arXiv:physics/0512106 Arenas, A., Guilera, A.D., Vicente, C.J.P.: Phys. Rev. Letts. 96, 114102 (2006) Arenas, A., Fernandez, A., Gomez, S. (2008) arXiv:physics/0703218 Yang, S.J.: Phys. Rev. E 71, 016107 (2005) Albert, R., Barabási, A.-L.: Statistical Mechanics in Complex Networks Rev. Mod. Phys. 74, 47–97 (2002) Ning, Z.: Complex network demonstration –China Education Network. Journal of Systems Engineering 21(4), 337–340 (2006)
Priority-Pointing Procedure and Its Application to an Intercultural Trust Project Rong Du1, Shizhong Ai1, and Cathal M. Brugha2 1
School of Economics and Management, Xidian University, Xian, Shaanxi, China 2 School of Business, University College Dublin, Belfield, Dublin 4, Ireland
Abstract. In the Western cultural background, the Priority-Pointing Procedure (PPP), which is a qualitative research-based diagnostic procedure, has been proven to be able to point to a priority for action by measuring imbalances in the context of Nomology. As the starting point to prove its feasibility in the environment of the Eastern cultural background, we applied PPP to the research process of an Intercultural Trust Project, which was performed in Dublin, Ireland. In this paper we present the application of PPP in the environment of a mixed cultural background. We find that PPP is useful for defining variables and identifying imbalances. Keywords: systems science; strategy; intercultural trust.
1 Introduction As China’s economy grows, Chinese systems methodologies have attracted increasing system problems attention from researchers in both China and the other parts of the world. Gu and Zhu (2000) presented an outline of an Oriental systems methodology: the Wuli Shili Renli approach (WSR), which has been used successfully to guide in China’s systems projects (Gu and Tang, 2006). In the early 1990s a Chinese system scientist, Qian Xuesen proposed a Meta-synthesis method to tackle with open complex giant which cannot be effectively solved by traditional methods. The method emphasizes the synthesis of collected information and knowledge of various kinds of experts, and combining quantitative methods with qualitative knowledge. Since then, continuous endeavors have been taken to put those ideas into practice, and achievements have been obtained from the applications in practice (Gu, Wang and Tang 2007, Yu and Tu 2002). Outside China, there are also some researchers who are interested in the Chinese systems methodology and compare it with the other systems methodologies originated beyond China. For example, Brugha (2001) used a meta-decision-making approach to show that parallels can be drawn between Wuli Shili Renli approach (WSR) and the adjusting, convincing, and committing dimensions in Nomology, a generic metamodel that is based in decision science. He used the match between them to propose a meta-linguistic bridge between China and the West that could aid in the communication and sharing of systems experiences. He proposed as a research agenda that the bridge be used to explore how Chinese insights could help to illuminate Western systems experience. Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 296–303, 2009. © Springer-Verlag Berlin Heidelberg 2009
Priority-Pointing Procedure and Its Application to an Intercultural Trust Project
297
In recent years, more efforts have been made to apply systems methodologies to modeling activities. For example, Gu and Tang (2005) discussed two main issues, model integration and opinion synthesis, which are often confronted when applying a meta-synthesis approach, and demonstrated the development of an embryonic metasynthetic support prototype, which shows how to model complex problems, such as macro-economic problems in a Hall for Workshop on Meta-Synthetic Engineering with versatile resources in information collection, model integration and opinion synthesis. Makowski (2005) presented the methodological background and implementation of a structured modeling environment developed to meet the requirements of modeling activities undertaken to support intergovernmental negotiations aimed at improving European air quality. Liu, Dang and Wang (2006) studied the research areas of Chinese natural science basic research from a point view of complex network. Most studies of complex systems use as their approach trying to explore the principles of the complex real world. To understand these complex phenomena and to help people to make decisions, some researchers have proposed soft approaches that use the fundamental ideas incorporated in systems methodologies. For example, Gu and Tang (2006) suggested that appropriate methods should be designed and employed to address and tackle Wuli Shili and Renli elements in a theoretically informed and systematic way, and outlined the background, philosophy, process, principles, and some practical applications of the methodology. Makowski and Wierzbicki (2003) addressed modeling knowledge in terms of model-based decision and soft computations. Nakamori and Sawaragi (2000) emphasized a soft approach that uses both the logic and the educated intuition of people. This approach originates in Sawaragi's shinayakana systems approach that is based on the Japanese intellectual tradition, which, to some degree, matches the Chinese systems methodology. Aiming at solving the decision problems in complex systems, Brugha (2000, 2005) proposed a Priority-Pointing Procedure (PPP), a qualitative research-based diagnostic procedure. The purpose of PPP approach is to develop a suitable strategy, which is critical to any organisation. People need to have a strategic direction to answer the question ‘what should we do next?’. Generally, this simple question produces multi-criteria answers. It leads to a pointing towards a priority for action from amongst several alternatives that emerge from the responses (Brugha 2000, 2005). PPP was shown to be useful for defining variables. However, one might ask: how is the application of PPP to the real-life world? To answer this question, we have tested the PPP in some different contexts. In Ireland, PPP has been used to solve the strategic problems in a graduate business school, to diagnose Ireland’s preparation for entry into the European Monetary Union, and to solve Dublin’s transport problems. Furthermore, in recent years, the EMBA students who took the course “Strategic Direction and Decision Workshop” in University College Dublin have undertaken projects in their course, in which PPP was used to solve many different strategic problems in many companies and institutions. Considering the difference between the Eastern culture and Western culture, one may wonder whether PPP works well or not in the context of the Eastern culture, such as in a Chinese context. To test the cross-cultural functionality of PPP, we applied PPP to the research process of the Intercultural Trust Project, which was performed in Dublin, Ireland, to test variations in trust within relationship management between China and English speaking countries in Western Europe. It was shown that PPP is
298
R. Du, S. Ai, and C.M. Brugha
useful for defining constructs and variables for the research of intercultural trust, and it is also useful for identifying imbalances in the action to build intercultural trust. We believe that the PPP approach might fits well with the Chinese systems methodology. In Section 2, we will give an introduction to the Intercultural Trust Project, and will provide the theoretical framework for PPP. We will present the application of PPP to the Intercultural Trust Project in Section 3. In Section 4 we will conclude our work and address some ongoing applications.
2 Research Background 2.1 Intercultural Trust Project In December 2006, we initiated a project called “Intercultural Trust Project”. The authors of this paper are main researchers of the project. The project was undertaken in Dublin, Ireland. On the basis of Brugha’s research into Nomology and Du’s previous research into trust (Du, Ai and Hu 2007), the project extended and developed the research through empirical surveys of the thought processes underpinning knowledge management in inter-cultural business practices. The specific focus was to test variations in trust within relationship management between China and English speaking countries in Western Europe. The research questions included: How do the different cultural backgrounds affect the development of trust? What does this tell us about how to develop future business relationships in a global context? The project involved interviewing and surveying people in companies and non-profit organizations that have different mixes of Chinese and Western influences. The most vital aspect was to be able to ascertain tiny nuances of meaning about aspects of trust. Guided by the fundamental theories in Nomology and the basic ideas in PriorityPointing Procedure, we conducted a field study in Dublin, Ireland, to study the inter-cultural trust in Irish-Chinese cooperative relationships. In the field study, we organised / attended 10 seminars on inter-cultural trust. Based on the Priority Pointing Procedure (Brugha 2000, 2005), we designed a set of open-ended questions about inter-cultural trust between Chinese people and Irish people, attempting to find the factors impacting on inter-cultural trust. We conducted a questionnaire survey, distributing 120 questionnaires by email and getting 21 responded questionnaires back, among which 20 answered questionnaires are valid. Based on the information we got from answered questionnaires, we arranged further interviews about inter-cultural trust in order to know some deep feelings and experiences on inter-cultural trust. We have interviewed 16 people about Chinese-Irish inter-cultural trust. 2.2 Theoretical Framework PPP is an outcome of inter-disciplinary research, which was built on the basis of the combination of different disciplines, such as philosophy, psychology, Nomology, management, and decision science. Brugha (1998a,b,c) has validated the reasoning behind the constructs by the use of a multidisciplinary base in his previous work. Here we would only give a brief introduction to its major framework.
Priority-Pointing Procedure and Its Application to an Intercultural Trust Project
299
Fig. 1. Priority-Pointing Wheel
Nomology is the study of the decision-making processes of the mind, literally the doctrine of law. Nomology is based on the premise that intelligent beings’ choices tend to follow a common set of simple decision rules (Brugha 1998a,b,c). According to Nomology, it is clear that decision-makers address their many different problems with the same approach, which is based on asking questions that have dichotomous answers (Brugha, 2000). As shown in Figure 1, the theoretical framework of PriorityPointing Procedure (PPP) is constructed on the basis of above nomological theory and its adjusting system. The Priority-Pointing Wheel consists of two major sides (planning and putting) with two major focuses (place and people) in the inner core, four general kinds of activities (propose, perception, pull, and push) in the middle layer, and eight principal activities (pounce, procedure, price, policy, promotion, productivity, pliability, practice) in the outer layer. PPP points to a priority for action by measuring imbalances in the context of the structure of adjustment decision-making from Nomology. Brugha (2000) depicted the details of the mechanism of the Priority-Pointing Wheel. Open-ended questions seek to determine if a system has an energy deficit, i.e. a need for “punch”, or a “prevention” block that should be reduced.
3 Application of PPP to the Intercultural Trust Project 3.1 Define the Objective The objective is “to build inter-cultural trust between Chinese and Western people in business/management contexts”. Here, business/management contexts refer to the contexts in a variety of businesses, including businesses in either companies or nonprofit organizations. In the objective definition, Chinese people include immigrants or temporary residents in Ireland and UK, who are from China, and the local Chinese people in China who have business relationship with Western people. Western people include the local Western people in Ireland and UK who have business relationship with Chinese people and those Irish and British people who work in China. 3.2 Identify the Respondents The potential respondents include Chinese immigrants or temporary Chinese residents in Ireland, and the local Western people in Ireland who have a relationship with
300
R. Du, S. Ai, and C.M. Brugha
Chinese people in business/management contexts. When we picked our real respondents, we chose the people who are actively involved in and committed to the solution of building inter-cultural trust between Chinese and Western people in business/management contexts. 3.3 Survey Method and Questionnaire The used method is based on six open-ended questions. Two questions were general and the other four were specific to the four sectors of activity: proposition, perception, pull and push. The six questions were equally divided into punch and prevention questions, and were given as follows. –What should be done to build more inter-cultural trust between Chinese and Western people in their business relationships? –What in general is preventing there being more intercultural trust between Chinese and Western people in their business relationships? –What specific problems are preventing the building of inter-cultural trust between Chinese and Western people in their business relationships? –What should be done to increase understanding of how to build inter-cultural trust between Chinese and Western people in their business relationships? –What is preventing Chinese and Western people in business relationships from working better together to build inter-cultural trust between them? –What organisational or institutional changes could help to build inter-cultural trust between Chinese and Western people in their business relationships? To obtain deeper thoughts of the respondents, the some respondents were interviewed in person using a semi-structured approach. 3.4 Survey Interpretation and Variable Definition The output of the intercultural trust survey was a diverse set of views. We use the adjustment theory (Brugha, 1998b,c) to make coherent sense of the results. Analysis of the responses to the above questions showed that they fell into 8 categories based on the nature of the procedure being used by the respondents. These are given in the outer circle of Figure 2. 3.5 Synthesis and Analysis In our questionnaire survey, we distributed 120 emails with questionnaire attached and got 21 responded, among which 20 answered questionnaires are answered and 13 are valid for PPP synthesis and analysis. The responses rate via e-mail is a bit low. But the results and conclusion can be meaningful in terms of PPP synthesis and analysis. The spread of answers by the 13 respondents who involved in intercultural interactions (Figure 2) shows significant imbalances in terms of their selection from the 8 processes that represented their menu of alternatives. The first dichotomy, planning (66) versus putting (2), showed a strong imbalance. Subsequent dichotomy also showed an imbalance between place (43) and people (25). Within that, the imbalance was more on the planning side, with proposition (42) compared with perception (24). Within the proposition sector, the procedure (41) compared with pounce (1) showed a very significant
Priority-Pointing Procedure and Its Application to an Intercultural Trust Project
301
imbalance. Within the perception sector, there is also a strong imbalance between price (23) and policy (1). The imbalances appeared to point in the direction of procedure and price, i.e. the priority may be either or both. This result was consistent with the main finding of the study, which remains an intractable problem to this day. The biggest barrier to the solution of intercultural trust problems in Ireland was the division of responsibility for planning between different governmental departments, e.g. the Visa Offices, the Department of Enterprise, Trade and Employment, which have responsibility for most things to do with foreign immigrants and residents.
PR
PLAC E
SH
OP O
L PU Y IT
PT
IV
IO
UCT
N PR IC E
PR O D
P U T T I NG
P LI A
BIL
RE EDU OC PR ON P L AN N I NG TI
PU
CE CTI
SI
IT Y
A PR
E
L
PR
P EOP LE OM
OT
IO N
PO
RC PE Y LI C
Fig. 2. Suggestions of solutions to intercultural trust problem
3.6 Measuring Imbalances Imbalances in the scores are measured as follows. Where a dichotomy is being compared, a sample proportion, described by p’, which comes from participants’ responses, can be used. For two scores to be in balance, the expected proportion, described by p, should be 0.5. Comparing the procedure (41) and price (23) scores the expected balanced score should be 32. A simple t-test shows the significance of this difference. An alternative is to use a chi-square measure where, as in this case, one of four sectors outscores the other three. Here the expected score based on balance is 68 divided by 4, i.e. 17. In the Intercultural Trust Project case, there were 64 answers for “narrow planning” sector and only 4 for the other three sectors. This gives a highly significant chi-square score, and points to a clear need for more open “planning” work between East and West to develop better mutual relationships. Together the survey clearly indicates the need to propose more procedural opportunities for interaction between Chinese and Western people to get to know one another. After that it points to developing perceptions about one another, the values that are important, i.e. highly “priced”, in each other’s culture. 3.7 Feedback to Participants An essential part of the procedure is the feedback to respondents of a synthesis of the priority response expressed in their own language or terminology. In the Intercultural
302
R. Du, S. Ai, and C.M. Brugha
Trust Project case, this feedback has served as a motivation for developing further surveys and interviews of people who have been dealing over a long period with intercultural trust issues.
4 Conclusion In this paper, we have presented the application of PPP to the Intercultural Trust Project. Induced by the result of the procedure of Priority-Pointing in the Intercultural Trust Project, we are now working on empirical surveys of the thought processes underpinning knowledge management in inter-cultural business practices, and have extended this to a wider survey of the Chinese involvement in business in Ireland along with Dr. Lan Li of the Confucius Institute and the Institute of Chinese Studies, which are located in the Quinn School. The focus of the research is to ask the questions such as, “How do the different cultural backgrounds affect the development of trust?” “What does this tell us about how to develop future business relationships in a global context?” It will involve interviewing and surveying people in companies that have different mixes of Chinese and Western influences both in Ireland and in China. The most vital aspect will be to be able to ascertain tiny nuances of meaning about aspects of trust. PPP as a general method can support high level decision making and strategic actions. However, at the moment PPP is not a well validated model yet. The theoretical framework of PPP is not very developed about its validity, basic principles, and logic for the construction. So we still need to make PPP more developed and well validated. Furthermore, more applications of PPP to the cases in the environment of the Eastern cultural background need to be performed so as to test whether PPP can work well in different cultural environments. At the moment, we are applying PPP to some cases in Xi’an, China. We will relate PPP to the Wuli Shili Renli systems approach and the Meta-synthesis approach in China.
Acknowledgments This research is supported in part by the National Natural Science Foundation of China through grant 70871096. It is also supported by the University College Dublin, Ireland.
References Brugha, C.M.: The structure of qualitative decision making. European Journal of Operational Research 104(1), 46–62 (1998a) Brugha, C.M.: The structure of adjustment decision making. European Journal of Operational Research 104(1), 63–76 (1998b) Brugha, C.M.: The structure of development decision making. European Journal of Operational Research 104(1), 77–92 (1998c) Brugha, C.M.: Relative measurement and the power function. European Journal of Operational Research 121, 627–640 (2000a) Brugha, C.M.: An introduction to the priority-pointing procedure. Journal of Multi-Criteria Decision Analysis 9, 227–242 (2000b)
Priority-Pointing Procedure and Its Application to an Intercultural Trust Project
303
Brugha, C.M.: Systemic Thinking in China: A Meta-Decision-Making Bridge to Western Concepts. Systemic Practice and Action Research 14(3), 339–360 (2001) Brugha, C.M.: Decision-maker centred MCDM: Some empirical tests and their implications. In: Multiple Criteria Decision Making in the New Millennium. Lecture Notes in Economics and Mathematical Systems, vol. 507, pp. 69–78. Springer, Heidelberg (2001) Brugha, C.M.: Structure of multi-criteria decision-making. Journal of the Operational Research Society 55, 1156–1168 (2004) Brugha, C.M.: Priority Pointing Within the Systems Development Life Cycle. International Journal of Knowledge and Systems Sciences 2(2), 25–32 (2005) Brugha, C.M., Du, R., Ai, S.: An Integrated Knowledge Management Development System (IKMDS). International Journal of Knowledge and Systems Sciences 5(1) (2008) Rong, D., Ai, S., Hu, N.: Interpersonal Trust and Its Impacts on Knowledge Transfer Within Alliances. International Journal of Knowledge and Systems Sciences 4(1), 44–50 (2007) Rong, D., Brugha, C.M., Ai, S.: Implications from Decision Science for the Inter-Cultural Trust Development in Information Systems. In: Professional Development Workshop, OCIS Division, Academy of Management, 67th Annual Meeting, Philadelphia, USA, August 3-8 (2007a) Rong, D., Ai, S., Brugha, C.M.: Inter-Cultural Trust in Chinese-Irish Cooperative Relationships: A Field Study in Dublin, Ireland. In: The Inaugural International Conference of the UCD Confucius Institute for Ireland and the Irish Institute for Chinese Studies, Dublin (2007) Du, R., Brugha, C.M., Ai, S.: The impact of cultures: a measuring instrument for intercultural trust between Irish and Chinese employees. In: The Inaugural Conference of the Association for Chinese Studies in Ireland, Dublin (2007b) Rong, D., Ai, S., Brugha, C.M.: A Moderating Model of Trust in Conflict Management. In: Proceedings of KSS 2008, Guangzhou, China, December 11-12 (2008) Glaser, B., Strauss, A.: The Discovery of Grounded Theory. Aldine, New York (1967) Jifa, G., Z.Z., Wuli, K., Shili, S.: Caring for Renli: Methodology of the WSR Approach. Systemic Practice and Action Research 13(1), 11–20 (2000) Jifa, G., Tang, X.: Meta-synthesis approach to complex system modeling. European Journal of Operational Research 166(3), 597–614 (2005) Jifa, G., Tang, X.: Wuli-shili-renli system approach/theory and applications. Shanghai Press of Science and Technology Education, Shanghai (2006) Jifa, G., Wang, H., Tang, X.: Meta-synthesis method and systems. Science Press, Beijing (2007) Jianguo, L., Dang, Y., Wang, Z.: Complex network properties of Chinese natural science basic research. Physica A: Statistical Mechanics and its Applications 366, 578–586 (2006) Makowski, M.: A structured modeling technology. European Journal of Operational Research 166(3), 615–648 (2005) Makowski, M., Wierzbicki, A.P.: Modeling Knowledge: Model-Based Decision and Soft Computations. In: Yu, X., Kacprzyk, J. (eds.) Applied Decision Support with Soft Computing, pp. 3–60. Springer, Berlin (2003) Nakamori, Y., S.Y.: Complex systems analysis and environmental modeling. European Journal of Operational Research 122(2), 178–189 (2000) Roy, S.: From the editors: what grounded theory is not. Academy of Management Journal 49(4), 633–642 (2006) Yu, J., Tu, J.: Meta-synthesis—Study of case. Systems Engineering—Theory and Practice 22(5), 1–7 (2002) (in Chinese)
Exploring Refinability of Multi-Criteria Decisions Cathal M. Brugha School of Business, University College Dublin, Ireland
Abstract. This paper used the Structured Multi-Criteria Methodology and the Direct-Interactive Structured-Criteria (DISC) Multi-Criteria Decision-Making (MCDM) system to explore the refinability of decisions in dialogue between Decision Advisors (DAs) and Decision-Makers (DMs). The study showed the importance of a sensitive DA/DM interaction, of using iterative cycles to complete stages, and that the DAs should have confidence in the full eight stage MCDM structured process when helping DMs to reach a decision.
1 Introduction This paper is part of a stream of research that evinces the (critical) real criteria structures in decisions, methodologies and in Multi-Criteria Decision-Making (MCDM) (Brugha 2004). Called Nomology (Brugha 1998) this field starts from the premise that generic decision structures form the basis of many decisions made in practice. The paper commences with a review of a structured methodology for MCDM, shows its links with two similarly structured methodologies that have oriental connotations, and then uses the structure to evaluate an exploration with 27 students who were each considering what they should do next year, to see how much it was possible to help them to refine aspects of their decision. The Structured Methodology is intended for decision advisors (DAs) to help them to guide decision-makers (DMs) when making multi-criteria decisions. It was developed by incorporating experiences with MCDM methodologies, including guidelines about fundamental objectives from Keeney and Raiffa (Keeney and Raiffa 1976), the criteria for examining the objectives and attributes in a value tree from Von Winterfeldt and Edwards (Von Winterfeldt and Edwards 1986) and from Keeney (Keeney 1992) a set of the desirable properties of fundamental objectives. It concluded that MCDM information should satisfy the following criteria. It should be accessible, differentiable, abstractable, understandable, verifiable, measurable, refinable, and usable. It also showed that these criteria themselves fit into a structure. The structured criteria are presented in Table 1, matching corresponding guidelines where possible. Applying the nomological approach to the three previous versions in Table 1 meant trying to evince the decision-making structure underlying a methodology for MCDM, fitting it into a generic structure, and then learning what it meant. It turned out that the fit was to an adjusting structure (Brugha 1998b). The interpretation is that an MCDM process is about shaping (as in adjusting) the information to help make a decision (Figure 1). The figure shows that the structured methodology is driven by a series of decisions, the first three of which are based on dichotomies. Firstly there is the need to plan the decision by forming the criteria tree, and then put it to use in measurement and Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 304–310, 2009. © Springer-Verlag Berlin Heidelberg 2009
Exploring Refinability of Multi-Criteria Decisions
305
choice. Secondly there is need for a structured engagement between the DMs, who are the people making the decision, and the DAs, who control the methodology and the systems, the place where the decision is made. Together these form four phases, proposing the factors in the decision, understanding the DMs’ perceptions, seeing how these reveal a pull to prefer some alternatives, and making the push to decide. Table 1. Structured Criteria and Previous Versions Structured Criteria Fundamental Objec- Value Tree (Von Winterfeldt Set of Attributes (Keeney (Brugha 2004) tives (Keeney 1992) and Edwards 1986) and Raiffa 1976) Accessible
Complete
Complete
Complete
Differentiable
Decomposable
Judgementally independent
Decomposable
Abstractable Understandable
Well-defined Understandable
Verifiable Measurable
Measurable
Refinable Usable
Operational
Operational
Non-redundant
Non-redundant
Non-redundant
Concise
Non-aggregatable
Minimum Size
Essential
Value-relevant
Controllable
Within each phase a third dichotomy governs the eight principal activities, which is to focus on personal engagement and interaction between DA and DM, or to rely more on the position of either. In the first of eight stages the DA uses his/her position as being in charge of the process to access information from the DM. Drawing from Personal Construct Theory in Psychology the DA uses laddering and other techniques to “pounce” on any information offered by the DM which could reveal an inherent criteria tree. The second stage uses a procedure similar to Grounded Theory in Sociology. Here the DA uses personal engagement with the DM to differentiate the resultant construct information into clusters of similar criteria. In the third stage the emphasis moves to the position of the DM, to try and evince his/her perceptions of why the criteria are important. Drawing from Critical Realism in the Philosophy of Science the DA tries to abstract, using generic language, the inherent worth, in nomological language the price, to the DM of each of the criteria. In the final stage of planning the criteria tree the DA personally discusses the criteria tree and its various layers with the DM, trying to understand his/her policy for the decision. The resultant criteria tree is a nomological map which facilitates the DM to consider choices (Brugha 2005). The first stage of putting the criteria tree to use involves the DA personally interacting with the DM to verify all aspects of the criteria tree, its processes and constructs. The DA should try to ensure that the DM has engaged fully in the process, and is clear about the contributing aspects. This is a promotion process in the sense that personally interacting with the DM also means leading him/her to use the system, to have confidence in his/her choices, evincing elements that may influence the choice. In some cases the outcome at this stage may be a decision.
306
C.M. Brugha
PLACE
PUTTING
PLANNING
PEOPLE
Fig. 1. Structured Methodology
In the next stage the emphasis moves back to the position of the DM, to try and identify any alternatives that stand out from others, i.e. are more productive in meeting the aims of the decision. MCDM measurement tools are used as objective arbiters for the choice. The aim of the final phase is to support the DM if he/she wishes to push for a choice. MCDM software can help facilitate refining the decision by relieving the DM of excessive work. The nomological term here is pliability, the idea that the MCDM system should facilitate the DM being able to re-shape many aspects of the decision, the individual alternatives, the criteria, and the set of alternatives. Examples of refinements might be a new hybrid alternative made from combinations of alternatives, exploring a change of criteria weight, or reducing the number of alternatives. Software facilitates considering numerous possible refinements. Finally, the development of cases should help make the system usable in practice. Case studies are a very helpful platform for getting a decision process started. The foundation of the cases on real criteria based on generic structures ensures that the
Exploring Refinability of Multi-Criteria Decisions
307
map does not become disconnected from the problem as both it and the DMs’ team membership may change over a long period (O'Brien 2008). Several benefits arose from discovering the generic structure and seeing it as a map of the methodology. Firstly, it showed that 50% of the work is about discovering the criteria tree, and that this is a complex task that draws on many fields in management. Secondly it shows the importance of the DA/DM relationship. The DA knows the methodology and the DM knows the problem. Their interaction is a mutual learning process. Thirdly, the map revealed missing constructs or stages in earlier versions of the methodology (Table 1), in this case the importance of verifiability, and the need to incorporate modifications or refinability of the decision during the process. A test of the validity of the structured methodology in a large case study as part of a PhD (O'Brien 2008) led to an extension of the idea of refinability, from what was proposed originally (Brugha 2004).
2 Kawakita K-J Mind Mapping and Iterative Cycles In Nomology it is important to get assurance about any claim that a structure is generic. Nomological structures are derived from actual practice. Consequently, such assurance can be found if the structures appear in different and unconnected practice. The process of making a nomological map of a criteria tree, described above, has parallels with the K-J Mind Mapping Method, which was developed in Japan by Jiro Kawakita. Both the KJ method and Nomology are believed to be universally applicable despite cultural diversity (Scupin 1997). Both also make reference to use of the word “abduction” by C.S. Peirce (Peirce 1867), which Brugha (Brugha 2004) suggests should more properly be described as evincing. The application of the KJ method involves four essential steps: Label Making, Group Organising, Chart Making, and Description. These correspond with the stages of forming a criteria tree. Label Making focuses on accessibility, writing down as many single ideas and points on individual labels that are deemed relevant to the question or topic being asked, recording them for later when order will be put on the data. Group Organising involves clustering the labels from the previous step into related and differentiable groups that are hierarchies of “families”. Similar to the way that personal engagement is used to differentiate criteria information above, group organising is described in the KJ method as done by subjectively clustering labels, with “feelings” dominating the logic. The third step of the KJ method Chart Making involves “devising a spatial patterning of the ‘families’ into a consistent unifying chart” (Scupin 1997). This corresponds to the abstractable and understandable steps above. The fourth step of the KJ method Description, otherwise known as the Written or Verbal Explanation, corresponds to the verifiable step. This “must express in detail the interrelationships that are configured in the chart” (Scupin 1997). Another factor that the Structured MCDM approach has in common with the K-J method is re-cycling to do more work on the diagram / criteria tree until DAs and DMs are satisfied. Brugha and Bowen (Brugha and Bowen 2005) have discussed several kinds of management system, both Chinese and Western, that have the same adjusting pattern as in Figure 1. They also suggest ways triple-loop learning could be interpreted in such a structure, one of which is as follows:
308
C.M. Brugha
“No learning at all could be described as staying within the practice activity. Single loop would correspond to … both pliability and practice, i.e. the push sector. Then double-loop learning would … include both pull and push sectors. Finally, triple-loop learning would involve all of the adjusting activities.”
In this context the idea that all the factors in a multi-criteria decision should be refinable becomes very important. This led to the exploration described below.
3 Exploring Refinability Four Groups of three or four students on an MSc in Business Analytics in the Smurfit School of Business in University College Dublin were given the task to each be DAs for two DMs, students who were unsure about what to do the following year, and who were willing to be helped with making their decision over a period of several weeks in late 2008. They were given the generic structure and case examples of career decisions and asked to go through the methodology, giving special attention to observing and facilitating refinability by the DMs. They were asked to use the Direct-Interactive Structured-Criteria (DISC) scoring systems (Brugha 2002), starting with Utility Scoring (DISCUS) and changing when appropriate to Relative Intensity Measurement (DISCRIM) (Brugha 2004) (Brugha 2004b). The highest decision on the criteria tree was about a trade-off between “Will I be able for it?”, “Will I Like it?” and “Will it be good for me?” Figure 2 shows one DM’s DISCUS scores, each between zero and a maximum of 10, for “Will I Like it?” which had three alternatives, “Masters”, “Job” or “Travel”1. Figure 3 shows the DISCRIM scores for the same DM for a second phase in which “Masters” and “Job” are compared relative to one another by sharing 10 points. The same sharing of 10 points was used to get the relative importance of the criteria in the tree (Figure 2). Figure 3 also shows a relative preference by 5.7 to 4.3 to take an accounting job graduate programme instead of doing a masters degree. The DM indicated that the process cleared any doubt about her choice. The four groups worked independently of each other, in all cases to the satisfaction of the DMs. The groups interacted quite differently with their DMs, with varying emphases on the use of DISCRIM to “discriminate” between close alternatives. This suggests that DA understanding of the process and skills with interaction can vary, and that there can be different measurability routes to a decision. Groups 3 and 4 helped all their DMs to reach a decision. In Group 1 two did not reach a decision; one ran out of time, and the other decided that he could not choose between the two close remaining alternatives because the choice depended on the state of the economy. In Group 2 only two out of the six DMs reached a decision. The other four stopped with two alternatives remaining. One may not have taken the process seriously. One was apprehensive about the system. One was conflicted at the highest level of the criteria tree between his preference and what he should do on a “rational basis”. And one had problems with the criteria tree not being sufficiently orientated towards expressing his interest in music. This suggests that that there can be different interpretations of the DA’s role in refinability. The DAs here did not 1
Enviorment should be spelt Environment.
Exploring Refinability of Multi-Criteria Decisions
309
push for refinements that might have resolved issues raised by the DMs. They focused more on the proposition phase interacting with the DMs trying to understand their criteria in psychological terms.
Fig. 2. First Phase: Utility scores for “Will I like it?”
Fig. 3. Second Phase: Relative Scores for “Will I like it?”
Group 3 was very different, got results with all of its six DMs, two of which were hybrid alternatives, which they helped to develop by suggesting combining aspects of other alternatives that scored highly on different parts of the criteria tree. This group reported a common problem was that DMs found it difficult to “separate the constructs of the tree from the options in question”. An “interesting comment (one) DM made was that without someone guiding them through the tree and asking the hard questions, they would have been reluctant to face up to the realities that some criteria presented.” Another of their DMs reported that “on scoring the alternatives found he was developing a
310
C.M. Brugha
deeper understanding of the choice as he was forced to assess the benefits of each of the sub-criteria.” This group saw the process as “interactive and ongoing”, with the tree “continuously altered until it completely makes sense, which results in the scoring making sense and hence producing an accurate and flexible result”. In Group 4 five of its cases made the decision in the first phase, and two went to a second phase where DISCRIM was used, leading to a decision. The group concluded it was important to help DMs who might have difficulties with revealing personal and emotional aspects of choices. This could affect accessibility if DMs felt embarrassed or sensitive about revealing private information. Where the attributes of the alternatives are very different it could affect differentiability because differences between alternatives would have to be expressed in terms of subjective criteria, particularly higher up the criteria tree. They felt that this might cause difficulties with measurability because “emotions are not quantifiable and therefore any scoring process would yield inaccurate and inconclusive results.” The study showed the importance of a sensitive DA/DM interaction, of using iterative cycles to complete stages, and that the DAs should have confidence in the full eight stage MCDM structured process when helping DMs to reach a decision.
References Brugha, C.: The structure of adjustment decision making. European Journal of Operational Research 104(1), 63–76 (1998b) Brugha, C.: Direct-Interactive Structured-Criteria System (2002), http://www.mcdm.com Brugha, C.: Structure of multi-criteria decision-making. Journal of the Operational Research Society 55(11), 1156–1168 (2004) Brugha, C.M.: The structure of qualitative decision making. European Journal of Operational Research 104(1), 46–62 (1998) Brugha, C.M.: Phased multicriteria preference finding. European Journal of Operational Research 158(2), 308–316 (2004b) Brugha, C.M.: Priority Pointing Within the Systems Development Life Cycle. International Journal of Knowledge and Systems Sciences 2(2), 25–32 (2005) Brugha, C.M., Bowen, K.: Decision research using cognitive structures. Systemic Practice and Action Research 18(1), 67–88 (2005) Keeney, R.: Value-Focused Thinking: A Path to Creative Decision-Making. Harvard University Press, Cambridge (1992) Keeney, R.L., Raiffa, H.: Decisions with Multiple Objectives: Preferences and Value Tradeoffs. John Wiley & Sons, New York (1976) O’Brien, D.B.a.B., Cathal, M.: Adapting and Refining in Multi-Criteria Decision-Making. Journal of the Operational Research Society (2008) (accepted subject to revisions) Peirce, C.: Collected Papers of Charles Sanders Peirce. In: Hartshorne, C., Weiss, P. (eds.). Harvard University Press, Cambridge (1867) Scupin, R.: The KJ Method: A technique for analyzing data derived from Japanese thnology. Human Organisation 56(2), 233–237 (1997) Von Winterfeldt, D., Edwards, W.: Decision Analysis and Behavioral Research. Cambridge University Press, New York (1986)
Methodology for Knowledge Synthesis Yoshiteru Nakamori School of Knowledge Science Japan Advanced Institute of Science and Technology 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan
[email protected] Abstract. This paper considers the problem of knowledge synthesis and proposes a theory of knowledge construction, which consists of three fundamental parts: a knowledge integration model, the structure-agency-action paradigm, and the evolutionally constructive objectivism. The first is a model of gathering and integrating knowledge, the second relates to necessary abilities when gathering knowledge in individual domains, and the third comprises a set of principles to evaluate gathered and integrated knowledge.
1 Introduction Meta-synthesis (Gu and Tang, 2005) might be interpreted as a systems thinking for a holistic understanding of the emergent characteristic of a complex system, and for creating a new systemic knowledge about a difficult problem confronted. With a similar purpose, Wierzbicki et al. (2006) proposed the informed, creative systemic approach, named the informed systems thinking, which should serve as the basic tool of knowledge integration and should support creativity. This systems thinking emphasizes three basic principles: the principle of cultural sovereignty, the principle of informed responsibility, and the principle of systemic integration. The problem here is: how can we fulfill a systemic integration in the context of knowledge synthesis? One of the answers to this is: the theory of knowledge construction, which consists of three fundamental parts: a knowledge construction model (Nakamori, 2003), the structure-agency-action paradigm (Nakamori and Zhu, 2004), and the evolutionally constructive objectivism (Wierzbicki and Nakamori, 2006). The main characteristics of this theory are: fusion of the purposiveness paradigm and purposefulness paradigm, interaction of explicit knowledge and tacit knowledge, and requisition for knowledge coordinators. This paper briefly introduces the knowledge construction model, the structureagency-action paradigm, and the evolutionally constructive objectivism, and then summarizes the theory of knowledge construction, expecting the further development of the methodology for knowledge synthesis.
2 Informed Systems Thinking Wierzbicki et al. (2006) proposed to redefine systems science as the discipline concerned with methods for the intercultural and interdisciplinary integration of Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 311–317, 2009. © Springer-Verlag Berlin Heidelberg 2009
312
Y. Nakamori
knowledge, including soft inter-subjective and hard objective approaches, open and, above all, informed. •
• • •
Intercultural means an explicit accounting for and analysis of national, regional, even disciplinary cultures, means trying to overcome the incommensurability of cultural perspectives by explicit debate of the different concepts and metaphors used by diverse cultures. Interdisciplinary approach has been a defining feature of systemic analysis since Comte (1844), but has been gradually lost in the division between soft and hard approaches. Open means pluralist, as stressed by soft systems approaches, not excluding by design any cultural or disciplinary perspectives. Informed means pluralist as stressed by hard systems approaches, not excluding any perspectives by disciplinary paradigmatic belief.
A basic novel understanding related to this paradigm is the essential extension of the skeleton of science (Boulding 1956). Beside biological, human and social levels of systemic complexity, many new levels of complexity of civilization development emerge. Informed systems thinking consists of three principles: • •
•
The principle of cultural sovereignty: We can treat all separate levels of systemic complexity as independent cultures, and generalize the old basic cultural anthropology: no culture shall be judged when using concepts from a different culture. The principle of informed responsibility: No culture is justified in creating a cultural separation of its own area; it is the responsibility of each culture to inform other cultures about its own development and be informed about development of other cultures. The principle of systemic integration: Whenever needed, knowledge from diverse cultures and disciplines might be synthesized by systemic methods, be they soft or hard, without a prior prejudice against any of them, following the principle of open and informed systemic integration.
It is, however, quite difficult to perform systemic integration unless we have methods of knowledge construction.
3 Knowledge Construction Model A knowledge construction model called the i-System was proposed in Nakamori (2003), which is a systemic and process-like approach to knowledge creation. The five ontological elements, or subsystems of the i-System are Intervention (the will to solve problems), Intelligence (existing scientific knowledge), Involvement (social motivation), Imagination (other aspects of creativity), and Integration (systemic knowledge): •
Intervention: Taking action on a problem situation which has not been dealt with before. First we ask: what kind of knowledge is necessary to solve the new problem? Then the following three subsystems are called on to collect that knowledge.
Methodology for Knowledge Synthesis
• • • •
313
Intelligence: Raising our capability to understand and learn things. The necessary data and information are collected, scientifically analyzed, and then a model is built to achieve simulation and optimization. Imagination: Creating our own ideas on new or existing things. Complex phenomena are simulated based on partial information, by exploiting information technology. Involvement: Raising the interest and passion of ourselves and other people. Sponsoring conferences and gathering people's opinions using techniques like interview surveys. Integration: Integrating heterogeneous types of knowledge so that they are tightly related. Validating the reliability and correctness of the output from the above three subsystems.
We can interpret these elements variously - either as nodes, or dimensions of Creative Space, or subsystems. In the last interpretation, while the 1st and the 5th subsystems are, in a sense, autonomous, the 2nd, 3rd and 4th subsystems are dependent on others; it is generally difficult for them to complete their missions themselves, and thus we can introduce a lower level system with similar structure to the overall system. Even if the i-System stresses that the creative process begins in the Intervention dimension or subsystem and ends in the Integration dimension or subsystem, it gives no prescription how to move in between. There is no algorithmic recipe how to move between these nodes or dimensions: all transitions are equally advisable, according to individual needs. This implicily means that the i-System requires knowledge coordinators within the System; we have to refer to the abilities or agencies of coordinators who works in the above three dimensions: Intelligence, Imagination and Involvement.
4 Structure-Agency-Action Paradigm The structure-agency-action paradigm is adopted when understanding the i-System from a sociological viewpoint (Nakamori and Zhu, 2004). The i-System can be interpreted as as a structurationist model for knowledge management. Viewed through the i-System, knowledge is constructed by actors, who are constrained and enabled by structures that consist of a scientific-actual, a cognitive-mental and a social-relational front, mobilize and realize the agency of themselves and of others that can be differentiated as Intelligence, Imagination and Involvement clusters, engage in rational-inertial, postrational-projective and arational-evaluative actions in pursuing sectional interests. The following are the working definition of some keywords that are essential to the concerned paradigm. These keywords have quite different but deeply ingrained
meanings in other disciplines beyond contemporary social theories. • • •
Structure: the systemic, collective contexts and their underlying principles, which constrain and enable human action. Agency: the capability with which actors, who are socio-technologically embedded, reproduce and transform the world. Construction: the social process through which actors reproduce and transform structure and agency.
314
Y. Nakamori
This paper only summarizes the agency complexity. By Intelligence we mean the intellectual faculty and capability of actors: experience, technical skill, functional expertise, etc. The vocabulary related to intelligence addresses logic, rationality, objectivity, observation and reflexivity. The accumulation and application of intelligence are mission-led and rational-focused (Chia, 2004), discipline- and paradigm-bound, confined within the boundary of ‘normal science’ (Kuhn 1970), which leads to ‘knowing the game’ and incremental, component improvement (Tushman and Anderson, 1986). In the Imagination cluster we uncover intuition, innocence, ignorance, enlightenment skill and post-rationality, which leads to a vocabulary of ‘feeling the game’, playful, fun, chaotic, illogic, forgetting, up-setting, competency-destroying and risktaking. This brings us beyond the ‘thoroughly-knowledgeable’ (Archer, 1995) and ‘over-rationalized’ agents (Mestrovic, 1998) that are portrayed in Giddens’s structuration theory (Giddens, 1979). Involvement is the cluster in human agency that consists of interest, faith, emotion and passion, which are intrinsically related to intentionality and ‘habits of the heart’ (Bellah et al., 1985), as well as the social capital (Bourdieu, 1985), social skill and political skill (Garud et al., 2002) that make intentionality and ‘the heart’ being felt. As human agency, involvement can produce managerial and institutional effects, particularly in dealing with the social-relational front, in that it helps or hampers researchers’ efforts to ‘make the game’. Even if the actors worked well using their agencies, this does not prove the validity of the obtained knowledge. We need a theory for knowledge justification.
5 Evolutionally Constructive Objectivism The evolutionally constructive objectivism is considered for testing knowledge creation theories (Wierzbicki and Nakamori, 2006), which consists of three principles: •
• •
Evolutionary falsification principle: hypotheses, theories, models and tools develop evolutionarily, and the measure of their evolutionary fitness is the number of either attempted falsification tests that they have successfully passed, or of critical discussion tests leading to an inter-subjective agreement about their validity. Emergence principle: new properties of a system emerge with increased levels of complexity, and these properties are qualitatively different than and irreducible to the properties of its parts. Multimedia principle: words are just an approximate code to describe a much more complex reality, visual and preverbal information in general is much more powerful and relates to intuitive knowledge and reasoning; the future records of the intellectual heritage of humanity will have a multimedia character, thus stimulating creativity.
Based on these three fundamental principles, we can give now a detailed description of an epistemological position of constructive evolutionary objectivism, closer in fact to the current episteme of technology than to that of hard sciences. 1.
According to the multimedia principle, language is a simplified code used to describe a much more complex reality, while human senses (starting with vision) enable people to perceive the more complex aspects of reality. This more
Methodology for Knowledge Synthesis
2.
3.
4.
5.
6.
7.
315
comprehensive perception of reality is the basis of human intuition; for example, tool making is always based on intuition and a more comprehensive perception of reality than just language. The innate curiosity of people about other people and nature results in their constructing hypotheses about reality, thus creating a structure and diverse models of the world. Until now, all such hypotheses turned out to be only approximations; but we learn evolutionarily about their validity by following the falsification principle. Since we perceive reality as more and more complex, and thus devise concepts on higher and higher levels of complexity according to the emergence principle, we shall probably always work with approximate hypotheses. The origins of culture are both linguistic, such as stories, myths, and symbols, and technical, such as tools and devices used for improving human life. Both these aspects helped in the slow development of science - by testing, abstracting, and accumulating human experiences with nature and other people, and testing and refining the corresponding models and theories. This development is evolutionary and, as in any punctuated evolution, includes revolutionary periods. The accumulation of human experiences and culture results in and is preserved as the intellectual heritage of humanity with its emotive, intuitive, and rational parts, existing independently from the human mind in libraries and other depositories of knowledge. Human thought is imaginative, has emotive, intuitive and rational components, and develops out of perception, sensory experiences, social interaction, and interaction with the intellectual heritage of humanity, including interpretive hermeneutic processes. Objectivity is a higher value that helps us interpret the intellectual heritage of humanity and select those components that more closely and truthfully correspond to reality, or that are more useful either when constructing new tools or analyzing social behaviour. A prescriptive interpretation of objectivity is the falsification principle; when faced cognitively with increasing complexity, we apply the emergence principle. The sources of our cognitive power are related to the multimedia principle.
6 Knowledge Construction Theory Now the paper proposes a theory of knowledge construction, which consists of three fundamental parts: the knowledge construction model, the structure-agency-action paradigm, and the evolutionally constructive objectivism. Although the final one was developed with the purpose of validating knowledge creation models such as the iSystem, this paper reuses it as a principle to test the obtained knowledge. The main characteristics of this theory are: • • •
Fusion of the purposiveness paradigm and purposefulness paradigm, Interaction of explicit knowledge and tacit knowledge, and Requisition for knowledge coordinators.
With the i-System we always start searching and defining the problem following to the purposiveness paradigm. Since the i-System is a spiral-type knowledge construction
316
Y. Nakamori
model, in the second round we use the i-System to find solutions following to the purposefulness paradigm. However, it is almost the case that we find an approximate solution and new problems. This paper accepts the idea of Nonaka and Takeuchi (1995) that the new knowledge might be obtained by the interaction between explicit knowledge and tacit knowledge. The use of the i-System means that we have to inevitably treat objective knowledge such as scientific theories, available technologies, socio-economic trends, etc. as well as the subjective knowledge such as experience, technical skill, hidden assumptions and paradigms, etc. The theory requires people who accomplish knowledge synthesis. Such persons need to have the abilities of knowledge workers in a wide-ranging areas and of innovators. However, they cannot achieve satisfactory results unless they possess the ability to coordinate opinions and values of diverse knowledge and people. We should establish an educational system to train human resources who will promote the knowledge synthesis in a comprehensive manner.
7 Concluding Remarks This paper considered the problem of knowledge synthesis and proposed a theory of knowledge construction, and reached a conclusion that we should nurture talented people called the knowledge coordinators. How can we nurture such people? One of the answers is that we should establish knowledge science and educate young students by this discipline. However, at the present stage, knowledge science is more a theme-oriented interdisciplinary academic field than a single discipline. Its mission is to organize and process human-dependent information and to return it to society with added value. Its central guideline is the creation of new value (knowledge) - such innovation being the driving force of society, but it mainly deals with the research area involving social innovation (organizations, systems, and reorganization of the mind). However, society’s progress is underpinned by technology and the joint progress of society (needs) and technology (seeds) is essential, so it also bears the duty to act as a coordinator (intermediary) in extensive technological and social innovations. In order to fulfill the above mission, knowledge science should focus its research on observing and modeling the actual process of carrying out the mission as well as developing methods to carry out the mission. The methods can be developed mainly through the existing three fields. These are the application of information technology/artistic methods (knowledge discovery methods, ways to support creation, knowledge engineering, cognitive science), the application of business science/organizational theories (practical uses of tacit knowledge, management of technology, innovation theory) and the application of mathematical science/systems theory (systems thinking, emergence principle, epistemology). However, it will take some time to integrate the above three fields theoretically and establish a new academic system. We should first attempt their integration in practical use (problem-solving projects), accumulate actual results and then to establish them as a discipline in a new field.
Methodology for Knowledge Synthesis
317
References Archer, M.S.: Realist social theory: The morphogenetic approach. University of Cambridge Press, Cambridge (1995) Bellah, R.N., Madsen, R., Sullivan, M.M., Swidler, A., Tipton, S.M. (eds.): Habits of the heart. University of California Press, Berkeley (1985) Boulding, K.: General systems theory: The skeleton of science. Management Science 2, 197– 208 (1956) Bourdieu, P.: The forms of capital. In: Richardson, J.G. (ed.) Handbook of theory and re-search for the sociology of education, pp. 241–258. Greenwood, New York (1985) Chia, R.: Strategy-as-practice: Reflections on the research agenda. European Management Review 1, 29–34 (2004) Comte, A.: A general view of positivism. Translation in 1865, London (1844) Garud, R., Jain, S., Kumaraswamy, A.: Institutional entrepreneurship in the sponsorship of common technological standards: The case of Sun Microsystems and Java. Academy of Management Review 45(1), 196–214 (2002) Giddens, A.: Central problems in social theory: Action, structure and contradiction in social analysis. Macmilian, London (1979) Gu, J.F., Tang, X.J.: Meta-synthesis approach to complex system modeling. European Journal of Operational Research 166(3), 597–614 (2005) Kuhn, T.S.: The structure of scientific revolutions, 2nd edn. University of Chicago Press, Chicago (1970) Nakamori, Y.: Systems methodology and mathematical models for knowledge management. Journal of Systems Science and Systems Engineering 12(1), 49–72 (2003) Nakamori, Y., Zhu, Z.C.: Exploring a sociologist understanding for the i-System. International Journal of Knowledge and Systems Sciences 1(1), 1–8 (2004) Nonaka, I., Takeuchi, H.: The knowledge-creating company: How Japanese companies create the dynamics of innovation. Oxford University Press, New York (1995) Tushman, M.L., Anderson, P.: Technological discontinuities and organizational environments. Administrative Science Quarterly 31, 439–465 (1986) Wierzbicki, A.P., Nakamori, Y.: Testing knowledge creation theories. In: IFIP-TC7 Conference, Cracow, Poland, July 23-27, 2007 (2006) Wierzbicki, A.P., Zhu, Z.C., Nakamori, Y.: A new role of systems science: informed systems approach. In: Wierzbicki, A.P., Nakamori, Y. (eds.) Creative space: models of creative processes for the knowledge civilization age, ch. 6, pp. 161–215. Springer, Heidelberg (2006)
Study on Public Opinion Based on Social Physics Yijun Liu1,3, Wenyuan Niu1,3, and Jifa Gu2,3 1
Institute of Policy and Management, Chinese Academy of Sciences, Beijing 100190, China 2 Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China 3 Center for Interdisciplinary Studies of Natural and Social Sciences, Chinese Academy of Sciences, Beijing 100190, China
[email protected] Abstract. Social combustion theory, social shock wave theory and social behavior entropy theory are the three basic theories of social physics. This paper studies on public opinion formation based on social combustion theory, and explores public opinion evolution process based on social shock wave theory, and grasps the individual’s whose specifically refers the public opinion leader’s behavior based on social behavior entropy theory. Keywords: Public opinion Formation; Public opinion Evolution; Social Combustion Theory; Social Shock Wave Theory; Social Behavior Entropy Theory.
1 Introduction After Conde put forward the idea of social physics nearly 200 years ago, the social physics has experienced such three development phases as the classical social physics, the modern social physics and the contemporary social physics. As an interdisciplinary field, contemporary social physics uses the concepts, principles and methods of natural science to explore, simulate, explants, explain and find out the social behavior rules and the economic operation orders with efficiently extending, properly integrating and rationally modifying[1]. During the last 50 years, great progress has been achieved in this field. Public opinion reflects the public on certain social reality and phenomenon in a different historical stages, the integration of mass consciousness, ideas and emotion. The subject of opinion is the general public, the object is a particular focus of the community, and the ontology is the tendentious comments or remarks of this focus. “Public opinion comes before the unrest” has become a consensus. Before any major social changes happen, there is always an aura from public opinion. During the changes, some oscillations will be caused on public opinion. After the changes, some public opinions will be persisted to guide new social changes as experience, preparation and reference. Public opinion can be viewed as a social behavior of the public and presentation of forming legal or moral restriction. It’s helpful to build harmonious society. In contrast, it can also induce social trouble. Therefore, it’s very significant to find out rule of opinion formation and evolvement and then guide opinion infection. Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 318–324, 2009. © Springer-Verlag Berlin Heidelberg 2009
Study on Public Opinion Based on Social Physics
319
Social physics presents three main theory[2], social combustion theory, social shock wave theory and social behavior entropy theory. Social combustion theory focuses on mechanism of society stability. Social shock wave theory explores spatio-temporal distribution of society stability. Social behavior entropy theory is for essential research on society stability. Social physics insisted that mechanism of opinion formation and evolution, same as process of common incident, involves latent period, active period and close period. When large-scale individuals or group discuss some incident together, it enters active period from latent period of opinion. That indicates that opinion is built step by step and formed at last by integration of local viewpoints with key points from opinion leader. The level of opinion formation during different stages can be quantitatively decided by number, scale and intensity. In the following parts, social physics will be applied to study opinion formation and evolution. In detail, social combustion theory will be used to study mechanism of opinion formation, social shock wave theory is used for exploring process of opinion evolution, and social behavior entropy theory is used to analyze behavior of participants, mainly opinion leader, in opinion ‘Ba’.
2 Studying on the Mechanism of Opinion Formation Based on Social Combustion Theory 2.1 Social Combustion Theory Social combustion theory, which carries a reasonable analogy between the natural burning phenomenon and social disorder, instability and turmoil, was proposed in 2001. In nature, burning involves not only physics process but also chemistry process. Physics process indicates physical balance conversation of energy, and chemistry process mainly indicates physical change and the related conditions. Burning occurred only if all three basic conditions, namely burning material, catalysis and the ignition temperature or the last straw, exist. That is, any of the three ones is indispensable. The mechanisms of combustion process in nature can also be used for reference during studying on social stability. In detail, the basic cause of social disorder, such as conflict between people and nature and the disharmony between persons, can be viewed as the burning material. The non-rational judgments, malicious attacks by hostile forces and deliberately one-sided interests of the chase will work as catalysis. When both of the above exist, even a small emergency become the ignition temperature or the last straw, thus result in mass incidents with a certain scale and some impact and then cause social instability and discord at last[3]. This research studies on the mechanism of opinion formation based on social combustion theory. There are wide ranges of attitudes, discussions and demands as a collection of burning material. The hierarchical structure in ba of the public opinion will create more opportunities to move closer to consensus, which can be viewed as social “catalysis”. What ultimately triggers the formation of public opinion is usually an unexpected incident or an authoritative source of speech. That is “the last straw”. 2.2 To Determine Formation of Opinion Public opinion during the process of collection and formation is presented with two forms[4], view flow and action flow. With view flow, the public continually express
320
Y. Liu, W. Niu, and J. Gu
their opinions on some of the social reality and problems to vent their unhappiness. If a high degree of consensus is achieved among the view flows and the demands of the people can not be met, view flow will upgrade to action flow. That is, the individual and unprompted actions become organized and purposeful campaign, to promote the mass outbreak of emergencies. The critical point of opinion formation is the moment when view flow upgrades to action flow[5]. Some of the social injustice invoked a psychological dissatisfaction of people, which plays an important role in preparation for the opinion formation. Those voices of appeal and cumulative negative effects, and so on, can be viewed as burn the material from the people's suffering. Besides, some sensitive words, such as "the rich", "money", "official" and "corruption" etc. will work as catalysis of pricking up public discontent. With the above to preconditions, even a small event can play a role of the last straw. At this moment, the three conditions of opinion formation are in place and a consensus of opinion has been reached. Without active response, the mass incidents leading to crimes against property and social stability will eventually happen. As a result, research on opinion formation should focus more on the mass incidents caused by the day-to-day events, analyze the opinions against social order and stability derived from public debate or views through continual friction and integration, and then give a correct guidance or even destroy this destructive force in a timely manner to avoid unexpected incidents and protect the security of the people and society property. As an important part of early warning system, Public opinion research will take a prediction role through the grasp of opinion formation mechanism.
3 Exploring Opinion Evolution Process Based on Social Shock Wave Theory 3.1 Social Shock Wave Theory The shock wave is one of the most important phenomena in the high speed of gas movement process. It is the strong compression wave produced by strongly compressed gas, also known as strong inter-section. The thin interruption is called as shock wave[6]. In this thin layer, speed, temperature, pressure, and other physical quantities changes quickly from the wave front value to wave behind value. Also, the gradient of speed, pressure and temperature are great. Therefore, the shock wave theory is not very concerned about the flow in wave, but just explores changes of physical quantities after going through the shock wave. At present, some ideas are absorbed from the shock wave theory to solve complex social problems, especially for those problems with wave phenomena, such as traffic[7], the flow of people[8], etc. The crowded can be viewed as a continual medium because any crowd disturbance is spread in the crowd with the form of waves. Besides, due to individual differences, non-linear distortion occurs on waves, which may result in the shock wave, crowded accident. Some methods, such as Ising model[9-12], Sznajd model, Deffuant model, KrauseHegselmann model[12-14], Rumor Spread model, Bankruptcy model, and Monte Carlo model have been proposed to study the process of opinion formation based on the social shock wave theory.
Study on Public Opinion Based on Social Physics
321
3.2 Modeling for Opinion Evolution Public opinion is a spread of the surface sense, exhibited up and down. Because people accept the views with different speed ability, there is different intensity of reaction. As a result, a wave of ups and downs can be felt due to such a gap caused different intension of spread. This status has been called the "wave of public opinion"[15]. Wave of public opinion is spread with non-linear form and involves some people as participants. During opinion infection, the behaviors of the participants can be classified [16]. This paper summarized them as “conformity”, “power” and “egoism”. In detail, “conformity” involves more psychological factors. Participants are fear of loneliness and obey to majority. “Power” mainly involves the moral values. Whether power or prestige is decisive factor, which is especially important in China. “Egoism” is driven by people's values. For some benefit, people may even change their words and deeds. Therefore, the “conformity”, “power” and “egoism” are fundamental for the establishment of opinion infection simulation rules. Hypothesis: there are N opinion subjects, each of them owns viewpoint oi
,
where i = 1,2, L N .
:
Definition 1: The three basic elements of opinion infection involves { σ change of public behavior E environment of opinion infection t time of opinion infection}, as following,
; :
;:
Y = F (σ , E ; t ) where,
(1)
Y is speed of opinion infection on some social phenomenon or incident.
Definition 2: The choices of actions of the public include { S preference
:choice of individual
; S :the interaction between individuals }, it can be expressed as, '
σ = f (S , S ' ) Definition 3: Choice of individual preference is based on{ c : “conformity”, “power” ,
l
:“egoism”}, it can be expressed as, S = ψ (c , p, r , l )
(2)
p
: (3)
Among them, individuals i , with influence from the above-mentioned action models, will build their abilities in adhering to their originally owned viewpoints between time
t and t + 1 . Opinion diffusion is the process of choosing or being persuaded of each individual. Participants (or part of them) get agree on behavior finally. Therefore, the law of gravity can be referred to reflect change of individual behavior between moment t and
t + 1 due to interaction. That can be expressed as,
322
Y. Liu, W. Niu, and J. Gu N
σi = ∑k
oi • o j
j =1
where
, k is the constant coefficient, d
vidual
α ij
(4)
represents the distance between the indi-
i and individual j , α is the parameter of power, oi • o j describes the con-
sistency between the individual If
d ijα
i and individual j .
oi • o j > 0 , individual i has the same viewpoint as individual j , then indi-
i will hold the original viewpoint. If oi • o j < 0 , individual i has the opposite viewpoint to individual j , then, we
vidual
can take following two conditions:
σ > 0 , individual i when σ < 0 , individual i when
will hold the original viewpoint. will change its proposition.
4 Recognizing the Individuals Who Specifically Refers the Opinion Leader’s Behavior Based on Social Behavior Entropy Theory 4.1 Social Behavior Entropy Theory Social behavior entropy is the essence of social unrest. The entropy theory in physics field is used for preference to explain the composing of group from individuals. There are six principles of social behavior entropy theory[1], namely 1) Universal “minimal effort” principle, 2) Pursue “Minimum entropy” principle, 3) Keep “psychological balance” principle, 4) Sustain “EQ resonance” principle, 5) social orientation "was the trend-U" Principle, and 6) Long for social convention that limits any other people except himself. During the process of public opinion formation and evolution, we may mainly concern with “psychological balance” principle and sustain “EQ resonance” principle. “Psychological balance” principle. If the individual could calm down through persuasion and self-reflection after suffering some unfairness, great help can be provided for the building of a harmonious society. In other words, by persuading the participants can achieve self-acceptance, self-awareness, self-experience and self-control. Sustain “EQ resonance” principle. "EQ resonance" means that only the people who owns most respect, reputation or approbate is allowed to play a role for persuading. Whether to the public or to government leaders, an example is very significant. The individuals always unconsciously follow some of the rules from social behavior entropy theory. Moreover, the above two principles indicate that the individuals prefer to seek the emotional support and the dependence on attitude from opinion leaders. This explained the indispensability of opinion leaders.
Study on Public Opinion Based on Social Physics
323
4.2 Recognizing Opinion Leaders Public opinion is often spread through interaction between persons and realized the effect on changing attitudes and behavior of audience. Generally, opinion is transmitted from popular media to opinion leaders and in turn transmitted to people who the leaders want to influence, which is called secondary communication. Opinion leaders can be treated as audience and also leaders to influence audience. They play a very important role during opinion infection. Recognizing opinion leaders during opinion formation and evolution and then finding out their behavior mode and path is an important method to guide opinion infection. This article adopted social network analysis (SNA) method to identify opinion leaders. SNA is proposed in 1930s and enhanced in 1970s. It’s a new paradigm of sociological research. SNA is used to recognize quantitatively "opinion leaders" because this approach exactly described the relationship between the subjects of opinion. Each point in social network represents a participant. The connected points further represent a group of individuals with similar viewpoints. The role in network means the combination among points. Some other concepts such as point, edge, degree, betweenness, cutpoint, component, subgroup and centralization and so on are involved in SNA. The cutpoint is the point whose absence will divide network into segments. Such a point in is important to not only network but also the other point, also of important significance. As a result, cutpoint plays the "opinion leaders" role among the subjects of opinion. Algorithm for computing and finding out cutpoint will not be studied in detail here.
5 Conclusions With perspective of social physics, this paper tries to explore the mechanism of opinion formation and evolution based on social combustion theory, social shock wave theory and social behavior entropy theory. This research is significant to recognize essence of opinion and then guide opinion infection efficiently. According to result presented in this paper and previous research, it can be summarized that the key value of opinion is from its prediction and alert function, and proper propagandize from media is taken, as a way to guide opinion infection, to get full, harmony and sustainable development of society. Opinion keeps watch on stability of society. It can be used as the benchmark or wind vane to judge social stability and harmony. By having an alert for the potential mass incidents based on analysis and forecasting of opinion formation and evolution, the prediction and alert function of opinion is fully presented. Building a fair and harmonious social environment can not only inhibit the breeding ground for rumors also enhance the prestige and credibility of the government.
References [1] Fan, Z.M., Liu, Y.J., et al.: Social physics: The forefront of international research perspective. Science press, Beijing (2007) (in Chinese) [2] Niu, W.Y.: Social physics: significance of the discipline’s value and its application. Science, forum. 54(3), 32–35 (2001) (in Chinese)
324
Y. Liu, W. Niu, and J. Gu
[3] Niu, W.Y.: The Social physics and the warning system of China’s social stability. Bulletin of Chinese Academy of Sciences 1, 15–20 (2001) (in Chinese) [4] Liu, J.M.: The basis of opinion study. China Renmin university press, Beijing (1988) (in Chinese) [5] Liu, Y.J., Gu, J.F., Niu, W.Y.: Study on the Mechanism of Public Opinion Formation. In: Chen, G.Y. (ed.) Harmonious Development and Systems Engineering, Proceedings of the 15th Annual Conference of Systems Engineering Society of China, pp. 595–600 (2008) (in Chinese) [6] Zhi, Q.J.: A discussion on shock wave. Journal of Guizhou Normal University (Natural Sciences) 21(1), 25–27 (2003) (in Chinese) [7] Li, Z.L., Chen, D.W.: Study on the traffic congestion at bus stop based on traffic flow wave theory. Traffic and Computer 23(6), 62–65 (2005) (in Chinese) [8] Lu, C.X.: Analysis on the wave of pedestrians. China Safety Science Journal 16(2), 30– 34 (2006) (in Chinese) [9] Wu, Q.F., Kong, L.J., Liu, M.R.: Influence of person’s character upon the evolution of the cellular automata model for public opinion. Journal of Guangxi Normal University (Natural Sciences) 22(4), 5–9 (2004) (in Chinese) [10] Xiao, H.L., Deng, M.Y., Kong, L.J., Liu, M.R.: Influence of people’s moving on the opinion communication in the cellular automation public opinion model. Journal of Systems Engineering 20(3), 225–231 (2005) (in Chinese) [11] Zhang, Z.D.: Conjectures on exact solution of three - dimensional (3D) simple orthorhombic Ising lattices (2007), http://arxiv.org/abs/0705.1045 [12] Stauffer, D.: Sociophysics: the Sznajd model and its applications. Computer Physics Communications 146, 93–98 (2002) [13] Stauffer, D.: Sociophysics simulations. Arxiv, cond-mat., 1–8 (2002) [14] Stauffer, D.: Sociophysics Simulations II: Opinion Dynamics. Arxiv. Physics, 1–18 (2005) [15] Liu, J.M.: Principles of public opinion. Huaxia Publishing Co., Ltd., Beijing (2002) [16] Sha, L.S.: Social psychology. China Renmin university press, Beijing (2002) (in Chinese)
Context-Based Decision Making Method for Physiological Signal Analysis in a Pervasive Sensing Environment Ahyoung Choi and Woontack Woo GIST U-VR Lab., Gwangju 500-712, S.Korea {achoi,wwoo}@gist.ac.kr
Abstract. With the advent of light-weight, high-performance sensing and processing technology, a pervasive physiological sensing device has been actively studied. However, a pervasive sensing device is easily affected by the external factors and environmental changes such as noise, temperature or weather. In addition, it is hard to deal with the internal factors of a user and personal differences based on physiological characteristics while measuring physiological signal with a pervasive sensing device. To address these issues, we propose a context-based decision making method considering pervasive sensing environments in which it concerns users’ age, gender and sensing environments for detecting normal physiological condition of a user. From the research conducted, we found that the context-based physiological signal analysis for multiple users’ regular data showed reliable results and reduced errors. Keywords: Context-based decision making, Pervasive sensing environment, Physiological signal analysis.
1 Introduction Pervasive physiological sensing devices for daily monitoring have been studied extensively [1-2]. However, these devices are not commonly used by normal consumers because analysis results are fragile to environmental noise. In addition, these devices are easily changed according to internal changes and personal differences. Smart environments now provide a wide range of resources, such as distributed and embedded sensing devices. These environments are quite useful and practical in the area of physiological signal sensing. This involves information on numerous external factors, like outdoor temperature, weather, humidity, and luminance, as well as user profiles, which include information of user activity, energy expenditure, gender, and age. By fusing this contextual information, we can obtain more reliable analysis results from a noisy sensory input. Knowing the previous condition before measuring the physiological signal provides clues for more precisely understanding user’s status. However, general decision support system in health domain has commonly used statistical pattern classification method for analyzing the signal. They collected a large number of data and found out the general threshold to cover all different types of users. Wanpracha proposed the classification method to determine epilepsy from Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 325–332, 2009. © Springer-Verlag Berlin Heidelberg 2009
326
A. Choi and W. Woo
the EEG signal [3]. However, each patient had different classification results. In the field of pervasive sensing and analysis, previous work of physiological decision making has focused on filtering noisy signals. Asada developed a ring-type sensor and minimized its’ errors by using a reference signal on the other side [4]. Rosalind proposed a stress analysis program by using a wearable computer [5]. However, these studies were effective for filtering the motion artifact, but did not reflect the personal differences and adaptive analysis pertaining to individual users. Winston indicated the decision making method with user's activity information [6]. They decided that physiological status of current user should be integrated into activity inference results. In this work, we propose a context-based decision making method of physiological signal based on a probabilistic decision making process which considers users and environmental conditions. The information is analyzed based on the uncertainty of influencing factors. The proposed method supports the context adaptive signal analysis and improves the normal physiological status classification rate. In addition, this model provides an adaptive framework for a dynamic and changeable user’s condition during monitoring. For analyzing effectiveness, we collect normal physiological condition of multiple users and decide users’ status with standard, personalized, and group threshold values with a PhysioNet database. Finally, we conclude that the proposed context based decision making model has an effect on improvement of physiological status recognition. The following section of this paper is as follows. We explain the proposed analysis method in section 2. Section 3 shows the experiential setup and analysis results for verifying the proposed method. Finally we conclude in Section 4 and illustrate the future direction of this research.
2 Context-Based Decision Making of Physiological Signal Physiological signal status analysis during a certain period of time may increase the chances of an incorrect diagnosis. For example, the human heart beats faster after exercise and, if a user were to visit a hospital soon after exercising, the possibility exists for a misdiagnosis of heart problems. Therefore, the decision making with contextual information needs to be improved in order to accurately analyze both sensing conditions and users’ conditions. However, we do not know which factors is influence to the physiological signal and how much the factors are affected. Therefore, we need to include probability theory to model the decision making algorithm. There have been researched of decision making algorithm addressing uncertainties [7-8]. We find out ideal user status and classify normal conditions based on user’s type (gender, age) and other group models. The information of the current user’s model is not given. We assume that both data distribution and error have a Gaussian probability density function. T= {t1, t2, t3, …, tn} refers to the type of user information (e.g. normal and abnormal), and n is the number of types. M= {m1, m2, m3, …, mp} is the model of other groups (e.g. gender, age) where p is number of models. We abbreviate user state as notation u, while 0 and 1 are normal and abnormal conditions, respectively. In order to find the ideal user physiological status u, we apply the MAP decision making method. For maximizing probability of current user status, we model the following equation:
Context-Based Decision Making Method for Physiological Signal Analysis
u * = arg max P ( u | d , m , t ) = arg max u*
u*
P (u, d , m , t ) P (d , m , t )
∝ arg max P ( u , d , m , t )
327
(1)
u*
For simplifying the equation, we assume the following conditions: P (u, d , m , t ) = P (d | u, m , t ) * P (u, m , t ) P ( u , m ,t ) 64 4 4 7 4 44 8 = P (d | u, m , t ) * P (m | u, t ) * P (u, t )
(2)
P ( u ,t )
6 447 4 4 8 = P (d | u, m , t ) * P (m | u, t ) * P (t | u ) * P (u )
Finally we obtain a joint probability density function in terms of current observation d. If we assume that each joint pdf is a normal distribution from the energy equation,
E = e1 + e 2 + e 3 + f ( Δ u )
(3)
Where e1 is an energy function of P(d|u,m,t), e2 is an energy function of P(m|u,t), and e3 is an energy function of P(t|u).
u * = arg min E = arg min { e 1 + e 2 + e 3 + f ( Δ u )} u*
(4)
e 1 = d − f 1 ( u , h , t ), e 2 = m − f 2 ( u , t ), e 3 = t − f 3 ( u )
(5)
u*
Where, e1, e2 and e3 are as follows:
We define that e1 is a function of difference between individual distribution and group-type distribution. e2 is a function of difference between group distribution and ideal type distribution. Finally e3 is 0, because we assume that P(t|u) is constant, 0.5. The context-based analysis concept is described in Fig. 1. Most previous works apply a statistical approach to establish a standard threshold and then proceeds to analyze under this standard. In this case, accuracy of analysis results improves as size of the data increases. The basic concept of adaptive physiological signal analysis is depicted in Fig. 1(b). In the proposed context-based analysis with user type information, we assume that the problem domain is a pervasive sensing device which records personal data and contextual information in real time over a long period of time. Physiological information is labeled dynamically and the labeling information can be provided directly by user’s input or by sensing information from heterogeneous sensors and services. We utilize the context labeled physiological signal information in user database to determine physiological status individually. For data labeling, we utilize contextual information. In order to solve the problem, we first estimate the ideal signal of distribution by assuming that observation and ideal signal are very similar and noise information is quite small. From the observation, we estimate current ideal user signal. Then, we estimate density of the proposed signal by using a normality test of signals obtained from three channel sensors. If distribution of measurement is normal, we model the signal in a parametric manner, Gaussian distribution. If distribution is abnormal, we model measurements in a non-parametric method.
328
A. Choi and W. Woo
(a)
(b)
Fig. 1. Concept diagram (a) Previous standard Decision making (b) Context-based Decision making
In estimation step, we use Kalman filter and assume the ideal estimation based on observations. In addition, we assume that X includes x1 and x1 (dimension: 2) and x1(t) and x2(t) are mutually independent. In case of states variation according to time is const. The current observation y(t) consists of original states and noise is white Gaussian noise. After getting the final decision making model, we compute differences between previous and current estimate parameters. If differences are small, we ignore changes of estimated results. However, if estimation shows a large distinction, we update user model parameters. Finally, in classification step, we compute the classified result assuming a 95% confidence interval.
3 Experimental Analysis For this experiment, we evaluated the proposed context-based decision making method with real observations from the measurement equipment. We made use of Normal Sinus Rhythm RR Interval Database and Congestive Heart Failure RR Interval database in PhysioBank [9]. In Normal Sinus Rhythm RR Interval database, RR interval of heart rate was obtained from 54 normal subjects. 30 subjects were male aged 28.5 to 76 and others were female aged 58 to 73. Congestive Heart Failure RR Interval data base included subjects' heart failure measurement data (NYHA classes III). The subjects were aged 34 to 79. For the analysis, we selected 18 subjects from data set, 9 sample measurements from abnormal condition subjects and the others from normal subjects. Among the time series measurement, we selected 5 minutes RR sample series from each subject. The measurements were preprocessed for correcting artifacts with smoothing method and for removing 3rd order trends of RR interval. After collecting the measurement data, the ideal pulse signal was estimated using Kalman filter. After estimating the pulse signal, we computed feature from RR interval to compute heart rate, because this factor was able to characterize signals in the time domain. We estimated a density function to determine normal and abnormal conditions of current observation. Since there were numerous density estimation methods, we first checked the Kolmogorov-Smirnov tests in MINITAB to verify the normality of the collected data. Finally, we obtained a probability density function about each data set.
Context-Based Decision Making Method for Physiological Signal Analysis
329
Fig. 2. Concept diagram (a) Previous standard Decision making (b) Context-based Decision making
In decision making step, we determined whether current condition of a subject was normal or abnormal by several thresholds such as an individual threshold, a group threshold, and a general standard threshold. In case of a standard threshold, we referred range under 100bpm because we just collected fast heat beating condition of abnormal subjects. For a group threshold, we categorized subjects into two groups with the context of gender and age. Gender group had two classification criteria, male and female. Age group was categorized into three ranges, 20-39 aged people group, 40-59 aged people group, and 60-79 aged people group. Individual threshold was computed by individual distribution following 95 percentage certification interval of each density distribution. From the experiment, overall classification performance with group threshold increased as displayed in Fig.2. When we applied the standard threshold under 100bpm, most classification errors were significantly reduced in case of normal sinus interval group. On the other hand, subjects who had heart-related disease (NYHA classes III) had large classification errors. However, age-gender group analysis kept the classification rate in normal subjects’ case as well as abnormal subjects’ case as shown in Fig.2. Average error rate in personalized analysis, group analysis with agegender context, and standard analysis was 5%, 19.23%, and 33.13%, respectively. From these experiments, we concluded that the group analysis with age-gender contexts had a positive effect on physiological status classification results comparing to other deterministic and static classification threshold. In addition, we compared the classification result in detail with several group thresholds as shown in Fig.3 and Fig.4. For analysis, we checked types of errors in
330
A. Choi and W. Woo
each distribution, which were Type 1 Error and Type 2 Error as in Fig.3(a). In case of Type 1 error, we defined that we had a positive result ("The subject was healthy") but it was from unhealthy subjects. In case of Type 2 error, we missed correct result of normal condition subject. Fig.3(b)-(c) indicated the distribution of each gender group. From the observation of Fig.3(d), we found that overall false classification ratio was reduced when we applied gender group threshold to decide the heart status. In addition, grouping analysis with age context also indicated lower classification error than standard analysis. However, in both cases, Type 2 errors increased because standard threshold extended possibility to detect normal condition subjects. From this experiment, moreover, we observed that group analysis with age context indicated more beneficial classification result rather than applying age context. As a result, we concluded that the classification error rate was reduced by group based decision making method as well as personalized decision making method. Accordingly, we found that age and gender context, especially age context, could be used to estimate the current user status to some extend without knowledge of the user's current density distribution.
(a)
(b)
(c)
(d)
Fig. 3. Gender group classification result (a) Definition of Type of Error (b) Distribution of male group (c) Distribution of female group (d) Classification result
Context-Based Decision Making Method for Physiological Signal Analysis
(a)
(b)
(c)
(d)
331
Fig. 4. Age group classification result (a) Distribution of age 20-39 group (b) Distribution of age 40-59 group (c) Distribution of age 60-79 group (d) Classification result
4 Conclusion and Future Work In this work, we propose a context-based decision making method of physiological signals which achieves better results than other deterministic methods; standard threshold. The proposed method supports the probabilistic decision making method with the context of gender and age. From the experiment conducted, normal status decision result of heart rate with the context of gender and age, produces better classification results than by applying standard threshold. We expect that user type context information as well as gender and age information can also improve normal heart status detection ratio. For future study, we will analyze the lower heart failure case with other database to complete analysis. In addition, we will extend the context information for grouping to user body constitution and clinical history. Furthermore, we will build the model to estimate the current status from the user’s history, other group models, and user type with pervasive sensing devices by applying the observed relationship. Acknowledgments. This research was supported by the CTI development project of KOCCA, MCST in S.Korea.
332
A. Choi and W. Woo
References 1. Robert, M., Neil, J.M., Paul, H., Peter, J.T., Martin, A.S.: A Wearable Physiological Sensor Suite for Unobtrusive Monitoring of Physiological and Cognitive State. In: IEEE EMBC 2007, pp. 5276–5281. IEEE Press, New York (2007) 2. Urs, A., Jamie, A.W., Paul, L., Gerhard, T., Francois, D., Michel, B., Fatou, K., Eran, B.S., Fabrizio, C., Luca, C., Andrea, B., Dror, S., Menachem, A., Etienne, H., Rolf, S., Milica, V.: AMON: A Wearable Multiparameter Medical Monitoring and Alert System. IEEE Transactions on Information Technology in Biomedicine 8, 415–427 (2004) 3. Wanpracha, A.C., Oleg, A.P., Panos, M.P.: Electroencephalogram (EEG) time series classification: Applications in epilepsy. Annals of Operations Research 148, 227–250 (2006) 4. Asada, H.H., HongHui, J., Gibbs, P.: Active noise cancellation using MEMS accelerometers for Motion tolerant wearable bio-sensors. In: IEEE EMBC 2004, pp. 2157–2160. IEEE Press, Los Alamitos (2004) 5. Rosalind, W.P., Charles, Q.D.: Monitoring stress and heart health with a phone and wearable computer. Motorola Offspring Journal (2002) 6. Winston, H., Wu, M.A., Batalin, L.K., Au, A.A., Bui, T., William, J.K.: Context-aware Sensing of Physiological Signals. In: IEEE EMBC 2007, pp. 5271–5275. IEEE Press, New York (2007) 7. Dianne, J.H., Robert, A.D.: Engaging multiple perspectives: A value-based decision-making model. Decision Support Systems 43, 1588–1604 (2007) 8. Meltem, O.z., Alexis, T.: Modelling uncertain positive and negative reasons in decision aiding. Decision Support Systems 43, 1512–1526 (2007) 9. Goldberger, A.L., Amaral, L.A.N., Glass, L., Hausdorff, J.M., Ivanov, P.C., Mark, R.G., Mietus, J.E., Moody, G.B., Peng, C.K., Stanley, H.E.: PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 101(23), e215–e220 (2000)
A Framework of Task-Oriented Decision Support System in Disaster Emergency Response Jun Tian, Qin Zou, Shaochuan Cheng, and Kanliang Wang Management School of Xi’an Jiaotong University, Xi’an, China 710049
[email protected],
[email protected],
[email protected],
[email protected] Abstract. Based on the analysis of organizing of rescuing process of Wenchuan Earthquake in China, the paper developed a task-oriented management model to deal with the disaster emergency response. The management mechanism of task generating in emergency response has been established. Four kinds of task generation mechanism have been studied and three decision-making patterns have been suggested. The routings to produce task system were discussed, which could dispose the essential task into sub-task and form the task system through the processes of Work Breakdown Structure (WBS). A framework of decision support system in emergency response has been proposed, which based on the Hall for Workshop of Mate-synthetic Engineering. It could help the operation team to transfer the predetermined plan to execution plan in emergency response and to assign and dynamic supervise the task system.
1 Introduction The emergency plan systems are the most important links in disaster emergency management, which can prevent the tense, disorder and chaotic situation after the thunderbolt actually happened and guarantee the rescuing activities developing rapidly, orderly, and effectively thus might reduce the nonessential loss and casualties [1, 2]. According to the theory of disaster emergency management, the disaster generally forms a cycle that can be divided into phases of mitigation, preparedness, response and recovery [3]. The former two stages are before the occurrence of the disaster, and the latter two stages are behind. The plan established before the disaster occurrence (in preparatory stage) is generally called the emergency predetermined plan [4]. When the disaster occurred, the actual situation might possibly be different with the beforehand tentative, so the predetermined plan is often facing with the questions of whether it can be carried out as well as how to carry it out. There must be an executive plan to put the predetermined plan into actions basing on the actual conditions of the scene and the available resources. From the view of emergency response operation and coordination, there needs a set of method to handl\e and organize the action scheme on which the dynamic execution plan is turn out and the essential tasks are generated and managed in right sequence and effective manner to carrying on. This paper will study the mechanism of task generation and its decision-making pattern according to the analysis of the case of Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 333–336, 2009. © Springer-Verlag Berlin Heidelberg 2009
334
J. Tian et al.
Wenchuan earthquake relief. A framework of task-oriented decision support system based on the hall for workshop of meta-synthetic engineering will be proposed.
2 Revelations from Wenchuan Earthquake Relief Wenchuan big earthquake brought the tremendous disaster to China. The emergency rescuing work are unprecedented huge, and the process in relief operation facing with complex situation and many unexpected difficulties. But the rapid response and the highly effective reaction Chinese government manifested in the first time has caused gazing of the world. The speedily generation of Emergency task and its deploying had win the precious time for rescues. After receiving the disaster emergency report, Chinese government immediately established general disaster relief headquarters of the State Council to resist earthquakes. Nine Essential Tasks had promptly been clear about the earthquake resistance disaster relief which include: Emergency search and rescue, Hygienic guard, Disaster forecasting and monitoring, Life placement, Infrastructures guaranteeing and recovering, Production restoration, Social security, Social communication and propaganda, Water conservation facility management [5], and so on. Based on the essential tasks, the main correlative duties were made clear and sub-tasks were disposed out. Following the emergency task produced, the big duty groups system have been established take on the nine essential tasks. The management of resisting this natural disaster can be called “task-oriented” process. Its character is to generate, make in order, break down, arrange, carry out, and supervise of all tasks.
3 The Mechanism of Task Generation in Emergency Response Through observing and summing up the operation process of Wenchuan earthquake disaster, as well as the process dealing with the snow disaster occurring in south China at the beginning of 2008, we can conclude and try out four kind of essential task generation mechanisms which are organization leading, resource leading, problem leading and mixed leading. (1) When organization is the restrict factor to essential task production, the mechanism could be called organization leading. The main essential tasks are put out according to the function and duty of organizations existed combining with the needs of disaster rescuing [9]: (2) Resource leading is taking the resources as the main conditions. In this sentiment situation, firstly according to the disaster situation need, analyzes the corresponding resources demand, according to the resources usability condition, the arrangement resources reassignment and the disposition, take the resources disposition as the core, forms the essential duty. (3) The pattern to produce tasks according to the real problems the disaster scene emergency actual needed is called problem leading. The corresponding task system should be established to fulfill the goal of demand of reducing threaten. The problem leading pattern is taking the demands as the guidance. The random factors or the individuality factor hold leadership.
A Framework of Task-Oriented Decision Support System
335
(4) There may need two or more leading patterns to produce task system. This task production pattern could be called mixed pattern. This may including more than two kinds of essential factors which are interdependence and interactive with each other in the process of the task production. According to FEMA, there are also four styles of decision making based on who makes the decision which includes: individual, consultation, group, delegation[6]. For satisfying the requirement of real situation needs, four kinds of different decisionmaking way could be considered, they are: template based decision, leaders team decision, expert decision, public decision.
4 A Framework of Task-Oriented Decision Support System According to the task-oriented management requirement and task production mechanism in emergency response operation, a framework of task management decision support system can be proposed which based on the Hall for Workshop of Metasynthetic Engineering [8] as shown in Figure 1.
Fig. 1. Framework of task-oriented decision support system in disaster emergency response
336
J. Tian et al.
5 Conclusion The task-oriented mechanism and the emergency management decision support system have following characteristic: (1) Integration of information and functions. (2) Dynamic interactivity between people with computer. (3) Serviceability. Although this set of method statement, is in passes on the earthquake relief organization process summary and in the refinement foundation to the article raises, the corresponding rule and the flow have the versatility, definitely may apply similarly other arises suddenly in the natural disaster handling process.
References [1] Alexander, D.: Principles of Emergency Planning and Management, pp. 4–11. Oxford University Press, New York (2002) [2] Reibstein, R.: Preventive Preparedness: The Highest-Value Emergency Planning. Environmental Quality Management, J., 13–19 (Winter 2005) [3] Chen, W.-F., Scawthorn, C.: Earthquake Engineering Handbook, pp. 10–15. CRC Press, LLC (2003) [4] Liu, T. (ed.): Emergency system construction and emergency predetermined pan compiling, pp. 13–14. Enterprise Management Press, Beijing (2004) [5] Task Management, Huayan Software, http://www.hotpm.com/products/hotoa/module.jsp?moduleID=36 [6] Announcement of the Constitution of National State Headquarter Team to Relief the Earth Quake of Sichuan, Government Information Website (18-05-2008), http://www.nlc.gov.cn/zfxx/2008/0518/article_320.htm [7] FEMA, Decision Making and Problem Solving (2005) [8] Yang, D.: The Hall for Workshop of Meta-synthetic Engineering from Quality Analysis to Quantity Methods, http://web.tongji.edu.cn/~yangdy/guide.html [9] The structure of Three Systems, in: China Earth Quake Information Net (05-06-2008), http://www.csi.ac.cn/manage/html/4028861611c5c2ba0111c5c558b 00001/_content/08_06/05/1212644646637.html [10] Website of the Department of Homeland Security of United State of America, http://www.whitehouse.gov/deptofhomeland/sect4.html [11] Tian, J., et al.: DSS development and applications in China. Decision Support Systems, J. 42(4), 2060–2077 (2007) [12] Zhang, X.-x., Zhang, P.: Research on visualization of group decision argument opinion’s distributing—Design and development of electronic common brain audiovisual room. J, Chinese Journal of Management Science 4 (2005) [13] Mak, H.-Y., et al.: Building online crisis management support using workflow systems. Decision Support Systems 25, 209–224 (1999) [14] Hirokawa, R.Y., et al.: Understanding the Sources of Faulty Group Decision Making: A Lesson from the Challenger Disaster. J, Small Group Behavior 19(4), 411–433 (1988) [15] Fan, W., Yuan, H.: Analysis of the Status Quo of Emergency Information Platform Construction of Our Country. J, Chinese Journal of Informationization Construction 10 (2006) [16] Rodriguez, H., et al.: Hand book of earth disaster research, pp. 1–15. Springer, Heidelberg (2006) [17] Wilkenfeld, J., Kraus, S., Holley, K.M., Harris, M.A.: GENIE: A decision support system for crisis nagotiations. J, Decision Support Systems 14, 369–391 (1995) [18] Weeks, M.R.: Organizing for disaster: Lessons from the military. J, Kelley School of Business 50, 479–489 (2007)
Study on the Developing Mechanism of Financial Network Xiaohui Wang1, Yaowen Xue1, Pengzhu Zhang 2, and Siguo Wang1 1
School of Economy and Management, Taiyuan University of Science and Technology, Shanxi 030024 2 School of Management, Shanghai Jiaotong University, Shanghai 200052, China
[email protected] Abstract. Financial network is a capital flow network made up of a great number of account nodes. Based on the theories of Economic Physics Behavior Economics and Complex Networks, the developing model of financial network has been constructed from the point of the weight properties of vertices and edges in financial network. According to the parsing of the model, it presents that the fi-
、
nancial network shows a power-law degree distribution
(p
k
∼ k −2 )in
the
condition that the time tends to infinity. Finally, the degree distribution of financial network is simulated with the experimental data on this thesis. Keywords: financial network, weight, degree distribution, developing mechanism.
1 Introduction Financial network is the path of capital flow in the economic system. The decaying or generation of an account may lead to structurally change in the financial network; a financial incident may give rise to local financial instability and global financial crisis. So, studying on the financial network topology and the developing and decaying mechanism of financial network are not only theoretical significance but also practical significance. By constructing and analyzing the developing model we mainly research the characteristics of complex network and statistical nature of financial network, which can be used as the theory foundation to study financial crisis mechanism and anti-destruction mechanism of the financial network in the future.
,
2 Assumption of the Developing Model of Financial Network The capital flow among the accounts is assumed only in three conditions: deposits, withdrawals and transfers. In this paper, the research object is the growth mechanism of financial network. Therefore, the assumption here is only to consider the addition of new account node and the capital flow among the old nodes which are deposits and transfers. By constructing the weighted network with directed links we can reflect the flux, direction and velocity of capital flow among the account nodes. Here, in order to Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 337–344, 2009. © Springer-Verlag Berlin Heidelberg 2009
338
X. Wang et al.
simplify the developing model of financial network, the assumption is the weight of edges and vertices is only decided by the flux of capital flow. In addition, we assume that the capital flow among the account nodes be continuous on condition that taking the appropriate time interval.
3 Construction of the Developing Model of Financial Network 3.1 Evolution Rules The change of the financial network topology is a gradual process; the process is sketched as follows (see Fig. 1):
Fig. 1. The evolution of the financial network
Note: the dots denote the bank accounts; the directed edges stand for the relationship of capital-transferring among the accounts. On the basis of nodes and edges of financial network, each node and edge was given a certain weight denoted by the weight of node and edge (see Fig. 2). The chart shows that the larger the shadow nodes is the higher weight of Fig. 2. The weight of the nodes gains. The principle of the edges is similar to the nodes and edges of finanvertices. cial network In the weighted financial network constructed in this paper, the weight of edges is decided by the ratio of capital amount among nodes in the total capital flow. It is described by a matrix wij ,
,
which stands for the weight between nodes i and j ( i size of financial network ). wij is shown as follows:
w ij =
S ij ( t ) k oi u t ( t )
∑
j =1
S ij ( t )
w ji =
= j = 1, 2,.....N , N is the S ji ( t )
k
i
(t )
∑ in
j =1
S ji ( t )
Study on the Developing Mechanism of Financial Network
Note: Sij (t ) denotes the capital flow from account node i to of t , ( Sij (t ) ∈ R );
339
j in the time interval
S ji (t ) stands for the capital flow from account node j to i in
the time interval of t , ( S ji (t ) ∈ R ). kout (t ) is the out-degree of node i in the time i
interval of
t ; kini (t ) denotes the in-degree of node i in the time interval of t . And
the more the amount of capital flow among nodes is gains. If
,the higher weight of the edges
i ki (t ) denotes the degree of node i , then ki (t ) = kout (t ) + kini (t ) .
In the financial network, the weight of edges reflects the interaction among the nodes. The weight of nodes is composed of the weight of edges connected with these nodes, and the weight of nodes originates from the weight of edges. The formular is as follows:
si =
∑ w +∑w ij
i j∈kout
Note:
ji
j∈kini
i si stands for the weight of node i , kout (t ) and kini (t ) are the same as
above. The weight of nodes reflects the situation of connectivity and information between the nodes and the edges comprehensively, which is the total response between the nodes and the edges. Based on the basic assumptions of the evolution of financial network, the character of account nodes and edges can be defined and expressed by the weight of nodes and edges. Both the character of the nodes and the edges codecide the developing mechanism of financial network. Moreover, compared with the original model, the financial network is also in the process of dynamical evolution with the growth of the system, and so does the weight of the nodes and edges (see Fig. 3). As the chart shows, the dotted line stands for potential connection of nodes Fig. 3. The evolution of the weight whose weight grows because the edges increase. of the nodes and edges of financial The evolution rules of financial network can be network summarized as follows: (1) With the addition of a new account node, the probability of connecting with any existing account is proportional to the weight of existing node si . (2) The original unconnected account nodes re-establish connections and the preferred probability of which is proportional to the product of the weight of two nodes si s j . 3.2 Model Construction The growth process of financial network mainly contains the following steps: (1)Developing: There are N 0 nodes and e0 edges in the financial network initially, and at each time step, a new account node is added to the financial network.
340
X. Wang et al.
Assuming that the new account node connect with a previous node which has existed in the financial network according to the probability of
p through m
n0 w0
( m ≤ N )new edges
s i → s i + w0 + δ
i
0
added (see Fig. 4). Firstly, the weight w0 is given to each new edge, and then the weight of the node i will also be added with the increase of the new connection edge between the two account nodes
Fig. 4. The developing model of the financial network denoting the increase f d
n0 and i . The situa-
si of the account node i changes is: si → si + w0 + δ , and δ is a random disturbance item. The reasons for the occurrence of δ is the new account node n0 sets up the connection with the original account node i ,and promotes the interaction between the account node i and the near account nodes, which also increase the weight of the account node i cor-
tion that the weight
respondingly and the increasing value is δ . The change of the weight between the account node i and its near account node j is Δwij . It can be denoted by the following
w → w + Δwij
ij formula: ij Δwij can be expressed as
: Δ wij = δ
wij si
(2) Preferred connectivity: The new added nodes’ preferred choices in the financial network are the account nodes with greater weight and creates the links based on the evolution rules of the financial network. Its preferential connecting probability is ∏( n0 → i ) ,
∏(n0 , i ) =
si → si + w1 + δ1
i
si N
∑s j =1
j
(3) The connections among the original account nodes: In the process of the growth of the financial network, the original unconnected account nodes may be re-established connections at any time(see Fig. 5). Assuming the connection is set up between the original account node i and
w1
Fig. 5. The developing model of the financial network denoting the increase of edges
its near node j by the probability
q (q = 1 − p ) , and there are m new edges established at every interval, so, the change of the node weight is: si → si + w1 + δ1 .
Study on the Developing Mechanism of Financial Network
341
s j wij s (t) dsi s s (t) s w w1 + ∑ s δ1 is ) = pm( N i w0 + ∑ ⋅ Ni δ ) + qm( N j dt ∑l sl s j ∑l sl ss ∑s j ∑ss (t) ∑sj (t) j =1
s=1
j =1
Together with the three steps above, the value that the weight of the account node
i varies in the unit time can be achieved: Without considering the change of the weight of the related account nodes resulted from the interaction among the account nodes, the above formula can be abbreviated as: According to this developing process, the interaction of two types connection mechanisms among the account nodes has promoted the growth and evolution of the whole financial network. Take continuance in time into account, the change of the degree and the weight of the account node can be expressed respectively as:
si(t+Δt)−si() t =[
t ∑ w(t+Δt)+ ∑ w(t+Δt)]−[ ∑ w()t + ∑w ()] ij
ji
j∈kiout (t+Δt)
ij
j∈kiin(t+Δt)
d k i (t ) = p ⋅m ⋅ dt
si (t ) N
∑
j =1
s j (t )
+ q ⋅m ⋅
j∈kiout (t)
s j (t )
∑
s =1
si (t )
⋅
N
ji
j∈kiin(t)
N
∑
s s (t )
j =1
s j (t )
k iout ( t +Δt ) k iin ( t +Δt ) dsi (t ) wij ( Δt ) d j + ∫ i w ji ( Δt )d j =∫i k out ( t ) k in ( t ) dt
4 Analysis of the Developing Model of the Financial Network Assuming that the mean weight of out-degree edges is w1 the in-degree edges is w2 .So,
∫
k iout ( t +Δt )
k
i out
(t )
,and the average weight of
w1 , w2 obey the law of normal distribution in (0-1). k i in ( t +Δt )
wij ( Δt ) d j + ∫ i k
in
(t )
w ji ( Δt )d j = w1 ⋅ Δk i out (t ) + w2 ⋅ Δk i in (t )
Also w is assumed as the mean of the weight of edges.
si ( t ) = w ⋅ Δ k i ( t ) ⋅ t ≈ w ⋅
dk i (t ) ⋅t dt
dk i (t ) p 1 = 2 2 dt m λ w t λ wt So
ki (t ) = −
p 1 p +C = − ln t ln t + C (t → ∞) 2 λw λw mλ w t
Then p ( k ( t )) =
d p ( k i (t ) ≤ k ) = − p(x dk
0)
t t + N0
(
c1 ) k2
342
As
X. Wang et al.
t →∞ p k k − γ (γ = −2)
The conclusion is that the financial network shows a power-law degree distribution on the condition that the time tends to infinity. with the exponent Certain statistical features of financial network can be found basing on the analytic results: the financial network has scale-free characteristics.
5 Simulation In this paper, we adopt the experimental data provided by the simulation platform of capital flow in financial network basing on agent. Experimental data is shown as follow: (see Fig. 6)
Fig. 6. The information of transaction among accounts
Randomly selecting 20000 accounts from a large number of experimental data, the topology can be obtained from these accounts by programming. Due to numerous accounts, it is hard to distinguish the network topology generated by the UCINET software. Here only the relationship among the partial accounts is provided (see Fig. 7).
Fig. 7. The partial accounts nodes network topology
Study on the Developing Mechanism of Financial Network
343
Using the UCINET and MATLAB software, the degree distribution of the financial network is simulated by adopting those data in this paper and it is shown as follows (see Fig. 8).
Fig. 8. The degree-distrition
The degree distribution of the financial network growth model is close to a straight-line in the log-log coordinate system by virtue of the node degree logarithm and probability and with the help of MATLAB software.The fitting distribution chart is shown as follows (see Fig. 9).
Fig. 9. The degree-distribution fitting
In contrast to the actual transfer data, there are certain errors because the experimental data is produced by the financial network simulation platform. Some differences exist between the slope of the fitting straight-line of the degree distribution in the bilogarithmic coordinate system and the parsed result of the model. Although there are certain errors, the result still reflects the characteristics of the financial network which shows a power-law degree distribution.
6 Conclusion This article studies the degree distribution in the process of dynamic evolution from the view of the growth of the financial network, and this is only the preliminary
344
X. Wang et al.
research on the financial network. As for any reslistic networks, it is impossible that the network will constant growth but not decay. So, the nodes of the financial network in this paper will constantly increase with decay, meanwhile, the decay of the edges is also considered in the future. When financial network grows accompanied by the recession, the different probability of the growth and recession would exert a different influence on the network. However, as the probability of growth and recession attaining to a certain ratio, whether or not the whole network will produce the financial crisis is not sure; once the crisis happens, the survivability and security which the whole financial network possess will serve as the further research of the author.
References [1] Xue, Y., Zhang, P., Fan, J., et al.: Research on criteria for identifying abnormal capital flows in financial networks. Chins. Soft Science 9, 57–62 (2004) [2] Barabási, A.L.: Evolution of the social network of scientific collaborations 10 (April 2001) [3] Yook, S.H., Jeong, H., Barabasi, A.-L., Tu, Y.: Phys. Rev. Lett. et al 86, 5835 (2001) [4] Newman, M.E.J.: The Structure and Function of Complex Networks. SIAM REVIEW, Society for Industrial and Applied Mathematics 45(2), 167–256 (2003) [5] Barrat, A., Barthélemy, M., Vespignani, A.: Phys. Rev. Lett. 92, 228701 (2004) [6] Adamic, L.A., et al.: Power-Law Distribution of the World Wide WebScience 287, 2115a (2000) DOI: 10.1126/science.287.5461.2115a [7] Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286, 509– 512 (1999) [8] Yook, S.-H.: Modeling the Internet’s large-scale topology, doi:10.1073/pnas.172501399 [9] Newman, M.E.J., Strogatz, S.H., Watts, D.J., et al.: Random graphs with arbitrary degree distribution and their applications. Phys. Rev. E 64, 026118 (2001) [10] Zhuang, X., Huang, X., Nie, H.-m., et al.: The model and optimization of financial networks. Chinese Journal of Management Science 11(2), 7–10 (2003) [11] Xue, Y., Zhang, P., Fan, J., et al.: Analysis of money laundering utility and path of agent nodes based on cost constraints. Journal of Tsinghua University(Science and Technology) S1 (2006) [12] Xue, Y., Zhang, P., Fan, J., et al.: Design and realization of supervision platform of simulating capital abnormal flow in complex financial network. Systems Engineering-Theory Methodology Application 14(5), 449–453 (2005)
Solving Sudoku with Constraint Programming Broderick Crawford1 , Carlos Castro2, and Eric Monfroy3 1
Pontificia Universidad Cat´ olica de Valpara´ıso, Chile and Universidad T´ecnica Federico Santa Mar´ıa, Chile
[email protected] 2 Universidad T´ecnica Federico Santa Mar´ıa, Chile
[email protected] 3 LINA, Universit´e de Nantes, Nantes, France and Universidad T´ecnica Federico Santa Mar´ıa, Valpara´ıso, Chile
[email protected] Abstract. Constraint Programming (CP) is a powerful paradigm for modeling and solving Complex Combinatorial Problems (generally issued from Decision Making). In this work, we model the known Sudoku puzzle as a Constraint Satisfaction Problems and solve it with CP comparing the performance of different Variable and Value Selection Heuristics in its Enumeration phase. We encourage this kind of benchmark problem because it may suggest new techniques in constraint modeling and solving of complex systems, or aid the understanding of its main advantages and limits.
1
Introduction
The Constraint Programming has been defined as a technology of Software used in complex system modeling and combinatorial optimization problems. The main idea of this paradigm is to model a problem by mean of a declaration of variables and constraints and to find solutions that satisfy all the constraints. Constraint Programming community uses a complete search approach alternating phases of constraint propagation and enumeration, where the propagation prunes the search tree by eliminating values that can not participate in a solution [Apt, 2003]. When enumerating two decisions have to be made: What variable is selected to be instantiated? and What value is assigned to the selected variable? In order to support these decisions we use enumeration strategies, then the enumeration strategies are constituted by variable and value selection heuristics [Monfroy et al., 2006].
2
Variable Selection Heuristics
The main idea that exists within the choice of the next variable, is to minimize the size of the search tree and to ensure that any branch that does not lead to a solution is pruned as early as possible, this was termed as the ”fail-first” Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 345–348, 2009. c Springer-Verlag Berlin Heidelberg 2009
346
B. Crawford, C. Castro, and E. Monfroy Table 1. Enumeration Strategies = Variable + Value Selection Heuristics S1 = MiD+SVal S2 = MiD+GVal S3 = MiD+AVal S4 = MiD+GAV S5 = MaD+SVal S6 = MaD+GVal S7 = MaD+AVal S8 = MaD+GAV
principle by Haralick and Elliot [Haralick and Elliot, 1980], described as ”To succeed, try first where you are most likely to fail” [Smith, 1996]. In this work we used the following 2 variable selection heuristics. Minimum Domain Size (MiD): at each enumeration step the domain of each one of the variables not yet instantiated is analyzed, then the variable with smaller domain size is selected; and Maximum Domain Size (MaD): the idea of this heuristic is similar to the previous one, nevertheless in this case it selects the variable with the greater domain size.
3
Value Selection Heuristics
In choosing the value, we can try, if it is possible, a value which is likely to lead to a solution, and so reduce the risk of having to backtrack and try an alternative value (”succeed-first” principle[Smith, 1996]). In this work we used the following 4 value selection heuristics. Smaller Value of the Domain (SVal ): this heuristic establishes that the smallest value of the domain is always chosen; Greater Value of the Domain (GVal ): it is similar to the previous one, but instead of choosing the smallest element of the domain, the greater element is selected; Average Value of the Domain (AVal ): this heuristic selects the value of the domain that is more near to the half of the domain, it calculates the arithmetic average between the limits (superior and inferior) of the domain of the selected variable and in case of having a tie the smallest value is selected; and Immediately Greater Value to the Average Value of the Domain (GAV ): this heuristic selects the smaller value of the domain that it is greater as well to the average value of the domain. Finally, established the heuristics to use, the enumeration strategies are compound according to Table 1.
4
Constraint-Based Model of Sudoku
Sudoku is a puzzle played in a 9x9 matrix (standard sudoku) which, at the beginning, is partially full. This matrix is composed of 3x3 submatrices denominated ”regions”. The task is to complete the empty cells so that each column, row and region contain numbers from 1 to 9 exactly once [Simonis, 2005]. The CP model consists of the following constraints: ∀i ∈ {1, ..., 9} Alldif f erent{xi1, xi2 , ..., xi9 } ∀j ∈ {1, ..., 9} Alldif f erent{x1j , x2j , ..., x9j }
(1) (2)
On the other hand, each cell in regions Skl with 0 ≤ k, l ≤ 2 must be different, which forces to include in the model the following constraint:
Solving Sudoku with Constraint Programming
∀i, j Alldif f erent{xij , xi(j+1) , xi(j+2) , x(i+1)j ,
347
(3)
x(i+1)(j+1) , x(i+1)(j+2) , x(i+2)j , x(i+2)(j+1) , x(i+2)(j+2) } con i = k ∗ 3 + 1 y j = l ∗ 3 + 1. Table 2. Sudoku solved with heuristic MiD
Source Degree SudokuMin None-1 SudokuMin None-2 The Times Easy The Times Medium The Times Hard
(E) 84 2836 7 16 27
S1 (B) 52 2815 3 6 16
(t) 14 153 11 11 11
(E) 220 271 17 174 24
S2 (B) 195 249 13 164 18
S3 S4 (t) (E) (B) (t) (E) (B) (t) 21 1308 1283 88 183 159 26 23 11074 11048 603 124 102 22 11 7 3 10 17 13 12 19 16 6 11 174 164 26 11 27 16 11 24 18 12
Table 3. Sudoku solved with heuristic MaD S5 S6 S. D. (E) (B) (t) (E) (B) (t) (E) 18554 18537 1799 274476 274472 28149 24195 - 121135 121113 12868 - 93138
5
S7 S8 (B) (t) (E) (B) (t) 24169 2582 721773 72155 7484 - 88720 88706 9763 93105 9158 -
Analysis of Results
The benchmark problems were implemented and solved in the platform Mozart1 with the strategies listed in Table 1. Results are showed in Tables 2 and 3, each execution had a time limited to 10 minutes, not finding results are indicated with the symbol ”-”. The performance evaluation was based on the following known indicators in constraint solving: Number of Backtracks (B), Number of Enumerations (E), or Nodes Visited, and Time (t). When observing the results obtained it is perceived that the strategies constituted by the heuristic MiD (S1 , ..., S4 ) have better behavior in those instances in which the search space grows, this in comparison with strategies that are guided by the heuristic MaD (S5 , ..., S8 ). Such differences happen mainly because the heuristic MiD leads as rapidly as possible to an insolvent space, allowing to prune the tree search. Different published instances have been used from The Times2 and Minimum Sudoku page3 . 1 2 3
www.mozart-oz.org http://entertainment.timesonline.co.uk http://people.csse.uwa.edu.au/gordon/sudokumin.php
348
6
B. Crawford, C. Castro, and E. Monfroy
Conclusions
In this work we showed that variable and value selection heuristics influence the efficiency in the resolution of Sudoku in Mozart. The efficiency of resolution was measured on the basis of performance indicators. The possibility to obtain better results in the search process was showed using suitable criteria of selection of variables and values. In fact, to select a variable in a search process implies to determine the descending nodes of the present space that have a solution. It is very important to detect early when the descending nodes are not in a solution, because in this way we avoided to do unnecessary calculations that force to backtracking. Acknowledgements. The second author has been partially supported by the Chilean National Science Fund through the project FONDECYT 1070268. The third author has been partially supported by Escuela de Ingenier´ıa Inform´atica PUCV through the project INF-03/2008 and DGIP-UTFSM through a PIIC project.
References Apt, K.: Principles of constraint programming (2003), http://citeseer.ist.psu.edu/apt03principles.html Haralick, R., Elliot, G.: Increasing tree search efficiency for constraint satisfaction problems. Artificial Intelligence 14, 263–313 (1980) Monfroy, E., Castro, C., Crawford, B.: Adaptive enumeration strategies and metabacktracks for constraint solving. In: Yakhno, T., Neuhold, E.J. (eds.) ADVIS 2006. LNCS, vol. 4243, pp. 354–363. Springer, Heidelberg (2006) Simonis, H.: Sudoku as a constraint problem. In: Hnich, B., Prosser, P., Smith, B. (eds.) Proc. 4th Int. Works. Modelling and Reformulating Constraint Satisfaction Problems, pp. 13–27 (2005), http://4c.ucc.ie/~ brahim/mod-proc.pdf Smith, B.: Succeed-first or Fail-first: A Case Study in Variable and Value Ordering. Technical Report 96.26 (1996), http://citeseer.ist.psu.edu/194952.html
A Study of Crude Oil Price Behavior Based on Fictitious Economy Theory Xiaoming He1, Siwei Cheng2, and Shouyang Wang3 1
Research Centre on Fictitious Economy and Data Science, Chinese Academy of Sciences, School of Management, Graduate University of Chinese Academy of Sciences, Beijing 100190, China
[email protected] 2 Research Centre on Fictitious Economy and Data Science, Chinese Academy of Sciences, School of Management, Graduate University of Chinese Academy of Sciences, Beijing 100190, China
[email protected] 3 Academy of Mathematics and Systems Science, Chinese Academy of Sciences, No.55 Zhongguancun East Road, Haidian District, Beijing 100190, China
[email protected] Abstract. The over fluctuating of international crude oil price has aroused wide concern in the society and the academics. Based on the theory of fictitious economy, this paper has studied and explained the crude oil price behavior from Jan 1946 to Dec 2008. It concludes that the long term prices of crude oil are subject to mean reversion in accordance with the decisive law of value, which is fluctuating around the long term marginal opportunity cost. However, at the same time the prices also appeared to deviate far from long term marginal opportunity cost for several relatively long periods. This paper highlights four aspects of this issue: the diversification of international crude oil market participants, the structural changes of the participants, the evolution of pricing mechanism, and the periodic change of world economy.
1 Introduction Crude oil is a crucial strategic material as well as an essential industry material. The facts that the sweeping hikes in oil price since 2002 and the sudden falls accompanied with global financial crisis 2008 have often been cited as causing adverse macroeconomic impacts on aggregate output and employment, which is far beyond economic or academic expectancy. Thus how to explain and forecast volatility of oil price is one of the principal issues faced in economic society at the moment. In recent years, most of researches focused on the influencing, modeling and forecasting of short-term oil volatility, which is hard to explain the long-term behavior. The original research on explaining long term oil price is the famous exhaustible resources model by Hotelling (1931) established on the certainty hypothesis of reserve and cost, which is inconsistent with uncertainty in petrochemical industry reality. According to the hypothesis of oil market structure, the theories of explaining oil Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 349–356, 2009. © Springer-Verlag Berlin Heidelberg 2009
350
X. He, S. Cheng, and S. Wang
volatility can be divided into two categories after the two oil crisis in 1970s: one is competitive oil price theory, such as Gately(1984), Krugman(2000), Alhajji and huettner(2000a); the other is monopolistic competition theory, including wealth maximization and capacity objective, such as Cremer and Weitzman(1976), Hnyilicza and Pindyck(1976,1978a), Adams and Marquez(1984), Gately and Kyle(1977), Gately(1983) etc. Krugman (2000) launched the concept of multiple equilibria theory, namely, given the backward-bending supply curve and a steep demand curve, there are stable equilibria at both the low price and the high price. But these theories can only explain the movements of oil prices in certain periods and given conditions. Chinese scholars, such as Qing Yang and Yuzhen Lu (2000), Xiaofeng Mei (2001), Zhizhong Pu (2006) studied the long-term high price equilibria and long-term low price equilibria from the perspective of long term supply and demand. While its limitation is that only spot market is considered, no attention is paid to the change of future market participants’ structure and the impact of external environment, which leads to powerless explain to higher oil price in recent years. In contrast with other literatures, the perspective and economic interpretation in this paper is different: a complexity system combined with fictitious economy perspective is adopted to analyze the long term periodic volatility of oil price guided by the theory of fictitious economy and the methodology of complexity science. The structure of this paper is organized as follows: Firstly, the fictitious economic features of international crude oil market is analyzed to develop the research perspectives; Secondly, this paper fitted the crude oil price from Jan 1946 to Dec 2008 according to exhaustible resources theory in order to reflect its long term equilibrium price; and then this paper filtered the crude oil prices according to Hodrick-Prescott filtering algorithm in order to reflect its periodical volatility; Thirdly, this paper explained the long term periodic volatility of oil price from the perspectives based on fictitious economy theory.
2 Theory and Methods According to complexity science and fictitious economy theory, the international crude oil market can be regarded as a complexity and fictitious economic system for following reasons: • The mainstays of the international crude oil market are consist of the natural and legal persons, who maintain extensive and close contacts with each other and comply with certain rules where activities are performed. According to their motivation, the mainstays of oil markets can be divided into three categories, which reflects the market structure: the first is the commercial investors for hedging, mainly referring to the corporations of upstream, middle-stream and downstream sections in the whole petrochemical industry; the second is the non-commercial investors for arbitraging, mainly including institutional investors, subvented agencies, and financial intermediaries, etc.; the last is the medium and small investors for speculating, mainly referring to numerous and dispersive individual investors. • The interaction occurring among the mainstays of the international crude oil market could produce self-organization effect and further form its hierarchical
A Study of Crude Oil Price Behavior Based on Fictitious Economy Theory
351
structures and function structures, which eventually promote the development of oil markets. With the diversification of oil market participants, the interacting among market mainstays in accordance with the decisive law of value drives the evolution of pricing mechanism in international crude oil market, which is very important to the development of international crude oil derivatives markets for risk management. • The international crude oil market is an open system, and its hierarchical structures and function structures are continuously reorganized and improved during the process of adaptive and active learning. With the development of oil market, the participant structure and its changing has exerted great influence upon the crude oil pricing mechanism, which resulted in oil price deviating far from long term marginal opportunity cost for several relatively long periods. • The international crude oil market is a dynamic system, and it is constantly changing and developing. In addition, compared with traditional spot market, oil future contracts usually settle for cash, not involving the delivery of the underlying, which belongs to the fictitious economy system. According to the five features of fictitious economy system (Siwei Cheng, 1998), international crude oil future market is a complex, metastable, high-risk, parasitism, and cyclical system. As a result, international crude oil future market must leech on to real economy and international crude oil spot market and the crude oil future price must follow the requirement for law of value and reflect oil market fundamentals of supply and demand in real economy from a long term perspective. • Therefore, it is necessary to research international crude oil market in a complexity system and fictitious economy perspective, which means considering both the changing of internal structures and the interacting between oil market system and external environment. Specifically, the changing of internal structures includes the diversification of international crude oil market participants, the evolution of pricing mechanism in international crude oil market, the changing of the participant structure in international crude oil market; while the external environment change mainly mean the periodic change of world economy. According to exhaustible resources theory, the long term total marginal opportunity cost of exhaustible resources theoretically reflects the total cost that the whole society paid for extracting per unit resource. Therefore, in the long run, the exhaustible resource price should equal to the long term total marginal opportunity cost. • As a kind of exhaustible resource, crude oil long term total marginal opportunity cost (MOC) of is consist of three parts: marginal production cost (MPC), marginal user cost (MUC) and marginal external cost (MEC). • let Pt denote the price of extracted crude oil at time t, q t the quantity extracted at time t,
C (qt ) the production cost and the external cost of extracting q t units of
resources, then the marginal user cost process must be that: λt
λt
= Pt − C (qt ) '
in different periods during the extracting
352
X. He, S. Cheng, and S. Wang
• The oil producers rationally arrange the outputs in different periods to maximize their net profits present value ( V0 ) given the limited crude oil reserves (r denotes discount rate):
' ' ' • V = (P − C ' (q )) + P1 − C (q1 ) + P2 − C (q 2 ) + ⋅ ⋅ ⋅ + Pt − C (qt ) 0 0 0 1+ r (1 + r )2 (1 + r )t • According to “Hotelling Rule” (1931), when the equi-marginal principle is satisfied, the maximum net profit can be gained. The equi-marginal principle requires that the present value of net profit (marginal user cost) that come from oil extraction and sales at any time equal to each other, which in turn requires the present marginal net profit increase at the rate of discount in the future. Consider the basic Hotelling model of an exhaustible resource produced in a competitive market with a constant marginal cost of extraction, including external cost,
namely,
C ' (qt ) = c , then the oil price trajectory is:
P = (P0 − c )e rt + c
dP = r (P − c ) dt
or
• Obviously, under the certainty (resource reserves and extraction costs are certain) assumptions, the long term equilibrium price, namely the long term total marginal opportunity cost, grows exponentially. The oil long term total marginal opportunity cost exists objectively, but it is invisible. To estimate the oil long term total marginal opportunity cost, the oil price series is fitted with exponential form. The Hodrick-Prescott Filter is a smoothing method that is widely used among macroeconomists to obtain a smooth estimate of the long-term trend component of a series. • Technically, the Hodrick-Prescott (H-P) filter is a two-sided linear filter that computes the smoothed series s of y by minimizing the variance of y around s, subject to a penalty that constrains the second difference of s. That is, the H-P filter chooses s to minimize: 2
T
∑ (y t =1
•
t
T −1
− s t ) + λ ∑ ((st +1 − s t ) − (s t − st −1 ))
2
t =2
The penalty parameter λ controls the smoothness of the series σ . The larger the λ , the smoother the σ . As λ = ∞ , s approaches a linear trend. Generally, λ equals 14400 for monthly data.
• In order to reflect the long-term periodic features of crude oil price volatility more
scientifically and visually, this paper introduced Hodrick-Prescott filtering methodology to deal with the WTI international crude oil monthly spot price data.
3 The Data This paper fitted the WTI (West Texas Intermediate) crude oil price from Jan 1946 to Dec 2008 in order to reflect its long term equilibrium price (long term total marginal opportunity cost). As it is shown in Fig 1, long term total marginal opportunity cost can explain 82% of the long term price behaviors.
A Study of Crude Oil Price Behavior Based on Fictitious Economy Theory
353
But such a fitted curve got by nominal oil price series is flawed in that it does not adjust for the impact of the U.S. dollar depreciation. In order to adjust the impact of U.S. dollar's purchasing power, this paper deflate the crude oil price series with the U.S. monthly urban consumer price index (CPI-U) issued by U.S. Bureau of Labor Statistics.And then, the real oil prices is fitted with exponential form to reflect oil long term total marginal opportunity cost, as it is shown in Fig2. This paper adopted H-P filtering algorithm to remove short term volatility and gained the long term trend of WTI crude oil nominal spot prices. The result has well reflected the long term periodic volatility trend of real oil price, as it is shown in Fig3.
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
:7,FUXGHRLOPRQWKO\UHDOVSRWSULFHGHIODWHGE\86&3,B8 LQGH[ WKHWUHQGRI:7,FUXGHRLOUHDOORQJWHUPWRWDOPDUJLQDO RSSRUWXQLW\FRVW
Fig. 1. The relationship between WTI crude oil nominal price and nominal long term total marginal opportunity cost Sources: Dow Jones, EIA
Fig. 2. The relationship between WTI crude oil real price and real long term total marginal opportunity cost Sources: Dow Jones, EIA, BLS
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
-DQ
:7,FUXGHRLOPRQWKO\UHDOSULFHGHIODWHGE\86&3,B8,QGH[ :7,FUXGHRLOPRQWKO\UHDOSULFHILOWWHUHGE\+3DOJRULWKP WKHWUHQGRI:7,FUXGHRLOUHDOORQJWHUPORQJWHUPWRWDOPDUJLQDO RSSRUWXQLW\FRVW
Fig. 3. The volatility period of WTI crude oil real price in 1946-2008 Sources: Dow Jones, EIA, BLS
Fig. 4. The relationship between long-term periodic volatility of WTI crude oil real price and real long term total marginal opportunity cost Sources: Dow Jones, EIA, BLS
354
X. He, S. Cheng, and S. Wang
4 Emperical Ananlysis As it is shown in Fig 4, it can be concluded that in the long run, crude oil prices are subject to mean reversion in accordance with the decisive law of value, which is fluctuating around the long term total marginal opportunity cost. But at the same time the prices also appeared to deviate far from long term marginal opportunity cost for several relatively long periods. According to the real oil price’s periodic volatility relative to the long term total marginal opportunity cost, the long term volatility of real oil price can be divided into six phases. (1946-1973): Almost all the international crude oil prices in this period Phase were well below the long term total marginal opportunity costs, and the essential reason was that oil pricing was once the domain of western multinational petrochemical companies, which ultimately led to the outbreak of first oil crisis in early years in 1970s. (1973-1986): The international crude oil prices in this period were Phase always far above the long term total marginal opportunity costs, and the essential reason was that oil pricing was the domain of the Organization of the Petroleum Exporting Countries (OPEC) after the first oil crisis. More importantly, the continuous high oil price stimulated the production of non-OPEC countries, which forced OPEC to give up the tactics of “restrict output for higher price” and trigger oil price wars, leading oil price slump below $15 per barrel. (1986-1992): The international crude oil prices in this period were close to Phase the long term total marginal opportunity costs, and the essential reason was that oil pricing was determined by international oil future markets in which the commercial investors dominated in all participants after New York Merchantile Exchange (NYMEX) and London International Petroleum Exchange (IPE) successfully launched benchmark WTI (West Texas Intermediate) crude oil futures and Brent crude oil futures. (1992-2002): The international crude oil price in this period were generPhase ally less than the long-term total marginal opportunity cost, and the essential reason was that the advent and development of “New Economy” represented typically by information and communication technologies dramatically decreased the oil production cost, while the increased external cost caused by continuous increasing of crude oil extraction had not reasonably reflected in the oil price. (2002-2008): The international crude oil prices in this period were genPhase erally higher than the long term total marginal opportunity costs, and the essential reason was that aggregate demand has been slightly greater than aggregate supply in international crude oil market for a long time, and the participant structure has gradually evolved from commercial-dominated to non-commercial dominated, which means that the oil price is mainly determined by its financial attribute and deviates far from the oil market fundamentals of supply and demand. Phase VI (Since July, 2008): The international crude oil prices in this period sharply fall, which is approaching to the long term total marginal opportunity costs. And its essential reason is that the deteriorating financial crisis has led to the outflow of capital in oil markets and the slowing global economy has resulted in decrease of oil demand growth.
Ⅰ
Ⅱ
Ⅲ
Ⅳ
Ⅴ
A Study of Crude Oil Price Behavior Based on Fictitious Economy Theory
355
In short term, provided that purchasing power of U.S. dollar remains constant, this paper forecasts that the nominal long term total marginal opportunity cost will vary between $45 and $52 per barrel in 2008-2010, and the nominal average production cost is estimated at between $30 and $40 per barrel, which was $30 per barrel in 2007. Therefore, WTI international crude oil price will fluctuate between $35 and $52 per barrel until the economic recovery. But considered from a long-term perspective, long term oil price will present a rapid rise after the global financial crisis: in the first place, the long term total marginal opportunity cost will rise faster with the depleting of low cost crude oil resource and the more attention focused on external cost. In addition, most central banks have injected massive funds into the markets represented by U.S. government during the global financial crisis and the extra liquidity could actually rise inflation and choose to inflow into international oil market, which inevitably pushes up nominal oil price.
6 Conclusions This paper has applied fictitious economy theory to analyze long term international crude oil price behavior from the perspectives of the changing of internal structure in oil market system and the interacting between oil market system and external environment. Our data suggest that that the long term prices of crude oil from 1946 to 2008 are subject to mean reversion in accordance with the decisive law of value, which is fluctuating around the long term marginal opportunity cost. But at the same time the prices also appeared to deviate far from long term marginal opportunity cost for several relatively long periods. Furthermore, based on our analysis and conclusion, WTI international crude oil price is forecasted to fluctuate between $35 and $52 per barrel until the economic recovery and is expected to be presenting a rapid rise after the global financial and economy crisis.This paper provided a new perspective to analyze the long term periodic volatility of international crude oil price, which integrated oil spot market and future market in a systematic way and combined crude oil commercial attribute with its financial attribute organically.
References Cheng, S.W.: On fictitious economy. Democracy and Construction Press (2003) Yu, L., Wang, S.Y., Lai, K.K.: Forecasting foreign exchange rates and international crude oil price volatility-TEI@I methodology. Hunan University Press, Changsha (2006) Yang, Q., Lu, Y.Z.: Preliminary research on oil forecasting. Journal of China University of Petroleum (Edition of Social Science) (2), 1–5 (2000) Mei, X.F.: Volatility analysis on international crude oil price. Master Thesis of China Center for Economic Research at Peking University (2001) Pu, Z.Z.: Study of long term periodic fluctuation of international crude oil price. International Petroleum Economics (6), 21–26 (2006) Song, Z.X., Fan, K.: World economic history. Economic Science Press, Beijing (1998) Adams, F.G., Marquez, J.: Petroleum price elasticity, income effects, and OPEC‘s pricing policy. Energy Journal 5(1), 115–128 (1976)
356
X. He, S. Cheng, and S. Wang
Alhajji, A.F., Huettner, D.: The target revenue model and the world oil market: emprical evidence from 1971 to 1994. The Energy Journal 21(2), 121–144 (2000) Cremer, J., Weitzman, M.L.: OPEC and the monopoly price of world oil. European Economic Review (8), 155–164 (1976) Ezzati, A.: Future OPEC price and production strategies as affected its capacity to absorb oil revenues. European Economic Review 8(2), 107–138 (1976) Gately, D., Kyle, J.F.: Strategies for OPEC’s pricing decisions. European Economic Review (10), 209–230 (1977) Gately, D.: OPEC: retrospective and prospects 1972-1990. European Economic Review (21), 313–331 (1983) Gately, D.: A ten-year retrospective: OPEC and the world oil market. Journal of Economics Literature (September 1984) Pindyck, R.S.: Uncertainty and exhaustible resource markets. Journal of Political Economy 88(6), 1203–1225 (1980) Pindyck, R.S.: The long-run evolution of energy prices. The Energy Journal 20(2), 1–25 (1999) Hodrick, R.J., Prescott, E.C., Postwar, U.S.: Business Cycles: An Empirical Investigation. Journal of Money, Credit and Banking 29(1), 1–16 (1997) Hotelling, H.: The Economics of Exhaustible Resources. Journal of Political Economy 39, 137– 175 (1931) Hnyilicza, E., Pindyck, R.S.: Pricing policies for a two-part exhaustible resource cartel, the case of OPEC. European economic review (8), 139–154 (1976) Krugman, P.: The energy crisis revisited (2000), http://web.mit.edu/krugman/www/opec.html
Study on the Method of Determining Objective Weight of Decision-Maker (OWDM) in Multiple Attribute Group Decision-Making Donghua Pan and Yong Zhang Institute of Systems Engineering, Dalian University of Technology, Dalian 116023, China
Abstract. In multi-attribute group decision-making, the aggregating result is much depended upon objective weight of decision makers. For getting a more accurate aggregating result quickly, a method of determining OWDM to attributes in interactive decision-making is presented in this paper, which is based on thinning the objective weight of decision makers down the objective weight of decision makers to attributes. Then a definition of consensus degree and the flow of interactive decision-making based on the objective weight of decision makers to attributes are proposed. Keywords: multi-attribute group decision-making; objective weight of decision-makers to attributes; consensus degree.
1 Introduction In multi-attribute decision-making (MADM), the decision-makers evaluate each attributes in each scheme. The result of each decision-makers’ estimation is aggregated into the result of group decision-making according to certain approach. So the method of aggregating decision-makers’ estimation is important in group decision-making. In decision-making process, how to determine decision-makers’ weights is the key to the aggregation of decision-makers’ estimation [1]. decision-makers weight is a concept in MADM, which refers to the relative important degree of each attribute’s utility function when each utility function relative to the same nature is aggregated into the total utility function [2]. The problem of weight determination is almost faced in the weighted model of aggregating component into total amount. In the aggregation of group utility function, the weight is the important degree and policy-making power in group decision-making. Decision-makers weight can be divided into two classes based on the factors determining it. One class is the subject weights which are assigned by considering the prior information of decision-makers, and the assigned weights is integrated quantity representation of the knowledge, experience, capability, expectation and so on. Another one is the objective weights which are assigned based on the adverse judgment of decision-makers’ estimation results [3]. Generally, the subject weights of decisionmakers are called “static weights” and are pre-assigned. They are not influence quality of decision-makers’ estimation results. But OWDM are called “dynamic weights” Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 357–360, 2009. © Springer-Verlag Berlin Heidelberg 2009
358
D. Pan and Y. Zhang
which could be changed with the quality of decision-makers’ estimation results. At present, AHP and Delphi approaches can be used for determining the subject weights[4]. And generally, the objective weights are assigned using similarity approach[5], reliability approach[6], consistency approach, or combination of several approaches[7]. In recent years, whether the subject or objective weights which are computed in most researches are the general weights of decision-makers. In contrary to traditional group decision-making, multiple attributes group decision-making is to thin the general estimation for the scheme to estimation of each attribute of the scheme, then integrate the result of attributes estimation into the general estimation of scheme. So, it has much higher complex degree. Because the decision-makers have the limitation of knowledge, different decision-makers have different cognition to each attribute, and the accuracy of each judgment is different. If the same weight is assigned for each attribute of each expert, it will be lack of rationalities. For improving the accuracy of estimation, the determination of the general weights of decision-makers should be thinned down the determination of weights of decision-makers to attributes. In this paper, the approach of determining the expert weight to attributes is proposed in the process of interactive decision-making. The information of matrix of decision-makers’ estimation is firstly extracted. Based on the similarity of matrixes, OWDM are determined. Then the consensus degree is gotten.
2 The OWDM to Attributes Nowadays, most research focus on determining general OWDM by adjusting the weights of decision-makers based on the deviate degree between the individuals decision-making result and group decision-making. Generally, the decision-makers whose decision-making result is much different from the group decision-making will be assigned smaller weight. Then the influence of that decision-maker is weakened to the group decision-making. There are two approaches. One is judgment-matrix which requests decision-makers comparing to the schemes. Another one is estimation-matrix which requests decision-makers giving estimation value. In this paper, we adopted the distance conformability degree approach for determining OWDM to attributes. The result of decision-makers’ general estimation is aggregated based on the each attribute estimation’s aggregation.
3 Interactive Flow of Group Decision-Making Based on the Objective Weight Based on the analysis above, the flow of group alternative decision-making approach based on the objective weight is obtained as follow: Step 1 Starting the process of group decision-making, the subject weight and the attribute weight is determined using the subject approach at first. Step 2 The lowest consensus coefficient is assigned according to the expert coherence degree which is necessary in the scheme.
Study on the Method of Determining Objective Weight of Decision-Maker
359
Step 3 The organizer should request all decision-makers to have to give the estimation matrix to all attributes for all schemes. Step 4 Computing the individual decision-making result for the scheme and the subject weight which is determined in step 1. Step 5 Computing consensus coefficient of the N-round. If consensus coefficient is bigger than the given value, then go to step 7. And the weight of expert to attribute is the final weight we want. Otherwise, the decision-maker who has the smallest weight to that attribute should be given the estimate matrix again and go to step 6. However, in situation of the group can not reach the final consensus, the opinion given by decision-maker having the smallest weight could be removed from reaching consensus. Step 6 Constructing estimate matrix again and going to the next round decisionmaking by using OWDM to attribute gotten in last round, then go to step 4. Step 7 Sorting the schemes and getting the decision-making result.
4 Conclusion For improving the efficiency and accuracy of group decision-making, OWDM is thinned down the objective weight of decision-makers to attributes. Then an interactive decision-making flow is designed. Based on the work done above mentioned, we can conclude: Although determining subject weights of decision-makers to attributes is much complicated than determining general subject weights of decision-makers, howerver, Under the situation of decision-makers are cooperative and the subject weights of decision-makers are generally consistent, the approach of determining the objective weight of decision-makers to attributes has a faster opinion aggregation speed and decision-making accuracy. Especially on the condition that there are more decision-makers and attributes, the approach of determining the objective weight of decision-makers to attributes has better practicability.
Acknowledgments The Authors are grateful to the editors, referees and the National Science Funds of P. R. China (70871017, 70431001).
References [1] Vargas, L.G.: An overview of the analytic hierarchy process and its application. European. Journal of Operational Research 48(1), 2–8 (1990) [2] Liang, L., Xiong, L., Wang, G.: New method for determining the objective weight of decision makers in group decision. Systems Engineering and Electronics 27(4), 653–655 (2005) (in Chinese) [3] Liu, P., Yan, X., Kuang, X.: Dynamic Weights of Experts in Interactive Decision-Making. Industrial Engineering and Management 5, 32–36 (2007) (in Chinese)
360
D. Pan and Y. Zhang
[4] Chen, W., Fang, T., Jiang, X.: Research on Group Decision Based on Delphi and AHP. Computer Engineering 29(5), 18–20 (2003) [5] Liu, Y., Xu, D., Jiang, Y.: Method of adaptive adjustment weights in multi-attribute group decision-making. Systems Engineering and Electronics 27(1), 45–48 (2007) (in Chinese) [6] Liang, L., Xiong, L., Wang, G.: A New Method of Determining the Reliability of Decision-makers in Group Decision. Systems Engineering 22(6), 91–94 (2004) (in Chinese) [7] Song, G., Zou, P.: The Method of Determining the Weight of the Decision-maker in Multi attribute Group Decision-making. Systems Engineering 19(4), 84–89 (2001) (in Chinese)
Machining Parameter Optimal Selection for Blades of Aviation Engine Based on CBR and Database Yan Cao1, Yu Bai1, Hua Chen1, and Lina Yang2 1
Advanced Manufacturing Engineering Institute, School of Mechatronic Engineering, Xi’an Technological University, Xi’an 710032, China 2 Xi’an University of Science and Technology, Xi’an, 710054, China
[email protected] Abstract. Blades of aviation engine are usually composed of complex three dimensional twisted surfaces that request high geometrical precision. Their machining is very difficult. Hence, how to reuse successful machining technics becomes an important and effective measure to improve blade machining quality. Machining parameter optimization for blades of aviation engine based on CBR and database is discussed in the paper. The system architecture and workflow are presented. Machining parameter database based on CBR consists of a case library and a machining database. Both of them can not only run independently, but also be integrated through application interface. Case representation includes two aspects, namely problem and objective description and solution scheme. Similarity ratio calculation is divided into local similarity ratio and integral similarity ratio. Through system development, it is proven to be feasible that machining parameter optimal selection is realized based on CBR and database. Keywords: blade; aviation engine; CBR; machining parameter; decision– making system.
1 Introduction Blades of aviation engine are key components of aviation engines. Because of their complex structures and varieties, they have a great influence on the performance of aviation engines, their design and manufacture cycle, and the manufacturing workload of the whole aviation engine. Some blades are composed of complex three dimensional twisted surfaces that request high geometrical precision. Their machining is so difficult that the surfaces are divided into sub-zones that include blade basin machining, blade back side machining, air-in and air-out edge machining, damping stand machining, etc. In machining process, machining parameters change greatly. All these problems affect the research and development of high performance aviation engines. The blades endure complex stress and micro-vibration that demand high quality of blade material, mechanical technics, heat treatment, and surface spray. Hence, how to reuse successful machining technics becomes an important and effective measure to improve blade machining quality. Although CBR (Case-Based Reasoning) has been used to machining parameter optimal selection and construction of machining databases [1] [2] [3], its application on the machining of aviation engine blades is few [4]. Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 361–369, 2009. © Springer-Verlag Berlin Heidelberg 2009
362
Y. Cao et al.
2 The System Architecture and Workflow When machining technicians make machining scheme decisions, they usually adopt two methods. • Machining decision-making starts from scratch to design a new machining scheme. • According to machining requirements, an association of ideas proceeds to search similar machining schemes. Then, they are modified and improved to fulfill the current requirements. In the paper, CBR is used to improve machining scheme decisions capability. Machining parameter optimization for blades of aviation engine based on CBR and database is discussed in the paper. The system architecture and workflow are shown in Fig.1 and Fig.2. Problem
Cases
Input
Machining scheme case library
Case abstraction and modification rules
Case retrieval and modification
Case classification, indexing and storage based on ANN
CBR evaluation and postprocessing
Information flow
Case classification and indexing
Evaluation model of machining schemes
Output of CBR system
Control flow
Fig. 1. The architecture of the CBR system
The workflow of machining scheme decision-making based on CBR coupled with ANN can be divided into two phases. 1. Training phase. Its main contents are as follows. • Collect, classify, and describe cases using appropriate methods. • Construct training set and train ANN. • Keep instances in a case library. Organize and manage the case in the library by the stated rules. 2. Working phase. Its main contents are as follows. • Define the problem to be solved. • Retrieve the case that matches the problem best of all from the case library by stated rules. Take it as the initial solution to the problem.
Machining Parameter Optimal Selection for Blades of Aviation Engine
363
• Modify the initial solution to achieve the new solution to the problem. • Evaluate the new solution. If it is feasible, it is regarded as a new case kept in the case library. Otherwise, it fails to solve the problem. The key factors to realize problem solving process mentioned above are as follows [5] [6]. • • • • • •
Logical structure relating to specific domain. Appropriate classification, storage, and index model of cases. Extraction method of similar cases. Modification method of similar cases. Evaluation system and methods of solutions. Interface processing module to deal with unsuccessful solution. Start ERP system Database/Knowledge base management module
Analysis of machining requirements
Rule and knowledge acquisition Machining data management
Machining scheme evaluation system
Machining scheme case retrieval Database Knowledge base
Case modification and machining scheme decision-making
Knowledge base editing Knowledge base management
Machining scheme evaluation
Finished? Case library management module
N
Iterate
Y
Case input Machining scheme document output Case indexing Case modification and storage
Machining scheme case library
Case library editing Y
Store?
N End
Fig. 2. System workflow
3 Machining Parameter Database Structure Machining parameter database based on CBR consists of a case library and a machining database [4], as shown in Fig.3.
364
Y. Cao et al.
Case retrieving
Case library
Case library
Mapping algorithm
Application program
Interface
Machining database
Machining parameter
Cutter
Blade information
Machine tool
Fixture
Model
Cutting fluid
……
Fig. 3. Machining parameter database structure
Both of them can not only run independently, but also be integrated through application interface. Hereinto, the machining database stores part material, cutters, machine tools, machining parameters, etc. Cutter selection and machining parameters optimization can be realized through application programs that are based on a relational database. The case library stores successful machining cases, experience, rules, etc. By case searches, mapping, and modification, a reasonable machining scheme for a new part can be recommended. Newly generated machining scheme can also be stored in the case library for further use. The machining parameter database adopts C/S structure.
4 Case Library and Case Representation Case representation should at least include two aspects, namely problem and objective description, and solution scheme. The problem and objective description is consisted of non-control parameters and output parameters in machining process. The solution scheme is consisted of control parameters. Because the control parameters have different effects on machining quality, it is not necessary to include all parameters in the
Machining Parameter Optimal Selection for Blades of Aviation Engine
365
case library. Hence, the case problem and objective description includes material number, cutter material, cutter abrasion resistance, rough and finishing machining requirement, cutting speed, cutting depth, amount of feed, material machining capability, and so on. 4.1 Control Parameters The control parameters are as follows. 1. Machine tool performance parameters. In the case library, it only includes machine tool number through that the machine tool performance can be retrieved from a machine tool database. 2. Cutter parameters. They include cutter number, cutter type, cutter model, manufacture, cutter material number, cutter material, cutting edge number, cutting speed, and cutting depth. 3. Cutting fluid parameters. They include cutting fluid type, cutting fluid model, manufacturer, and cutting fluid number. 4.2 Non-control Parameters The non-control parameters are as follows. 1. Part type parameters. They include part type and machining surface in the case library. 2. Machining type parameters. They include rough machining, semi-finishing machining, and finishing machining.
5 Case Mapping Algorithms Similarity ratio calculation is divided into local similarity ratio and integral similarity ratio. 5.1 Local Similarity Ratio Calculation Methods 5.1.1 Numerical Method If the ranges of attributes are numerical values, the similarity ratio is calculated using following formula. sim ( x, y ) =
1 1+ | x − y |
(1)
Hereinto, sim(x, y) is the local similarity ratio. x and y are the attributes values. 5.1.2 Fuzzy Logic Method If the ranges of attributes are fuzzy logicals, the similarity ratio is calculated using following formula. (2) sim( x, y ) = f ( x, y )
366
Y. Cao et al.
Hereinto, sim(x, y) is local similarity ratio. x and y are the attribute values. f(x, y) is a numerical function according to actual attribute characteristics. 5.1.3 Enumeration Method If the ranges of attributes are within a listed scope, the similarity ratio is determined in terms of machining knowledge. 5.2 Integral Similarity Ratio Calculation Method When compute the integral similarity ratio, a weight value usually is assigned to an attribute to satisfy actual requirements. Because a case is retrieved according to the attributes of new problem and objective description, partial attributes are taken into account instead of all attributes to calculate the integral similarity ratio. The formula is as follows. m
∑ w sim (q , u ) i
sim (q,u ) = sim ([ q1, q 2,...qm ], [u1, u 2, L , um ]) =
i
i =1
i
(3)
m
∑w
i
i =1
Hereinto, q - The new problem and objective description. qi is the attribute i of q. u - A source case in the case library. ui is the attribute i of u. m - The attribute number of the problem and objective description. wi - The weight value of attribute i of the local similarity ratio. 5.3 Nearest Neighbor Method In a CBR system, case retrieving is close related to index mechanism adopted. Different from database query, case retrieving in the CBR system is usually fuzzy. On the one hand, at moments, a similar case is retrieved instead of a totally same case. On the other hand, the condition for CBR retrieving is the attributes of problem and objective description, not all attributes of the case. Currently, commonly used searching methods of CBR are nearest neighbor method and inductive method. In the paper, the nearest method is adopted.
6 Application of Similarity Ratio Calculation Methods The problem and objective description of a blade machining case includes material type, blade type, blank type, blade surface, and machining precision. The range of each property is as follows. 1. Material type: carbon steel, low alloy steel, high alloy steel, cast steel, stainless steel, chilled steel, ductile cast iron, gray cast iron, spheroidal graphite iron, ferroalloy, nickel-base alloy, cobalt-base alloy, titanium alloy, aluminum alloy, and copper alloy. 2. Blank type: founding, forging, and milling. 3. Heat treatment status: quenching, normalizing, tempering, and annealing. 4. Surface: blade body, blade basin, air-in and air-out edge, rabbet, and damping stand.
Machining Parameter Optimal Selection for Blades of Aviation Engine
367
5. Machining precision: roughing machining, semi-finishing machining, and finishing machining. 6.1 Application of Local Similarity Ratio Calculation Method and Nearest Neighbor Method The calculation of attribute similarity ratio, such as material, blank type, blade machining surface, heat treatment status, and so on, adopts the enumeration method. The similarity ration of part rigidity and precision is obtained using the fuzzy logic method. The similarity ratio calculation of material trademark adopts nearest neighbor method. 6.2 Application of Integral Similarity Ration Calculation Method According to the degree of influence of machine tool, cutter, cutting fluid, cutter material, cutter geometry, and cutting quantity standard, different weight values are assigned to the attributes of the problem and objective description. They are divided into three grades. From high grade to low grade, they are: 1. Material type, part shape, and machining surface. 2. Material trade mark. 3. Blank type, heat treatment status, and machining precision. According to the principle that the weight value at a higher grade should greater than the sum of all weight values at a lower grade, the attribute weight value at grade three is 1, the attribute weight value at grade two is 5, and the attribute weight value at grade one is 10.
Fig. 4. Part type selection
368
Y. Cao et al.
7 Applications After a user chooses part to be machined, machining mode, machining feature, and machining cutter, parameter reasoning can be accomplished, as shown in Fig.4 and Fig.5.
Fig. 5. Parameter input
Fig. 6. Reasoning results
Machining Parameter Optimal Selection for Blades of Aviation Engine
369
According to the abovementioned input, the optimal results can be found, as shown in Fig.6.
8 Conclusion Based on CBR technology, a machining parameter database, a case library and its corresponding mapping algorithms are established. Thus, the accumulated machining data and experience can be used to machine new parts of high quality. Self-learning problem is solved that cannot be realized only by rule-based reasoning. It is of great importance for machining parameter optimal selection in blade machining. Through system development, it is proven to be feasible that machining parameter optimal selection is realized based on CBR and database. The future researches are focused on systematical classification of more blade machining cases, more effective mapping algorithms, etc. Acknowledgments. The paper is supported by Shaanxi Major Subject Construction Project and President Fund of Xi’an Technological University.
References [1] Zhou, W., Tao, H., Gao, X.B.: Application research on intelligent optimization database of cutting parameters. Aeronautical Manufacturing Technology (18), 78–81 (2008) [2] Chen, P.J.: Study and realization of general-cutting database. Machine Building & Automation 36(3), 94–95, 98 (2007) [3] Jiang, Z.: Research state and progress prospect of metal cutting database. Mechanical Engineer (5), 104–106 (2006) [4] Bai, Y., Cao, Y., Yang, X.F.: Cutting parameter database system of aeroengine blade based on case-based reasoning. Machinery Design & Manufacture (11), 195–197 (2008) [5] Zheng, Y.Q., Lv, S.L.: Opening machining database system research on Case-Based Reasoning. Manufacturing Automation 29(11), 96–99 (2007) [6] Xiang, K.J., Liu, Z.Q., AI, X.: Development of high-speed cutting database system based on hybrid reasoning. Computer Integrated Manufacturing Systems 12(3), 420–427 (2006)
A Multi-regional CGE Model for China Na Li1, Minjun Shi2, and Fei Wang3 1
Graduate University of Chinese Academy of Sciences, Beijing 100049, China
[email protected] 2 Research Center On Fictitious Economy & Data Science, Chinese Academy of Sciences, Beijing 100190, China
[email protected] 3 School of International Trade and Economics, University of International Business and Economics, Beijing 100029, China
[email protected] Abstract. With the development of China’s economy, the regional diversity and interregional economic linkage have become more and more remarkable and been two important factors to study China’s national and regional economy. Based on the multi-regional input-output table for China, this paper develops a multiregional CGE (MRCGE) model for China that is expected to provide a useful tool for analysis of regional economy and regional policies. This model depicts regional diversities on scale and structure and interregional economic linkages, i.e. commodity flow, labor flow and capital flow. As an application of this model, this paper designs to increase the investment for Northwestern region to reveal the important effect that comes from the regional differences and linkages.
1 Introduction Input-output models, econometric models, and computable general equilibrium (CGE) models have been applied for analysis of regional development and regional policy. While we need insight of policy impacts on several regions, multi-regional CGE (MRCGE) models have more advantages than other models, because they can reveal regional differences and economic interactions across regions. China is featured by diversity of natural environment and resources and spatial heterogeneity of social-economic development because of its large scale of territory. Meanwhile economic linkages across regions are increasing with liberalization and privatization of economic system. So a multi-regional CGE model is necessary for analysis on China’s regional development issues. There are some researches on multi-regional (or multi-national) CGE models, such as GTAP[1], GTEM[2] and MMRF[3]. Only few trials on multi-regional CGE model have been made for China up to now [4, 5, 6] and most of these models are lack of description of economic interaction across regions. This paper focuses upon the framework of a multi-regional CGE model for China that aims to describe economic interaction across regions, including commodity flows, labor flows and capital flows, which is expected to provide a useful tool for analysis of regional economy and regional policies. Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 370–373, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Multi-regional CGE Model for China
371
2 Framework of Multi-regional CGE Model for China The multi-regional CGE model for China includes seven parts. The first three parts are regional economic activities characterized at regional level. The next three parts present the main economic linkages across regions. The last part is macroeconomic closure and equilibrium. Production technology: The model recognizes two broad categories of inputs in each sector: intermediate inputs and primary factors. Producers are constrained in their choice of inputs by a two-level nested production technology. At the first level, intermediate-input bundles and primary-factor bundles are used in fixed proportions to output (Leontief function). At the second level, intermediate-input bundles are CES (Constant Elasticity of Substitution)combination of labor and capital. Local final demands: In each region, the household buys bundles of goods to maximize a Stone-Geary utility function subject to a household expenditure constraint. A linear expenditure system (LES) consumption function determines household expenditure. Government is not divided into regional governments and central government but only one sector. A Cobb-Douglas (C-D) consumption function determines government’s expenditure. Considering technological relationships between investment products, we use Leontief function to determine various investment products demands. Inventory investment in each regional sector is assumed to be fixed in the model. Import and export demands: In each region, Armington function is applied to this model to account for imperfect substitutability between locally produced output and imports in the local market. In minimizing costs with Armington function, demand for local imports is determined by the sales of locally produced products and the price of locally produced products relative to world market price. Local imports and locally produced products are formed into composite goods, which supply producers, households, government and investors. Likely, it is assumed that exports from different regions have the imperfect substitutability, and the ratio of regional exports results from the relative difference between local prices of regional exports. All regional exports can be added into China total exports. The demand of China total exports to the world market can be determined as an exponential function of the world market price relative to the price of China total exports. Interregional commodity flows: Interregional commodity flows in the model involve intermediate demand, rural and urban household consumption, government’s consumption, gross fixed capital formation, and inventory investment. The commodities of their consumption can be not only from composite goods (local imports and locally produced products) but also from other region goods. Accordingly, CES function can account for the substitutable relationship between them. Similarly, the commodities in each region can be provided not only for local market but also for other regions or export. Interregional investment allocation: According to Bai et al.[7], the differences of capital return rate between regions in China have become small in recent ten years. So, in our model investment /capital is assumed to be mobile across regions. Capital usually can move from low-return region to high-return region, but this will lead to a decline of expected return rate in high-return region because of competition. Finally, in a long-run, the expected return rates at all regions are equal to the national average
372
N. Li, M. Shi, and F. Wang
expected return rate. This is way to allocate investment in the long-run closure. Thus the aggregate investment at each region is endogenous. Interregional labor allocation: Following scheme is designed to reflect regional labor flows and wage differences between regions and sectors. Total national labor supply is assumed exogenously, but regional labor supply is endogenous. Labors can move imperfectly across regions. A regional distortion coefficient is applied to represent differences between regional wage and national average wage. The regional wage is equal to national average wage multiplies the regional distort coefficient. The labor supply at each region can be determined endogenously. Similarly, labor can move imperfectly across sectors within a region. A sectoral distortion coefficient is applied to represent differences between sectoral wage and regional average wage, and labor supply at each sector can be determined endogenously. Macroeconomic closure and equilibrium: Governmental savings rate is exogenous, and governmental expenditure is endogenous. Exchange rate is exogenous, and the ration of foreign savings to GDP is endogenous. The numeraire is the average price of national urban consumption. There are mainly three kinds of equilibriums. (1) Labor. Adding all regional labor supply equal total national labor supply, and adding all regional sectoral labor supply equal regional labor supply. (2) Capital. Adding all sectoral capital supply in one region equal this regional capital supply. (3) IS equilibrium. Total national investment equal total national savings.
3 Data Multi-regional CGE models usually need an interregional input-output table as database for description of these linkages of economic activities across regions. In this model we use the Multi-regional Input-Output Table for China 2000, which includes eight regions (Northeast, North municipalities, North coast, Central coast, South coast, Central region, Northwestern, Southwest) and thirty sectors.
4 Simulations This paper simulated the effects of a policy to increase investment (200 billion yuan) for Northwestern region on regional economic development based on the multiregional CGE model. The results show that the real GDP of Northwestern region will increase 3.6%, and the real GDP of other regions will also increase at different level– Northeast (1.85%), North municipalities (2.09%), North coast (1.30%), Central coast (1.73%), South coast (1.45%), Central region (0.96%) and Southwest (1.42%). The policy will make different region have different economic response. It revealed the differences in policy response comes from differences in regional economic structures and economic linkages across regions.
5 Conclusions and Perspective This paper developed a multi-region CGE model for China based on interregional input-output model. It can reflect the differences in economic scale and industrial
A Multi-regional CGE Model for China
373
structure and economic interactions across regions, which can provide a powerful tool for regional planning and policy analysis. The further research needs to develop a dynamic multi-regional CGE model for China.
References Hertel, T.W.: Global Trade Analysis: Modeling and Applications. Cambridge University, New York (1997) Pant, H.M.: GTEM: global economy and environment model. Australian Bureau of Agricultural and Resource Economics (ABARE) Technical Report, Canberra (2007) http://www.abareconomics.com/interactive/GTEM (accessed June 15, 2008) Adams, P.D.: MMRF: A Dynamic Multi-Regional Applied General Equilibrium (CGE) model of the Australian economy. Draft documentation prepared for the Regional GE Modelling Course. Centre of Policy Studies, Monash University, July 16-21 (2007) Li, S.-t., He, J.-w.: A Three-regional Computable General Equilibrium (CGE) model for China. In: The 15th International Input-Output Conference, Beijing (June 2005) http://www.iioa.org/pdf/15th%20Conf/shantong_jianwu.pdf (accessed May 15, 2007) Fei, W., Song-hong, G., Ezaki, M.: Labor Migration and Regional Development in China: A Regional CGE Analysis. China Economic Quarterly 5(4), 1067–1090 (2006) Xu, Z.-y., Li, S.-t.: The Effect of Inter-regional Migration on economic Growth and Regional Disparity. The Journal of Quantitative & Technical Economics 2, 38–52 (2008) Bai, C., Xie, C.-t., Qian, Y.-y.: The Return to Capital in China. Comparative Studies 28, 1–22 (2007)
The Method Research of Membership Degree Transformation in Multi-indexes Fuzzy Decision-Making Kaidi Liu, Jin Wang, Yanjun Pang, and Jimei Hao 1
Institution of Uncertainty Mathematics, Hebei University of Engineering, Handan 056038
Abstract. The conversion of membership degree is the key computation of fuzzy evaluation for multi-indexes fuzzy decision-making. But the method should be discussed, because redundant data in index membership degree is also used to compute object membership degree, which is not useful for object classification. The new method is: based on data mining of entropy, mining knowledge information about object classification hidden in every index, affirming the relation of object classification and index membership, eliminating the redundant data in index membership for object classification by defining distinguishable weight, extracting valid values to compute object membership. Thus constructing a new membership degree conversion method that can not be effected by redundant data and it is used for fuzzy decision for multi-indexes. Indexterms: fuzzy decision-making; membership degree transformation; distinguishable weight; valid values; comparable values.
1 Introduction There are many factors that effect decision goal in relatively decision system, among these effective factors, selecting the more important factors called as indexes; these different indexes are divided into some levels, decision-making index system is a hierarchical structure: the top level contain one factor Q ,called as general goal; base level contains some base levels that are controllable indexes, so every base index (quantitative or qualitative) does not has its index; there are some intermediate levels between top and base level; and except base index, other levels have some index; in order to descript simplify, let hierarchical structure only have one intermediate level, because there is not difference between two intermediate levels or more and one intermediate level by computation. If the question is simplified, for example, decision-making goal is that determining the importance order of base indexes about top goal (such as simplify plans scheduling). Saaty provides analytic hierarchy process based on“multiple comparison”: under the condition of general goal, scheduling the importance of base indexes, and after the importance of base indexes are normalized, obtaining the importance weights of base indexes of top goal. Although the analytic hierarchy process is not perfect, it can solve above questions scheduling. But the multi-index decision-making is complex, it does not only need obtain the importance scheduling of indexes, for example, mi base index belonging to i index Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 374–383, 2009. © Springer-Verlag Berlin Heidelberg 2009
The Method Research of Membership Degree Transformation
375
of intermediate level is quantitative index, and when j ( j = 1 ~ mi ) index is continuous in intervals [a j , b j ] , so i index changes continuously, which leads to variation of top general goal Q . The goal of decision-making is that: what status is the top general goal when the value of base index j (1 ≤ j ≤ m) is x j ∈ [a j , b j ] . Obviously, if wanting to solve above question, first, it must discrete continuous status of i index into P different classes (also called kinds), let Ck (k = 1 ~ p ) represents k th class of i index. Generally, let {C1, C2 ,L, C p } is a division of state-space C , and satisfies:
Ci I C j = φ
P
(i ≠ j )
U Ck = C
k =1
(1)
Correspondingly, the value intervals of j index also is divided into P subintervals, let the value of j index in k th sub-interval represents that i index is C k class, called the value of j index in k th sub-interval belongs to C k class. Following to this division method, although the values of j index in interval [a j , b j ] between two boundary points are so near, they belong to two different classes, which is unreasonable, the reason is that let gradually variational membership degree represents that the value x j of j index belongs to C k class which is superior to mutational “belongs to ” (represents number 1) or “not belongs to”(represents number 0). So let fuzzy membership degree μ k ( x j ) represents the value x j of j index belongs to C k class, which is great contribution of Zadeh [2] . When let fuzzy membership de-
gree represents one index belonging to C k class, so providing the following questions: If mi = 1 , that is i index only has one base index j , doubtlessly, the membership degree μ k ( x j ) of the value x j of j index belonging to C k class is the membership degree of i index belonging to C k class. But when mi ≥ 2 , the status changes: how to determine the membership degree of i index according to the membership degree of mi base indexes? That is how to realize membership degree transformation from membership degree of j index to membership degree of i index. Because it is inevitable in any multi-indexes decisionmaking, must be answered explicitly. For a hierarchical structure, if obtaining membership degree of i index belonging to C k class, it can obtain membership degree from intermediate level to top general goal Z belonging to C k class. And every membership degree transformation in every level can be summarized in the following membership transformation model: Suppose that there are m indexes affecting object Q , where importance weight of j ( j = 1 ~ m ) index about object Q is λ j (Q ) that satisfies: 0 ≤ λ j (Q ) ≤ 1 , ∑ λ j (Q ) = 1 m
j =1
(2)
376
K. Liu et al.
Every index is classified into P classes. CK represents the K th class and CK is prior to CK+1.If the membership μ jK (Q) of j th index belonging to CK is given, where K = 1 ~ P and j = 1 ~ m , and μ jK (Q) satisfies: P
0 ≤ μ jK (Q ) ≤ 1 , ∑ μ jK (Q) = 1
(3)
K =1
What is the membership μ K (Q) of object Q belonging to CK? Obviously, if the above conversion method is correct or not, which determines that the evaluation result is credible or not. For the above membership transformation, there are 4 transformation methods in fuzzy comprehensive evaluation: M (Λ, V ) , M (•, V ) , M (Λ, ⊕) and M (•, + ) . However through a long-time research on the application, only M (•, + ) is accepted by most researchers, which regards object membership as “weighted sum”: μk (Q ) = ∑ λ j (Q ) ⋅ μ jk (Q ), (k = 1 ~ p ) m
(4)
j =1
And the “ M (•, + ) ” method as the mainstream membership transformation algorithm is widely used [4-9]. And above method is basic method realizing membership transformation from universe U fuzzy set to universe V fuzzy set in fuzzy logical system. But M (•, +) method is in dispute in academic circles especially in application field. For example, Ref. [10,11] pointed out that the “weighted sum” method was too simple and did not use information sufficiently. The authors proposed a “subjective and objective comprehensive” method based on evidence deduction and rough sets theory to realize membership transformation. In [11], in the improved fuzzy comprehensive evaluation, a new “comprehensive weight” is defined to compute “weighted sum” instead of index importance weight. Ref. [12~14] define over proof weight to compute “weighted sum”; Ref. [15] avoid membership degree transformation from index to goal, compute goal membership degree by optimal weight in fuzzy pattern recognition. However, including these mentioned methods, many existing membership transformation methods are not designed for object classification, thus they can’t indicate “which parts in index membership are useful for object classification and which parts are useless”. The redundancy of membership degree transformation shows that: the correct method realizing membership degree transformation is not found, which need further study. For the redundant data in existing membership transformation methods, based on data mining of entropy, mining knowledge information about object classification hidden in every index, affirming the relation of object classification and index membership, eliminating the redundant data in index membership for object classification by defining distinguishable weight, therefore, exploring the concrete way to compute object membership degree without the interference of redundant data.
2 Distinguishable Weight and Effective Value of Membership
K
th Class Index
From the viewpoint of classification, what are concerned most are these following questions: Dose every index membership play a role in the classification of object Q ?
The Method Research of Membership Degree Transformation
377
Are there redundant data in index membership for the classification of object Q ? These questions are very important. Because their answers decide which index membership and which value are qualified to compute membership of object Q . To find the answers, we analyze as follows. 2.1 Distinguishable Weight (1)Assume that μ j1 (Q) = μ j 2 (Q) = L = μ jp (Q ) , then j th index membership implies that the probability of classifying object Q into every grade is equal. Obviously, this information is of no use to the classification of object Q . Deleting j th index will not affect classification. Let α j (Q) represent the normalized and quantized value describing j th index contributes to classification, then in this case α j (Q) = 0 . (2) If there exists an integer K satisfying μ jk (Q) = 1 and other memberships are zero, then j th index membership implies that Q can be only classified into
C k . In
this case, j th index contributes most to classification and α j (Q) should obtain its maximum value. (3) Similarly, if μ jk (Q) is more concentrated for K , j th index contributes more to classification, i.e., α j (Q) is larger. Conversely, if μ jk (Q) is more scattered for K , j th index contributes less to classification, i.e., α j (Q) is smaller. The above (1)~(3) show that α j (Q) , reflecting the value that j th index contributes to classification, is decided by the extent μ jk (Q) is concentrated or scattered for K . And it can be described quantitatively by the entropy H j (Q) . Therefore, α j (Q)
is a function of H j (Q) : p
H j (Q) = − ∑ μ jk (Q) ⋅ logμ jk (Q) k =1
v j (Q) = 1 −
1 H j (Q) log p
m
α j (Q) = ν j (Q) ∑ν t (Q)
( j = 1 ~ m)
t =1
(5) (6) (7)
Definition 1. If μ jk (Q) (k = 1 ~ p, j = 1 ~ m) is the membership of j th index belonging to C k and satisfies Eq. (1); Given by (4) (5) (6), α j (Q) is called distinguishable weight of j th index corresponding to Q . Obviously, α j (Q) satisfies 0 ≤ α j (Q) ≤ 1 ,
m
∑ α j (Q) = 1
(8)
j =1
2.2 Effective Value of Index Membership The significance of α j (Q) lies in its “distinguishing” function, i.e., it is a measure that reveals the exactness of object Q being classified by j th index membership and even
378
K. Liu et al.
the extent of the exactness. If α j (Q) = 0 , from the properties of entropy, then μ j1 (Q) = μ j 2 (Q) = L = μ jp (Q ) . This implies j th index membership is redundant and
useless for classification. Naturally the redundant index membership can’t be utilized to compute membership of object Q . Definition 2. If μ jk (Q) (k = 1 ~ p, j = 1 ~ m) is the membership of j th index belonging to C k and satisfies Eq. (1), and α j (Q) is the distinguishable weight of j th index corresponding to Q , then α j (Q) ⋅ μ jk (Q)
(k = 1 ~ p)
(9)
is called effective distinguishable value of K th class membership of j th index, or K th class effective value for short. If α j (Q) = 0 , it indicates that j th index membership is redundant and useless for the classification of object Q , so it can not be utilized to compute membership of object Q . Note that if α j (Q) = 0 , then α j (Q) ⋅ μ jk (Q) = 0 . So in fact computing K th class membership μ k (Q) of object Q isn’t to find μ jk (Q) but to find α j (Q) ⋅ μ jk (Q) . This is a crucial fact. When index membership is replaced by effective value to compute object membership, distinguishable weight is a filter. In the progress of membership transformation, it can delete the redundant index memberships that are useless in classification and the redundant values in index membership.
3 Comparable Value of K th Class Index Membership and Membership Transformation Undoubtedly, α j (Q) ⋅ μ jk (Q) is necessary for computing μ k (Q) . However the problem is in general K th class effective values of different indexes aren’t comparable and can’t be added directly. Because, for determining K th class membership of object Q , in most cases these effective values are different in “unit importance”. The reason is, generally, index membership doesn’t imply relative importance of different indexes. So when using K th class effective value to compute K th class membership, K th effective value must be transformed into K th class comparable effective value. 3.1 Comparable Value Definition 3. If α j (Q) ⋅ μ jk (Q) is K th class effective value of j th index, and β j (Q) is importance weight of
j th index related to object
β j (Q) ⋅ α j (Q) ⋅ μ jk (Q)
(k = 1 ~ p)
Q , then
(10)
is called comparable effective value of K th class membership of j th index, or K th class comparable value for short. Clearly, K th class comparable values of different indexes are comparable between each other and can be added directly.
The Method Research of Membership Degree Transformation
379
3.2 Membership Transformation Definition 4. If β j (Q) ⋅ α j (Q) ⋅ μ jk (Q) is K th class comparable value of j th index of Q , where ( j = 1 ~ m) , then m
M k (Q ) = ∑ β j (Q) ⋅ α j (Q) ⋅ μ jk (Q) j =1
(k = 1 ~ p)
(11)
is named K th class comparable sum of object Q . Obviously, the bigger M k (Q) is, the more possibly that object Q belongs to C K . Definition 5. If M k (Q) is K th class comparable sum of object Q , and μ k (Q) is the membership of object Q belonging to C K , then Δ
p
μ k (Q ) = M k (Q) ∑ M t (Q) t =1
(k = 1 ~ p)
(12)
Obviously, given by Eq.(11), membership degree μ k (Q) satisfies: p
0 ≤ μ k (Q) ≤ 1 , ∑ μ k (Q ) = 1 k =1
(13)
Up to now, supposing that index membership and index importance weight are given, by Eq. (5) (6) (7)(11 (12), the transformation from index membership to object membership is realized. And this transformation needs no prior knowledge and doesn’t cause wrong classification information. The above membership transformation method can be summarized as “effective, comparison and composition”, which is denoted as M (1,2,3) .
4 Case Reinforced concrete beam bridge is consist of 7 components including main beam, pier platform, foundation et al. So the reliability is decided by 7 components; and the reliability of every component is effected by concrete factors including carrying capacity, distortion, fracture et al. therefore, the reliability evaluation of defect status of beam bridge is a three levels hierarchical structure [20]. Such as Fig.1. 4.1 Fuzzy Evaluation Matrix By Fig.1, the reliability evaluation of defect status of Beam Bridge is a three levels hierarchical structure. Ref.[20] determines the importance weights of 7 sub-indexes belonging to the reliability evaluation of defect status of beam bridge and importance weights of indexes belonging to every intermediate level by analytic hierarchy process; and according to one beam bridge, determining the membership degree vector of every base index in 5 evaluation classes {good, relatively good, medium, poor, very poor}. At last, obtain the fuzzy evaluation matrix as Table 1.
380
K. Liu et al.
Carrying capacity B11 Main beam
A1
Diaphragm
A2
Distortion B12 Fracture B13
Carrying capacity B21 Distortion B22 Fracture B23
Support
A3 Carrying capacity B41
A4
Bent beam Defect status of Beam Bridge
Distortion B42 Fracture B43 Carrying capacity B51
Pier platform
Pile foundation
Foundation
Distortion B52 Fracture B53
A5
Carrying capacity B61
A6
Distortion B62 Fracture B63
A7
Fig. 1. The reliability evaluation of defect status of Beam Bridge
In Table 1, the figures in parentheses corresponding to the indexes are their importance weights, The vectors behind the lower indexes are their membership vectors including 5 grades. The figures in table are from Ref.[20]. 4.2 Steps in the M (1,2,3) Method As data in table 1, evaluation process as following (1) base evaluation Taking the membership degree transformation from Carrying capacity B11 Fracture B13 to Main beam A1 for example, steps as following:
Distortion B12
、
①By the evaluation matrix of A
、
1
⎛ 0 .1 ⎜ U ( A1 ) = ⎜ 0 ⎜ 0 ⎝
0 .3 0 .4
0 .6 0 .5
0 0 .1
0 .2
0 .4
0 .4
0⎞ ⎟ 0⎟ 0 ⎟⎠
By the j th row ( j = 1 ~ 3) of U ( A1 ) , the distinguishable weights of B1 j are obtained and the distinguishable weight vector is α ( A1 ) = (0.3682,0.3447,0.2871)
The Method Research of Membership Degree Transformation
381
Table 1. Fuzzy evaluation of the reliability evaluation of defect status of Beam Bridge Class membership degree
Component level
Goal
Factor level
{good, relatively good, medium, poor, very poor}
Carrying capacity B11 (0.680)
(0.1,0.3,0.6,0,0)
Distortion B12 (0.170)
(0,0.4,0.5,0.1,0)
Fracture B13 (0.150)
(0,0.2,0.4,0.4,0)
Main beam A1 (0.21)
Carrying capacity B 21 (0.850)
(0,0.3,0.7,0,0)
Distortion B 22 (0.075)
(0,0.2,0.7,0.1,0)
Fracture B 23 (0.075)
(0,0.2,0.5,0.3,0)
Carrying capacity B 41 (0.700)
(0.1,0.5,0.4,0,0)
Distortion B 42 (0.150)
(0.2,0.5,0.3,0,0)
Fracture B 43 (0.150)
(0.1,0.6,0.3,0,0)
Carrying capacity B51 (0.800)
(0.4,0.3,0.3,0,0)
Distortion B52 (0.130)
(0.3,0.5,0.2,0,0)
Fracture B53 (0.070)
(0.4,0.4,0.2,0,0)
Carrying capacity B61 (0.860)
(0.5,0.3,0.2,0,0)
Distortion B62 (0.070)
(0.4,0.5,0.1,0,0)
Fracture B62 (0.070)
(0.5,0.4,0.1,0,0)
Diaphragm A2 (0.06)
Support A3 (0.03) The reliability of Defect status of Beam Bridge
(0,0.5,0.5,0,0)
Bent beam A4 (0.15)
Z
Pier platform A5 (0.23)
Pile foundation A6 (0.24)
Foundation souring A7 (0.08)
(0.6,0.4,0,0,0)
②The importance weight vector of B11 ~ B13 is given as β ( A1 ) = (0.680, 0.170, 0.150)
③Calculate the K th comparable value of B1 j ( j = 1,2 L 4 ) and obtain the comparable value matrix of A1 : 0 0⎞ ⎛ 0.0250 0.0751 0.1502 ⎟ ⎜ N ( A1 ) = ⎜ 0 0.234 0.0293 0.0059 0 ⎟ ⎜ 0 0.0086 0.0172 0.0172 0 ⎟⎠ ⎝
④Compute the comparable sum of main beam A1 and obtain the comparable sum vector M ( A1 ) = (0.0250, 0.1072, 0.1968, 0.0231, 0)
⑤Compute the membership vector of main beam A1 μ ( A1 ) = (0.0711, 0.3044, 0.5589, 0.0656, 0)
、Bent beam A 、 、Pile foundation A that are μ ( A ) , μ ( A ) , μ ( A ) , μ ( A ) , and the membership degree vectors of Support A 、 Foundation souring A is given, as Similarly, obtain membership degree vectors of Diaphragm A2
Pier platform A5
6
2
3
4
4
5
6
7
μ ( A3 ) μ ( A7 ) , the fuzzy evaluation matrix U ( Z ) of the reliability evaluation of
,
382
K. Liu et al.
defect status of Beam Bridge Z is consist of μ ( A1 ) μ ( A5 )
、 μ ( A ) 、 μ ( A ) ,as following: 6
、 μ( A ) 、 μ( A ) 、 μ( A ) 、 2
3
4
7
⎛ μ ( A1 ) ⎞ ⎛ 0 . 0711 ⎟ ⎜ ⎜ ⎜ μ ( A2 ) ⎟ ⎜ 0 ⎜ μ(A )⎟ ⎜ 0 3 ⎟ ⎜ ⎜ U ( Z ) = ⎜ μ ( A4 ) ⎟ = ⎜ 0 . 1132 ⎟ ⎜ ⎜ ⎜ μ ( A5 ) ⎟ ⎜ 0 . 3858 ⎜ μ ( A6 ) ⎟ ⎜ 0 . 4921 ⎟⎟ ⎜⎜ ⎜⎜ ⎝ μ ( A7 ) ⎠ ⎝ 0 . 6
0 . 3044
0 . 5589
0 . 0656
0 . 2891 0 .5 0 . 5162 0 . 3357
0 . 6909 0 .5 0 . 3707 0 . 2785
0 . 0200 0 0 0
0 . 3236 0 .4
0 . 1842 0
0 0
0⎞ ⎟ 0⎟ 0⎟ ⎟ 0⎟ ⎟ 0⎟ 0⎟ ⎟ 0 ⎟⎠
(2) Top evaluation By matrix U (Z ) and the importance weight vector (0.21, 0.06, 0.03, 0.05, 0.23, 0.24, 0.08) , the membership vector U ( Z ) of Z can be obtained using the similar algorithm in: μ ( Z ) = (0.2840, 0.3660, 0.3362, 0.0138, 0)
(3) class of reliability evaluation Let the class C1 (good), C 2 (relatively good), C 3 (medium), C 4 (poor), C 5 (very poor) quantitative vector is (m1 , m2 , m3 , m4 , m5 ) = (5, 4, 3, 2, 1 ) ,the reliability of defect status of Beam Bridge Z is 5
η ( Z ) = ∑ mk ⋅ μ k ( Z ) k =1
(13)
In this study η ( Z ) = 3.9200 , because the η (Z ) is near to 4, then Z belongs to “relatively good” class.
5 Conclusions The conversion of membership degree is the key computation of fuzzy evaluation for multi-indexes fuzzy decision-making, but the transformation method has question, analysis the reason of the question, obtain the solving method, at last build the M (1, 2, 3) model without the interference of redundant data, which is different from M (•, + ) and is nonlinear model. M (1, 2, 3) provides the general method for membership transformation of multi – indexes decision-making in application fields. The theory value is that it provides transformation method which is comply to logics to realize the transformation universe U fuzzy set to universe V fuzzy set in fuzzy logical system. From index membership degree of base level, after obtain one index membership degree vector in adjacent upper level by M (1,2,3) , thus, by the same computation, obtaining membership degree vector of top level. Because of normalization of computation, M (1,2,3) is suitable for membership transformation which contains multi-levels, multi-indexes, large data.
The Method Research of Membership Degree Transformation
383
References Saaty, T.L.: The Analytic Hierarchy Process. University of Pittsburgh, Pittsburgh (1988) Zadeh, L.A.: Fuzzy Sets. Information and Control 8, 338–353 (1965) Qin, S.-K., et al.: The theory and application of comprehensive evaluation, p. 214. Electronic industry publishing house, Bei-Jing (2003) Geng, X.-f., Liu, K., Wang, D.-z.: Fuzzy Comprehensive Evaluation For Supply Chain Risk. Logistics Technology 26(8), 164–167 (2007) Xiao, L., Dai, Z.-k.: Multi-Levels Fuzzy comprehensive Evaluation Model For Risk Of Information System. Journal Of Sichuan University 36(5), 98–102 (2004) Li, H.-t., Liu, Y., He, D.-q.: The Risk Evaluation Method For Reliability Of Information System engineering. Journal of Beijing Jiaotong University 29(2), 62–64 (2005) Guozhong, M., Wenyong, M., Xiaodong, L.: Multi-level fuzzy evaluation method for civil aviation system safety. Journal of Southwest Jiaotong University 42(1), 104–109 (2007) Jun, Y., Jianlin, W., Pei, S., et al.: Method of second comprehensive safety evaluation and its application to oil safety evaluation. China Safety Science Journal 17(6), 135–138 (2007) Xianbin, Z., Guoming, C.: Research on fuzzy comprehensive evaluation method for oil & gas pipeline failure based on fault tree analysis. Systems Engineering-theory & Practice (2), 139– 144 (2005) Guanglong, H., Zhonghua, S., Zhaotong, W.: A method of comprehensive evaluation with subjective and objective information based on evidential reasoning and rough set theory. China Mechanical Engineering 12(8), 930–934 (2001) Guo, J., Guo, J., Hu, M.-x.: The improvement on project risk fuzzy evaluation. Industrial Engineering Journal 10(3), 86–90 (2007) Zeng, M.-r., Wang, C.-h.: The application of Fuzzy math in quality of water evaluation. Fujian environment 16(5), 7–9 (1999) Lin, Y., Xiao-ling, L.: The application of Fuzzy math in quality of water evaluation for Huang Shui area. Environment detection of China 16(6), 49–52 (2000) Mei, X.-b., Wang, F.-g., Cao, J.-f.: The application and study of fuzzy comprehensive evaluation in quality of water evaluation. Global Geology 19(2), 172–177 (2000) Tian, J.-h., Qiu, L., Chai, F.-x.: The application of fuzzy identification in quality of evaluation. Journal of Environmental Sciences 25(7), 950–953 Zhang, W.-q.: The Current status and prospect Of Research & Application For Data Minging. Statistics & Information Forum 19(1), 95–96 (2004) Jia, L., Li, M.: The losing model of customers of telecom based on data mining. Computer Engineering And Applications, 185–187 (2004) Yang, W.-x., Ren, X.-m., Qin, W.-y., et al.: Research of complex equipment fault diagnosis method. Journal Of Vibration Engineering 13(5), 48–51 (2000) Gao, Y.-L.: Data mining and application in project diagnosis. Xi’An Jiaotong University (2000) Lu, Y., He, S.: Fuzzy reliability evaluation of defective RC beam bridge. Journal of Traffic and Transportation Engineering 5(4), 58–62 (2005)
Study on Information Fusion Based Check Recognition System Dong Wang
Abstract. Automatic check recognition techniques play an important role in financial systems, especially in risk management. This paper presents a novel check recognition system based on multi-cue information fusion theory. For Chinese bank check, the amount can be independently determined by legal amount, courtesy amount, or E13B code. The check recognition algorithm consists of four steps: preprocessing, check layout analysis, segmentation and recognition, and information fusion. For layout analysis, an adaptive template matching algorithm is presented to locate the target recognition regions on the check. The hidden markov model is used to segment and recognize legal amount. Courtesy and E13B code are recognized by artificial neural network method, respectively. Finally, D-S evidence theory is then introduced to fuse above three recognition results for better recognition performance. Experimental results demonstrate that the system can robustly recognize checks and the information fusion based algorithm improves the recognition rate by 5~10 percent.
1 Introduction For its negotiability and security, bank check has been widely used in financial systems. And the amount of used checks increases rapidly year by year. However, most checks are still processed by the traditional manual way, which is inefficient and inconvenient for risk management. The automatic processing techniques mainly use the optical character recognition (OCR) algorithms to recognize the digits and characters in check images, which provides an accurate, efficient and secure check processing mode without human intervention. The techniques have attracted more and more attentions recently and involve multiple disciplines such as artificial intelligent, image processing, fuzzy mathematics and pattern recognition. Although the check recognition techniques are used in financial fields recently, it has been one of the most active research topics in the character recognition fields due to its extensive market demands. The A2iA-CheckReader system developed by the French A2iA company is a successful check recognition system, which has been used by several commercial banks due to its high processing speed and recognition rate. The center of excellence for document analysis and recognition in State University of New York at Buffalo and centre for pattern recognition and machine intelligence in Canadian Concordia University have also developed advanced check recognition system. In China, research on check recognition has made a great progress recently. The Finance OCR system developed by pattern recognition and intelligent system lab of Beijing University of Posts and Telecommunications can recognize the courtesy and legal amounts, account number and date of issue. Extensive experimental results show that the recognition rate is 63.23% when the false true rate less than 0.1%. Tian et al [3] Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 384–391, 2009. © Springer-Verlag Berlin Heidelberg 2009
Study on Information Fusion Based Check Recognition System
385
proposed an unconstrained handwritten Chinese string recognition method based on the Hidden Markov Model (HMM) which incorporates both segmentation and recognition. Zhu et al [4] proposed an improved nonlinear normalization method for recognition of unconstrained handwritten Chinese characters based on density equalization of the exact character area. The experimental results show that the correct recognition rate is improved by around 1.5%. Zhao et al [5] investigate the check automatic faulttolerant recognition technologies based on electronic payment cipher and establish a prototype of check recognition system. This paper presents a information fusion based check recognition system, which is composed of four main modules including preprocessing, check layout analysis, character segmentation and recognition, information fusion. The system sketch map is shown in Fig.1.The preprocessing step is used for image binaryzation, noise removal and skew correction. The check layout analysis step locates the target region with an adaptive template searching algorithm. The legal amount characters are segmented and recognized based on a HMM algorithm, and the courtesy amount characters and E13B code are recognized by an artificial neural network method. The information fusion step improves recognition rate by combining above recognition results with DS evidence theory. The experimental results show that our system achieves fast and robust check recognition and the recognition rate is improved by around 5%-10% with information fusion algorithms.
Fig. 1. Sketch map of our system
Fig. 2. Check template
2 Preprocessing and Layout Analysis The check images always suffer from various noises such as bank seals, spots, wrinkled lines, optical noises. The preprocessing for check images includes binaryzation, noise removal and skew correction. This paper uses Ostu algorithm [6] to calculate the threshold value for image binaryzation. We employ median filter and connected component analysis to remove image noises. The skew correction is realized by the two following steps: (1)calculate the skew angle by Hough transformation; (2)rotate the image with the detected skew angle. The layout analysis is used to locate the target rectangle region and provide efficient character extraction algorithms. Considering that the layout of Chinese bank check is fixed, we propose an adaptive template searching algorithm to achieve the
386
D. Wang
character location and extraction. The used check template is shown as Fig.2, where three black rectangles represent legal amount, courtesy amount and E13B code, respectively. The red rectangle indicates the rectangle borderline printed on the check, which can be regarded as the reference point for template matching. The algorithms are composed of following steps: (1)apply the Hough transformation to detect the bottom and top borderlines around the red rectangle region, and accordingly adjust the template positions; (2)calculate the vertical and horizontal projection histograms for the red rectangle region. With the histograms, we can detect whether the character string is beyond the borderlines or not. As shown in Fig.3b, the legal amount characters are beyond the bottom borderlines. We take an iterative algorithm of dynamically adjust the borderline position to be T + ΔT until the entire string is included in the target region, which is shown as Fig.3c; (3) determine the minimum enclosing rectangle and achieve character string extraction based on the vertical and horizontal projection histograms.
(a) Source image
(b) The character string beyond the bottom borderline
(c) Dynamic borderline adjustment result Fig. 3. Results of adaptive template searching algorithm
3 Legal Amount Recognition Character segmentation and recognition are not two independent processing modules. On one hand, the character recognition is based on the segmentation results. On the other hand, the recognition results can be used to improve the segmentation accurateness. Knerr et al [7] introduced HMM algorithm for French check recognition. Tian et al [3] proposed a legal amount segmentation and recognition algorithm based on HMM algorithm. This paper also uses HMM algorithm to recognize the legal amount. According to the distinction of vertical project, the legal amount
Study on Information Fusion Based Check Recognition System
387
characters are divided into 26 basic units, which are shown in Fig.4. For combinations of at least one unit and 3 units at most, and meanwhile at most 2 characters, we can obtain N = 366 states. The HMM parameters (π , A, B ) are determined by training procedure. The initial state distribution is represented by π = (π1 ,L, π N ) . A = (aij ) N × N denotes the state transition probability distribution, where aij = P(ω tj+1 | ωit ) . B = (b jk ) N × M denotes the observation symbol probability distribution in state ω j , where b jk = P (okt | ω jt ) and M is the number of distinct observation symbols per state.
Fig. 4. 26 basic units used in our algorithm
Fig. 5. Types of line elements
This paper employs the directional element feature of stroke in sub-regions to determine the observation state sequence. Firstly, the character image is normalized as 64×64 pixels which is divided into 8×8 sub-regions. Then calculate the number of contour points belonging to four types of line elements, respectively. The four types of line elements are vertical, horizontal and two oblique lines slanted at ±45 degree. The normalized numbers are used to generate the feature vector. Finally, the observation state sequence is determined by recognition algorithms. The common used line element types are shown in Fig.5, where (a)-(d) possess one type of line element, and (e)-(l) possess two type of line elements. Our proposed HMM-based check recognition algorithm is composed of two main steps. The model training step calculate the HMM model parameter λ = (π , A, B) and the parameters for observation state generator. The recognition algorithm includes following steps: (1) segment the character string with appropriate threshold value based on the vertical projection histogram. (2) extract directional line element feature for each segmented parts and generate observation state sequence by recognition algorithms accordingly. (3) find the optimal hidden state sequence using the Viterbi algorithm [8], which is formulized as Eq.1.
δ t (i) = max P(q1 ,L, qt , qt = ω i , o1 ,L, ot / λ ) q1 ,L, qt −1
(1)
where q1 , q2 L , qt with qt = ω i is the optimal results for observation sequence o1 , o2 ,L, ot . Our proposed algorithm works efficiently and robustly on two sets of check samples. There are 900 check images in sample set I, of which 200 images are used for training and the others for testing. In sample set II, 600 images out of 2400 samples are used for training. The recognition rate of our algorithm is 90.32% and 91.68%, respectively. The details are shown in Table 1.
388
D. Wang Table 1. Legal amount recognition results Sample set I
Sample set II
Character number in training set
3581
10154
Character number in testing set
12935
32092
Recognized character number
11683
29422
Recognition rate (%)
90.32
91.68
4 Recognition of Courtesy Amount and E13B The artificial neural network is used to recognize courtesy amount and E13B code in the check images in this paper. The artificial neural network [9] is a nonlinear classifier which is commonly used for classification tasks with several classes, such as English letter or digit recognition. It consists of an interconnected group of artificial neurons and processes information using a connectionist approach to computation. This method provides a promising way of modeling complex classification problem. With the help of printed rectangular grids, the courtesy amount can be simply segmented based on grid region analysis. The E13B code was printed on the check, which can be segmented using the margin region between characters. The samples of courtesy amount and E13B amount code are shown in Fig.6. The segmented characters are normalized to 32×32 pixel images after the smooth process and noise removal. The features for recognition include black pixel distribution, stroke line elements and frequency coefficients. The character was firstly divided into 4×4 grids, and the number of black pixels in each sub-region is normalized to [0,1] interval and generate a 16-dimensional feature vector. Stroke line elements has already been described in section 3, extract 4 directional features in each sub-region and generate 64dimensional feature vectors. The frequency feature is composed of 36 coefficients which are from the low frequency space obtained by fast fourier transform (FFT). The feature of digit images is represented a 116-dimensional feature vector obtained by concatenation of above three features. The artificial neural networks are firstly trained with labeled samples. When new image comes, the extracted feature vector is used as input data, the output data of the artificial neural networks are the recognition results.
Fig. 6. Courtesy amount and E13B amount code
In this section, we also use the two sample sets for experiments. The courtesy amount recognition rate is 92.86% and 93.44%, respectively. The E13B amount code recognition rate is 94.73% and 95.91% respectively. More details are shown in Table 2.
Study on Information Fusion Based Check Recognition System
389
Table 2. Courtesy amount recognition results Sample set I
Sample set II
Character number in training set
3396
9845
Character number in testing set
10691
29408
Recognized character number
9928
27479
Recognition rate (%)
92.86
93.44
Table 3. E13B code recognition results Sample set I
Sample set II
Character number in training set
2200
6600
Character number in testing set
7700
19800
Recognized character number
7294
18990
Recognition rate (%)
94.73
95.91
5 Information Fusion for Check Recognition The legal amount, courtesy amount and E13B amount code are recognized independently. The three recognition results are fused using D-S evidence theory [10] to improve the recognition rate in this section. Given the frame of discernment Θ , the basic probability assignment (BPA) func-
tion is defined as m: 2Θ → [0, 1] , such that
⎧⎪ m(Φ ) = 0 (2) ⎨ ∑ m( A) = 1 ⎪⎩ A ⊂ Θ With the BPA functions, the belief function and plausibility function are defined as follows:
Bel ( A) = ∑ m( B)
(3)
B⊆ A
Pl ( A) = 1 − Bel ( A) =
∑ m( B )
B ∩ A≠ Φ
In this paper, the frame of discernment is Θ = {0,1,L, 9} BPA function is defined as follows: ⎧m( Ai ) X = Ai 9 ⎪⎪ m( X ) = ⎨1 − ∑ m( A j ) X = Θ ⎪ j =0 othervise ⎩⎪ 0
(4)
, A = {i} i = 0,L,9 , the i
(5)
For legal amount, courtesy amount and E13B amount code, the belief functions are defined as Bel1 ( A) , Bel2 ( B ) and Bel3 (C ) , respectively; the BPA functions are
390
D. Wang
represented by m1 ( A) , m2 ( B) and m3 (C ) , respectively. The fused BPA function is calculated by Dempster's rule: m( X ) =
∑ m1 ( A)m2 ( B )m3 (C ) A∩ B ∩C = X 1−
(6)
∑ m1 ( A)m2 ( B)m3 (C )
A ∩ B ∩ C =Φ
The fused belief function and plausibility function can be determined by Equ.3 and Equ.4. The final recognition results are calculated by following Equ.7: j = max m( Ai )
(7)
i
The comparison results with other three recognition algorithms are shown in Table 4. The extensive experimental results demonstrated that the information fusion based algorithm in conjunction with D-S evidence theory can improve the recognition rate considerably. Table 4. Comparison recognition results Sample set I
Sample set II
Legal amount (%)
90.32
91.68
Courtesy amount (%)
92.86
93.44
E13B code (%)
94.73
95.91
D-S fusion (%)
98.82
99.14
6 Conclusion and Future Work This paper proposed an information fusion based check recognition system. We focus on check layout analysis and recognition algorithms for legal amount, courtesy amount and E13B amount code. The D-S evidence theory is introduced to fuse multiple results for improving recognition rate. The experimental results show that the system can process check images automatically and robustly. Some valuable extensions include: (1) automatic check classification with discrimination methods of printed and handwritten characters; (2) combine multimodal information to improve the system robustness. For example, the E13B code recognition results from a MICR reader can make the check recognition performance more accurate and powerful.
References [1] Gorski, N., Anisimov, V., Augustin, E., Baret, O., Maximov, S.: Industrial bank check processing: the A2iA CheckReader. International Journal on Document Analysis and Recognition 3, 196–206 (2001) [2] Xu, W.: A Research on Key Techniques in Bank Cheque OCR System Based on Statistical Classifier, PhD thesis, Beijing University of Posts and Telecommunications (2003) [3] Tian, S., Ma, G., et al.: Unconstrained handwritten Chinese string recognition system for the amount on bank checks. Journal of Tsinghua University 42(9), 1228–1232 (2002)
Study on Information Fusion Based Check Recognition System
391
[4] Zhu, N., Zeng, S., et al.: An Improved Nonlinear Normalization Method and Its Application to Handwritten Legal Amount Recognition on Chinese Cheque. Journal of Computer Aided Design & Computer Graphics 17(6), 1246–1251 (2005) [5] Zhao, B., Wang, Y., et al.: Research on Check Automatic Fault-tolerant Recognition System Based on Electronic Payment Cipher. Systems Engineering-Theory & Practice 7, 13– 17 (2000) [6] Otsu, N.: A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. On Systems, Man, and Cybernetics 9(1), 62–66 (1979) [7] Knerr, S., Augustin, E., Baret, O.: Hidden Markov Model based word recognition and its application to legal amount reading on French checks. Computer Vision and Image Understanding 70(3), 404–419 (1998) [8] David, G., Forney, J.R.: The Viterbi algorithm. Proc. of the IEEE 61(3), 268–278 (1973) [9] Chen, Y., Wang, X., et al.: Artificial neural network theory and its applications. China Electric Power Press, Beijing (2002) [10] Shafer, G.: A mathematical theory of evidence. Princeton University Press, Princeton (1976)
Crisis Early-Warning Model Based on Exponential Smoothing Forecasting and Pattern Recognition and Its Application to Beijing 2008 Olympic Games Baojun Tang and Wanhua Qiu School of Economics and Management, Beijing University of Aeronautics and Astronautics, Beijing 100191, China
[email protected] Abstract. A large number of methods like discriminant analysis, logic analysis, recursive partitioning algorithm have been used in the past for the business failure prediction. Although some of these methods lead to models with a satisfactory ability to discriminate between healthy and bankrupt firms, they suffer from some limitations, often due to only give an alarm, but cannot forecast. This is why we have undertaken a research aiming at weakening these limitations. In this paper, we propose an Exponential Smoothing Forecasting and Pattern Recognition (ESFPR) approach in this study and illustrate how Exponential Smoothing Forecasting and Pattern Recognition can be applied to business failure prediction modeling. The results are very encouraging, and prove the usefulness of the proposed method for bankruptcy prediction. The Exponential Smoothing Forecasting and Pattern Recognition approach discovers relevant subsets of financial characteristics and represents in these terms all important relationships between the image of a firm and its risk of failure.
1 Introduction The development and use of models, able to predict failure in advance[1, 2], can be very important for the firms in two different ways. First, as “early warning systems”, such models can be very useful for those (i.e. managers, authorities) who have to prevent failure. Second, such models can be useful in aiding decision-makers of financial institutions in charge of evaluation and selection of the firms. This paper presents a new method called Exponential Smoothing Forecasting and Pattern Recognition approach for the analysis and prediction of business failure.
2 The Design of Crisis Early-Warning Model 2.1 Forecasting Process Exponential Smoothing Forecasting Method [3, 4] is a typical time sequence forecasting method. It is supposes that the future prediction value is related to the known value. And the recent data has a bigger influence on prediction value, and the long-term data Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 392–398, 2009. © Springer-Verlag Berlin Heidelberg 2009
Crisis Early-Warning Model Based on Exponential Smoothing Forecasting
393
has a less. The influence presents decrease of geometric series. Its advantage is that it could save a lot of data and decrease quantity of dealing with data. Exponential smoothing Forecasting Method calculates forecasting value of next time by regarding currently actual value and weighted average value of last time exponential smoothing value as currently Exponential Smoothing value. Its mathematics expression is
S (t ) (1) = αx(t ) + (1 − α ) S (t − 1) (1)
(1)
S (t ) is exponential smoothing value of the t period; S (t − 1) is exponential x(t ) is actual value of t period, t is the period smoothing value of the t − 1 period; number (t=1 2 …k); α is smoothing coefficient (0 ≤ α ≤ 1) . Where
,,
The smoothing coefficient represents the allotment value of the new and old data, the size of α value shows different period factor to act as different function in the process of forecasting. When the α is smaller, such as between 0.1 0.3, the influence of recent orientation change is less, then exponential smoothing value represents long-term trend of time sequence; When the α is bigger, such as between 0.6 0.8, the recent change reaction is sensitive, then exponential smoothing value represents influence of recent orientation. If α value can't be judged, we can try to calculate different α values, and selects best value of least error for forecast. The formula (1) is called linear exponential smoothing. In order to improve tallying degree of exponential smoothing against time sequence, when the time sequence presents distribute of linear trend, it is smoothed again on the linear smoothing. Its purpose is to revise the lagging deviation of linear exponential smoothing. The calculation formula of quadratic exponential smoothing is
~
~
S (t ) ( 2) = αS (t ) (1) + (1 − α ) S (t − 1) ( 2)
(2)
If time sequence presents nonlinear distribution, then cubic exponential smoothing is needed to estimate, the formula is
S (t ) (3) = αS (t ) ( 2 ) + (1 − α ) S (t − 1) (3)
(3)
Hence forecasting model is
v(t + T ) = a(t ) + b(t )T + c(t )T 2
(4)
v(t + T ) is forecasting value, T is the number of period from current time to a (t ) , b(t ) , c(t ) are respectively forecasting time, parameters Where
a(t ) = 3S (t ) (1) − 3S (t ) ( 2) + S (t ) (3) b(t ) =
a
2(1 − a )
2
[(6 − 5a )S (t )
(1)
− 2(5 − 4a )S (t ) ( 2) + (4 − 3a )S (t ) ( 3)
(5)
]
(6)
394
B. Tang and W. Qiu
[
a2 c(t ) = S (t ) (1) − 2 S (t ) ( 2) + S (t ) (3) 2 2(1 − a )
]
(7)
Exponential Smoothing Method is an iterated calculation process. Initial value
S (0) (1) , S (0) ( 2 ) and S (0) (3) must be determined firstly when we will forecast. When time sequence is very longer and
α
value is very bigger, the initial value has
less influence on forecasting. According to general experience,
S (0) (1) , S (0) ( 2 ) and
S (0) (3) could be original value of the first period, or take average value of the front three periods. 2.2 Pattern Recognition Process Pattern Recognition [5] is a classification method, which separates distinct sets of objects or observations according to the characteristic observation value of the research objects. In the view of pattern recognition, the concept of early warning can be comprehended and redefined. That is, the early-warning is to compare and distinguish the sample of unknown alarm degree with the one of known alarm degree, sequentially allocates new objects or observations into the previous category of early-warning mode. Usually alarm degree can be divided into several classes, such as no alarm, light alarm, medium alarm, heavy alarm, huge alarm etc. All crisis enterprise samples that have the same alarm degree constitute an early-warning mode set, the earlywarning mode sets of different alarm degree represent different early-warning mode category.
3 Parameter Calculation The enterprise finance crisis means that the enterprise loses an ability to compensate mature debt, which mainly includes technique failure and bankruptcy. The former indicates that the total property amount of the enterprise is more than the total liabilities amount. Because of its unreasonable financial condition, the enterprise can't immediately extinguish the debt, thus results to bankruptcy. The latter indicates the total property amount of the enterprise is less then the total liabilities amount, and leads to the enterprise bankrupt. Usually the enterprise management condition is divided into the crisis state and the non-crisis state two type, namely c=2. The sort of non-crisis enterprise (non-ST) is represented by ω1 , the sort of crisis enterprise(ST) is represented by ω 2 . Financial early-warning indexes contain 6 items, namely n=6. Through the parameter estimating, six financial indexes of 96 training samples in 2004 all obey multivariate normal distribution. Prior probability are respectively: The non-ST enterprise p (ω1 ) =0.9,
p(ω 2 ) =0.1, the value of the parameter ∑ 1 Condition probability density function p ( x | ω i ) are: ST enterprise
、∑ 、 μ 、μ 2
1
2
of
Crisis Early-Warning Model Based on Exponential Smoothing Forecasting
Σ1 =
Σ2 =
1.221 - 12.08
- 12.083 219.178
0.352 - 35.349
1.892 - 59.21
3.607 - 79.21
0.352 1.892
- 35.349 - 59.209
128.87 38.853
38.853 173.48
70.02 - 24.004
3.607 0.769
- 79.212 - 11.955
70.018 9.037
- 24.004 - 0.582
179.67 28
395
2.097 35.86
0.769 - 11.95
9.766 9.037 , μ1 = 11.35 - 0.582 13.09 8.001
28 22.81
0.343
- 9.007
- 0.908
- 0.881
- 4.101
2.633
1.059
- 9.007
666.02
- 33.82
- 14.429
139.02
- 171.61
66.44
- 0.908
- 33.82
69.193
13.69
13.384
41.557
- 0.881
- 14.43
13.692
34.37
71.83
21.133
- 4.101 2.633
139.02 - 171.611
13.384 41.557
71.835 21.133
1771.2 655.83
655.85 835.55
4.587
,μ2 =
4.891 - 39.43 - 29.51
Through calculation, discriminant function is
g 12 (x) = x T (W1 - W2 )x + (w 1 - w 2 ) T x + ω 10 - ω 20 ⎡ x1 ⎤ ⎢x ⎥ ⎢ 2⎥ ⎢ x3 ⎥ =⎢ ⎥ ⎢ x4 ⎥ ⎢ x5 ⎥ ⎢ ⎥ ⎣⎢ x6 ⎦⎥
T
⎡ 1.9950 ⎢- 0.0162 ⎢ ⎢ 0.0403 ⎢ ⎢ 0.0636 ⎢- 0.0030 ⎢ ⎣⎢ 0.0051
- 0.0164 - 0.0053 0.0009 - 0.0004 - 0.0025 0.00137
0.0403 0.0009 0.0031 - 0.0004 0.0032 - 0.0014
0.0635 - 0.0004 - 0.0004 0.0154 - 0.0030 0.0014
0.0051 ⎤ ⎡ x1 ⎤ ⎡ - 4.63 ⎤ 0.0014 ⎥ ⎢⎢ x2 ⎥⎥ ⎢ 0.506 ⎥ ⎥ ⎢ ⎥ - 0.0014 ⎥ ⎢ x3 ⎥ ⎢ - 0.249 ⎥ ⎥ ⎥⎢ ⎥ + ⎢ 0.0014 ⎥ ⎢ x4 ⎥ ⎢ - 0.299 ⎥ ⎢ ⎥ ⎢ ⎥ 0.0046 x5 0.294 ⎥ ⎥ ⎥⎢ ⎥ ⎢ - 0.0265 ⎦⎥ ⎣⎢ x6 ⎦⎥ ⎣⎢ 0.181 ⎦⎥
- 0.0030 - 0.0025 0.0032 - 0.0030 - 0.0054 0.0046
T
⎡ x1 ⎤ ⎢x ⎥ ⎢ 2⎥ ⎢ x3 ⎥ ⎢ ⎥ − 5.0851 ⎢ x4 ⎥ ⎢ x5 ⎥ ⎢ ⎥ ⎣⎢ x6 ⎦⎥
4 Sponsor Companies Crisis Early-Warning of Beijing 2008 Olympic Games 4.1 Sponsor Background Introduction of Beijing 2008 Olympic Games The main income of Beijing 2008 Olympic Games is going to depend on a sponsor. The sponsor plan of Beijing Olympic Games includes three layers of cooperation colleagues, sponsor companies and suppliers. In order to reduce the risk, BOCOG is very careful to select sponsor companies, and listed five measure standards, where qualification factors request to have a stronger economic strength, a good development
( )
Table 1. Financial indexes datum of ZTE Corporation unit % Time
parameter
Liquidity Ratio
˄year˅
Assert-liabilities Ratio˄%˅
Account
Ac-
Main
Net-
Receivable
count Pay-
Operation
asset Return
Turnover
able Turn-
Profit Ra-
Ratio
Ratio
over Ratio
tio˄%˅
˄%˅
2001
0
1.66
69.2
6.56
4.5
7.83
20.7
2002
1
2.09
56.2
8.87
4.23
6.11
19.8
2003
2
1.48
66.8
8.1
4.12
5.12
15.7
2004
3
1.44
66.5
8.30
4.23
4.69
14.8
2005
4
1.86
53.8
7.82
3.98
4.45
11.9
396
B. Tang and W. Qiu
foreground, a healthy financial condition, and have ample cash to pay the sponsor expenses in the future. 4.2 Crisis Early-Warning Calculation From 2001 to 2005, the six financial indexes datum of ZTE Corporation are listed in table 1. From 2001 to 2005, the six financial indexes datum of China Petrochemical Corporation are listed in table 2.
( )
Table 2. Financial indexes datum of China Petrochemical Corporation unit % Time
parameter
(year)
Liquidity Ratio
Assert-liabilities Ratio(%)
Account Receivable Turnover Ratio
Account Main Payable Operation Turnover Ra- Profit tio Ratio (%)
Net-asset Return Ratio(%)
2001
0
0.82
61.3
18.5
12.1
6.67
6.49
2002
1
0.83
58.5
23.
12.8
4.61
6.31
2003
2
0.82
53.7
11.8
13.1
5.62
5.98
2004
3
0.80
51.6
16.9
14.2
5.88
7.69
2005
4
0.81
52.7
26.7
19.3
7.02
11.6
From 2001 to 2005, the six financial indexes datum of XINHUA Metal Products CO., LTD. are listed in table 3.
(%)
Table 3. Financial indexes datum of XINHUA Metal Products CO., LTD. unit
time
parameter
Liquidity Ratio
˄year˅
Assert-
Account
Account
Main
Net-
liabilities
Receivable
Payable Turn-
Operation
asset Re-
Ratio˄%˅
Turnover Ra-
over Ratio
Profit Ratio
turn Ra-
˄%˅
tio
tio
˄%˅ 2001
1
5.503
30.71
3.432
24.654
9.06
10.56
2002
2
2.185
31.98
3.591
22.659
10.57
10.62
2003
3
1.817
33.53
3.291
25.1
9.03
10.02
2004
4
1.706
44.8
4.016
25.27
6.93
6.61
2005
5
1.794
38.7
5.203
26.33
4.39
5.87
Let smoothing coefficient
α =0.5 S (0) (1) S (0) ( 2 ) S (0) (3) be , , ,
original value of
the first period. By exponential smoothing formula and forecasting model, predicting result of six financial indexes of three enterprises in 2006, 2007, 2008 can be calculated. The results are shown in table 4, table5, table6 respectively.
Crisis Early-Warning Model Based on Exponential Smoothing Forecasting
397
Table 4. Prediction results of financial indexes of ZTE Corporation in 2006, 2007 and 2008 financial indexes
prediction results v˄6˅
prediction results v˄7˅
prediction results v˄8˅
(in 2006)
(in 2007)
(in 2008)
Liquidity Ratio
1.91
2.04
2.18
Assert-liabilities
49.6
43.5
37.1
Receivable
7.68
7.43
7.15
Account Payable Turn-
3.87
3.73
3.6
3.76
3.18
2.61
9.08
6.18
3.21
Ratio Account Turnover Ratio
over Ratio Main Operation Profit Ratio Net-asset Return Ratio
Table 5. Prediction results of financial indexes of China Petrochemical Corporation financial indexes
prediction results v˄6˅ (in 2006)
prediction results v˄7˅ (in 2007)
prediction results v˄8˅ (in 2008)
Liquidity Ratio
0.803
0.799
0.795
Assert-liabilities Ratio
50.8
49.5
48.3
Account
Receivable
31.0
37.1
43.4
Account Payable Turn-
22.2
25.8
29.5
7.75
8.65
9.58
14.1
17.0
20.1
Turnover Ratio
over Ratio Main Operation Profit Ratio Net-asset Return Ratio
Table 6. Prediction results of financial indexes of XINHUA Metal Products CO., LTD financial indexes
prediction results v˄6˅
prediction results v˄7˅
prediction results v˄8˅
(in 2006)
(in 2007)
(in 2008)
Liquidity Ratio
1.50
1.35
1.24
Assert-liabilities Ra-
41.3
42.4
43.4
Account
Receivable
6.08
7.26
8.63
Payable
27.4
28.7
30.1
Main Operation Profit
1.76
-1.44
-5.14
3.66
1.3
-1.36
tio
Turnover Ratio Account Turnover Ratio
Ratio Net-asset Return Ratio
398
B. Tang and W. Qiu
Substituting the financial indexes forecasting results of ZTE Corporation, China Petrochemical Corporation, XINHUA Metal Products CO., LTD. of 2006, 2007, 2008 in table 4,5,6 into minimum error rate discriminant function (17respectively, and obtain the g12 ( x ) value of three enterprises in three years, which are shown in table 7. Table 7. g12(x) value of ZTE Corporation, China Petrochemical Corporation and XINHUA Metal Products CO., LTD. in 2006,2007 and 2008 sponsor enterprise
g12(x) in 2006
g12(x) in 2007
g12(x) in 2008
ZTE Corporation
3.55
4.23
4.27
China Petrochemical Cor-
5.55
6.73
8.01
10.48
4.69
0.82
poration XINHUA Metal Products CO., LTD.
5 Conclusions If this method is used, we must pay attention to the followings: first, when earlywarning indexes are selected by the statistic technique, the indexes should have bigger difference between each set and smaller difference inside the set. Second, the selection of training sample of pattern recognition should be representative, and its distribution is as soon as possible equilibrium. Third, in this paper we discuss the classifying model based on pattern recognition under the condition that characteristic observation value x obeys multivariate normal distribution. If index variables obey other distribution, such as even distribution, гdistribution and βdistribution, the models can be solved similarly.
References [1] Altman, E.: Financial Ratios: Discriminant Analysis and the Prediction of Corporate Bankruptcy. Journal of Finance, 589–609 (spring, 1968) [2] Ohlson, J.S.: Financial Ratio and the Probabilistic Prediction of Bankruptcy. Journal of Accounting Research 27(2), 109–131 (1980) [3] Li, S., Liu, K.: New System and Algorithm of Exponential Smoothing Models of Time Series. Chinese Journal of Applied Probability and Statistics 21(4), 412–418 (2005) [4] Li, S., Liu, K.: Quadric Exponential Smoothing Model with Adapted Parameter and Its Applications. Systems Engineering-theory & Practice 20(2), 95–99 (2004) [5] Han, M., Cui, P.: A dynamic RBF neural network algorithm used in pattern recognition. Journal of Dalian University of Technology 12(9), 746–751 (2006)
Measuring Interdependency among Industrial Chains with Financial Data Jingchun Sun, Ye Fang, and Jing Luo School of Management, Xi’an Jiaotong University, Xi’an 710049, China
Abstract. Industrial chains exhibit strong interdependency within a large-scale resource-based enterprise group. When analyzing the independency effect in industrial chains, the interdependency of financial index is often ignored. In this paper, we will mainly focus on measuring the long-term interdependency effect by historical simulation and cointegration tests with financial data. A largescale coal-mining group is studied empirically as a case to explain the framework of independency analysis among the industrial chains. The results show that high degree of independency appears in production costs and marketing costs, and low degree appears in revenues and profits.
1 Introduction When analyzing industrial chains within a large-scale resource-based enterprise group, market, production, technology, financial etc. relationships are incorporated into the framework of interdependency [4]. That drastic fluctuation of the above mentioned factors from the upper stream industry could exert great influence on the downstream industries, and vice versa. The effects are mentioned in related researches [3]. Entropy theory and ecosystem stability theory are among the most popular approaches in interdependency research of industrial chains [7][8]. However, complexity of a large-scale resource-based enterprise group, which involving numerous factors and related to many domains, denies a universal approach in the real-life problems. In order to achieve an overall view of the problems based on unique structure of one industrial chain, the actual level of technology, and external conditions, researchers resorted to more qualitative approaches, to reveal the determinants for interdependency [8]. None of the above techniques, however, offers an accurate answer to the problems. Financial data from the monthly operation is provided with the possibility to explain how the industrial chains develop in a relatively holistic and stable way, but interdependency of financial index is often ignored. The reasons are two-fold. Firstly, the limited availability of historical data is frequently encountered in interdependency estimation, especially for production enterprises. Secondly, production process cannot be repeated many times to show the stable interdependency. Historical simulation approach is proposed to deal with the lack of data [2][5], which is generally simpler to implement than other methods. Monte Carlo method is generally used in simulation [1], but one serious methodological flaw of the method is the assumption of statistical independence of factors [6]. Hence, it is very important to Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 399–406, 2009. © Springer-Verlag Berlin Heidelberg 2009
400
J. Sun, Y. Fang, and J. Luo
develop historical simulation techniques that could deliver effective and efficient interdependency estimates. This study attempts to shed light on interdependency analysis of a coal-mining group corporation in an empirical way. Monte Carlo method is used to simulate the production cost, sales cost, revenue and profit based on historical data, and five industries, including coal mining, power generation, electrolytic aluminum, coal chemical products and building materials, are incorporated in the system. Then Granger causality tests on the simulation data can reveal the causality relationship among the industrial chains and the interdependency. This approach can effectively minimize errors resulted from analysis of insufficient source data.
2 Case Background and Source Data Diversification has been one critical strategy of the coal industry in China. In practice, coal industry is regarded the main basis of industrial chains, on which coal mining enterprises are able to tap into related market and further exploit new core competence [8]. According to general analysis of industrial chain extension, technology bears on inter-industry binding extent in a direct way: The heavier hi-tech departments weigh in one industrial chain, the greater influence those industries will exert on each other. But technology alone cannot assure coordination between industries. Inappropriate actualization of the strategy, such as diversifying domains blindly or setting foot in a wrong field may greatly jeopardize effects of industrial chains. Hence, working out influence factors and inherent laws of inter-industry systems within the large-scale coal mining enterprise group is conducive to disperse operational risk, reduce production cost and transaction cost, and expand the enterprises. As one of the top 520 state-owned large-scale key enterprises in China, YM Coal Corporation comprise four coalfields, including twelve production mines and one open-pit coal mine. In the year 2005, YM corporation’s raw coal output is more than sixteen million tons, and gross output value is nearly six billion RMB Yuan. The fact that YM Coal Corporation’s industrial chain involves several industries, each of which includes numerous influencing factors, makes comprehensive and quantitative analysis almost impossible. We took seventy-two groups of financial data as research subject to construct time series multiplication model, by which we can further carry out simulation and Granger causality test, so that the interdependency of industrial chains will be obtained. We focus on the production cost in the downstream involving four industries of coal mining, and other financial perspectives, such as sales cost, revenue and profit.
3 Multiplication Model of Time Series and Historical Simulation According to principle of multiplication models, aforementioned data will be further decomposed into three ingredients: (1) Trend factor, which embodies the general developing trend despite the fluctuations caused by external factors. (2) Periodical factor. Financial data of coal mining industry is greatly influenced by seasonal needs and
Measuring Interdependency among Industrial Chains with Financial Data
401
yearly financial regulations that a company must obey. So periodical factor could be described in a mathematical function that regresses annually. (3) Random factor, which represents all other factors that have impact on the model respectively. Random factor’s effect is minor compared with the former two factors. Table 1 shows all primary variables that appear in the model. Table 1. Variables involved in the model Variable
Symbol
Type
Appears in
Original data of month t
Yt
Time Series
Multiplication Model
Trend factor of month t
Gt
Time Series
Multiplication Model
Periodical factor of month t
Pt
Time Series
Multiplication Model
Random factor of month t
Et
Time Series
Multiplication Model
Year, Month
t, j
Subscripted Variable
Trend Factor
Random Variable
Trend Factor
White Noise
η
Polynomial Coefficient
αt
Regression Coefficient
Periodical Factor
Monthly weight
βt
Regression Coefficient
Periodical Factor
regression coefficients
θ
Regression Coefficient
Periodical Factor
Discrimination variables of months
Cn,t
0-1 Variable
Periodical Factor
Matrix
Periodical Factor
Vector of
Coefficient matrix of periodical factors X Least squares estimation for Pt
P ^t
Time Series
Periodical Factor
Vector form of Pt
P
Coefficients Vector
Periodical Factor
Vector form of μT
u
Coefficients Vector
Periodical Factor
To solve this problem, a multiplication model is employed, which has such a basic form:
Yt
Gt < Pt < Et
(1)
t (Time) could be expressed in terms of i and j :
t = (i − 1) × 12 + j
,
(2)
i=1, 2, 3,…,N, N represents numbers of years observed j=1, 2,…, 12. Moving average of the actual financial data is used to substitute the trend factor Pt as shown in formulae (3)
,
:
Gt + 6 =
11 1 11 (∑ Yt + j + ∑ Yt +1+ j ) 24 j = 0 j =0
t = 1, 2,3,..., T ...,12
(3)
The nature of (3) is an averaging digital filter that eliminates periodical components from Gt. To further quantify the trend factor Gt, many different functions have been
402
J. Sun, Y. Fang, and J. Luo
attempted to see which one fits the actual numbers best. In the YM coal corporation case, polynomial forms are superior to logarithmic forms and exponential ones, so we choose to adopt this form. k
Gt
¦ b B.M.
W
MW
W
MS
Alum. -> Chem.
W
B.M. -> Chem.
W
MW
B.M. -> Alum.
W
W
W
MS
Power -> Chem.
W
MS
Power -> B.M.
Alum. -> B.M.
W
W
W
MS
Coal -> Chem.
W
S
W
Coal -> B.M.
W
MS
Coal -> Alum.
W
MS
Chem. -> Coal
W
MS
W
MW
Alum. -> Coal
W
MW
W
MW
Chem. -> Alum.
W
S
S
S
Coal -> Power
Sales Prod. Rev. Cost Cost S
S
S
W
W
W
W
W
MW
W
W
Power -> Alum.
W
W
W
W
MS
Power -> Coal
MW
W
W
W
W
MS
Chem. -> Power
W
W
W
W
W
MS
B.M. -> Power
W
W
W
W
B.M. -> Coal
W
MW
W
W
Alum. -> Power
W
W
W
W
MW
W
W
W
Level S indicates strong interdependency that is greater than 0.6 Level MS medium strong interdependency that is between 0.4 and 0.6 Level MW medium weak interdependency that is between 0.2 and 0.4 Level W weak interdependency that is less than 0.2
404
J. Sun, Y. Fang, and J. Luo
5 Interdependency among Industrial Chains To explain how the industrial chains exert impact on each other in the sense of statistical significance based on the 10,000 groups of data in all four financial perspectives (production cost, sale cost, revenue, and profit), we define the measurement of interdependency between two industrial chains on the basis of cointegration relationship: the possibility to pass the Granger causality test. It can be easily derived that the possibility belongs to the interval [0, 1]. Here are the principles of classification (which is to provide more concise analysis. 5.1 Horizontal Analysis By taking the horizontal perspective to interpret the results, the possible interactions and its stability are examined within industrial chains of YM Coal Corporation.
Fig. 2. Causality relationships among industrial chains
By analyzing the system horizontally, we can come up with insights of YM Coal Corporation, as well as useful suggestions for its future development: 1. Profit (A): Granger causality relationships only exist between coal and power industries, but not significant. Power industry should be expanded as well as the coal mining industry. Since we are optimistic toward the prospect of the coal mining industry, the main focus should be placed on enlarging its scale. 2. Production cost (B): As Granger causality tests show, a positive feedback loop is formed by coal mining, aluminum and chemicals. Minor decrease of production cost in any of the three industries may lead to much more savings in other two industries, which may benefit the system greatly. Therefore, it will be most effective to reduce cost related to production, investment inside this loop. 3. Revenue(C): Only aluminum and chemicals are correlated significantly for this entry. Since the aluminum-chemicals mutual interaction are verified from 3 entries (B, C, D), it merits attention for the study on integration effect. 4. Sales cost (D): Close causality relationships are widespread under this category. As the figure shows, the coal mining industry should be responsible for the bindings because of its universal impact on other parts. 5.2 Vertical Analysis In Fig.3, BM is short for building materials, and Al for aluminum, Ch for Chemicals, Pw for Power, Cl for Coal. It is obvious that industrial chains in cost-related entries
Measuring Interdependency among Industrial Chains with Financial Data
405
(production and sales) have better chances to pass Granger causality test than those in the other two entries (profit and revenue), which indicates closer relationships among those industries in production and sales cost. One possible explanation is that, strong interdependency of costing indices is induced by high controllability of material flows within industrial chains. For instance, wastes from upstream industries (coal mining and processing) may easily turn into valuable raw materials for the downstream industries (building materials and chemicals), so production costs are saved. Revenue and profit, however, are influenced by more factors, including sales, marketing and operation, which are beyond the range of industrial chains, and may as well increase uncertainty of the system and further upset their Granger causalities.
Fig. 3. Cumulative probability to pass Granger causality test
6 Conclusions To sum up, industry integration is the trend that modern coal mining industry follows. Recycling of materials, reducing of pollution emissions, as well as cost savings are direct benefits from integrated industrial chains. Since extension modes of industrial chains are similar, the methodology can be easily applied to other cases. It is known that there are many newly invested projects in large-scale resource-based enterprise groups in China, meaning the short time series of financial data from the monthly operation will appear frequently when the methodology is applied, which will be involved in our future research.
References [1] Dubi, A.: Monte Carlo Applications in Systems Engineering. John Wiley & Sons. Inc., Chichester (2000) [2] Costello, A.: Comparison of historically simulated VaR: Evidence from oil prices. Energy Economics 30(5), 2154–2166 (2008) [3] Sucky, E.: The bullwhip effect in supply chains—An overestimated problem? International Journal of Production Economics (2008) (in press, corrected proof, available online)
406
J. Sun, Y. Fang, and J. Luo
[4] Dali, H.: The Connection of Enterprise Clusters Competitive Advantages with Industry Interrelation and Synergy. Chinese Journal of Management 3(6), 709–714 (2006) [5] Cabedo, J.D., Moya, I.: Estimating oil price ‘Value at Risk’ using the historical simulation approach. Energy Economics 25(3), 239–253 (2003) [6] van Dorp, J.R., Duffey, M.R.: Statistical dependence in risk analysis for project networks using Monte Carlo methods. International Journal of Production Economics 58(1), 17–29 (1999) [7] Templet, P.H.: Diversity and other Emergent Properties of Industrial Economies. Progress in Industrial Ecology 1, 24–38 (2004) [8] Chun-you, W., Hua, D., Ning, D.: Review on the Study of the Stability of Industrial Ecosystem. China Population, Resources and Environment 15(5), 20–25 (2005) (in Chinese)
Multi-objective Economic Early Warning and Economic ∗ Risk Measure Guihuan Zheng** and Jue Wang Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China Tel.: 86-10-62651377; Fax: 86-10-62541823
[email protected] Abstract. The leading indicators approach is the most prominent and widely used method for economic early warning. However, only the single target’ analysis is focused. In fact, there is more to any economy than a single overarching business target. In this paper, the multi-dimension business climate index approach is introduced to carry out multi-objective economic early warning and measure economic risk. First, the multi-dimension coincident index is constructed directly in the unified framework based on FHLR method. Second, vector space approach and probability analysis of multi-dimension impact point are introduced to provide early warning analysis of multi-objective and measure economic risk. Then, it is applied to research Chinese two-object economic system. The empirical results show that multi-dimension climate index approach may provide a coherent and evolving outlook for multi-objective early warning and build a consistent track record of predictions. Keywords: Business climate, Multi-dimension index, Multi-dimension analysis.
1 Introduction In early warning analysis, the leading indicators approach is the most prominent and widely used method for business cycle monitoring, as well as short term forecasting. However, the present leading indicators approach only can establish the single index for single objective early warning (Klein 1989; Lahiri and Moore 1991; Diebold and Rudebusch 1996; Forni et al. 2000 and 2005; Achuthan and Banerji 2004; Marcellino 2006; Carriero and Marcellino 2007), which limits its application. Actually, the economy is a highly complex system, and there is more to any economy than a single overarching business cycle for only one objective. Therefore, it is important and necessary to research many cycles for multi-objective economic early warning. Following these multi-objective economic early warning can help foresee when the behavior of the economy will depart from the norm propounded by the pundits. Consequently, economic risk may be measured via the results of multi-objective early warning. ∗
Supported by NSFC (70221001), CAS Special Funds for President Award Gainer, CAS Knowledge Innovation Program and People’s Bank of China. ** The corresponding author. Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 407–413, 2009. © Springer-Verlag Berlin Heidelberg 2009
408
G. Zheng and J. Wang
Economic Cycle Research Institute provided the idea of multi-cycles and multiindex, that is Economic Cycle Cube (ECC) (Banerji and Hiris 2001; Achuthan and Banerji 2004). However, ECC still uses the single leading index method to develop the early warning analysis of each dimension. It is still not the direct multi-objective research. Therefore, it is urgent to research on multi-objective economic early warning based on the traditional leading indicators approach for revealing the dynamic and complex system that is otherwise hidden form view. We should pull all of these indices together into an easy-to-use framework for capturing the nuances of the economy’s gyrations. The challenge of multi-objective early warning approach is to combine these multiple cycles (or objectives) into one coherent outlook. How can all of this be captured and monitored at one time? In this paper, the multi-dimension climate index approach, including the multidimension index construction and climate analysis, is introduced to realize multiobjective early warning and measure economic risk. The empirical results show that the multi-dimension climate index approach may provide a coherent and evolving outlook for multi-objective early warning and build a consistent track record of predictions.
2 The Multi-dimension Climate Index Approach 2.1 Multi-dimension Index Construction If there are some important monitor targets for the issue, only to construct composite index for single target and analysis them respectively is not sufficient because of the lack of consistency and comparability. Therefore, the multi-dimension index should be constructed in the integral frame simultaneously, which should be the base of the multi-objective economic early warning. The FHLR approach (Forni, Hallin, Lippi, Reichlin, 2000 and 2004), based on general dynamic-factor model (GDFM), is one method to construct composite index from the model point of view. Compared with the traditional non-model based method, there is no need to distinguish a priori between leading and coincident variable, all the necessary variables are studied to composite index directly (Marcellino, 2006). In fact, many cycles (i.e. multi-dimension coincident index) may be picked up directly in the unified framework via FHLR approach (Han et al. 2008). Furthermore, it proposed the use of dynamic principal components to better capture the dynamics of the model since the estimated factors are combinations of lags, contemporaneous values and leads of the single series. Suppose there are N variables taken into consideration, the values of which are denoted as {xit , i ∈ N } . The GDFM model is represented as:
xit = bi1 ( L)u1t + bi 2 ( L)u 2t + … + biq ( L)uqt + ξ it where L standing for the lag operator,
(1)
(u1t , u2t ,L, uqt ) / is orthonormal white noise
standing for the common shocks of the model.
χ it = xit − ξ it and ξ it
common component and the idiosyncratic component of
are called
xit , respectively. The details
Multi-objective Economic Early Warning and Economic Risk Measure
409
for the assumptions, identification and estimation for model are mentioned by Forni et al. (2000, 2004). The process of multi-dimension index construction is as follows: (i) Construction of a panel pooling several important and necessary economic indicators, including the reference variables of multi-targets. (ii) Identification of the parameter q for model (1). (iii) Estimation of the common component for each indicator in the unified framework, and getting the common component for reference variables of multitargets. 2.2 Economic Risk Measure For the multi-dimension index, the multi-dimension economic climate analysis should be introduced to measure economic risk. In this paper, vector space approach and probability analysis of multi-dimension impact point are introduced to analyze the dynamics of multi-dimension index and depict the economic risk. (1) Vector space approach In this paper, vector space approach is introduced to measure the deviation of current state to the average state of multi-dimensional index in order to getting its periodicity. In vector space, the distance and angle is the common tool to study its characteristics. Therefore, the deviation is described by distance and angle of vector space for depicting the risk of economic system. The deviation of distance represents the level shift and the deviation of angel represents the direction shift. It should be noted that the distance here is defined as the statistic distance rather than Euclidean distance. The statistic distance between two p-dimensional vectors, x = ( x1 , x2 , L , x p ) and y = ( y1 , y2 , L , y p ) , is defined as:
d ( x , y) = ( x − y)′ A( x − y) where A = S −1 , S is the covariance matrix for x. The angle is defined as: x1 y1 + x2 y 2 + L x p y p α = arccos( ) 2 2 x1 + x2 + L + x 2p ∗ y12 + y 22 + L + y 2p
(2)
(3)
(2) Probability analysis of multi-dimension impact point To make business warning signal system is one method in the classical prosperity warning analysis (Dong et al., 1998). The warning system uses some sensitive indicators that reflect the state of economic situation. After setting some threshold values for the status of these indicators, these indicators, just like the traffic lights in red, yellow and green, may give different signals in different economic situations. The threshold values for each status of a single indicator are set by probability calculation of impact point, combined with economic theory and historical experience. However, calculating threshold values for each single variable and combining into one composite index only represents the independent status of the variable itself or single monitor target. It is not enough to analyze the multi-dimension index since the dynamic interrelation cannot be considered. However, the probability analysis of multi-dimension impact point is to calculate the multi-dimension threshold for the multi-dimension
410
G. Zheng and J. Wang
index directly, which is proposed to overcome this disadvantage and measure economic risk from multi-objective. We cannot get the unique solution when to calculate the multi-dimension threshold value according to the probability of p. Thus, we first calculate the probability of impact point according to some multi-dimension boundaries decided by some equilibrium relations; then calculate the multi-dimension threshold value according to one probability of p by interpolation method. For a multi-dimension index, x = ( x1 , x2 , L, x p ) , assume xi ∈ N ( μ i , σ i 2 ) . (i) Normalization of xi :
ki = ( xi − μi ) / σ i
(4)
where μ i is the mean and σ i is the stand deviation for xi . Then ki ∈ N (0,1) . (ii) Calculation the probability of impact point according to one multi-dimension boundary: For the normalized time series ki , the probability for − 4 ≤ ki ≤ 4 is greater than 99%. Therefore, suppose the values of kij are from -4 to 4 incremental by 0.1, the probability for each j is defined as: p j = p{x ∈ (−∞, k1 j ⋅ σ 1 + μ1 ) × (−∞, k2 j ⋅ σ 2 + μ 2 ) × L × (−∞, k pj ⋅ σ p + μ p )}
(5)
(iii) Calculation the multi-dimension threshold value according to the probability of p. If there is a p j = p in step (ii), the threshold value is:
( x1 , x2 ,L, x p ) = (k1 j ⋅ σ 1 + μ1 , k2 j ⋅ σ 2 + μ2 ,L, k pj ⋅ σ p + μ p ) If not, we can find p j < p < p j +1 , suppose the threshold value for U and V respectively: U = (u1 , u2 ,L, u p ) = (k1 j ⋅ σ 1 + μ1 , k2 j ⋅ σ 2 + μ 2 ,L, k pj ⋅ σ p + μ p ) V = (v1 , v2 ,L, v p ) = (k1, j +1 ⋅ σ 1 + μ1 , k2, j +1 ⋅ σ 2 + μ 2 ,L, k p , j +1 ⋅ σ p + μ p )
(6)
p j and p j +1 are
(7)
The threshold value for p , ( x1 , x2 , L , x p ) , is calculated by interpolation method:
xi =
p − pj
p j +1 − p j
ui +
p j +1 − p
p j +1 − p j
vi
(8)
Generally, the threshold values with the probability of 0.1, 0.25, 0.75 and 0.9 should be calculated to divide the multi-dimension index into five signals: light blue, blue, green, yellow and red lamps (Dong et al., 1998).
3 Chinese Multi-objective Economic Early Warning 3.1 Indicators and Data
In this section, Chinese macro economic targets, growth and inflation, are considered. In fact, the unemployment should be included too. However, it must be omitted because of the lack of data. First, the related reference cycles are decided corresponding to two targets: value added of industry - the current growth rate and general consumer price index (CPI).
Multi-objective Economic Early Warning and Economic Risk Measure
411
Second, the relative important economic and financial indicators should be decided as follows: Value added of industry1; General consumer price index (CPI); Investment in fixed asset-investment completed2; Total retail sales of consumer goods1; The ratio of exports and imports; Exports1; Imports1; Sales ratio of industrial products1; Total government expenditure1; Total government revenue1; Total deposits3; Foreign exchange reserves at the month-end3; Output of crude steel1; Total energy production1; Electricity1; The number of new start project2; Retail price index1. (Note: 1 the current growth rate; 2 the accumulated growth rate; 3 the current growth rate of balance at period-end.) 3.2 Empirical Results
Chinese multi-dimension index is constructed over the period from Jan. 1999 to Dec. 2007 via FHLR method. For vector space approach, the distance and angle of multidimension index is calculated. Seen from Fig.1 and Fig.2, the state of two-dimension system departures from the mean very far from the level since the distance locates on the much high position; however, its fluctuation of direction tends to more stable because the angle wanders in the low level. 4
3.5
3.5
3
3
2.5 2.5
2 2
1.5
1.5 1
1
0.5
0.5
0
0
20
40
60
80
100
Fig. 1. The distance series
120
0 0
20
40
60
80
100
120
Fig. 2. The angle series
For the probability analysis of multi-dimension impact points, the relative threshold value calculated by the method introduced in Section 2.2 may be seen from Table 1, and the signal results over the period of Jan. – Dec. 2007 may be seen from Table 2. These signal results show the whole situation for this two-dimension system. Generally, the threshold values should be rectified according to the economic theory and historical experience, which is not covered in this paper. Table 1. The threshold values Probability 10% 25% 75% 90%
Threshold values Value added of industry 110.2719 112.2745 117.1960 119.3343
CPI 99.0985 100.3121 103.2947 104.5905
412
G. Zheng and J. Wang Table 2. The signal results
Signal Time Signal Time
○ 2007-01 ● 2007-07
○ 2007-02 ● 2007-08
◎
2007-03 ● 2007-09
※:light blue;⊙:blue;○:Green;◎:yellow;●:red
◎
2007-04 ● 2007-10
◎
2007-05 ● 2007-11
◎
2007-06 ● 2007-12
4 Conclusions In macro economy system, there are many important sub-systems needed to be researched, for example the three driving sub-system including investment, consumption, trade, and the financial sub-system including currency, credit, interest rate. Therefore, all of these problems should be resolved on the framework of multi-objective economic early warning via multi-dimension business climate index approach. It also may be used to measure economic risk. In this paper, the multidimension business climate index approach is introduced which is the great improvement in the research work of business climate since only the single index approach was covered before. However, only multi-dimension coincident index is constructed in this paper, the multi-dimension leading index construction should be researched further. Moreover, the multi-dimension analysis for coincident index and leading index is also an important issue for multi-objective early warning.
References [1] Achuthan, L., Banerji, A.: Beating the Business Cycle – How to Predict and Profit from Turning Points in the Economy. Doubleday, a division of Random House, Inc. (2004) [2] Diebold, F.X., Rudebusch, G.D.: Measuring business cycles: a modern perspective. The Review of Economics and Statistics 78, 67–77 (1996) [3] Hamilton, J.D.: A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 57, 357–384 (1989) [4] Klein, P.A.: Analyzing Modern Business Cycles: Essays Honoring Geoffrey H. Moore. M.E. Sharpe, Inc., USA (1989) [5] Banerji, A., Hiris, L.: A multidimensional framework for measuring business cycles. International Journal of Forecasting 17, 333–348 (2001) [6] Bonett, D.G., Price, R.M.: Statistical inference for generalized Yule coefficients in 2*2 contingency tables. Sociological Methods & Research 35, 429–446 (2007) [7] Einenberger, H.: Evaluation and analysis of similarity measures for content-based visual information retrieval. Multimedia Systems 12, 71–87 (2006) [8] Dong, W.Q., Gao, T.M., Jiang, S.Z., Chen, L.: Analysis and Forecasting Methods of Business Cycle. JiLin University Press, Changchun (1998) [9] Forni, M., Hallin, M., Lippi, M., Reichlin, L.: The Generalized Dynamic Factor Model: Identification and Estimation. Review of Economics and Statistics 82, 540–554 (2000)
Multi-objective Economic Early Warning and Economic Risk Measure
413
[10] Forni, M., Hallin, M., Lippi, M., Reichlin, L.: The Generalized Dynamic Factor Model: Consistency and Rates. Journal of Econometrics 119, 231–255 (2004) [11] Han, A., Zheng, G.H., Wang, S.Y.: The generalized dynamic factor model with an application to coincident index. Systems Engineering - Theory& Practice (in press, 2009) [12] Lahiri, K., Moore, G.H.: Leading Economic Indicators: New Approaches and Forecasting Records. Cambridge University Press, USA (1991)
An Analysis on Financial Crisis Prediction of Listed Companies in China’s Manufacturing Industries Based on Logistic Regression and Bayes Wenhua Yu1,2, Hao Gong1, Yuanfu Li2, and Yan Yue3 1
Commercial College, Chengdu University of Technology, Chengdu, Sichuan 610051, China 2 School of Economics and Management, Southwest Jiaotong University, Chengdu, Sichuan 610031, China 3 College of network education, Chengdu University of Technology, Chengdu, Sichuan 610059, China
[email protected] Abstract. In this paper, some listed companies in China’s manufacturing industries are taken as the research objects, and the ST companies’ financial indicators in 1~5 years before the occurrence of their financial crisis are collected. On the other hand, non-ST companies are selected as samples and then empirically analyzed by Logistic regression; the Logistic regression model is established to forecast financial crisis, and the prediction capacity of forecasting financial crisis in 1 5 years ahead of their occurrence are summed up. According to the established model, by using Bayes’ Theorem, the financial crisis probabilities of listed companies in China’s manufacturing industries in the next years are amended.
~
Keywords: Financial crisis prediction; Logistic regression; Bayes’ Theorem.
1 Introduction The financial crisis is also defined as the "financial distress", among which the most serious situation is "enterprise bankruptcy". Enterprise bankruptcy triggered by the financial crisis is actually a breach of contract, so the financial crisis can be referred to as "the risk of default." With the spread of global financial crisis recently, the market environment is undergoing increasingly fierce changes. Specifically speaking, the competition among enterprises is becoming fiercer and fiercer, and the financial risk in enterprises is increasing. Therefore, studying and forecasting the financial crisis of the listed companies will be of important practical significance to protect the interest of investors and creditors, prevent the occurrence of financial crisis, and help government departments monitor the quality of listed companies and the risk in stock market. At present, there are three types of methods to research financial crisis: (1)single variable discriminant analysis[1]; (2)multivariate statistical analysis methods, including linear probability model, Probit model, and Multivariate Discriminant Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 414–419, 2009. © Springer-Verlag Berlin Heidelberg 2009
An Analysis on Financial Crisis Prediction of Listed Companies
415
Analysis (MDA) and so on[2-8], among which the "Zeta" discrimination model established by Altman in 1968 is relatively more well-known[4]; (3)The artificial neural network based on information science [9-12] .According to many studies, the prediction efficiency of multi-variable model is significantly higher than that of the singlevariable model, but until now there haven’t been any evidences to prove that the prediction efficiency of neural network method is higher than that of the multiple statistical methods. Moreover, due to the differences in selecting samples and financial indicators, it’s rather difficult to compare the prediction efficiency of each model. This study has the following two characteristics: (1) Logistic regression is employed to study the prediction of financial crisis and analyzes the prediction efficiency of forecasting financial crisis in five years ahead of its occurrence. (2) On the basis of Logistic regression analysis, this paper uses Bayes’ Theorem to calculate the Posterior probability, and revises anterior probability which is calculated by Logistic regression model, and finally, compares the prediction efficiencies of the two methods.
2 Logistic Regression Model Logistic regression model is a non-linear regression model. The predictive value of variables is between 0 and 1, so Logistic regression model is actually derived from general linear multiple regression model. However, due to the fact that its error terms are subject to binomial distribution instead of normal distribution, so maximum likelihood method is employed for parameter estimation while fitting. "Likelihood" is used to express the function value of likelihood function: the greater the likelihood function value is, the better the fitting is. In researches, "-2Log Likelihood" is often used to describe the extent of fitting: the smaller its value is, the better the fitting is. Logistic regression model is expressed as the following formula:
p=
1 1+ e
− ( a + b1 X 1 +...+ bm X m )
(1)
In this formula, a is a constant, while bi is the coefficient of Logistic regression. In this paper, such act means that the independent variables will be substituted in the Logistic regression equation. 0.5 is selected as the threshold of probability while using the model to conduct discriminant classification. The object will be regarded as a company with financial crisis if p is more than 0.5; otherwise the object is a company in normal financial situation. This paper constructs Logistic Regression model with five financial indicators (X1: Earnings per share, X2: Net profit rate of operating income, X3: Net profit rate of current assets, X4: Return on net assets, and X5: Turnover rate of total assets) [12]. For example, according to the sample data in the 2 year before the outbreak of financial crisis, the following result is obtained: -2Log Likelihood=38.68. The equation can be expressed as:
p=
1 1+ e
− (1 . 09 + 0 . 32 X 1 + 0 . 44 X 2 − 34 . 3 X 3 + 1 . 94 X 4 − 2 . 8 X 5 )
416
W. Yu et al.
In accordance with the Logistic regression equation and the optimal discriminant point of 0.5, the original data in the year before financial crisis is substituted back in the equation for further discriminant, and the results are shown in Table 2. In the year before financial crisis, 2 of 45 companies which then suffer from financial crisis are judged by mistake, so the Error Type is 4.44%; 0 of the 50 companies which don’t accounts for 0%. In genundergo financial crisis are misjudged, so the Error Type eral, 2 of the 90 companies are misjudged, so the misjudgment rate is 2.11%. Similarly, the Logistic regression can be used to forecast the companies’ financial situation in 2~5 years before the occurrence of financial crisis.
Ⅰ
Ⅱ
3 The Combined Application of Bayes’ Theorem and Logistic Regression 3.1 Bayes’ Theorem Bayes’ Theorem can be expressed as Theorem 1.1: [13] Theorem 1.1: If N1, N2, … are mutually exclusive events, and ∞
U Ni = Ω
i =1
P( N i ) > 0,
i = 1, 2, L
Then for any event A, we will have: P ( N i | A) =
P( A | N i )P(N i )
,
∞
∑ P(N j =1
i = 1,2,L
(2)
j )P ( A | N j )
3.2 The Combined Application of Bayesian Theorem and Logistic Regression— —Empirical Analysis Take t-2, for example. Assuming that the major financial indicators of a manufacturing enterprise are as follows: X8 =-0.28, X10 = -0.09, X13=-0.05, X15=-0.1, X25=0.23. Substituting the indicators into the Logistic regression model, we have:
p=
1 1+ e
− (1.09 + 0.32 X 1 + 0.44 X 2 − 34.3 X 3 +1.94 X 4 − 2.8 X 5 )
= 86%
The results indicate that the probability of this company breaking out financial crisis 2 years later is 86%, while the probability of not breaking out financial crisis is: 1p = 1-86%=14% Table 1 shows the results of empirical analysis of 2 years before financial crisis: the accuracy of forecasting the companies with financial distress is 98%, and the Error Type accounts for 2%; The accuracy of forecasting the companies without fiaccounts for 6.52%. nancial distress is 93.48%, while the Error Type
。
Ⅰ
Ⅱ
An Analysis on Financial Crisis Prediction of Listed Companies
417
Table 1. Logistic regression model’s forecasting results in 2 years before the outbreak of financial crisis
Companies Companies trapped in financial crisis Companies in normal financial situation
Logistic regression model predicts companies’ being trapped in financial crisis
Logistic regression model predicts companies’ being in normal situation
Total
98.00%
2.00%
100%
6.52%
93.48%
100%
Combined with Table 1, let be: Ni denotes if the company breaks out financial crisis 2 years later. N1= company breaks out financial crisis 2 years later. N2= company does not break out financial crisis 2 years later. Ai denotes Logistic regression model forecasts whether the company will break out financial crisis or not 2 years later. A1 = Logistic regression model forecasts the company will break out financial crisis. A2 = Logistic regression model forecasts the company will not break out financial crisis. Then: P(N1)=86%, P(N2)=14%, P(A1|N1)=98%, P(A2|N1)=2%, P(A2|N2)=93.48%, P(A1|N2)=6.52%. According to the full probability formula, we have:
P ( A1 ) = P ( N 1 ) P ( A1 | N 1 ) + P ( N 2 ) P ( A1 | N 2 ) = 86% × 98% + 14% × 6.52% = 93.41%
P( A2 ) = P( N 1 ) P( A2 | N 1 ) + P( N 2 ) P( A2 | N 2 ) = 86% × 2% + 14% × 93.48% = 14.81% The posterior probability can be computed by Bayes’ formula: P ( N 1 ) P ( A1 | N 1 ) P ( N 1 | A1 ) = P ( A1 )
=
86% × 98% = 90.23% 86% × 98% + 14% × 6.52%
P( N 2 | A1 ) = 1 − P( N1 | A1 ) = 1 − 90.23% = 9.77% The posterior probability computed by Bayesian formula shows: the probability of the company being trapped in financial crisis is 90.23%, while the probability of the company being in normal situation is 9.77%.Utilizing Bayes’ Theorem to calculate the posterior probability based on the results of Logistic regression can be used to test all the samples in the testing set, and then we can get the forecasting accuracy. Taking into account the fact that the loss resulted from misjudging the companies with financial crisis (Error Type ) is far greater than that from misjudging the companies in normal financial situation, this paper compares the respective percents of Error Type I resulted from misjudgements by the two methods: one is the logistic regression, and the other is the combined application of Logistic regression and Bayes’ Theorem.
Ⅰ
418
W. Yu et al.
Table 2 shows the forecasting results of the two methods: combining Bayes’ Theorem and Logistic regression can decrease the number of misjudged samples, and increase the accuracy rate of forecasting significantly. Table 2. Comparison of Error Type
Years
t-1 t-2 t-3 t-4 t-5
Ⅰ between the two estimating methods
The number of companies with financial crisis
The number of misjudged samples
Error Type %
Error Type %
45 46 39 47 38
2 3 10 14 18
4.44 6.52 25.64 29.79 47.37
0.00 2.00 15.22 34.88 37.50
Logistic regression
Combination of Bayes’ Theorem and Logistic Regression The number of Error Type misjudged % samples 0 2 4 9 12
0.00 4.35 10.26 19.15 31.58
4 Conclusion By carrying out an analysis on the financial situations of listed companies in China’s manufacturing industries, and establishing the forecasting model of financial crisis with Logistic regression, then combining with Bayes’ Theorem, this paper discussed the financial early warning model, the research shows: (1) The results of logistic regression shows that for the long-term forecasting (4~5years before the outbreak of financial crisis), the prediction effect of using the five financial indicators selected in this paper is not good enough, however, by carrying on the empirical analysis of a lot of samples, it is a more effective way to improve the forecast accuracy with the combination of logistic regression and Bayes’ Theorem to forecast the financial crisis. Therefore, the financial early warning model can be used for tracking and forecasting financial crisis more effectively, thus providing references for investors, financial institutions and supervisors of stock markets to analyze and research the financial situation of enterprises. (2)The occurrence of financial crisis of listed companies is based on a gradual process but not a sudden incident. The financial indicators of China’s listed companies contain the information that can be used to forecast financial crisis, in other words, the financial indicators can be used to establish prediction models. With the help of financial early warning model, leadership of a company can find out the warning signs as early as possible, take some forward-looking measures, and enhance internal control, thereby preventing financial crisis. In addition, financial early warning model can also provide references for investors to evaluate the performance of the leadership and the investment value of the company. Also, with the help of the financial early warning model, the government can take the initiative to coordinate various relationships in advance, perfect the overall allocation of resources, strictly control and check the financial allocation to the companies that are going to get into bankruptcy so as to reduce the loss of state-owned assets and increase in bankruptcy cost, thus achieving optimal allocation of resources.
An Analysis on Financial Crisis Prediction of Listed Companies
419
Acknowledgement This research was supported by: 1. Key Projects in the National Science & Technology Pillar Program in the Eleventh Five-year Plan Period (2008BAG07B05); 2. The research fund project of College of Business of Cheng Du University of Technology (2008QJ24, Sxyzc08-07); 3. Scientific Research Fund of Sichuan Provincial Education Department (08SB073).
References Beaver, W.: Alternative accounting measures: predictors of failure. Accounting Review 10, 113–122 (1968) Martin, Daniel: Early warning of bank failure: a logit regression approach. Journal of Banking and Finance 11, 249–276 (1977) Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. Journal of Finance 23, 189–209 (1968) Nong, W.S., Yi, L.X.: A Study of Models for Predicting Financial Distress in China’s Listed Companies. Economic Research Journal 6, 46–96 (2001) Ling, Z., Xin, Z.: Financial Distress Early Warning Based on MDA and ANN Technique. Systems Engineering 11, 49–56 (2005) Altman, Haldeman, Narayanan: ZETA analysis: A new model to identify bankruptcy risk of corporations. Journal of Banking and Finance 1, 29–54 (1977) Zavgren, C.V.: Assessing the vulnerability to failure of American industrial firms: a logistic analysis. Journal of business finance and accounting 12, 19–45 (1985) Yu, W.-h., Li, Y.-f., Min, F.: An Analysis on Financial Distress Prediction of Listed Companies in China’s Manufacturing Industries Based on Multivariate Discriminant Analysis and Logistic Regression. In: Proceeding of 2008 International Conference on Risk and Reliability Management, vol. 11, pp. 137–141 (2008) Altman, E.I., Marco, G., Varetto, F.: Corporate distress diagnosis: comparison using linear discriminate analysis and neural networks. Journal of Banking and Finance 18, 505–529 (1994) Coats, P.K., Fant, L.F.: Recognizing financial distress patterns using a neural network tool. Financial Management 11, 142–155 (1993) Trippi, R.R., Turban, E.: Neural networks in finance and investing. Irwin Professional Publishing, Chicago (1996) Yang, S.-e., Wang, L.-p.: Research on Financial Warning for Listed Companies by Using BP Neural Networks and Panel Data. Systems Engineering-theory & Practice 2, 61–67 (2007) Wei, Z.-s.: The theory of probability and statistics. China higher education press, Beijing (2002)
Comparative Analysis of VaR Estimation of Double Long-Memory GARCH Models: Empirical Analysis of China’s Stock Market* Guangxi Cao1, Jianping Guo1, and Lin Xu2 1
School of Economics and Management, Nanjing University of Information Science & Technology, Nanjing 210044, P. R. China 2 Department of Economy, Party College of Sichuan Province Committee of CCP, Sichuan 610072, P. R. China
Abstract. GARCH models are widely used to model the volatility of financial assets and measure VaR. Based on the characteristics of long-memory and lepkurtosis and fat tail of stock market return series, we compared the ability of double long-memory GARCH models with skewed student-t-distribution to compute VaR, through the empirical analysis of Shanghai Composite Index (SHCI) and Shenzhen Component Index (SZCI). The results show that the ARFIMA-HYGARCH model performance better than others, and at less than or equal to 2.5 percent of the level of VaR, double long-memory GARCH models have stronger ability to evaluate in-sample VaRs in long position than in short position while there is a diametrically opposite conclusion for ability of out-ofsample VaR forecast. Keywords: VaR; long-memory; ARFIMA; HYGARCH; skewed student-t distribution.
1 Introduction The VaR method has been accepted and used widely by various financial institutions and enterprises, such as banks, securities firms, insurance companies, fund management companies and trust companies all over the world, since it is scientific than variance method. There are two methods—parameters method and non-parametric method of estimating VaR. The parameters method is used widely in practice. The traditional parameters method of estimating VaR is through calculating expectations and variance of sequence with the assumption that the return rate sequence is subject to a particular distribution, such as the normal distribution used mostly. This type of approach is static parameters method, so it has too many defects to be used to study financial time series which has the characteristics of heteroscedasticity and cluster in statistical view. GARCH models are the most commonly volatility models used to reflect time-varying characteristics of the financial market and can effectively capture *
Supported by Philosophy and Social Science of Department of Education of Jiangsu Province (No: 8SJB7900020) and Research Startup Fund of Nanjing University of Information Science & Technology (No:SK20080204).
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 420 – 428, 2009. © Springer-Verlag Berlin Heidelberg 2009
Comparative Analysis of VaR Estimation of Double Long-Memory GARCH Models
421
the characteristics of heteroscedasticity and cluster. Therefore, using GARCH model to estimate VaR becomes a hot spot of risk research in recent years. Although the characteristics of heteroskedasticity and cluster of financial time series had been known for many years, but it has not been able to find a class of appropriate time-series model to reflect these characteristics until Engle (1982) put forward ARCH Model (Autoregressive conditional heteroskedasticity model) in 1982 and Bollerslev constructed GARCH model in 1986 (Bollerslev, 1986). Since then, there are many generalized GARCH models put forward, such as IGARCH (Engle and Bollerslev, 1986), EGARCH (Nelson 1991), PARCH (Ding, et al., 1993), and so on. Taking account of the long-memory characteristics of the fluctuations series (e.t. the variance series), Baillie, Bollerslev and Mikkelsen (1996) presented FIGARCH model. It is regrettable that GARCH type models above did not consider the impact of long-memory of series self to the features of cluster, asymmetry, leverage and longmemory of fluctuations. Baillie, Chung and Tieslau (1996) began to consider the long-memory’s impact to fluctuations by using of ARFIMA-GARCH model to analyze the long-memory characteristic of inflation rate of 10 industrialized countries included Japan and Ireland during the post-war. Recently, some scholars argued that there are significant long-memory characteristics in the return series of Shanghai and Shenzhen stock markets in China (Cao and Yao, 2008). Furthermore, the characteristic of double long-memory in china’s stock markets has been identified (Cao, 2008). Since the GARCH type models can portray the characteristics of dynamic changes of stock return series, and capture the cluster effect, asymmetry, GARCH models are used to measure the VaR in the field of financial risk management. Laurent and Peters (2002), Ricardo (2006) adopted GARCH model to forecast VaR. In China, in 2000 Fan evaluated the VaR of China’s stock market with normal distribution method (Fan, 2000). However, as further research being done, many scholars have found that most of China's stock market returns are not subject to the assumption of independent variance and often do not obey normal distribution, and believe that there are characteristics of lepkurtosis and fat tail. Therefore, in recent years the VaR models that can capture dynamic changes characterization of series are more focused on, and the student-t distribution, GED distribution are introduced as the assume distribution of GARCH type models. Gong and Chen (2005) made a comparative analysis of the accuracy of VaR estimated by GARCH type models with the assumption of normal distribution, student-t distribution, GED distribution. Researchs above on estimating VaR by using GARCH type models have three weaknesses: First, the characteristic of double long-memory is not considered in financial time series research on estimating VaR by using GARCH class of models; second, most literatures focus on estimate the VaR of financial time series using GARCH class of models with assume of normal, Student-t, and GED distributions. These distributions can not characterize lepkurtosis and fat tail characteristics. The skewed student-t distribution may be a better distribution assumption. Third, long and short positions of multi-asset management are not both taken into account. On the other words, the estimate accuracy of in-sample VaR and the forecast accuracy of outof-sample VaR are not considered at the same time. In this paper, the in-sample and out-of-sample VaR are calculated by appropriate double long-memory GARCH class of models for SHCI and SZCI respectively. The computation accuracy is compared among different models, such as ARFIMA-GARCH,
422
G. Cao, J. Guo, and L. Xu
ARFIMA-FIGARCH, ARFIMA-FIAPARCH, ARFIMA-HYGARCH, with skewed student-t innovations. The empirical results indicate that the accuracy both in-sample VaR estimated and out-of-sample VaR forecasted by ARFIMA-HYGARCH model with skewed student-t innovations are better than other models. The rest of the paper is organized as follows: In Section 2, the ARFIMAHYGARCH model, VaR computation model and test method are presented. In Section 3, the selected method and origin of data are described. In Section 4, the empirical analysis compare analysis of double long-memory GARCH type models with skewed student-t innovations are investigated. Conclusions are presented in Section 5.
2 Methodology The GARCH, FIGARCH and FIAPARCH models are traditional models. They can be found in many textbooks. And skewed student-t distribution can be found in Hansen (1994). So, in this section, we only introduce the ARFIMA-HYGARCH model and present the VaR computation and test methods. The other models such as ARFIMAGARCH, ARFIMA-FIGARCH, ARFIMA-FIAPARCH and so on, can be gained with similar way by using GARCH, FIGARCH and FIAPARCH instead of HYGARCH in ARFIMA-HYGARCH models. 2.1 ARFIMA(p1,d1,q1)-HYGARCH (p2,d2,q2) Model ARFIMA(p1,d1,q1)-HYGARCH (p2,d2,q2) model has the following form:
Φ ( L)(1 − L ) d1 ( x t − μ ) = Θ( L )ε t
(1)
ε t = σ t zt , σ t2 = Where
ω 1 − β ( L)
+ (1 −
(2)
α ( L)[1 + αˆ ((1 − L) − 1)] 2 )ε t . β ( L) d2
(3)
L is lag operator, d1 < 0.5 , μ is the unconditional mean of smooth time
{xt } and z t is the random values of independent identified distribution (i.i.d) with mean 0. Φ(L) and Θ(L) are autoregress operator with rank p1 and move average q 1 , that is Φ( L) = 1 − φ1 L − L − φ p L p and operator with rank series
1
1
Θ( L ) = 1 + θ 1 L + L + θ q2 Lq2 , and all eigenvalues of them are out of unit circularity. And
q d 2 ≥ 0 , αˆ ≥ 0 , α ( L) = ∑α i Li , 2
i =1
∞
there are (1 − L) d = ∑ ϕ j L j x , where
p2
β ( L) = ∑ β j L j . For arbitrary
d with d ≥ −1 ,
j =1
d = d1 or d 2 denotes long-memory parame-
j =0
ter , ϕ = j
k − 1 − d , j = 1,2,L , and k 1≤ k ≤ j
∏
ϕ 0 = 1 . Traditionally, equation (1) and (3) are
Comparative Analysis of VaR Estimation of Double Long-Memory GARCH Models
423
called conditional mean equation and conditional variance equation respectively, while equation (2) is called innovation or residual distribution. Series {xt } is long-memory smooth progress, i.e. it has long-memory characteristic or persistence characteristic, if
0 < d1 < 0.5 ; Series {xt } is short memory
smooth progress, i.e. it has short memory characteristic or anti-persistence characteristic, if − 0.5 < d 1 < 0 . Additionally, provided that d 2 > 0 , the amplitude of HYGARCH(p2,d2,q2) is δ (1) S = 1− (1 − αˆ ) . The FIGARCH and GARCH models correspond to αˆ = 1 and β (1) αˆ = 0 respectively. HYGARCH model can overcome some restrictions of FIGARCH model: The covariance of HYGARCH progress is smooth ( αˆ ≠ 1 ); The long-memory parameters d and amplitude parameters S of HYGARCH progress can be estimated respectively, and avoid the limitation S = 1 of FIGARCH progress;
①
②
When 0 < d 2 < 1 , with the increase of creases.
③
d 2 , the length of long-memory of the series in-
2.2 VaR Computation Model and Test Method Most researchers compute the VaR values of financial asset returns to measure the risk on long position when prices of these assets are downward. That is, these researchers assume investors have only long trading positions and what they concern is the depreciation amount of their portfolio. However, in fact we can invest through holding not only long but also short trading positions in today’s financial market. The one-step ahead VaR is computed with the result of estimated models and its assigned distribution. The one-step-ahead forecast of the conditional mean μˆ t and conditional variance
μˆ t
is computed conditional on past information. Under skewed
student-t distribution, the VaRs of α quantile (also called VaR level) for long and short trading position are computed as follows (Tang and Shieh, 2006): (4) VaR long = μˆ t − zα σˆ t ,
VaRshort = μˆ t + z1−α σˆ t ,
(5)
zα and z1−α denote the left and right quantile at α % for skewed student-t distribution. Under normal distribution or student-t distribution, zα = − z1−α . And it where
follows that the forecasted long VaR value is equal to the forecasted short VaR value in this case. But this conclusion does not exist under skewed student-t distribution for its asymmetry. The performance of VaR’s estimation is evaluated by computing their failure rate for the time series. The definition of failure rate is the proportion of the number of times the observations exceed the forecasted VaR to the number of all observations. The standard we use to judge the performance of VaR model is to assess the difference between the failure rate and the pre-specified VaR level α . If the failure rate is
424
G. Cao, J. Guo, and L. Xu
very close to the pre-specified VaR level, we would conclude that the VaR estimated very well, that is the model used to compute the VaR value is specified very well. In this paper, we adopt the Kupiec’s LR test (Kupiec, 1995) to test the effectiveness of VaR models. Denote the failure rate as f which is equal to the ratio of the number of observations exceeding VaR (N) to the number of total observations (T) and pre-specified VaR level as α .The statistic of Kupiec’s LR test is defined as follows:
LR = 2{log[ f N (1 − f ) T − N ] − log[α N (1 − α ) T − N ]},
(6)
Which is distributed as χ distribution with 1 level of freedom and is used to test the null hypothesis that the failure rate equals the pre-specified VaR level α . 2
3 Data On December 16, 1996, the policy of fluctuation range restrict on China’s stock market. Taking the further impact of this restriction into account, our paper considers the closing price of SHCI and SZCI from December 17, 1996 to April 20, 2006. All the data are derived from stock star data stream. Before the estimation of models, in order to gain smooth time series, we convert the stock price index to stock returns. The conversion formula is that rt = ln( pt / pt −1 ) , where rt and pt denote the return and closing price on date t respectively.
4 Compared Empirical Results The characteristic of double long-memory of China’s stock markets have been confirmed by Cao (Cao, 2008; Cao and Yao, 2008). Therefore, in this paper, it is reasonable that double long-memory GARCH models are selected to empirical analyze on China’s stock markets. Additionally, most researchers believe that GARCH(1,1) is sufficient to describe the conditional variance (Lamoureux and William, 1993; Hamilton, 1994). The p1 and q1 of ARFIMA(p1,d1,q1) are selected among
0 ≤ p ≤ 3 and
0 ≤ q ≤ 3 , based on some criteria: AIC , SIC, log likelihood values,
Q 2 -statistics on squared standardized residuals. The model which 2 has the lowest AIC and SIC or log likelihood values and passes the Q -test simul-
and Ljung-Box
taneously is adopt. In this paper, most computations were performed with GARCH4.0[16], where ∞
(1 − L) d = ∑ ϕ j L j x . In empirical analysis, infinite series are instead of the former j =0
1000 polynomials. The parameters estimation method is maximum qusi-likelihood estimation method.
Comparative Analysis of VaR Estimation of Double Long-Memory GARCH Models
425
4.1 In-Sample VaR Computations The computation results of the in-sample VaRs of SHCI and SZCI returns are presented in Table 1. Additionally, in order to compare double long-memory GARCH models with single long-memory GARCH model the results of ARFIMA(2,d1,0)GARCH(1,1) is also presented in Table 1. Table 1 contains the failure rates computed and the P-values of their corresponding Kupiec’s LR tests. Table 1. In-sample VaR calculated by ARFIMA(2,d1,0)-GARCH-type (1,d2,1) with skewed student-t distribution for the return series of SHCI and SZCI
D
SHCI return (%)
Long position
5.0 2.5 1.0 0.5 0.25
Short position
95.0 97.5 99.0 99.5 99.75
G
FIG
HYG
SZCI return FIAP
G
FIG
HYG
FIAP
0.053310 0.051533 0.053310 0.049311 0.051088 0.047090 0.049311 0.049311 (0.47575) (0.73985) (0.47575) (0.88059) (0.81333) (0.52256) (0.88059) (0.88059) 0.023101 0.021768 0.022657 0.022657 0.022657 0.022657 0.023545 0.022657 (0.55889) (0.31540) (0.46942) (0.46942) (0.46942) (0.46942) (0.65530) (0.46942) 0.006664 0.004887 0.006664 0.009329 0.006220 0.005331 0.006220 0.007108 (0.09036) (0.00680)* (0.09036) (0.74631) (0.05264) (0.0145)* (0.05264) (0.14585) 0.004443 0.003998 0.004443 0.004887 0.003110 0.003110 0.003110 0.0031097 (0.70224) (0.48496) (0.70224) (0.93903) (0.17154) (0.17154) (0.17154) (0.17154) 0.002666 0.002221 0.002666 0.002221 0.002666 0.002221 0.002221 0.0022212 (0.87640) (0.78716) (0.87640) (0.78716) (0.87640) (0.78716) (0.78716) (0.78716) 0.94758
0.94980
0.94891
0.94936
0.94713
0.95024
0.94847
0.94580
(0.60090) (0.96531) (0.81333) (0.88870) (0.53640) (0.95755) (0.73985) (0.36695) 0.97201
0.97512
0.97379
0.97423
0.97157
0.97335
0.97246
0.97290
(0.37288) (0.97036) (0.71508) (0.81674) (0.30731) (0.61875) (0.44688) (0.52904) 0.98978
0.99067
0.99067
0.99023
0.98712
0.99023
0.98889
0.98845
(0.91762) (0.74631) (0.74631) (0.91364) (0.18810) (0.91364) (0.60428) (0.47059) 0.99600
0.99645
0.99556
0.99556
0.99467
0.99600
0.99511
0.99511
(0.48496) (0.30486) (0.70224) (0.70224) (0.82568) (0.48496) (0.93903) (0.93903) 0.99689
0.99733
0.99689
0.99733
0.99689
0.99733
0.99689
0.99689
(0.57690) (0.87640) (0.57690) (0.87640) (0.57690) (0.87640) (0.57690) (0.57690)
Notes : G, FIG, HYG and FIAP represent GARCH, FIGARCH, HYGARCH and PIAPARCH model, respectively. The numbers in table 1 are VaR failure rates. The numbers in the parentheses are P-values of their corresponding Kupiec’s LR tests. The superscript, *, indicates the statistical significance at the 5%.
From the results of in-sample VaR computations in Table 1, we can arrive at the following conclusions: For two stock index returns, at the 5% statistical significance level, all Kupiec’s LR test values of in-sample VaR computed by ARFIMA
①
426
G. Cao, J. Guo, and L. Xu
(2,d1,0)-FIAPARCH(1,d2,1), ARFIMA(2,d1,0)-HYGARCH(1,d2,1) and ARFIMA (2,d1,0)-GARCH(1,1) models can not reject the null hypothesis with every VaR level of short or long position but the Kupiec’s LR test values of in-sample VaR computed by ARFIMA(2, d1,0)-FIGARCH(1,d2,1) model with 1.0% VaR levels of long position. Furthermore, with small VaR level α , the accuracy of in-sample VaR computed by ARFIMA(2,d1,0)-HYGARCH(1,d2,1) model is more than other models. This also indicates that ARFIMA(2,d1,0)-HYGARCH(1,d2,1) with skewed student-t distribution can describe the lepkurtosis and fat-fail behaviors exhibited in the two stock index return series. From the numbers of VaR failure rates, at every VaR level and all position, that computed by ARFIMA(2,d1,0)-HYGARCH(1,d2,1) are the smallest, but the opposite result is obtained for ARFIMA(2,d1,0)-GARCH(1,1), which indicate that the longmemory characteristic of fluctuation series compact to the estimation
②
Table 2. Out-of-sample VaR calculated by ARFIMA(2,d1,0)-GARCH-type (1,d2,1) with skewed student-t distribution for the return series of SHCI and SZCI
D
SHCI returns (%)
Long position
5.0 2.5 1.0 0.5 0.25
Short position
95.0 97.5 99.0 99.5 99.75
G
FIG
HYG
SZCI returns FIAP
G
FIG
HYG
FIAP
0.038835 0.038835 0.038835 0.048544 0.058252 0.058252 0.058252
0.058252
(0.58904) (0.58904) (0.58904) (0.94568) (0.70773) (0.70773) (0.70773)
(0.70773)
0.029126 0.029126 0.029126 0.038835 0.038835 0.038835 0.038835
0.048544
(0.79371) (0.79371) (0.79371) (0.40485) (0.40485) (0.40485) (0.40485)
(0.17438)
0.019417 0.019417 0.019417 0.019417 0.029126 0.029126 0.029126
0.029126
(0.39496) (0.39496) (0.39496) (0.39496) (0.11294) (0.11294) (0.11294)
(0.11294)
0.019417 0.019417 0.019417 0.019417 0.009709 0.009709 0.009709
0.019417
(0.11541) (0.11541) (0.11541) (0.11541) (0.54880) (0.54880) (0.54880)
(0.11541)
0.009709 0.009709 0.009709 0.019417
0.00000
(0.26666) (0.26666) (0.26666) (0.02940)* (1.0000) 0.94175
0.91262
0.91262
0.89320
0.93204
0.00000
0.009709 0.0097087
(1.0000) (0.26666) 0.91262
0.91262
(0.70773) (0.11373) (0.11373) (0.02069)* (0.42663) (0.11373) (0.11373) 0.98058
0.98058
0.98058
0.96117
0.97087
0.97087
0.97087
(0.70583) (0.70583) (0.70583) (0.40485) (0.79371) (0.79371) (0.79371) 0.99029
0.99029
0.99029
0.98058
0.99029
0.99029
0.99029
(0.97618) (0.97618) (0.97618) (0.39496) (0.97618) (0.97618) (0.97618) 1.0000 (1.0000)
0.99029
0.99029
0.99029
0.99029
0.99029
0.99029
(0.54880) (0.54880) (0.54880) (0.54880) (0.54880) (0.54880)
(0.26666) 0.92233 (0.23161) 0.97087 (0.79371) 0.98058 (0.39496) 0.99029 (0.54880)
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.99029
(1.0000)
(1.0000)
(1.0000)
(1.0000)
(1.0000)
(1.0000)
(1.0000)
(0.26666)
Notes: 1.0000 failure rate of short position in Table 2 is equivalent to 0.0000 failure rate of short position. The superscript, *, indicates the statistical significance at the 5%.
Comparative Analysis of VaR Estimation of Double Long-Memory GARCH Models
427
③
accuracy of the stock return series model. Comparing failure rates of VaRs in long and short position, we obtain that double long-memory GARCH models perform much better than ARFIMA-GARCH model while computing in-sample VaRs. 4.2 Out-of-Sample VaR Forecast By comparing the in-sample VaR with estimation sample we only know the “past” performance of these VaR models. The real contribution of VaR computation is its forecasting ability, which provides investors or financial institutes with the information about what the biggest loss they will incur is. In this subsection we show the empirical results in forecasting ability of the double long-memory models used to compute VaR. The out-of-sample VaR is one-step-ahead forecasted, which means that the VaR of the (t+1)th day is computed conditional on the available information on the tth day. The sample interval of forecasting is from April 21, 2006 to September 27, 2006. We compute 103 out-of-sample VaRs of SHCI and the SZCI returns. Just similar to in-sample VaR analysis, the results of out-of-sample VaRs are recorded for latter evaluation using Kupiec’s LR test. The empirical results are shown in Table 2. For the limited amount of VaRs estimated, it is easy to encounter zero failure rates in some pre-specified VaR level, like 0.25% and 0.5%. It is worthy to note that zero failure rate appearing in 0.25% and 0.5% level means the model used to forecast VaRs performs very well and it can capture the lepkurtosis and fat-fail behaviors exhibited in the stock index return series perfectly. According to Table 2, with the skewed student-t distribution, there are the following results: At the 5% statistic significance level, the Kupiec’s LR test values of out-ofsample VaR computed by ARFIMA(2,d1,0)-FIGARCH(1,d2,1) model reject the null hypothesis at 0.25% VaR level of short position and 0.5% VaR level of long position. And excepted of this, no matter long position and short position, the double long-memory GARCH models and ARFIMA(2,d1,0)-GARCH(1,1) have a notable performance. The results show that ARFIMA(2,d1,0)-GARCH(1,1), ARFIMA(2,d1,0)FIGARCH(1,d2,1), ARFIMA(2,d1,0)-HYGARCH(1,d2,1) all have the higher forecast accuracy of out-of-sample VaRs, no matter long position and short position. Additionally, it is worthy to note that there are no significant differences in the accuracy of out-of-sample VaRs forecasted by ARFIMA(2,d1,0)-GARCH(1,1), ARFIMA(2,d1,0)-FIGARCH(1,d2,1) and ARFIMA(2,d1,0)-HYGARCH(1,d2,1) models. Moreover, when VaR level α less than or equal to 2.5%, for out-of-sample VaRs computation, forecast ability of double long-memory GARCH models in short position is better than in long position.
5 Conclusions In this paper, we investigate the ability of double long-memory GARCH type models with skewed student-t distribution to compute VaR. The following conclusions are obtained. (1) In general, double long-memory model ARFIMA(2,d1,0)-HYGARCH(1,d2,1) with skewed student-t distribution performs better than other models in computing VaRs, no matter in-sample and out-of-sample.
428
G. Cao, J. Guo, and L. Xu
(2) For forecast ability of out-of-sample VaR, ARFIMA(2,d1,0)-FIGARCH(1,d2,1) and RFIMA(2,d1,0)-HYGARCH(1,d2,1) all performance better. Double long-memory GARCH models are better than ARFIMA-GARCH model for forecast ability of insample VaR. (3) when the VaR level α less than or equal to 2.5%, for in-sample VaR computation, estimation ability of double long-memory GARCH models used in long position is better than in short position. But the diametrically opposite result is for out-of-sample VaR forecast ability. That is, forecast ability of double long-memory GARCH models used in short position is better than in long position for out-ofsample VaR forecast when the VaR level α less than or equal to 2.5%.
References [1] Baillie, R.T., Bollerslev, T., Mikkelsen, H.: Fractional Integrated Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics 74, 3–30 (1996) [2] Baillie, R.T., Chung, C.-F., Tieslau, M.A.: Analysing Inflation by the Fractionally Integrated ARFIMA-GARCH Model. Journal of Applied Econometrics 11, 23–40 (1996) [3] Bollerslev, T.: Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics 31, 307–327 (1986) [4] Cao, G.X.: Research of China Stock Market Fluctuation Based on Fractal Analysis. Economic Science Press, Beijing (2008) (in Chinese) [5] Cao, G.X., Yao, Y.: Empirical Research on Dynamic Spillover Effect and Correlation in Shanghai and Shenzhen Stock Markets. Systems Engineering 26(5), 47–54 (2008) (in Chinese) [6] Ding, Z., Granger, C.W.J., Engle, R.F.: A long Memory Property of Stock Market Returns and a New Model. Journal of Empirical Finance 1, 83–106 (1993) [7] Engle, R.E.: Autoregressive Conditional Heteroskedasticity with Estimation of the Variance of UK Inflation. Econometrics 50, 987–1008 (1982) [8] Engle, R.E., Bollerslev, T.: Modeling the Persistence of Conditional Variances. Econometric Reviews 5, 81–87 (1986) [9] Hamilton, J.D.: Time Series Analysis. Princeton University Press, Princeton (1994) [10] Hansen, B.: Autoregressive Conditional Density Estimation. International Economic Review 35, 705–730 (1994) [11] Jorion, P.: Value at Risk, 2nd edn. McGraw-Hill, New York (2001) [12] Kupiec, P.H.: Techniques for Verifying the Accuracy of Risk Measurement Models. Journal of Derivatives (3), 73–84 (1995) [13] Lamoureux, C.G., William, D.L.: Forecasting Stock Return Variance: Toward an Understanding of Stochastic Impied Volatilities. Review of Financial Studies 5, 293–326 (1993) [14] Laurent, S., Perters, J.P.: G@RCH 4.0, Estimating and Forecasting ARCH Models, Timberlake Consultants (2005) [15] Nelson, D.B.: Conditional Heterosdasticity in Asset Returns: A New Approach. Econometrica 59, 347–370 (1991) [16] Ricardo, A.: The Estimation of Market VaR Using GARCH Models and a Heavy Tail Distributions. Working Paper Series (2006) [17] Tang, T.L., Shieh, S.J.: Long-memory in Stock Index Futures Markets: A Value-at-risk Approach. Physica A 366, 437–448 (2006)
Estimation of Value-at-Risk for Energy Commodities via CAViaR Model* Zhao Xiliang1 and Zhu Xi2 1
Department of Economics, Xiamen University, Xiamen, China, 361005
[email protected] 2 Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai, China 20052
[email protected] Abstract. This paper uses the Conditional Autoregressive Value at Risk model (CAViaR) proposed by Engle and Manganelli (2004) to evaluate the value-at-risk for daily spot prices of Brent crude oil and West Texas Intermediate crude oil covering the period May 21th, 1987 to Novermber 18th, 2008. Then the accuracy of the estimates of CAViaR model, Normal-GARCH, and GED-GARCH was compared. The results show that all the methods do good job for the low confidence level (95%), and GED-GARCH is the best for spot WTI price, Normal-GARCH and Adaptive-CAViaR are the best for spot Brent price. However, for the high confidence level (99%), Normal-GARCH do a good job for spot WTI, GED-GARCH and four kind of CAViaR specifications do well for spot Brent price. Normal-GARCH does badly for spot Brent price. The result seems suggest that CAViaR do well as well as GED-GARCH since CAViaR directly model the quantile autoregression, but it does not outperform GED-GARCH although it does outperform Normal-GARCH.
1 Introduction Since asset prices were becoming more and more volatile since 1970s, the importance of effective risk management has never been greater. The past decade has witnessed the rapid development of techniques for measuring and managing market risk. One of the most popular approaches is the well known “Value at Risk” (VaR) measure, which many financial institutions and risk managers have adopted as a first line of defense against market risk. Value at Risk is defined as the worst loss that might be expected from holding a security or portfolio over a given period of time, usually a day or two weeks for the purpose of regulatory capital reporting, given a specified level of probability (known as the “confidence level). The inclusion of VaR models within the capital-adequacy framework provides an incentive for financial institutions to develop efficient models, that is, models that provide sufficient conservatism to meet the supervisors’ requirements while at the same time minimizing the capital that must be held. Cabedo and Moya (2003) used the historical simulation with ARMA forcasts *
This research was supported by China National Science Foundation (70703023).
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 429–437, 2009. © Springer-Verlag Berlin Heidelberg 2009
430
Z. Xiliang and Z. Xi
(HSAF) approach to evaluate value at risk for daily spot Brent prices from 1992 to 1998 and tested the model out of sample for 1999. They found that the HSAF model fits the data more closely than standard historical simulation or the ARCH model. Costello et al. (2008) calculated VaR measures for daily Brent crude oil price from May 20, 1987 through January 18, 2005. They used the first five years data to estimate models and took the rest of the data as out sample investigation. Their results suggest that their semi-parametric GARCH outperforms the HSAF approach. Giot and Laurent (2003) calculated VaR measures for daily spot prices of Brent crude oil and West Texas Intermediate crude oil covering the period May 20, 1987 to March 8, 2002. In a five year out of sample investigation, they showed that the skew student APARCH model performed best in these data. Hung et al. (2008) adopts the GARCH model with the heavy-tailed (HT) distribution proposed by Politis (2004) to estimate one-day-ahead VaR for West Texas Intermediate crude oil, Brent crude oil, heating oil, propane and New York conventional gasoline regular, and further compares the accuracy and efficiency with the Normal-GARCH and T-GARCH models. They showed that the t-GARCH model is the least accurate and efficient model at both the high and low confidence levels. The Normal-GARCH model is more efficient than alternatives in the case of low confidence levels for most series, but fails to achieve reliable coverage rate. The VaR forecasts obtained by the HT-GARCH model provide the most accurate coverage rate and most efficient risk measures. Fan et al. (2008) calucated VaRs for daily spot West Taxes Intermediate crude oil and Brent crude oil prices covering the period May 20, 1987 to August 1, 2006. They use the last one year as the out of period sample. They found that the VaR model based on GED-GARCH proves to be more robust and accurate than Normal-GARCH and HSAF methods. This study adopts a newly developed method, Conditional Autoregressive Value at Risk (CAViaR) model proposed by Engle and Manganelli(2004), to estimate VaRs. Instead of modeling the whole distribution, CAViaR models directly the quantile. The rest of the paper arranges as follows: the second part introduce the CAViaR model, part 3 introduces the backtesting methods employed in this paper, part 4 describe the data, part 5 gives the empirical results, and last part comes to the conclusions.
2 Conditional Autoregressive Value at Risk (CAViaR) Model The empirical fact that volatilities of financial market returns cluster over time may be translated in statistical words by saying that their distribution is autocorrelated. VaR is the loss that will be exceeded over a pre-specified holding period on a given probability. Actually, VaR is the left quantile of the underlying asset, which must exhibit a similar behavior. CAViaR is this kind of model which modeling VaR as an autoregressive specification. Suppose the return series of a portfolio is {yt}t=1T, and p be the probability associated with VaR, xt a vector of time t observables and βp a vector of unknown parameters. Let VaRt(β) ≡ f(xt-1,βp) denote the time t p-quantile of the distribution of portfolio returns formed at time t-1. Engle and Manganelli (2004) put forward four specifications: Model 1 (SAV) Symmetric Absolute Value:
Estimation of Value-at-Risk for Energy Commodities via CAViaR Model
431
VaRt ( β ) = β1 + β 2VaRt −1 ( β ) + β3 yt −1
Model 2 (AS) Asymmetric Slope:
(
VaRt ( β ) = β1 + β 2VaRt −1 ( β ) + β3 yt −1
)
+
(
+ β 4 yt −1
)
−
Model 3 (IGARCH) Indirect GARCH(1,1):
(
)
2 2 1/2 VaRt ( β ) = β1 + β 2VaRt −1 ( β ) + β3 yt −1
Model 4 (Adaptive) Adaptive:
( )
⎧
(
( ))
VaRt ( β ) = VaRt −1 β1 + β1 ⎨⎡1 + exp G ⎡ yt −1 − VaRt −1 β1 ⎤ ⎤ ⎣ ⎦ ⎦ ⎩⎣
−1
⎫
− p⎬
⎭
Where (yt-1)+ = max(yt-1,0), (yt-1)- = -min(yt-1,0). We’ll use these four CAViaR specifications to evaluate the VaR of crude oil prices in the following.
3 Backtesting VaR In order to evaluate the accuracy of different models for VaR estimation, we employ the Kupiec (1995)’s unconditional coverage testing statistics. First, define the hit sequences of VaR violations as p ⎪⎧1, if yt +1 < VaRt +1 I t +1 = ⎨ p ⎪⎩0, if yt +1 ≥ VaRt +1
Where VaRpt+1 denotes the forecasted VaR at time t+1, and yt+1 is the actual return at time t+1. The total number of violation is T1, and the total observation under evaluation is T, then the failure rate f = T1/T would be equal to the confidence level p if the model is precisely accurate. Kupiec (1995) formulate a statistics to test the null hypothesis that the failure rate equal to confidence level p, that is, H0: E(It+1) = p, which follows a chi-squared distribution with one degree of freedom.
LRuc
^
2 ln
`
ª1 p T T1 pT1 º ln ª1 f T T1 f T1 º F 2 1 «¬ »¼ «¬ »¼
(1)
4 Data Description and Preliminary Analysis We use daily spot WTI and Brent crude oil prices from May 21th, 1987 to November 18th, 2008, which are quoted in US dollars per barrel and come from the Energy information administration of America.
-40
-40
-20
-20
Return of WTI
Return of Brent
0
0
20
Z. Xiliang and Z. Xi
20
432
01may1987
01may1992
01may1997
01may2003
01may2008 Trading day
01may1987
01may1992
01may1997
01may2003
01may2008 Trading day
Fig. 2. Daily spot WTI and Brent crude oil price returns (1987.5.21-2008.11.18)
Table 1. Descriptions of simple statistics for daily returns WTI
Brent
Panel A: Estimation period (4237 observations) Mean
0.01912
0.02017
S.D.
2.49404
2.36402
Skewness
-1.26107
-0.98834
Kurtosis
24.82951
21.84798
Min
-40.63958
-36.12144
Max
18.86765
17.33327
J-B
85350(0.0000)
63405.59(0.0000)
Q(10)
21.12(0.0203)
22.81(0.0115)
Q2(20)
123.04(0.0000)
351.32(00.000)
Mean
-0.0175
-0.0094
S.D.
2.3831
2.1835
Skewness
-0.1314
-0.1458
Kurtosis
7.6240
5.3522
Min
-12.8267
-11.4688
Max
16.4137
11.4621
J-B
893.7661(0.0000) 234.0857(0.0000)
Q(10)
21.85(0.0159)
11.58(0.3142)
214.04(0.0000)
45.41(0.0000)
Panel B: Forecast period (1000 observations)
2
Q (10)
2
Note: Figures in parentheses are p-values. Q(10) and Q (10) are the Ljung-Box Q test for 10th order serial correlation of the returns and squared returns, respectively.
Estimation of Value-at-Risk for Energy Commodities via CAViaR Model
433
There are 5237 samples in our data. We divided them into two parts, the first from period May 21th, 1978 to October 18th, 2004 is in sample period, and the second from period October 19th, 2004 to November 18th, 2008 is the out of sample period. We use in sample period series to estimate VaR models, and employed the estimated model to forecast the VaRs in out of sample period and then evaluate the accuracy of different VaR estimation models. We define the returns as 100 times of difference of log prices, that is, yt=100*ln(pt/pt-1). Fig.2 shows the returns of WTI and Brent crude oil during the whole period. Table 1 shows the simple statistics for returns of WTI and Brent crude oil both in sample and out of sample. Obviously, it can be found that all distributions exhibit fat-tailed and leptokurtic. As indicated by the coefficients of skewness and kurtosis, each of the return series presents left-skewed and leptokurtic for the estimation period. The J-B normality test significantly rejects the hypothesis of normality for both periods. Moreover, the Ljung-Box Q(10) statistics for returns indicate there are some serial dependence for both series in sample and for WTI returns out of sample, but there is serial independence for Brent return out of sample. The Ljung-Box Q2(10) statistics for squared returns indicate that the return series exhibit linear dependence and strong ARCH effects.
5 Empirical Results and Analysis 5.1 Estimates for Normal-GARCH, GED-GARCH, and CAViaR Models In order to perform the VaR analysis, we first estimate the GARCH(1,1) model with the usual normal distribution and general error distribution (GED), and then we use four CAViaR specifications proposed by Engle and Mangenilli (2004) to estimate VaR. For each series, the same three kinds of models are estimated with first 4237 daily returns, and the estimation period is then rolled forward by adding one new day. In this procedure, the out of sample VaR are computed for the next 1000 days. The estimationresults of the Normal-GARCH, GED-GARCH, and four CAViaR specifications for WTI crude oil and Brent crude oil during the in-sample period are reported in Table 2 and Table 3. Table 2 shows that the sums of the coefficients of ARCH terms and GARCH terms in two GARCH model are less than one, and thus ensure that the stationary condition hold. The shape parameters in GED-GARCH model are less than 2, reveal that the distributions of returns series are leptokurtic and fat-tailed. Diagnostics of the standard residuals of two GARCH models for WTI return are sufficient to correct the serial correlation of return in the conditional variance equation. For the Brent return series, it’s only sufficient at 1% significant level. Since this kind of GARCH models are nested in a framework of i.i.d. variables, which might not be consistent with the characteristics of our return series showed in Table 1. There might be not i.i.d in return series since the Ljung-Box Q(10) for return are large, which show that there are some kind of serial correlation in return series. However, CAViaR models are valid even for non-i.i.d sequences. Therefore, CAViaR specifications are more general than GARCH models, which can be used for situations with constant volatilities, but changing error
434
Z. Xiliang and Z. Xi
distribution, or situations where both error densities and volatilities are changing. Table 3 is the results of CAViaR models. The coefficients of four specifications are very significant at 5% confidence level except constant (β1) in Asymmetric Slope and Indirect GARCH specifications. But, the coefficient of the autoregressive term (β2) is always very significant. This confirms that the phenomenon of clustering of volatilities is relevant also in tails. The results for the 1% VaR show that the Symmetric Absolute Value, the Asymmetric Slope and the Indirect GARCH models do a good job at describing the evolution of the left tails for the Brent crude oil, and Indirect GARCH model do a good job for the WTI crude oil. The 5% VaR results show that all the models perform well with two series except the Asymmetric Slope model for WTI crude oil. Table 2. Estimation results of Normal-GARCH and GED-GARCH Normal-GARCH WTI
GED-GARCH Brent
WTI
Brent -
AR
0.8058(0.000)
-
0.6860(0.000)
MA
-0.8405(0.000)
-
-0.7195(0.000) -
ARCH
0.1070(0.000)
0.1063(0.000)
0.0808(0.000)
0.0968(0.000)
GARCH
0.8913(0.000)
0.8859(0.000)
0.9110(0.000)
0.8939(0.000)
Cons
0.0613(0.0000)
0.0757(0.000)
0.0639(0.000)
0.0710(0.000)
1.2920(0.000)
1.3309(0.000)
Shape 2
Q (10)
15.6270(0.1108)
21.5237(0.0177) 15.048(0.130)
22.9428(0.0110)
2
Note: Figures in parentheses are p-values. Q (10) are the Ljung-Box test statistics for the squared standardized residuals with 10 lags.
5.2 VaR Performance Table 4 reports the results of all models at 95% and 99% confidence levels. Panel A shows that the Normal-GARCH yields the highest VaR estimates, the lowest failure rate, and its VaR estimate is most volatile. But it passes the LRuc test. GED-GARCH yields less VaR, and higher failure rate, and its VaR estimate is less volatile. It can also pass the LRuc test. Four CAViaR estimates is less volatile than Normal-GARCH and GED-GARCH, they also can pass the LRuc test. For the WTI series, GED-GARCH performs best, although all the model do good job as LRuc test shows. For the Brent series, Normal-GARCH and Adaptive-CAViaR seem outperform other models, and IGARCH-CAViaR model performs worst. Panel B shows the similar result for WTI series. Normal-GARCH do the best job, and GED-GARCH follows, IGARCH-CAViaR model do the worst job. Other three kinds of CAViaR models perform well, but not better than Normal-GARCH and GED-GARCH. For Brent series, we get a quite a different result. Normal-GARCH performs badly, since it can not pass the LRuc test at 5% confidence level. GED-GARCH, as well as Symmetric Absolute
Estimation of Value-at-Risk for Energy Commodities via CAViaR Model
435
Table 3. Estimates and relevant statistics for the four CAViaR specifications
1% Value at
Symmetric Absolute
Risk
Value
Asymmetric -
Indirect -
Slope
-
Adaptive WTI
GARCH
WTI
Brent
WTI
Brent
WTI
Brent
0.1314
0.2146
-0.0242
0.0167
0.0154
0.2301 1.2082
1.1225
Standard errors 0.0417
0.0810
0.0259
0.0681
0.2438
0.2496 0.1061
0.1261
p-values
0.0008
0.0040
0.1756
0.4032
0.4749
0.1783 0.0000
0.0000
Beta2
0.9040
0.8688
0.9004
0.8674
0.8939
0.8505
Beta1
Standard errors 0.0177
0.0336
0.0098
0.0307
0.0130
0.0147
p-values
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
Beta3
0.3952
0.4842
0.4762
0.5051
0.9507
1.2257
Standard errors 0.0933
0.1417
0.0433
0.1449
0.4233
3.0485
p-values
0.0003
0.0000
0.0002
0.0124
0.3438
0.3206
0.4495
0.0000
Beta4 Standard errors
0.0483
0.1384
p-values
0.0000
0.0006
Brent
RQ
355.40
326.31
351.14
328.01
362.92
321.12 441.70
396.70
5% Value at
Symmetric Absolute Asymmetric
Indirect
Risk
Value
Slope
GARCH
WTI
Brent
WTI
Brent
WTI
Brent
WTI
Brent
0.1751
0.2251
0.0757
0.1373
0.4128
0.3378 0.5744
0.4439
Standard errors 0.0351
0.0737
0.0246
0.0858
0.1356
0.0991 0.0664
0.0534 0.0000
Adaptive
Beta1
p-values
0.0000
0.0011
0.0010
0.0548
0.0012
0.0003 0.0000
Beta2
0.8886
0.8548
0.8985
0.8397
0.8880
0.8628
Standard errors 0.0184
0.0327
0.0130
0.0367
0.0121
0.0108
p-values
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
Beta3
0.1875
0.2499
0.2145
0.2365
0.2015
0.3088
Standard errors 0.0316
0.0297
0.0296
0.0519
0.0402
0.0484
p-values
0.0000
0.0000
0.0000
0.0000
0.0000
Beta4
0.1356
0.2854
Standard errors
0.0271
0.0415
p-values
0.0000
0.0000
1147.70
1062.63 1159.89
RQ
0.0000
1150.73
1062.93
1067.12 1208.25 1120.98
436
Z. Xiliang and Z. Xi
Table 4. Forecasting performance summary for different VaR models Mean
Standard Min
VaR Panel A.
Max
Failure rate LRuc
VaR
95% VaR confidence level
WTI GARCH-Normal
-3.7564
1.3784
-12.1296 -2.1530 0.046
0.3457(0.5566)
GARCH-GED
-3.6985
1.2756
-10.8460 -2.1911 0.049
0.0212(0.8843)
CAViaR-SAV
-3.6923
1.1736
-9.7911
-2.0761 0.047
0.1932(0.6603)
CAViaR-AS
-3.7220
1.0929
-9.4961
-2.2764 0.046
0.3457(0.5566)
CAViaR-IGARCH -3.5189
1.0973
-10.2667 -2.3384 0.053
0.1860(0.6663)
CAViaR-Adaptive
-3.5250
0.8638
-7.6005
-1.8695 0.058
1.2843(0.2571)
GARCH-Normal
-3.5622
0.9543
-8.3678
-2.8126 0.054
0.3287(0.5665)
GARCH-GED
-3.5514
0.9305
-8.1599
-2.0164 0.057
0.9889(0.32)
CAViaR-SAV
-3.5497
0.8746
-8.1720
-1.8643 0.055
0.5105(0.4749)
CAViaR-AS
-3.5007
0.8980
-7.9556
-1.8740 0.06
1.9842(0.1589)
CAViaR-IGARCH -3.4945
0.9178
-8.3820
-2.0322 0.063
3.2987(0.0693)
CAViaR-Adaptive
0.6221
-6.3496
-2.5900 0.054
0.3287(0.5665)
Brent
Panel B.
-3.5166
99% VaR confidence level
WTI GARCH-Normal
-5.3133
1.9687
-17.1100 -3.0171 0.011
0.0978(0.7544)
GARCH-GED
-5.8162
2.0207
-17.0137 -3.4468 0.008
0.4437(0.5102)
CAViaR-SAV
-6.5322
2.7565
-20.8542 -2.7212 0.006
1.8862(0.1696)
CAViaR-AS
-6.6588
2.5434
-19.8145 -3.2760 0.006
1.8862(0.1696)
CAViaR-IGARCH -6.4797
2.7136
-22.0355 -3.0564 0.005
3.0937(0.0786)
CAViaR-Adaptive
-5.4453
1.3609
-9.9922
-3.2774 0.014
1.4374(0.2306)
-5.038
1.3497
-11.8348 -2.8126 0.017
4.0910(0.0431)
Brent GARCH-Normal GARCH-GED
-5.5377
1.4509
-12.7239 -3.1443 0.01
0.0000(1.0000)
CAViaR-SAV
-5.9187
1.8174
-15.5547 -2.3880 0.01
0.0000(1.0000)
CAViaR-AS
-5.9665
1.7032
-14.7304 -2.7242 0.009
0.1045(0.7465)
CAViaR-IGARCH -6.0422
1.9195
-16.1704 -2.6831 0.009
0.1045(0.7465)
CAViaR-Adaptive
1.1364
-8.9353
1.4374(0.2306)
-5.4562
-3.1934 0.014
Estimation of Value-at-Risk for Energy Commodities via CAViaR Model
437
Value CAViaR models, do perfect forecast. Both failure rate is 1%, which is just equal the pre-specified probability of violations. The other three CAViaR specifications also do a good job. They can pass the LRuc test significantly.
6 Conclusion This paper uses the Conditional Autoregressive Value at Risk model (CAViaR) proposed by Engle and Manganelli (2004) to evaluate the value-at-risk for daily spot prices of Brent crude oil and West Texas Intermediate crude oil covering the period May 21th, 1987 to Novermber 18th, 2008. Then the accuracy of the estimates of CAViaR model, Normal-GARCH, and GED-GARCH was compared. The results show that all the methods do good job for the low confidence level (95%), and GED-GARCH is the best for spot WTI price, Normal-GARCH and Adaptive-CAViaR are the best for spot Brent price. However, for the high confidence level (99%), Normal-GARCH do a good job for spot WTI, GED-GARCH and four kind of CAViaR specifications do well for spot Brent price. Normal-GARCH do badly for spot Brent price. The result seems suggest that CAViaR do well as well as GED-GARCH since CAViaR directly model the quantile autoregression, but it does not outperform GED-GARCH although it does outperform Normal-GARCH.
References [1] Cabedo, J.D., Moya, I.: Estimating Oil Price “Value at Risk” Using the Historical Simulation Approach. Energy Economics 25, 239–253 (2003) [2] Caostello, A., Asem, E., Gardner, E.: Comparison of Historically Simulated Var: Evidence from Oil Prices. Energy Economics 30, 2154–2166 (2008) [3] Crouhy, M., Galai, D., Mark, R.: Risk Management. McGraw-Hill, New York (2001) [4] Engle, R.F., Manganelli, S.: CAViaR: Conditional Autoregressive Value at Risk by Regression Quantiles. Journal of Business and Economic Statistics 22(4), 367–381 (2004) [5] Fan, Y., Zhang, Y.J., Tsai, H.T., Wei, Y.M.: Estimating ’Value at Risk’ of Crude Oil Price and Its Spillover Effect Using the Ged-Garch Approach. Energy Economics (2008), doi:10.1016/j.eneco.2008.04.002 [6] Giot, P., Laurent, S.: Market Risk in Commodity Markets: A Var Approach. Energy Economics 25, 435–457 (2003) [7] Hung, J.C., Lee, M.C., Liu, H.C.: Estimation of Value-at-Risk for Energy Commodities Via Fat-Tailed Garch Models. Energy Economics 30, 1173–1191 (2008)
An Empirical Analysis of the Default Rate of Informal Lending—Evidence from Yiwu, China Wei Lu, Xiaobo Yu, Juan Du, and Feng Ji School of Management, University of Science and Technology of China, Hefei, China
Abstract. This study empirically analyzes the underlying factors contributing to the default rate of informal lending. This paper adopts snowball sampling interview to collect data and uses the logistic regression model to explore the specific factors. The results of these analyses validate the explanation of how the informal lending differs from the commercial loan. Factors that contribute to the default rate have particular attributes, while sharing some similarities with commercial bank or FICO credit scoring Index. Finally, our concluding remarks draw some inferences from empirical analysis and speculate as to what this may imply for the role of formal and informal financial sectors. Keywords: Informal Lending, Default Rate, Logistic Regression, FICO.1
1 Introduction Mark Schreiner (2000) defined informal finance as contracts or agreements conducted without reference or recourse to the legal system to exchange cash in the present for promises of cash in the future. Meanwhile, many scholars make a lot of illuminating research into the reasons why informal finance exists. Steel (1997) stated that, because of the relation of region, occupation and consanguinity, creditors have advantage in information about debtors’ credibility, and income so that the hazard of adverse select caused by asymmetric information can be eliminated or decreased. And they have information advantages in supervising the process of loan. Bell et al (1997) explained from the demand side that a parallel market structure may exhibit extensive rationing in the regulated segment, and hence the spillover of unsatisfied demand into the unregulated segment of the market is caused. Luc Tardieu pointed out that banks in the formal sector tended to rely on “hard” information for their lending decisions, such as books of accounts, ratios, financial business plan, etc. By contrast, informal sources value “soft” information: face-to-face relationships,confidential information, informal business plan, etc. Informal credit achieves a low default rate on loans to rural or the underprivileged population that are considered as highly risky clients by formal financial sectors. Many scholars have also done lots of innovative empirical studies about the low default in informal lending. Irfan Aleem found the default rate was only 1.5% to 2% in his study on the rural credit market in Pakistan. Ranjeet Ranade et al. found that the repayment rates of the comparatively poorer farmers were 1
FICO Score is a credit score developed by Fair Isaac & Co, www.myfico.com
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 438–446, 2009. © Springer-Verlag Berlin Heidelberg 2009
An Empirical Analysis of the Default Rate of Informal Lending
439
better than that of the financially better-off farmers during the survey in Indian informal credit market. And they constructed a game theoretic model to show, in the face of asymmetric information, how the necessity to build trust had led to this behavior. Some Chinese scholars analyzed the low default rate in microfinance using the comparative and classification analysis methods as well as the micro-credit models.
2 The Outline of Yiwu Informal Lending Yiwu is a small commercial city lying in the most prosperous eastern China with a small population of more than 700,000. In 2006, the GDP per capita reached 6.500 U.S dollars as much as three times as the whole country’s average2. It grew at a pace of 10 percent in five consecutive years. The grassroots capital volume passed ten billions surprisingly according to some unofficial survey. Most of the natives do business in different sectors ranging from foreign trade to real estate, and it is common for them to borrow money from their relatives or friends for their operation. Usually, there is no extra procedure in informal lending except a receipt on which outlines the amount, the names of the creditor and debtor, the monthly interest and, the deadline etc. As it can be seen from the chart1, the interest rate of informal lending was much higher than the benchmark rate. It was on an upward trend as the central bank increased the benchmark interest. As reflected from Fig.1, the default rate of more than 90% of the debts in our sample is less than 5% that is the same with the percent of non-performance debt stateowned commercial banks regulated to control. According to the Fig.3, the maturity of more than 85% of the debt in our sample is shorter than 6 months. 9% 20%
31%
60%
less than 3% 3% to 5% more than 5%
Fig. 1. Distribution of default rate
57%
23%
less than three months 3 to 6 months more than 6 months
Fig. 2. Distribution of maturity
The following part of the paper is organized as follows: In part three, the preliminary reasons explaining why informal lending has a low default rate are presented. In part four, the underling factors contributing to the low default rate are explored with the methodology of logistic regression. Finally, the informal lending is compared with commercial bank lending and FICO.
3 Data Descriptions and Sampling From July-2006 to July-2007, we monitored 35 debts in Yiwu informal credit market. Additionally, we made a number of interviews with moneylenders. When seeking 2
Data Source: http://number.cnki.net
440
W. Lu et al.
information on screening, risk evaluation the lenders are likely to be better informed than the borrower. However, much of the information we have about the lending activities are based on information from the demand side. Interest rate, loan size, collateral and even repayment easy to obtain information on by asking the borrower to double check information from lenders. Snowball sampling is an approach for locating information-rich key information. Using this approach, a few potential respondents are contacted and asked whether they know of anybody with the characteristics that are being sought for in the research. We use this approach to identify the resources within a community and to select the creditors and debtors best suited for the needs of our survey. 3.1 Default Comparisons between Informal Lending and Commercial Debt The rate of nonperforming debt should be controlled below the 5%3 benchmark according to the China Banking Regulatory Commission (CBRC) regulation. We assume the Null hypothesis H0: the default rate in our sample is not lower than that of commercial debt; The Alternative hypothesis H1: the default rate in our sample is indeed lower than that of commercial debt. H 0 : μ ≥ 0.05 H 1 : μ < 0.05
(1)
Left-tail test was conduced .The followings are the results. Table 1. One-Sample Statistics
Default Rate
Number
Mean
Std. Deviation
35
0.0268
0.02352
Std. Error Mean 0.00398
Table 2. One-Sample T Test ( α =0.05)
Default Rate
T
df
Sig.
-5.847
34
0.000
Mean Difference -0.02325
NB: the 90% Confidence Interval of Difference is (-0.0300, -0.0165).
,
,
We can know T =-5.847 Pd =0.000 Pl =0.0000 ∂ci
(2)
(3)
Then it is straightforward that overconfident investors (c=70%
intermediate
Low
30%30%
2.8-1.2
1.4-0.6
0.7-0.3
P 2u1@ , H t ) °° ' ® § c11 c12 · § c11 c12 · § a11 °Ht ¨ ¸¨ ¸¨ °¯ © 0 c22 ¹ © 0 c22 ¹ © a21
'
a12 · § a11 ¸ (H H ') ¨ a22 ¹ t 1 t 1 © a21
'
a12 · § b11 b12 · § b11 b12 · ¸ ¸¨ ¸ H ¨ a22 ¹ © b21 b22 ¹ t 1 © b21 b22 ¹
where α ij represents the ARCH effect, or the short run persistence of shocks, and βij represents the GARCH effect, or the contribution of such shocks to long run persistence. If it can be proved statistically significant that α ij , βij , i ≠ j is unequal to zero,
there is volatility spillover from country i to country j. The Maximum likelihood estimation is used to estimate the parameters, and the Ljung-Box Q statistics is used to test the robustness of the BEKK-MGARCH model.
3 Empirical Results Univariate inspection of the data reveals that neither of return series is distributed according to normal distribution as indicated by significant Jarque-Bera test statistics. The crr of Russia (crr-Ru) displays a mild negative skewness while crr of Kazakhstan (crr-Ka) is positively skew. The series also display significant leptokurtic behavior as evidenced from large kurtosis relative to Gaussian distribution. With the LB Q(p)
542
X. Sun, W. He, and J. Li
statistics, there is autocorrelation in the return series at indicated lags (5). The ADF statistics reveals that the two series are stationary. Table 1 shows estimation results. As shown in Table 2, the results show that the standardized residuals do not exhibit significant serial correlation at indicated lags (15). Thus, the model proposed is an adequate representation for describing the dynamic interaction of country risk returns. Table 1. The estimation of parameters Effect Parameter
ARCH a11
a21
a12
GARCH a22
b11
b21
b12
b22
Estimate 0.3337* 1.1168* -0.1264* 0.2860* 0.8475* -0.1017* 0.1648* 0.3790* Stderror 0.0482 0.1409 0.0035 0.0057 0.0008 0.0025 0.0199 0.0153 Notes: *: significant at 5% level.
Table 2. Adequacy of model: LB-Q tests for autocorrelation Residual Lag(15)
Statistic
crr-Ru crr-Ka
22.108 9.858
Residual^2
p-value
Statistic
p-value
0.15 0.89
10.780 5.226
0.77 0.99
Notes: Residual^2 means the squared residuals
As shown in Table 1, there is bidirectional spillover between Russia and Kazakhstan. It can be concluded that the magnitude of ARCH effect from Kazakhstan to Russia is stronger than it from Russia to Kazakhstan, while the magnitude of GARCH effect is stronger from Russia to Kazakhstan. Shocks to crr-Ru bring negative-going influence on crr-Ka in the short run, but positive-going in the long run, while shocks to crr-Ka bring positive-going influence on crr-Ru in the short run, but negativegoing in the long run. As sharing opposite country risk settings, the two countries are complementary choices for investment portfolio selection. For international investors interesting in the petroleum industry in Russia and Kazakhstan, it is advisable to diversify the investment, and to gear operational strategy flexibly according to country risk changes in the two counties.
4 Conclusions This paper attempts to quantitatively investigate the interaction of country risk between countries using BEKK model to capture the dynamic spillover effects. Empirical results show that there are significant bidirectional country risk spillover effects with the asymmetrical volatility between Russia and Kazakhstan, and short-term pattern of volatility spillover differs from the long-term pattern. For foreign entities who try to optimize investment or imports portfolio, these results provide useful policy implication: Russia and Kazakhstan are complementary choices.
Country Risk Volatility Spillovers of Emerging Oil Economies
543
Acknowledgments. This research is supported by the National Key Technologies R&D Program (NO.2006BAB08B01), from the Ministry of Science and Technology of P.R. China.
References [1] Kalyuzhnova, Y., Nygaard, C.: State governance evolution in resource-rich transition economies: An application to Russia and Kazakhstan. Energy Policy 36, 1829–1842 (2008) [2] Mitchell, J.V.: A new political economy of oil. The Quarterly Review of Economics and Finance 42, 251–272 (2002) [3] Breunig, R.V., Chia, T.C.: Sovereign ratings and oil-producing countries: Have sovereign ratings run ahead of fundamentals? (2008), http://ssrn.com/abstract=1138494 [4] Spanjer, A.: Russian gas price reform and the EU–Russia gas relationship: Incentives, consequences and European security of supply. Energy Policy 35, 2889–2898 (2007) [5] Zhang, Y.J., Fan, Y., Tsai, H.T., Wei, Y.M.: Spillover effect of US dollar exchange rate on oil prices. Journal of Policy Modeling 30, 973–991 (2008) [6] Hoti, S.: Small island tourism economies and country risk ratings. Mathematics and Computers in Simulation 68, 553–566 (2005) [7] Hoti, S., McAleer, M., Pauwels, L.L.: Modelling international tourism and country risk spillovers for Cyprus and Malta. Tourism Management 28, 1472–1484 (2007) [8] Benítez, P.C., McCallum, I., Obersteiner, M., Yamagata, Y.: Global potential for carbon sequestration: Geographical distribution, country risk and policy implications. Ecological Economics 60, 572–583 (2007)
Modeling the Key Risk Factors to Project Success: A SEM Correlation Analysis Juan Song1,2, Jianping Li1,*, and Dengsheng Wu1,2 1
Institute of Policy and Management, Chinese Academy of Sciences, Beijing 100190, China 2 Graduate University of Chinese Academy of Sciences, Beijing 100039, China
[email protected],
[email protected],
[email protected] Abstract. Researchers have put forward a number of project risk factors in different levels or dimensions. However, they mostly focus on the assessment of a single factor’s effect on project performance, neglecting of the relationship between factors and risk factors’ impact on project result due to the relationships. Taking domestic construction projects as an example and using structural equation model, this study analyses the correlation between risk factors and risk factors’ effect on project success. This study intends to help project managers better understanding of domestic projects risk characteristics, and provide positive recommendations for project management effectively. Keywords: project success, key risk factors, correlation analysis.
1 Introduction Researchers have always attached great importance to risk management. From their point of view, researchers put forward a number of project risk factors in different levels or dimensions. Boehm [1] provided the lists of the top risk factors in software projects. Shen et al. [2] identified 58 risks associated with Sino-Foreign construction joint ventures. However, they mostly focused on the assessment of a single factor’s impact on project performance, neglecting of relationship between factors and systematical impact on project result due to the relationships. That is not conducive to the theoretical study and the practice. On the basis of the review of international research literature, taking domestic construction projects as an example and using structural equation model, this study analyses the correlation between risk factors and its effect on project success. The study is structured as follows: In the next section, the literature is reviewed and reassessed. Following this, research method is presented. Thirdly, empirical study result and the main findings are presented. Fourthly, the comparison of the empirical result with the literature is analyzed. Finally, implications and limitations of this study are discussed and conclusions are drawn on. *
Corresponding author.
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 544–551, 2009. © Springer-Verlag Berlin Heidelberg 2009
Modeling the Key Risk Factors to Project Success
545
2 Literature Review The most common definition of project risk is in terms of exposure to specific factors that present a threat to achieving the expected outcomes of a project. But there is another definition in terms of uncertainty – such as “the chance of something happening that will have an impact on objective”[3].Some researchers tend to emphasize the two-edged nature of risks, such as “a threat and a challenge” [4], “the chance of something happening that will have an impact on objectives; may have a positive or negative impact” [3].The definition of risk used in this study is: A risk is the key factor to project success, where effective management of these factors will contribute to project success, and if not will lead to project failure. Lists of the key risks or success factors in projects are common in the literature. This study summarizes four international journals from 1990 to 2007 ---Journal of Information System Management, International Journal of Project Management, Construction Management and Economics, Project Management Journal, where the total fifty-six papers related to key factors or risks to project success or project failure were analyzed. There are thirty-one papers for IT projects and twenty-five papers for construction projects in the total fifty-six papers. Twenty-nine items of key risks or factors are summarized. Table 1 shows top-10 key risks or factor in IT projects and construction projects according to the count of citations. Table 1. Top-10 key risks or factors in IT projects and construction projects Rank
Construction Projects
IT Projects
1
Project objective is or not clear
Senior management provide or not effective support for project
2
Senior management provide or not
Project planning’s rationality
3
Communication and feedback
4
Project manager competency
Project objective is or not clear
5
Project planning’s rationality
Client involvement
effective support for project Change management
6
Project team competency
Communication and feedback
7
Monitoring and controlling
Project team competency
8
Client involvement
Technology is or isn’t proven or familiar
9
Resource allocation
Resource allocation
10
Stockholder management
Organizational culture and structure
According to Table 1, it can be seen that there are similarities and differences between the key risks or factors to construction projects success and to IT projects success. The factors “Senior management provide or not effective support for project” ,“Project objective is or not clear”, “Communication and feedback”, “ Project planning’s rationality”, “ Project team competency”, “Client involvement” and “Resource allocation” are important for both types of projects. The difference is that construction projects more focus on these factors ---“Project manager competency”, “Monitoring
546
J. Song, J. Li, and D. Wu
and controlling” and “Stockholder management” ,and IT projects more emphasize on these factors --- “Change management”, “Technology is or isn’t proven or familiar” and “Organizational culture and structure”.
3 Research Method 3.1 SEM Introduction Structural equation model is one of statistical analysis techniques and used to analyze the intrinsic relationship between the variables or the variables groups, which can not only effectively reduce the variable dimensions, but also deduce the direct effects and the indirect effects of the independent variables on the dependent variables. Structural equation model includes two main variables: latent variable and manifest variables. Latent variable, also known as component, factor or construct, can not be directly measured. Manifest variable, also known as observed variable or indicator, which can be directly observed and accurately measured. A latent variable often corresponds to a number of manifest variables, and can be seen as the abstract and generalization of its corresponding manifest variables. A typical structural equation model usually includes two main parts: measurement model and structural model. Measurement model describes the relationship of a latent variable with its manifest variables; structure model described the causal link between latent variables. In this paper, linear structural relationships (LISREL) version 8.70 was used analysis the relationship of risk factor with project success. 3.2 Questionnaire Investigation The data was gathered using structure questionnaires. The domestic construction contractor’s project managers were investigated. The investigation is executed on-site during PMP (B class) certification training for contractor’s project managers. As these peoples have achieved international PM certification standards and had more rich experience in project management, the quality of the questionnaire responses is ensured. Using on-site investigation, on the one hand, ensures the recovery rate of questionnaires, on the other hand, facilitates the correct understanding about the true meaning of questionnaire and avoiding ambiguity. The determination of survey is divided into two stages, which are the pilot survey and the final survey. The preliminary list of key factors was presented on the basis of literature at home and abroad. This preliminary list was further refined and many item reworded through the pilot survey and interviews with academic experts and practicing professionals in the domestic construction industry. As a result of this process, a list comprising of five groups of risks and a total of twenty-seven risk factors (seeing Table 2) was finally formulated into a final questionnaire. The questions solicited evaluations on a 5-point scale. For example, the level of impacting of engineering technology on project success was determined by asking the respondent the following question:“According to your assessment, which level does engineering technology
Modeling the Key Risk Factors to Project Success
547
impact on project success?” The answer was given on the scale:1 (not at all) to 5 (extremely seriously). Table 2. Key risk classification and key risk factors Risk
Project
classificationitself risk(R1)
Risk factors
Client risk(R2) Contractor risk(R3)
Other parties risk(R4)
Project environment risk(R5)
Adapt to corpo- Project Contractor rate strategy bidding is or not competency (R11) normal (R21) (R31)
Supervisor (R41)Government
Feasibility study Reasonable approved (R12) contract (R22)
Project
Designer (R42) Economy
Project
Sufficient objectives are or funds(R23) not clear(R13)
Project
Project scope is Project or not uncermanagement tainty(R14) competency
Project team competency (R34)
approval (R51)
planning (R32)
stability (R52) Supplier (R43)
Control and change(R33)
Social stability (R53)
Subcontractor (R44)
Natural and geological conditions (R54)
(R24) Delivery time is Project or not sponsor realistic (R15) /champion
Communication (R35)
Laws and norms (R55)
(R25) Communication Technology (R26) (R36)
Industry competition (R56)
3.3 Data Analysis and Result According to the theoretical framework of structural equation model of and the risk classification and risk factors in Table 2, this paper proposed the model of the impacting of risk factor on construction project success, the result of which was presented in Figure 1. Fitting index of the model includes: χ2=725.35, DF =319, CFI = 0.65, RMSEA =0.08. Since these tests achieve or exceed the required fit criteria, the structural equation model should be accepted. From Figure 1, it can be seen, five categories of risks have significant impacts on project success. Contractor risk impacts on project success most significantly, followed by client risk, project itself risk, project environment risk and other parties risk. According to the path coefficient, the effect of each risk factor was calculated, top-10 of which have been presented in Table 3. “Sufficient funds” impacts on project success most significantly. Top-10 risk factors comprise of three contractor risk factors, three client risk factors, two project itself risk factors, one project environment risk factor and two other parties risk factors.
548
J. Song, J. Li, and D. Wu
Fig. 1. The Model of impacting of risk factors on project success Table 3. The effects of top-10 risk factors on project success Rank
Risk factors
Effects on project success
1
Sufficient funds
0.789
2
Reasonable contract
0.779
3
Economy stability
0.765
4
project bidding is or not normal
0.760
5
delivery time is or not realistic
0.756
6
Supplier
0.748
7
Contractor’s communication
0.739
8
Project control and change
0.710
9
Contractor project team competency
0.710
10
Subcontractor
0.697
Table 4 presents the correlation between five groups of project risks. In this matrix, the first value is parameter estimate (non-standardized), and the second is the standard error, the third is t-value. It is generally believed that parameter estimate is significant when t-value is greater than 2. Table 4 shows that all t-values are greater than 2, so there are significant correlations between five groups of risk. Risk does not exist independently, which related with other risk. These risks impact on project success as a whole system. It is concluded only managing risks systematically can improve project performance due to the correlations of risks. Contractor, client and other parties should work cooperatively to manage potential risks effectively and in time.
Modeling the Key Risk Factors to Project Success
549
Table 4. The correlation of risk factors Variable
R1
R2
R3
R4
0.9139 R2
(0.0473)
1.0000
19.3172 R3
R4
R5
0.8680
0.9103
(0.0597)
(0.0447)
14.5471
20.3793
0.6043
0.7776
0.7202
0.1257
(0.0834)
(0.0972)
4.8096
9.3238
7.4133
0.6886
0.8181
0.8553
0.7860
(0.0974)
(0.0626)
(0.0520)
(0.0727)
7.0732
13.0686
16.4518
10.8050
1.0000
1.000
4 Comparison Table 5 presents top-10 risk factor of domestic construction projects (Table 3), the international construction projects (Table 1) and the domestic IT projects [5]. Top-10 risk factors of the domestic IT projects are cited from the paper published in Chinese Journal of Management, the writers of which, Liu et al., investigated risk factors of Table 5. The comparison of top-10 risk factor International
Domestic
construction projects
construction projects
Project objective is or isn’t clear
Sufficient funds
Senior management provide effective support for project
Reasonable contract
Domestic IT projects Integration with other information systems Estimation of project time
Communication and feedback Economy stability
Coordination between user’s department
Project manager competency
Conflict with corporate culture
project bidding is or not normal
Project planning’s rationality delivery time is or not realistic
Technology complexity
Project team competency
Supplier
Milestone Schedule monitoring
Monitoring and control
Contractor’s communication
Client involvement
Project change management
User support
Resource allocation
Contractor project team
Market need
Competency Stockholder management
Subcontractor
Project term training
550
J. Song, J. Li, and D. Wu
one hundred and twenty-eight IT projects in 2007, and thirty-five risk factors were classified, measured and assessed[5]. From Table 5, it is evident that there are similarities between international construction projects and domestic construction projects in “Communication and feedback”, “Project team competency”, “Project planning’s rationality” and “Stockholder management”. But project objective and senior management support are more important in international construction projects, sufficient funds and reasonable contract are more important in domestic construction projects. From Table 5, it also can be seen that estimation of project time and project team competency are important risk factors of both domestic construction projects and domestic IT projects. But market need impacts on IT project significantly and economy stability is important to construction project. Personel and resource are important to construction project, and technology is important to IT project. From above, it can be concluded that the similarities between domestic construction projects and the international construction projects are more than the similarities between domestic construction projects and domestic IT project. It can be concluded tentatively that the type of project impacts on project risk identification.
5 Conclusion Managing project risks has been recognized as a very important process in order to achieve project success. This paper presents the research result obtained through questionnaire investigation conducted in China. A total of twenty-seven key risk factors were ascertained based on a subjective assessment of level of impact on project success. Using structural equation model, this paper analyses the correlation between five groups of risks and the influence of each risk factor on project success. The recognized risks are mainly related to contractors, followed by client risk, project itself risk, project environment risk and other parties risk, and there are significant correlations between five groups of risks. These risk factors were compared with the findings of a parallel survey in domestic IT projects to ascertain the generic risk factors in both types of project and highlight the unique risk factors associated with domestic construction projects. In addition, these risk factors were compared with the key risk factors of international construction projects which were summarized on the basis of the international literature. This study only reflects the contractor's point of view. The investigation related with client, designer, supervisor and other parties involved should been carried out to obtain a more comprehensive point of view. Moreover, in future research, we will focus on the risk factors in the process of trustworthy software development. Acknowledgments.This research has been supported by a grant from National Natural Science Foundation of China (#90718042).
Modeling the Key Risk Factors to Project Success
551
References [1] Boehm, B.W.: Software risk management: principles and practices. IEEE Software 8, 32– 41 (1991) [2] Shen, L.Y., Wu, G.W.C., Ng, C.S.K.: Risk assessment for construction joint ventures in China. Journal of Construction and Engineering Management1 27, 76–81 (2001) [3] Standards Australia: Risk management AS/NZS 4360, Strathfield: Standards Association of Australia (2004) [4] Flanagan, R., Norman, G.: Risk management and construction Victoria. Blackwell Science Pty Ltd, Australia (1993) [5] Liu, S., Zhang, J., Chen, T., Cong, G.: Risk assessment and avoidance strategies for IT project in Enterprises. Chinese Journal of Management 5, 498–506 (2008)
Research on R&D Project Risk Management Model Xiaoyan Gu1,2, Chen Cai1,*, Hao Song1,2, and Juan Song1,2 1
Institute of Policy and Management, Chinese Academy of Sciences, Beijing 100190, China 2 Graduate University of Chinese Academy of Sciences, Beijing 100049, China
[email protected],
[email protected],
[email protected],
[email protected] Abstract. R&D project is an exploratory high-risk investment activity and has potential management flexibility. In R&D project risk management process, it is hard to quantify risk with very little past information available. This paper introduces quality function deployment and real option in traditional project risk management process. Through waterfall decomposition mode, R&D project risk management process is constructed step by step; through real option, the managerial flexibility inherent in R&D project can be modeled. In the paper, first of all, according to the relation matrix between R&D project success factors and risk indexes, risk priority list can be obtained. Then, risk features of various stages are analyzed. Finally, real options are embedded into various stages of R&D project by the risk features. In order to effectively manage R&D risk in a dynamic cycle, the steps above should be carried out repeatedly. Keywords: R&D project, risk management, quality function deployment, real option.
1 Introduction Tools and techniques for project risk management have been developed to assist project managers in past decades, such as venture evaluation review technique, probabilistic risk assessment, Bayesian risk analysis, fault tree analysis, event tree analysis, and so on. The underlying assumption in most of these tools and techniques is that past information is available regarding both the risk probability and risk impact. For R&D project, the distinctive feature is that it is an exploratory high-risk investment activity, with very little or no relevant previous experience and past data [1]. Therefore the traditional risk management methods based on statistical theory are not appropriate, it is essential to find a new way to manage R&D project risk. However, there was limited research in R&D project risk management. Asher Tishler [2] presented a two-stage model describing the optimal choice of R&D risk among R&D programs with the same expected outcome. Heikoa.Gerlach [3] considered R&D project risk in the term of project selection, but didn’t analyze R&D risk concretely. Sary Regev [4] proposed a R&D project risk management method based on knowledge gap, which includes robust function and opportunity function. However, it is difficult *
Corresponding author.
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 552–558, 2009. © Springer-Verlag Berlin Heidelberg 2009
Research on R&D Project Risk Management Model
553
to construct and quantify the two functions in specific project. In this paper, we describe an approach for R&D project risk management based on quality function deployment and real option. This method can avoid calculating risk probability and risk impacts and take into account potential management flexibility in R&D project.
2 Quality Function Deployment and R&D Real Option (1) Quality function deployment Quality function deployment is a complex system with the function of input, output and process. Waterfall decomposition mode is its overall implementation philosophy, and the House of Quality (HOQ) is its essence which is an extended bivariate matrix. Through waterfall decomposition mode and house of quality, user demands can be decomposed, and then deployed into concretely technical requirements and quality control requirements [5, 6]. (2) R&D real option The advantage of real option is that the managerial flexibility inherent in R&D projects can be modeled. The main options in R&D investment include continue option, expansion option, switch option, abandonment option, and deferral option. The continue option is a standard European call option, allowing the manager to proceed to the next project stage, The expansion option allows increasing benefits through an additional investment once the original project was terminated. The deferral option gives the owner the right to defer a decision until a later point in time. The switch option allows the manager to choose between continuing with the next stage right away or to do some more research in the current stage. The abandonment option represents the possibility of abandoning the project [7].
3 R&D Project Risk Management Model In general, project risk management process includes: risk identification, risk analysis, risk response and risk control. In the risk identification stage, project managers should use perceptual knowledge and experiments combined with some statistic methods to identify and classify potential risks. In the risk analysis stage, it is necessary to objectively analyze the probability of risk occurrence and assess the impact, then sort risks and form risk priority list. In the risk response stage, for high priority risks, main risk response strategies should be selected and implemented. Risk control is continuing to carry out risk management activities in entire project life cycle [8, 9, 10]. In the paper, the basic idea of QFD is embedded in the process of R&D project risk management. Through waterfall structure and HOQ, the result of previous step can be transferred to next step; ultimately risk response strategies can be obtained. In every risk management phase, we establish HOQ respectively. First, we connect R&D success factors with R&D risk indexes, and acquire risk index priority list. Second, we establish the relation matrix between risk indexes and R&D project stages. Finally, we decide how to deal with R&D project risks according to real options. In the paper, the first HOQ is named as risk identification HOQ, the second HOQ named as risk analysis HOQ, the third HOQ named as risk response HOQ. R&D project risk management model is shown in Fig 1.
554
X. Gu et al. R&D project risk indexes R&D project success factors
relation matrix
risk identification risk analysis(1)
R&D real options
R&D project stages R&D project risk priority list
relation matrix
risk analysis(2)
R&D project stages
relation matrix
risk response
risk control
Fig. 1. R&D project risk management model
3.1 Risk Identification In the risk identification phase, first of all, we should define R&D project success factors and connect R&D project success factors with risk indexes. According to the relation matrix between R&D project success factors and risk indexes, we can remove the risk indexes from the preliminary risk index system which have little relationship with R&D project success factors and establish a better risk evaluation index system. In this stage, it is important to acquire R&D project success factor scores. Success factor score is the weighted average score of different experts. Score is determined by 5,3,1,0. The value of 5, 3, 1 or 0 is used to be on behalf of strong influence, moderate, weak, and no impact respectively. 3.2 Risk Analysis For R&D project, risk analysis involves two sub phases, the former sub phase is intended to obtain risk index priority list, and the latter sub phase is intended to get the risk feature of individual stage in R&D project. (1) the former sub phase of risk analysis In order to obtain risk index priority list, traditional risk management methods depend on risk probability and risk impact, while our method is on the basis of HOQ. The relative importance of success factors can be used as the importance of risk indexes. Through calculating the value of risk importance according to the relation matrix between R&D project success factors and risk indexes in risk identification HOQ, risk index priority list can be acquired. The greater the risk importance value is, the higher the risk priority is. The value of risk importance can be explained by the following equation:
S = AT B , AT = (a1 α 2 L α m ) Where, A-- the matrix of project success factors B --the relation matrix between risk indexes and success factors S --the matrix of risk importance values
Research on R&D Project Risk Management Model
555
α i --the importance of R&D project success factor m --the number of R&D project success factors (2) the latter sub phase of risk analysis A typical R&D project can be divided into four stages: basic research stage, prototype design stage, product development stage, market-oriented stage. In the first stage, the product concept is defined. In the second stage, the prototype is produced. In the third stage, products are manufactured. In the fourth stage, products are put into market. Through constructing the relation matrix between risks and stages, the main risks of every stage can be acquired. Then, the risk feature of every stage can be decided. 3.3 Risk Response In the risk response phase, through the risk feature of every stage, risk managers can determine which real options should be embedded into various stages. The real options dealing with risks include continue option, expansion option, switch option, abandonment option, and deferral option. 3.4 Risk Control Risk control plays an important role in R&D project implementation, such as supervision and control, tracking project implementation stages, adjusting risk priority list, remodeling corresponding risk management strategy, and making R&D project risk management be in a dynamic cycle. In order to effectively control risks, it is necessary to repeat the steps above.
4 Example In this section, we illustrate R&D project risk management process through constructing a specific example of electronic product. In the example, R&D project risk management index system is shown in table 1. First, R&D project success factors are inputted into the matrix of risk identification HOQ. In this example, in order to calculate more easily, the relative importance calculation process of R&D project success factors and risk indexes revising process are not listed, the result is given as shown in table 2. The next, through calculating and sorting the value of risk importance according to the relation matrix between success factors and risk indexes, we can obtain risk indexes priority list. All data of relation matrix are obtained by expert scoring. Then, risk index priority list is taken into risk analysis house of quality. The sigh # demonstrates the relationship between risk indexes and R&D project stages. It is shown in table3. In the phase, managers should construct the relation matrix between risk indexes and R&D project stages. Then, the main risks in every stage can be obtained. In the example, the risk feature of every stage can be decided by top10 risks. The results are as follows: in the basic research stage, the risk feature is characterized by β11 , β12 , β 23 , β 33 ; in the prototype stage, the risk feature is characterized by β12 , β13 , β 22 , β 23 , β 31 , β 33 ; in the
556
X. Gu et al.
product development stage, the risk feature is characterized by β 23 , β 33 ; in the marketoriented stage, the risk feature is characterized by β 41 , β 42 . Table 1. R&D project risk management index system indexes R&D
α1 technology innovation, α 2 high quality, α 3 compatibility, α 4 durability, α 5 easy to
project
manufacture, α 6 easy to maintenance, α 7 easy to upgrade, α 8 cost with budget,
success
α 9 completed on time, α10 core competence, α11 profit
factors
risk indexes
β1 technology risk
β11 technical difficulty, β12 product complexity, β13 demand change
β 2 sustentation risk
β 21 material, β 22 core members outflow, β 23 staff skill, β 24 device
β 3 management risk
β 31 communication risk, β 32 coordination ability, β 33 fund β 41 market demand, β 42 market competition,
β 4 environment risk
β 43 social impact, β 44 legal risk
χ1 basic research stage , χ 2 prototype design stage,
R&D project stages
χ 3 product development stage, χ 4 market-oriented stage
R&D real options
ε 1 continue, ε 2 expansion, ε 3 switch, ε 4 abandonment, ε 5 deferral Table 2. Risk identification house of quality
success factor
score
α1 α2 α3 α4 α5 α6 α7 α8 α9 α 10 α 11
β1
β2
β3
β4
β11 β12
β13
β 21 β 22 β 23 β 24
β 31 β 32 β 33
β 41
5
3
5
1
1
3
3
1
1
0
3
3
5
1
1
3
3
3
0
3
3
1
1
1
1
1
3
3
0
0
1
3
1
1
0
1
1
1
1
3
1
1
3
1
3
1
3
1
0
1
1
0
0
1
3
1
3
1
1
0
3
1
0
0
1
3
1
3
1
1
0
1
1
0
0
3
1
1
1
1
1
1
0
0
0
0
3
3
1
1
3
1
1
0
1
1
1
0
1
1
1
3
3
1
0
5
1
1
3
1
1
1
1
3
3
3
1
1
1
0
5
5
3
3
0
3
3
3
3
3
3
3
1
0
3
3
1
1
1
0
5
3
1
3
1
3
3
3
1
1
5
β 42 β 43
β 44
0
0
1
3
1
1
0
1
1
1
5
3
1
1
total
72
65
47
44
81
62
41
60
53
67
103
93
26
34
order
4
6
10
11
3
7
12
8
9
5
1
2
14
13
Research on R&D Project Risk Management Model
557
Table 3. Risk analysis house of quality order
χ1
4
#
6
#
β11 β12 β13 β 21 β22 β 23 β 24 β31 β32 β33 β 41 β 42 β 43 β 44
10
χ2
# #
3
# #
#
12
# #
8
#
9 5
χ4
#
11 7
χ3
# #
#
1
#
2
#
14
#
13
#
Finally, the real options in every stage can be decided through the respective risk feature. It is shown in table4. Table 4. Risk response house of quality
ε1
χ1 χ2 χ3 χ4
ε2
ε3
ε4
ε5 #
#
# #
# #
In table4, the sigh # demonstrates the relationship between risk R&D project stages and real options. Through analysis, we can get the following results: in the first stage, deferral option should be embedded; in the second stage, continue option and switch option should be embedded; in the third stage, expansion and abandonment option should be embedded; in the fourth stage, deferral option should be embedded.
5 Conclusion In the paper, we combine project risk management process with quality function deployment model and real option, and establish R&D project risk management model. First, through risk identification house of quality, the relation matrix between R&D project success factors and risk indexes can be established. It is helpful to form more reasonable R&D risk evaluation index system. Meanwhile, according to the relation
558
X. Gu et al.
matrix calculation results, risk indexes priority list can be acquired. Then, through risk analysis house of quality, the relationship between risk indexes and R&D project stages can be decided, and the risk feature of every stage can be inferred. Finally, through risk response house of quality, the real options are embedded into various stages by the risk features. In order to effectively manage R&D project risk in a dynamic cycle, we should repeat the steps above. The introduction of quality function deployment model and real option can deal with difficulties of R&D project risk management, which gets rid of calculating risk probability and risk impacts and involves potential management flexibility in R&D. Thus, R&D project risk management can be effectively carried out. In practice, we can try to use this method to manage main risk in the trustworthiness software process. Acknowledgments. This paper is supported by the National Science Foundation of China under Grant No.90718042 and by the Young Talent Front Project of Knowledge Innovation Program in Chinese Academy of Sciences under Grant No.0700561C01.
References Feng, G., Wu, C.: Risk management of critical R&D project: research overview. Research and Development Management 17(12), 1–5 (2005) Tishler, A.: How risky should an R&D program be? Economics Letters 99(5), 268–271 (2008) Gerlach, H., Rnde, T., Stahl, K.: Project choice and risk in R&D. The Journal of Industry Economics 51(3), 53–81 (2005) Regev, S., Shtub, A., Ben-Haim, Y.: Manage project risk as knowledge gaps. Project Management Journal 12, 17–25 (2006) Guinta, L.R., Praizler, N.C.: The QFD Book: The Team Approach to Solving Problems and Satisfying customers through Quality Function Deployment. AMACOM Books, New York (1993) Kyeong, K.J., Han, H.C., Chui, S.H., et al.: A knowledge based approach to the quality function deployment. The 23rd International Conference on Computers and Industrial Engineering 35(10), 233–236 (1998) Schneider, M.: Making real options work for practitioners: a generic model for valuing R&D projects. R&D Management 1, 85–106 (2008) Tao, L., Li, Z.: Project Management. China Renming University Press, Beijing, China (2005) Ward, S.: Requirements for an effective project risk management process. Project Management Journal 9, 37–43 (1999) The PMI Standards Committee: A Guide to the project management body of knowledge. Project Management Institute, Philadelphia (2000)
Software Risks Correlation Analysis Using Meta-analysis Hao Song1,2,3, Chen Cai1,*, Minglu Li1,2, and Dengsheng Wu1,2 1
Institute of Policy and Management, Chinese Academy of Sciences, Beijing 100190, China 2 Graduate University of Chinese Academy of Sciences, Beijing 100039, China 3 School of Statistics and Mathematics, Shandong Economics University, Jinan 250017, China
[email protected],
[email protected],
[email protected],
[email protected] Abstract. Software risk identification is the first and critical activity adopted in software risk management process. Checklist, as a low cost and easy-to-operate method, is hard to get a comprehensive view of the whole kinds of risks due to culture diversity and location restriction. After getting a synthetic risk list by examining a series of checklists, we formed a legible and integral impression of the frequency and importance of the whole risks with the help of the meta-analysis framework. The top ten risks were listed compared with Boehm’s and SPSS was used to analyze the correlations between different categories and major risks. Keywords: Software risk, meta-analysis, correlation analysis.
1 Introduction Software development is a process full of all kinds of risks and uncertainties. The Standish Group CHAOS Report in 2007 revealed 35 percent of software projects started in 2006 completed successfully, meaning they were on time, on budget and met user requirements. While 46 percent were challenged because they had time or cost overruns or couldn’t totally meet the user requirements. The situation in China perhaps is more serious for most of the enterprises are small and CMMI level are not very high. The National Natural Science Foundation of China set up “The major research plan of trustworthy software” in 2007 to face the challenge of software project and promote the development of the software enterprises in our country. Boehm divided software risk management into two steps which were risk assessment and risk control [1]. Risk assessment was compound of risk identification, risk analysis and risk prioritization. A modified checklist, an easy and low cost way for risk identifying, was put forward on the base of surveying several experienced project managers. It contained top ten software items which were treated as the most serious risks in 1991. From then on, researchers and software project managers became especially interested in checklist and brought forward different software checklists. Barki and Houstan conducted a comprehensive review of software risk literature and proposed *
Corresponding author.
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 559–565, 2009. © Springer-Verlag Berlin Heidelberg 2009
560
H. Song et al.
an extended risk lists which were significant different from Boehm’s top ten risk list [2,3]. Other researchers like Jones, Tony, Keil etc concluded their own important software risks by surveying experts or analyzing questionnaires [4-8]. However, it is difficult to shape a global impression of all risks for the reason that some risks varies with various cultural environments and they are changing with technological process at the same time. Meta-analysis is the most famous and important tool in evidence-based medicine. It provides unbiased quantitative and statistical conclusion about some controversial and contradictory results. The framework of meta-analysis was taken in this study and two questions were focused on. What are the components of software development risk and what are the correlations between the different categories of software risk?
2 Meta-analysis 2.1 Meta-analysis Introduction A traditional meta-analysis starts with stating the question definitely and designing research plan. System and all-round literature review is one of the most significant characteristic of Meta-analysis in contrast to other method. Three approaches, computer-based retrieval, manual search and expert interview can be taken for searching the literatures and literature should be inspected for quality and correctness before statistical analysis is conducted. 2.2 Search Process The goal in this meta-analysis is to obtain all the software development risks mentioned in literature. After checking the literatures at hand, we noticed that the statement is varying with time. IS (Information System) was popular in 1990s, but rarely mentioned in recent years. A wide scope of keywords should cover all possible changes for fear of missing some important literature. Five search engines were chosen considering their important roles in software risk and relative software study. We set tiles, keywords, abstract etc as filter to search and identify the related material. Different languages exist in all engines and English was chosen as search language for its popularity and university. In fact, a few literatures discuss software risk as their direct or only purpose. They mention the risk factors as beginning of other work. And as a result most of the final literatures do not focus on risk research. But it doesn’t disturb our choice of treating them as a source to summarize risk factors. The final literatures consist of several kinds of software risks. The first kind of risks is drawn by surveying experienced experts and project managers like the eight checklists referred above. The second kind of risks is concluded by non-English literature review. We take them as representative for risks in their non-English speaking countries. The final literature set includes 23 selected papers ranging from 1988 to 2008. 2.3 Basic Statistical Analysis We choose eight researchers’ checklists as risk source, who are Barki, Jones, Tony, Keil, Houstan, Schmidt, Murthi and Wallace. In the first phase of the study, we
Software Risks Correlation Analysis Using Meta-analysis
561
divided the risk items into six dimensions, which are requirements risk, user risk, developer risk, project management risk, development risk and environment risk. This is an obvious way to categorize the items and the final list consisted of 56 risk factors covering most of the common risks (The full checklist can be provided upon request). Table 1. Keywords and search engine
Keywords
Search engine
Software risk development risk project risk risk management IT risk IS risk risk list checklist Google Scholar: scholar.google.cn Elsevier ScienceDirect: www.sciencedirect.com IEEE/IEE Electronic library: www.ieeexplore.ieee.org Ei Engineering Village: www.engineeringvillage2.org.cn ACM Digital Library: portal.acm.org/portal.cfm
All risks were counted according to the above 6 categories checklist. Of the six categories, project management risks’ total times, 87, is the highest compared with the second position, requirements risks, 63 times and the third position, developer risks, 56 times (see figure 1. cumulative number). The total risk times of these three categories accounts for 64% of the whole times. Environment risks were mentioned the least times, 20 only.
Fig. 1. Cumulative number
2.4 Top Ten Risks The risk “Lack of adequate knowledge and skills in project personnel” is mentioned almost in all literatures. The other four risks, “Lack of frozen requirements”, “Unclear
562
H. Song et al.
system requirements”, “Inadequate cost estimating” and “excessive schedule press”, are the next four more often referred items. Table 2 shows the top ten risks ordered by times. The first one, “Lack of required knowledge and skills in the project personnel”, has similar expression with Boehm’s “Personnel shortfalls”, the first one in his top ten risk items. The other common risks are “Lack of frozen requirements”, “Inadequate cost estimating” and “Excessive schedule pressure”. While the other six risks are totally different from Boehm’s top ten risk factors. Table 2. Top ten risks Risk Name Lack of Required Knowledge/Skills in the Project Personnel Lack of frozen requirements Unclear system requirements
Category
Probability
Developer Risk
91.30%
Requirements Risk
69.57%
Requirements Risk
69.57%
Inadequate cost estimating
Project Management Risk
65.22%
Project leader's experience/availability
Project Management Risk
60.87%
Incorrect system requirements Excessive schedule pressure Instability and lack of continuity in project staffing Lack of an effective project management methodology Lack of staff commitment, low morale
Requirements Risk Project Management Risk
56.52% 56.52%
Developer Risk
52.17%
Project Management Risk
52.17%
Developer Risk
47.83%
3 Statistical Analysis 3.1 Cluster Analysis Using K-means cluster, we can get some reasonable rules about risk distribution. The categories’ frequency is chosen as a variable for our cluster analysis. And when number of clusters is 4, the distances between each cluster is the biggest. So we divide all literatures into 4 clusters. Cluster one has the most cases, 11 literatures (48%). And most of these literatures have moderate number of risks. 3.2 Correlation Analysis Are there some relations existing between these risks? Literal comprehension tells us the answer probably is yes. For example the attitude from users, “lack of cooperation, commitment and involvement”, may affect the quality of requirements and lead to some “bad” consequences, “incorrect, unclear and unstable requirements”. The properties of developer seem being closely connected with development process. The purpose of correlation analysis is to find out the mutual relations between the different categories.
Software Risks Correlation Analysis Using Meta-analysis
563
Table 3. Correlation coefficient matrix
Requirements
User Risk
Developer Risk
Project Management Risk
Development Risk
Environment Risk
Pearson Correlation Sig. (2tailed) Pearson Correlation Sig. (2tailed) Pearson Correlation Sig. (2tailed) Pearson Correlation Sig. (2tailed) Pearson Correlation Sig. (2tailed) Pearson Correlation Sig. (2tailed)
Requiremen ts 1
0.366
User
Developer
Project management
Development
Environment
0.366
0.097
.429(*)
0.273
-0.014
0.086
0.661
0.041
0.207
0.948
1
0.031
0.055
0.365
0.169
0.887
0.804
0.087
0.442
1
.541(**)
0.262
.435(*)
0.008
0.228
0.038
1
0.296
0.201
0.171
0.358
1
0.157
0.086 0.097
0.031
0.661
0.887
.429(*)
0.055
.541(**)
0.041
0.804
0.008
0.273
0.365
0.262
0.296
0.207
0.087
0.228
0.171
-0.014
0.169
.435(*)
0.201
0.157
0.948
0.442
0.038
0.358
0.475
0.475 1
*Correlation is significant at the 0.05 level (2-tailed). ** Correlation is significant at the 0.01 level (2-tailed).
From the above table we can see project management risks are correlated significantly with developer risks and requirements risks. The Pearson correlation coefficients are 0.541 and 0.429. The experience and skill of developer can affect the process of project management. More risks occur from developer, move risks occur from project management. And so the Pearson correlation coefficient is positive and significant at the 0.01 level (2-tailed). For the similar reason requirements risks are correlated positively and significantly with project management risks. The third significant coefficient (0.435) is between developer risks and environment risks. Some risks like “subcontractor”, “insufficient resource”, are under control of developers to much extent. The only negative correlation is between requirements and environment. But it’s not significant and we can’t tell requirements risks are affecting the environment risks conversely. There is no significant correlation in this experiment except these three ones. 3.3 Causal Diagram A similar statistical analysis proceeds with the major risks and the causal diagram based on analysis result and previous study provides a clearer impression of the relations between the risks. An arrow points to a risk, representing an effect and the risk at the other end represents a cause. Double arrows mean each risk at both bottoms is cause and effect. The sign on line is negative when the effect moves in the opposite direction of the cause and positive in the same direction.
564
H. Song et al. ǂˉǂ Failure to gain user commitment
Need to integrate with other systems
ˇ*
ˇ*
Incorrect requirements ˇ*
Project leaderÿs experience
ǂˉǂ
ˇ*
Low morale
ǂˉǂ
ǂˉǂ
ˇ
ˉ
Excessive schedule pressure
Top management support Resource insufficiency ǂˉǂ
Unclear requirements ˇ
ˇ*
ˇ*
Inadequate cost estimation
Inadequate configuration control
ˇ* ǂˇ ˇ
Lack of an effective management methodology
*Correlation is significant at the 0.05 level (2-tailed)
Fig. 2. Causal diagram
Statistical analysis proved our guesses to a large extent. But more data and experiments are needed to prove a more precise conclusion or model. This experiment is a meaningful attempt to search the relationships between the risks.
4 Conclusion The collected literatures show project management, requirements and developer risks occur more often than other categories. The top ten risks in the present paper are significantly different from Boehm’s because risks are changing with time and technology. Three significant positive correlations between project management and developer, project management and requirements, developer and environment are found through statistical analysis. No significant negative correlation is found. Acknowledgments. This research has been supported by a grant from National Natural Science Foundation of China (#90718042).
References [1] Boehm, B.W.: Software risk management: principles and practices. IEEE Software 8, 32– 41 (1991) [2] Barki, H., Rivard, S., Talbot, J.: Toward an assessment of software development risk. Journal of Management Information System 10(2), 203–225 (1993) [3] Houstan, D.X., Mackulak, G.T., Collofello, J.S.: Stochastic simulation of risk factor potential effects for software development risk management. The Journal of System and Software 59, 247–257 (2001) [4] Jones, C.: Assessment and control of software risks. Yourdon Press, Englewood Cliffs (1994) [5] Moynihan, T.: How experienced project managers assess risk. IEEE Software 14, 35–41 (1997)
Software Risks Correlation Analysis Using Meta-analysis
565
[6] Keil, M., Cule, P., Lyytinen, K., Schmidt, R.: A framework for identifying software project risk. Communication of the ACM 41(11), 76–83 (1998) [7] Schmidt, R., Lyytinen, K., Keil, M., Cule, P.: Identifying software project risks: an international Delphi study. Journal of Management Information System 17(4), 5–36 (2001) [8] Wallace, L., Keil, M., Rai, A.: Understanding software project risk: a cluster analysis. Information and Management 42, 115–125 (2004) [9] Miller, J.: Applying meta-analytical procedures to software engineering experiments. The Journal of System and Software 54, 29–39 (2000)
A Two-Layer Least Squares Support Vector Machine Approach to Credit Risk Assessment Jingli Liu1,2, Jianping Li2, Weixuan Xu2, and Yong Shi3 1
School of Management, University of Science and Technology of China Institute of Policy and Management, Chinese Academy of Sciences, China 3 Data Technology and Knowledge Economy of Chinese Academy of Sciences, China {manager,ljp,wxu}@casipm.ac.cn,
[email protected] 2
Abstract. Least squares support vector machine (LS-SVM) is a revised version of support vector machine (SVM) and has been proved to be a useful tool for pattern recognition. LS-SVM had excellent generalization performance and low computational cost. In this paper, we propose a new method called two-layer least squares support vector machine which combines kernel principle component analysis (KPCA) and linear programming form of least square support vector machine. With this method sparseness and robustness is obtained while solving large dimensional and large scale database. A U.S. commercial credit card database is used to test the efficiency of our method and the result proved to be a satisfactory one. Keywords: LS-SVM, Kernel principle component analysis, credit risk assessment.
1 Introduction Support vector machine was first introduced by Vapnik in his literature: The nature of Statistical Learning Theory [1] and Statistic Learning Theory [2]. In a paper named least squares support vector machine (LS-SVM) classifier [3], which is a modified version of SVM by changing the inequality constraints into equality constraints and substituting the -loss function by a sum of squared error loss function, the authors pointed out that LS-SVM had excellent generalization performance and low computational cost. Suykens, et al. [4] illustrated that the effective number of the parameters of LS-SVM is controlled by the regularization and the application of 2-norm reduces the sparseness of the SVM. Wei, et al. [5] employed a l 1 norm representation of object function to improve the sparseness and robustness of LS-SVM, which was called LS-SVM-LP. The solution of the objective function is based on basis pursuit (BP) which was first initialized by Chen, et al. [6]. The essential of the LS-SVM-LP is to modify the sum of the squared error function
1 n 2 , a 2 norm into β and assign each element of the sample l γ ∑ ei 1 2 i =1
data matrix corresponding Lagrange multipliers. In this paper, we propose a combined Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 566–572, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Two-Layer Least Squares Support Vector Machine
567
approach of LS-SVM-LP with kernel principle component analysis (KPCA) to evaluate credit risk. KPCA has already been used in drawing principle components from the feature space and obtaining lower dimensional feature space. The paper is organized as follows. In section 2 and section 3, the basics of KPCA and LS-SVM-LP are briefly introduced respectively. In section 4, we present the twolayer LS-SVM algorithm and the numerical test result on a real credit card database of a U.S. commercial bank and a comparison with other methods. Conclusions including further works are given in section 5.
2 Background 2.1 Kernel Principle Component Analysis Kernel Principle Component Analysis (KPCA), introduced by Bernhard, et al. [7], had been used widely in the nonlinear feature extraction. In fact, KPCA is a linear form of PCA in nonlinear feature space by first mapping the sample into a high dimensional feature space. A detailed algorithm was given in [8], we illustrate the algorithm below. Suppose we have a training set S = {x1, x2 ,..., xl } , each xi ∈ k , i = 1, 2,..., l is of dimension k , the corresponding set of mapped data points in the feature space is {φ ( xi )}, i = 1, 2,..., l . 1. 2.
Choose kernel function κ . Compute the kernel matrix K ij = φ ( xi ) ⋅ φ ( x j ) = κ ( xi , x j ), i , j = 1, 2,..., l
3.
Centering the kernel matrix K ← K −
4.
column vectors with all of its l elements are 1. Compute the eigenvalues and eigenvectors of K ,
1 ' 1 1 jj K − Kjj ' − 2 ( j ' Kj ) jj ' , j is the l l l
α = j
1
5.
Normalization of the eigenvectors,
6.
Projection of the data, xi = (
7.
The transformed data is S = { x1 , x 2 , ..., xl } .
~
~
∑ ~
l i =1
λj
[∨, Λ] = eig ( K ) .
ν j , j = 1, 2,..., k .
α i jκ ( xi , x)) kj =1 .
~
~
From the formulations above, we can see that KPCA is a useful tool for dimension reduction which adopts the same idea as that in PCA, that is, to choose the number of principle components according to the sum of the first p largest eigenvalues’ proportion to the total sum of the eigenvalues. Ordinarily we can choose the ratio to be 85% [13] or higher. 2.2 LS-SVM-LP In this section, we will briefly introduce the basic works of LS-SVM-LP model. Wei, et al. [5] proposed a modified version of LS-SVM which assigned each element of the
568
J. Liu et al.
data matrix A a Lagrangian multiplier
α ki . Given a training data of
points S = {( x1 , y1 ), ( x2 , y2 ),..., ( xN , y N )} , where each xk ∈ , k n
k th input pattern,
N data
= 1, 2, ..., N
is the
yk ∈ is the k th output pattern, LS-SVM-LP aims at constructing a
classifier of the form y ( x ) = sign[
N
n
k =1
j =1
∑ (∑ α ij yi k ( xij , xkj )) + b] , where αij is the
multipliers of the corresponding Lagrange function and b is a real constant number. Suppose the data matrix A = { xij }, i = 1, ..., N , j = 1, ..., n , the proposed equalities of the constraints in LS-SVM-LP is
y k [ w T (i )ϕ ( xki ) + b ] = 1 − ek , k = 1,..., N , i = 1, 2,..., n
(3)
The corresponding Lagrange function is
L ( w , b , e , α ) = J ( w , b , e ) − ∑ ∑ α ki { y k ( w T ( i )ϕ ( x ki ) + b ) + ek − 1} N
n
(4)
k =1 i =1
By adopting the idea of BP method [6], which changed the objective function into β 1 to get a spare solution of the problem, the final programming to be solved becomes
min β 1
(5)
s.t. B × β = C
(6)
⎡0 B = ⎢ 1×1 ⎣⎢ y N ×1
Y1×T( N ×n ) ⎤ ⎥ K N ×( N ×n ) ⎦⎥
(7) ( N +1)×( N ×n +1)
is the output pattern of the sample data, 01×1 is the scalar 0.
Y1T× ( N × n ) is y TN ×1 ,..., y TN ×1 , K N × ( N × n ) = [ K1 ,..., K n ] , where 14243 n
⎡ y1y1k(x1i ,x1i )+1 γ ⎢ 1 Ki = ⎢ y2y1k(x2i ,x1i )+γ ⎢ ... ⎢ 1 y y k ( x , x )+ ⎣⎢ N 1 Ni 1i γ
1 y1y2k( x1i ,x2i )+ γ
1 y2 y1k ( x2i , x2i )+ ... γ ...
⎤ ⎥ 1 y2y1k ( x2i , xNi )+ ⎥ γ ⎥ , i = 1, 2,..., n ... 1⎥ yN y1k ( xNi ,xNi )+ ⎥ γ⎦
1 ... y1yNk ( x1i ,xNi )+ γ
...
1 yN y1k ( xNi , x2i )+ ... γ
k ( xij , x kl ) = ϕ ( xij ) ⋅ ϕ ( x kl ), i , k = 1, 2, ..., N , j , l = 1, 2,..., n
(8)
(9)
A Two-Layer Least Squares Support Vector Machine
β
of the linear equation is
569
⎡b ⎤ T ⎢α ⎥ , where α = (α11,...,αN1,α12,...,αN2,...,α1n,...,αNn ) . ⎣ ⎦
⎡0⎤ = ⎢ ⎥ , l is the 1× n column vector with all elements is 1. The object of the ⎣l ⎦ programming is to find the spare solution of β using l1 norm, which is equivalent to find the sparse solution of α 1 .The final solution of this problem proved to be roAnd C
bust and sparse, which is a good one.
3 KPCA-LS-SVM-LP When facing with a high dimensional database, we are trying to draw most related features in feature space. One of the most efficient ways to lower the feature space dimension is by using Kernel principle component analysis (KPCA). In order to get a sparse solution of SVM and lower the dimension of the input space, we combined KPCA and LS-SVM-LP together. First, we used KPCA to get the number of principle components, which is smaller than the original data dimension. Then the Lagrangian multipliers can be obtained by LS-SVM-LP. So this method can reduce the dimension of input data and the computational complexity. At the same time, this algorithm can get a sparse and robust solution, which is a good property of LS-SVM-LP. Hsu, et al. [10] mentioned that in order to make good use of SVM to get a satisfactory solution, several procedures should be obeyed. That is: 1. 2.
Transform the data to the form of SVM software. Conduct simple scaling on the data.
3.
Consider the RBF kernel of
4.
Use cross-validation to find the best parameter σ 2 and
5. 6.
Use the best parameter σ and Test. 2
K ( x, y ) = exp(−
γ
2 x − y ). 2
σ
γ
.
to train the whole training data.
We adopt their ideas in our algorithm design and illustrate the algorithm below. 3.1 Algorithm First, scale the data into [-1, 1] or [0, 1]. Then by adopting RBF function
k ( xi , x j ) = exp(−
2 xi − x j ), σ 2 〉 0 with 5-fold cross validation to grid search 2
σ
the optimized σ 2 and penalty parameter γ . Finally, the optimized and γ is used to train the whole training data and testing data. Below is our detailed algorithm for two-layer LS-SVM-LP approach for classification.
570
J. Liu et al.
1) Transform the data into format of the Matlab@ software and normalization. 2) Initialization of parameters σ 2 and γ , then use 5-fold cross validation σ 2 and γ. 3) Use KPCA to get the p principle components. 4) Solve (5) (6) by using least squares line programming to get Lagrange multipliers a and scalar b . 5) Use the classification function computed by equation (7) to make classification on testing data. 6) Compute three types of errors. 7) If the solution is not satisfactory, change the origin σ 2 and γ , then return to 4). 3.2 Numerical Results and Analysis In order to test the performance of our proposed method, we used a U.S. commercial bank credit card database consisting of 5000 samples and 65 attributes. There are two classes of good and bad having 4285 and 815 creditors respectively. To show the efficiency of the proposed method and compare it with other methods, we compute three types of errors: # m is c la s s ific a tio n o f g o o d to b a d # s a m p le d a ta m i s c l a s s ific a tio n o f b a d to g o o d # error 2 = # s a m p le d a ta # m is c la s s ific a tio n error = s a m p le d a ta
error
=
1
(10)
The 5-folder cross-validation experimental results are listed in table 1 and table 2. Table 1. Classification errors of different
σ 2 and γ
when P = 40
Folder#
σ2
γ
error
error1
error2
P
1
50
1
0.27
0.31
0.27
40
2
1000
10
0.22
0.36
0.16
40
3
5000
5
0.29
0.14
0.32
40
4
5000
10
0.19
0.29
0.14
40
5
10000
10
0.21
0.19
0.20
40
0.24
0.26
0.22
40
average P
†
is the subtracted number of principle components,
to get the lowest error in prediction.
σ 2 and γ
†
are the optimized parameter values
A Two-Layer Least Squares Support Vector Machine
Table 2. Classification errors of different
σ 2 and γ
when P = 10
Folder#
σ2
γ
error
error1
error2
P
1
1
0.1
0.26
0.27
0.25
10
2
100
5
0.27
0.27
0.27
10
3
5000
10
0.29
0.28
0.29
10
4
10000
10
0.29
0.28
0.29
10
5
5000
50
0.21
0.33
0.18
10
0.26
0.29
0.26
10
average P
†
is the subtracted number of principle component,
σ 2 and γ
571
†
are the optimized parameter values
to get the lowest error in prediction.
From the table above, we can see that the numbers of kernel principle components are smaller than that of the original data attributes 65. At the same time, we can also see that when Several conclusions can be drawn from the previous three tables. (1) The number of data attributes has been reduced while the accuracy is good. This verifies that the method is efficient dealing with high dimensional data. (2) There is a tradeoff between error1 and error2 when the subtracted number of kernel principle component p is fixed. If error1 is lowered, error2 will rise and vice versa. So the final decision depends on the decision maker’s preference. (3) When the subtracted kernel principle components is fixed, the classification errors (three errors defined above) do not change rapidly with different optimized kernel parameter σ 2 and γ When p is fixed, the value of error and error2 gradually increases with the increase of γ . While a larger σ will get a smaller value of error2. Smaller values of p will get high values of three types of errors. This may be caused by the loss of certain important information contained in some attributes which are deleted during the application of KPCA. 2
Finally, we compare the empirical results with other models. The comparison is listed in table 3. Table 3. Comparison with other methods models
error (%)
1
error (%)
2
error (%)
MLCP
24.49
59.39
30.18
MCNP
49.03
17.18
43.84
Decision tree
47.91
17.3
42.92
Neural network
32.76
21.6
30.94
SVM-MK
24.25
17.24
23.22
KPCA-LS-SVM
24.00
26.00
22.00
572
J. Liu et al.
According to the numerical test, the result showed to be a good one in terms of error and error1. So this is a proper method in credit card risk assessment and can be one of the alternatives in the assessment toolbox.
4 Conclusions This paper gives a new method on how to obtain the sparse and robust solution of a credit risk assessment model coping with high dimensional data, which is a combination of KPCA and LS-SVM-LP. The advantage of this method is that we can cope with the high dimensional data and solve the high dimensional problems. The experimental results have demonstrated that our proposed method is available for credit risk evaluation. In this paper, the kernel parameter used in KPCA and LS-SVM-LP is not the same, so what is the relationship between the two kernel parameters? This needs to be further studied.
Acknowledgements We would like to thank Dr. Liwei Wei, Dr. Zhenyu Chen for their helpful discussions on this topic and providing us with their latest M atlab @ code of LS-SVM-LP and for advice on how to make it run effectively. This research has been partially sponsored by a grant from National Natural Science Foundation of China (#70531040) and 973 Project (#2004CB720103), Ministry of Science and Technology, China.
References [1] Vapnik, V.: The Nature of Statistic Learning Theory. Springer, New York (1995) [2] Vapnik, V.: Statistic Learning Theory. Willey, New York (1998) [3] Suykens, J.A.K., Vandewalle, J.: Least Squares Support Vector Machine Classifiers. Neural Processing Letters 9, 293–300 (1999) [4] Van Gestel, Suykens, T., Baesens, B., et al.: Benchmarking least squares support vector machine classifiers. Machine Learning 54(1), 5–32 (2001) [5] Wei, L., Chen, Z., Li, J., Xu, W.: Sparse and robust least squares support vector machine: a linear programming formulation. In: Proceedings of 2007 IEEE International Conference on Grey Systems and Intelligence Services, China, November 18-20 (2007) [6] Chen, S.S., Donoho, D.L., Sauders, M.A.: Atomic decomposition by basis pursuit. SIAM review 43, 129–159 (2001) [7] Schölkopf, B., Smola, A.J., Müller, K.-R.: Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation 10(5), 1299–1319 (1998) [8] Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004) [9] Zhang, R., Fang, K.: An Introduction to Multivariant Statistical Analysis. Science Press (2003) [10] Hsu, C.-W., Chang, C.-C., Lin, C.-J.: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
Credit Risk Evaluation Using a C-Variable Least Squares Support Vector Classification Model Lean Yu1 , Shouyang Wang1 , and K.K. Lai2 1
Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China {yulean,sywang}@amss.ac.cn 2 Department of Management Sciences, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
[email protected] Abstract. Credit risk evaluation is one of the most important issues in financial risk management. In this paper, a C-variable least squares support vector classification (C-VLSSVC) model is proposed for credit risk analysis. The main idea of this model is based on the prior knowledge that different classes may have different importance for modeling and more weights should be given to those classes with more importance. The C-VLSSVC model can be constructed by a simple modification of the regularization parameter in LSSVC, whereby more weights are given to the lease squares classification errors with important classes than the lease squares classification errors with unimportant classes while keeping the regularized terms in its original form. For illustration purpose, a real-world credit dataset is used to test the effectiveness of the C-VLSSVC model.
1 Introduction Recent financial crisis makes more and more people to know the importance of credit risk. The resulting credit risk evaluation has become a major focus for academic researchers and business practitioners. In the past decades, different hard-computing techniques such as discriminant analysis (Altman, 1968), logit analysis (Wiginton, 1980), probit analysis (Grablowsky and Talley, 1981), mathematical programming (Glover, 1990), and k-nearest neighbor (KNN) (Henley and Hand, 1996) have been applied to credit risk evaluation. Due to the fact that there is a nonlinear relationship between default probability and credit patterns, these hard-computing techniques did not generate good performance for credit risk evaluation tasks. For this reason, some emerging softcomputing techniques such as artificial neural networks (ANN) (Yu et al., 2008a), evolutionary algorithm (EA) (Chen and Huang, 2003), and support vector machine (SVM) (Yu et al., 2008b) have also been used to evaluate credit risks. Some empirical results revealed that the soft computing techniques are advantageous to traditional hard computing techniques in credit risk evaluation tasks due to the flexibility of tolerable classification errors. Among these soft-computing techniques, SVM is reported to be the best one in many practical credit risk evaluation experiments (Yu et al., 2008b). In terms of the classification and regression problems, the SVM can be categorized into support vector classification (SVC) and support vector regression (SVR) (Vapnik, 1995). In this paper, the Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 573–579, 2009. c Springer-Verlag Berlin Heidelberg 2009
574
L. Yu, S. Wang, and K.K. Lai
SVC is used to judge whether one customer is default or not. However, the solution of standard SVC is obtained by solving a convex quadratic programming (QP) problem (Vapnik, 1995). An important shortcoming of QP is that QP might lead to a high computational cost if a large-scale problem is computed. To avoid the above shortcoming, a least squares support vector machine classifier (LSSVC for short) first proposed by Suykens and Vandewalle (1999) is presented, where the solution can be obtained by solving a set of linear equations instead of solving QP problem. In the practical credit risk classification problems, bad customer classification is more important than the classification of good customer due to the fact that bad customer usually leads to direct economic loss for firms. For this, some prior knowledge that different classes may have different importance for modeling should be taken into account. It is therefore advisable for us to give more weights to those classes with more importance. In view of this idea, an innovative approach, called C-variable least squares support vector classification (C-VLSSVC), which used the variable regularization parameter in the LSSVC to classify faithful customers or delinquent customers. The C-variable LSSVC can be obtained by a simple modification of the regularization parameter C in LSSVC, whereby the data with important classes are penalized more heavily than the data with less important classes. The primary objective of this paper is to propose a new learning paradigm called C-variable LSSVC that can significantly reduce the computational cost and to improve the classification capability as well as to examine whether the prior knowledge that the data with important classes should give more weights than the data with less important classes can also be utilized by LSSVC, especially in credit risk evaluation and classification in this study. In addition, our model also provides a solution to an unsolved issue of Zhou et al (2009). The remainder of this paper is organized as follows. In Section 2, the least square support vector classification (LSSVC) model is briefly reviewed. Section 3 presents the formulation of the C-variable LSSVC (C-VLSSVC) in detail. For further illustration, two typical credit datasets are used and the corresponding results are reported in Section 4. Finally, some concluding remarks are drawn in Section 5.
2 Least Square Support Vector Classification (LSSVC) Assume that there is a training dataset {xi , yi } (i = 1, 2, · · · , N) where xi ∈ RN is the ith input pattern and yi is its corresponding observed result, and it is a binary variable. In credit risk evaluation models, xi denotes the attributes of applicants or debtors and yi is the observed outcome of repayment obligations. If the customer defaults, yi = 1, or else yi = −1. The SVM first maps the input data into a high-dimensional feature space through a mapping function φ (·) and finds the optimal separating hyperplane with minimal classification errors. The separating hyperplane can be represented as follows: z(x) = wT φ (x) + b = 0
(1)
where w is the normal vector of the hyperplane and b is the bias, which is a scalar. Suppose that φ (·) is a nonlinear function that maps the input space into a higher dimensional feature space. If the set is linearly separable in this feature space, the
Credit Risk Evaluation Using a C-Variable LSSVC Model
classifier should be constructed as follows: T w φ (xi ) + b ≥ 1 if yi = 1 wT φ (xi ) + b ≤ −1 if yi = −1
575
(2)
which is equivalent to yi (wT φ (xi ) + b) ≥ 1, i = 1, . . . , N
(3)
In order to deal with data that are not linearly separable, the previous analysis can be generalized by introducing some nonnegative variables ξi ≥ 0, such that (3) is modified to yi [wT φ (xi ) + b] ≥ 1 − ξi, i = 1, . . . , N (4) ξi ≥ 0, i = 1, . . . , N The nonnegative ξi in (4) are those for which data point xi does not satisfy (3). Thus the term ∑Ni=1 ξi can be considered as a measure of the amount of misclassification, i.e., tolerable misclassification errors. According to the structural risk minimization (SRM) or margin maximization principle, the risk bound is minimized by formulating the following optimization problem: ⎧ ⎨ Minimize J(w, b; ξi ) = 12 wT w + C ∑Ni=1 ξi (5) Subject to yi (wT φ (xi ) + b) ≥ 1 − ξi , i = 1, . . . , N ⎩ ξi ≥ 0, i = 1, . . . , N where C is a regularization parameter controlling the trade-off between margin maximization and tolerable classification error. Searching the optimal hyperplane in (5) is a quadratic programming (QP) problem (Vapnik, 1995). When a large-scale problem is computed, the QP may lead to a high computational cost. For this, Suykens and Vandewalle (1999) proposed a least square version of support vector machines. In the least squares support vector classification (LSSVC) model, the following optimization problem can be formulated. Minimize J(w, b; ξi ) = 12 wT w + C2 ∑Ni=1 ξi2 (6) Subject to yi (wT φ (xi ) + b) = 1 − ξi, i = 1, . . . , N Using (6), one can define a Lagrangian function: ! " 1 C N N (7) L(w, b, ξi ; αi ) = wT w + ∑i=1 ξi2 − ∑i=1 αi yi (wT ϕ (xi ) + b) − 1 + ξi 2 2 where αi is the ith Lagrangian multiplier. The condition for optimality can be obtained from (7) ⎧ ∂L N ⎪ ⎪ ∂ w = 0 ⇒ w = ∑i=1 αi yi ϕ (xi ) ⎪ ⎨ ∂ L = 0 ⇒ ∑N α y = 0 i=1 i i ∂b (8) ∂L = 0 ⇒ ξ = αi C, i = 1, 2, · · · , N ⎪ i ∂ξ ⎪ ⎪ ⎩ ∂ Li = 0 ⇒ y [wT ϕ (x ) + b] − 1 + ξ = 0, i = 1, 2, · · · , N i i i ∂ αi After elimination of w and ξi , the solution is given by the following set of linear equations N ∑i, j=1 αi yi y j ϕ (xi )T ϕ (x j ) + byi + (αi C) − 1 = 0 (9) ∑Ni=1 αi yi = 0
576
L. Yu, S. Wang, and K.K. Lai
Using the Mercer condition, the kernel function can be defined as K(xi , x j ) = ϕ (xi )T ϕ (x j ) for i, j = 1, 2, · · · , N. Typical kernel functions include linear kernel K(xi , x j ) = # $d K(x&i , x j )= xTi x j + 1 , Gaussian kernel or RBF kernel xTi x j , polynomial kernel %2 % $ # K(xi , x j ) = exp −%xi − x j % σ 2 , and MLP kernel K(xi , x j ) = tanh β xT x j + θ i
where d, σ , β and θ are kernel parameters, which are specified by users beforehand. Accordingly, using the matrix form, the linear equations in (9) can be rewritten as ' (' ( ' ( 1 α Ω Y (10) = Y T 0 b 0 where b is a scalar, Ω, Y , α , and 1 are (11), (12), (13) and (14) respectively, as follows: Ω = yi y j φ (xi )T φ (x j ) + (1/C)I
(11)
Y = (y1 , y2 , · · · , yN )T
(12)
α = (α1 , α2 , · · · , αN )T
(13)
1 = (1, 1, · · · , 1)T
(14)
where I is a unit matrix in (8). From (11), the Ω is positive definite, the solution of Lagrangian multiplier α can be obtained from (10), i.e., α = Ω−1 (1 − bY )
(15)
Substituting (15) into the second matrix equation in (10), we can obtain b=
Y T Ω−11 Y T Ω−1Y
(16)
Here, since Ω is positive definite, Ω−1 is also positive definite. In addition, since Y is a non zero vector, Y T Ω -1Y > 0. Thus, b is always obtained. Substituting (16) into (15), α can be obtained. Accordingly the solution of w can be obtained from the first equation in (8). Using w and b, the separating hyperplane in (1) can be determined. One distinct advantage of LSSVC lies in that the optimal solution of (1) can be found by solving a set of linear equations instead of solving a quadratic programming (QP) problem which is used in standard SVC. Thus the computational costs might be reduced when the large-scale problems are needed to be computed.
3 C-Variable LSSVC (C-VLSSVC) As Section 2 mentioned, the regularization parameter C shown in (5) and (6) determines the trade-off between the regularized term and the tolerable empirical errors. With the increase of C, the relative importance of empirical errors will grow relative to the regularized term, and vice versa. Usually, in standard SVC and LSSVC, the empirical risk
Credit Risk Evaluation Using a C-Variable LSSVC Model
577
function has equal weight C to misclassification error function (Vapnik, 1995) and least squares error function (Suykens and Vandewalle, 1999). That is, the regularization parameter C is a constant or a fixed value in standard SVC and LSSVC. However, many practical applications have shown that a fixed regularization parameter C is unsuitable for some classification tasks with some prior knowledge. Considering the prior knowledge that different classes might have different importance for classification tasks, more weights should be given to those classes with more importance. In the case of LSSVC for credit risk evaluation, more weights should be given to the default class taking the prior knowledge into account that default customers might lead to more economic loss for firms than good customers. For this purpose, the regularization parameter C should be replaced by a variable regularization parameter Ci to emphasize the important classes in the sample data {(xi , yi )}Ni=1 . In terms of the prior knowledge that important classes might provide more decision information than unimportant classes in the practical classification tasks, the variable regularization parameter Ci should satisfy Ci (IA ) > Ci (IB ), where IA and IB are the subscript sets of the Ath (important), Bth (unimportant) class of the training data, respectively). Since the auto-adaptive regularization parameters Ci will change with the different class automatically, Ci is called variable regularization parameter which will give more weights on those important classes. In the practical applications, the form of Ci often depends on the prior knowledge we have. In the credit risk evaluation, a typical segment form is often used, which is defined as follows: Z Ci = ∑l=1 Cl λil (i = 1, 2, · · · , N; l = A, B, · · · , Z) (17) where Cl is a constant corresponding to the lth class of the training data, Z is the number of classes, n is the number of training sample, and 1, i ∈ Il λil = (18) 0, otherwise where Il is the subscript set of the lth class of the training data. Based on the variable regularization parameter Ci , a new LSSVC called C-variable LSSVC (C-VLSSVC) can be introduced. Similar to (6), the optimization problem of C-VLSSVC for classification can be formulated as follows. Minimize J(w, b; ξi ) = 12 wT w + C2i ∑Ni=1 ξi2 (19) Subject to yi (wT φ (xi ) + b) = 1 − ξi , i = 1, . . . , N Using the Lagrangian theorem, the final solution is similar to (15) and (16). The only difference is the value of Ω due to the introduction of the variable regularization parameter Ci . In the case of C-VLSSVC, the value of Ω is calculated by Ω = yi y j φ (xi )T φ (x j ) + (1/Ci)I
(20)
According to (19) and (20), the LSSVC algorithm can still be used except the regularization parameter value Ci for every training data points is different, and thus computation and simulation procedures of LSSVC should be utilized by a simple modification of regularization parameter from a fixed value C to a variable parameter Ci in terms of (17) and (18).
578
L. Yu, S. Wang, and K.K. Lai
4 Experimental Results In this section, two real-world credit datasets (German and Australian credit datasets) are used to test the performance of C-VLSSVC. The datasets are obtained from UCI Machine learning Repository (http://archive.ics.uci.edu/ml/). The German credit dataset consists of 1000 instances including 700 instances of creditworthy applicants (good) and 300 default instances (bad). For each instance, 24 input variables describe 19 attributes and 4 of them are changed to dummy variables. For the Australian dataset, all variables have been transferred to meaningless symbolic data to protect confidentiality. It contains 690 instances among which 383 instances of Good Class which are supposed to be good applicants in this study and 307 of Bad Class which are supposed to be bad applicants. Each instance has 14 explanatory variables and 1 observed variable. In the experiments, we use 5-fold cross validation to test the performance of the model. For comparison purpose, several commonly used credit risk evaluation models, such as linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logit analysis (LogA), k-nearest neighbour (KNN), artificial neural networks (ANN), least squares support vector machines (LSSVC) are also used. In addition, three evaluation criteria, Type I accuracy, Type II accuracy and total accuracy (Yu et al., 2008b, 2009) are used. Accordingly, the computational results are reported in Table 1. Note that the values of LDA, QDA, LogA, KNN and LSSVC are got from Table 5 in Zhou et al (2009). As can be seen from Table 1, several important conclusions can be summarized. (1) In terms of three criteria, the proposed C-VLSSVC model obtained the best performance in the two credit datasets, revealing that the C-VLSSVC model is an effective technique for credit risk evaluation. (2) Generally, the performance of the German dataset is worse than that of the Australian dataset. There are two possible reasons. On the one hand, the credit market in Germany is more complex than that of Australia. On the other hand, there is more nonlinearity in German dataset than that of Australian dataset. (3) According to the results of Type II accuracy and total accuracy, the C-VLSSVC model performs the best in two credit datasets. This implies the strong classification capability of the proposed C-VLSSVC model in credit risk evaluation. However, from the viewpoint of Type I accuracy, the C-VLSSVC model is the best of all the listed approaches for Australian dataset. But in German dataset, the LDA model is the best Table 1. Evaluation results of different models German Dataset Type I (%) Type II (%) Total (%) LDA 72.00 74.57 73.80 QDA 66.57 69.33 67.40 LogA 50.33 88.14 76.80 KNN 27.00 90.57 71.50 ANN 46.89 73.46 69.43 LSSVC 49.67 88.86 77.10 C-VLSSVC 63.48 91.34 79.34 Models
Australian Dataset Type I (%) Type II (%) Total (%) 80.94 92.18 85.94 66.12 91.38 80.14 85.90 86.32 86.09 81.72 54.40 69.57 72.56 83.61 78.94 85.12 89.25 86.96 88.83 93.29 91.88
Credit Risk Evaluation Using a C-Variable LSSVC Model
579
approach. The reason leading to this phenomenon is unknown, which is worth exploring further.
5 Concluding Remarks In this paper, a new least squares support vector classification (LSSVC) model, called C-variable LSSVC (C-VLSSVC) model, is proposed for credit risk evaluation. In terms of the empirical results, we can find that across different models for the test cases of two main credit datasets on the basis of different evaluation criteria, the proposed CVLSSVC model perform the best. In the presented two cases, the total accuracy is the highest, indicating that the proposed C-variable LSSVC model can be used as a promising tool for credit risk evaluation.
Acknowledgements This work is partially supported by grants from the National Natural Science Foundation of China (NSFC No. 70221001) and the Knowledge Innovation Program of the Chinese Academy of Sciences (CAS No. 3547600) and the NSFC of China and RGC of Hong Kong Joint Research Scheme (Project No. N CityU110/07).
References 1. Altman, E.I.: Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. Journal of Finance 23, 89–609 (1968) 2. Wiginton, J.C.: A note on the comparison of logit and discriminant models of consumer credit behaviour. Journal of Financial Quantitative Analysis 15, 757–770 (1980) 3. Grablowsky, B.J., Talley, W.K.: Probit and discriminant functions for classifying credit applicants: A comparison. Journal of Economic Business 33, 254–261 (1981) 4. Glover, F.: Improved Linear Programming Models for Discriminant Analysis. Decision Science 21, 771–785 (1990) 5. Henley, W.E., Hand, D.J.: A k-NN classifier for assessing consumer credit risk. Statistician 45, 77–95 (1996) 6. Yu, L., Wang, S.Y., Lai, K.K.: Credit risk assessment with a multistage neural network ensemble learning approach. Expert Systems with Applications 34(2), 1434–1444 (2008a) 7. Chen, M.C., Huang, S.H.: Credit scoring and rejected instances reassigning through evolutionary computation techniques. Expert Systems with Applications 24, 433–441 (2003) 8. Yu, L., Wang, S.Y., Lai, K.K., Zhou, L.G.: Bio-Inspired Credit Risk Analysis - Computational Intelligence with Support Vector Machines. Springer, Berlin (2008b) 9. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995) 10. Suykens, J.A.K., Vandewalle, J.: Least squares support vector machine classifiers. Neural Processing Letters 9(3), 293–300 (1999) 11. Zhou, L.G., Lai, K.K., Yu, L.: Credit scoring using support vector machines with direct search for parameters selection. Soft Computing 13, 149–155 (2009) 12. Yu, L., Wang, S.Y., Lai, K.K.: An Intelligent-Agent-Based Fuzzy Group Decision Making Model for Financial Multicriteria Decision Support: The Case of Credit Scoring. European Journal of Operational Research 195(3), 942–959 (2009)
Ecological Risk Assessment with MCDM of Some Invasive Alien Plants in China Guowen Xie1, Weiguang Chen2, Meizhen Lin3, Yanling Zheng1, Peiguo Guo1, and Yisheng Zheng1 1
College of Life Science, Guangzhou University, Guangzhou 510006, China
[email protected] 2 School of Foreign Studies, Guangzhou University, Guangzhou 510006, China 3 College of Geographical Science, Guangzhou University, Guangzhou 510006, China
Abstract. Alien plant invasion is an urgent global issue that threatens the sustainable development of the e c o s y s t e m health. The study of its ecological risk assessment (ERA) could help us to prevent and reduce the invasion risk more effectively. Based on the theory of ERA and methods of the analytic hierarchy process (AHP) of multi-criteria decision-making (MCDM), and through the analyses of the characteristics and processes of alien plant invasion, this paper discusses the methodologies of ERA of alien plant invasion. The assessment procedure consisted of risk source analysis, receptor analysis, exposure and hazard assessment, integral assessment, and countermeasure of risk management. The indicator system of risk source assessment as well as the indices and formulas applied to measure the ecological loss and risk were established, and the method for comprehensively assessing the ecological risk of alien plant invasion was worked out. The result of ecological risk analysis to 9 representative invasive alien plants in China shows that the ecological risk of Erigeron annuus, Ageratum conyzoides Alternanthera philoxeroides and Mikania midrantha is high (grade1-2), that of Oxalis corymbosa and Wedelia chinensis comes next (grade3), while Mirabilis jalapa, Pilea microphylla and Calendula officinalis of the last (grade 4). Risk strategies are put forward on this basis.
,
Keywords: alien plant, invasion, ecological risk assessment, multi-criteria decision-making (MCDM), analytic hierarchy process (AHP), risk strategies.
1 Introduction Bio-invasions have caused serious ecological consequences (Ehrenfeld, 2003) and economic losses (Pimentel et al., 2001) on local and global scale. China, with the rapid development of economy, will introduce more and more species accompanied with the expanse of foreign trade and tourism. As a result, the probability of alien species invasion in China increases and bio-invasion comes to hinder municipal and regional economy and the social substantial development. Biodiversity and natural ecological system have been severely threatened by invasive alien species (Xie et al., 2001). The yearly economic loss caused by invasive alien species exceeds 7 billion US dollar Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 580–587, 2009. © Springer-Verlag Berlin Heidelberg 2009
Ecological Risk Assessment with MCDM of Some Invasive Alien Plants in China
581
((Fan and Li, 2001). In research of this respect, plant invasion is an important subject, since it seriously threatens the local natural resources, biodiversity, ecological environment and farming-forest-grazing-fishery production and leaves a lasting damage. Consequently, it is necessary to conduct ecological risk assessment to alien plants so as to predict the possible damage and prepare an effective risk management. MCDM techniques deal with the problems whose alternatives are predefined and the decision-maker ranks available alternatives. MCDM is proved to be a promising and growing field of study since early 1970s and many applications in the fields of engineering, business, and social sciences have been reported. Carlsson and Fulle′r (1996) classified MCDM methods into four distinct types including (i) outranking, (ii) utility theory, (iii) multiple objective programming, and (iv)group decision and negotiation theory. One of the methods classified under utility theory is the analytic hierarchy process (AHP) , and has proved to be one of the most widely applied MCDM methods (Vaidya and Kumar 2006). The AHP provides an ideal platform for complex decision-making problems. The AHP uses objective mathematics to process the subjective and personal preferences of an individual or a group in decision making (Saaty 2001). The AHP works on a premise that decision-making of complex problems can be handled by structuring it into a simple and comprehensible hierarchical structure (Tesfamariam and Sadiq, 2006). Based on the theory of ecological risk assessment, this paper explores the MCDM method of ecological risk assessment of invasive alien plants and establishes a primary model for risk evaluation system, appraisal criteria and comprehensive evaluation of invasive alien plants.
2 Materials and Methods 2.1 Materials and Data Source Among alien weeds in China, most are from the American and European Continents, and the most of them are of Compositae. Nine of these alien plants, namely,Ageratum conyzoides, Mikania micrantha, Erigeron annuus, Pilea microphylla, Alternanthera philoxeroides, Wedelia trilobata, Oxalis corymbosa, Mirabilis jalapa and Calendula officinalis are selected as the research subjects according to their weediness, distribution and harmfulness. The original habitat, way of introduction, time of introduction, and living state (the weediness) in the habitat are ascertained (Li and Xie,2002) and a data bank is set up to study the biological, reproductive, environment-adaptive, community-ecological and spreading mechanical factors relevant to invasions. Comparisons are made in adaptability, lasting capacity and harmfulness among these plants to decide the risk assessment index system and the proportion in the ecological risk. 2.2 Analysis Methods In accordance with quantitative analysis method of harmful species, the dangerousness values of invasive plants in comprehensive evaluation are classified into R i and R i j, regarded as assessing value of grade 1 and grade 2. Value assignment is conducted to
582
G. Xie et al.
R i and R i j on the criteria of invasive species assessment index, the range of which is basically classified into 4 grades. Finally, the dangerousness of invasive harmful species R is calculated according to the formula (1), (2), (3) and (4) of quantitative analysis of harmful species to determine the grade of dangerousness.
3 The Construct of MCDM Assessment System and Model 3.1 The Construct of MCDM Assessment System for Ecological Risk The index system that is classified into three layers, namely, the target layer, the criterion layer, and the index layer. The target layer (R): Being the first layer, and calculated from 4 index of the criterion layer, this refers to the Risk of invasive alien plants (R). The criterion layer R i : Being the second layer, and calculated from relevant index, this is composed of the invasiveness R 1 , adaptability R 2 , diffusibility( R 3) and perniciousness R 4 .
( ) ( )
( )
( )
Table 1. Framework of risk assessment system for invasive alien plants R
Ri
Rij
index properties and parameters
occurrence level in habitat
area of occurrence, ratio of area-occurrence
place˄R 11˅ ways of introduction
intentional introduction: market price in foreign countries or other
˄R 12˅
areas, trade volume in foreign countries or other areas, planted areas
Invasiveness
(crops)
˄R 1˅
introduction: import volume from the alien species habitat, import
in
foreign
countries
or other
areas.
unintentional
volume of farm produce, number of tourists and volume of ballast water emission control measures˄R 13˅
Risk of invasive alien plants
Adaptability ˄R2˅
expert grading
adaptive capacity˄R21˅
expert grading
adverse resistance˄R22˅
expert grading
climate adaptability˄R23˅
temperature similarity, illumination similarity, rainfall similarity
adaptability
soil similarity
of
other
restrictive factors˄R24˅ growth rate˄R31˅
increase rate of fresh weight, increase rate of dry weight
reproductive capacity˄R32˅
mode of reproduction, generation length, number of offspring from an individual biosome
diffusibility ˄R3˅
diffusibility˄R33˅
propagule mobility, spread range
scope
eco-adaptation
area of suitable climate zone, area of adaptive soil, area of host
˄R34˅
plants
control mechanisms ˄R35˅
types and distribution of natural enemies, rate and cost of pesticide
economic importance˄R41˅
whether or not as quarantine object, economic importance of
of
prevention and treatment mischievous object, production and profits of relevant economic activities, loss and preventive cost of relevant domestic industries Pernicious-
eco-environmental
possibility to hybrid with local species, species diversity index ,
ness˄R4˅
importance˄R42˅
community diversity index , landscape diversity index, change of ecologic system function
human-health ˄R43˅
importance
number of potential patients, mortality of patients, cost of prevention and cure, economic loss of patients
Ecological Risk Assessment with MCDM of Some Invasive Alien Plants in China
583
( )
The index layer R i j : This is the concretization of the criterion layer index, and the foundation of the risk of the invasive alien plants. In this research, 15 quantitative risks of index layer are determined. (Table 1).
3.2 Model of Risk Assessment by Alien Plants The function and relations of risk assessment index among alien species are different, and the latter can be classified into accumulating relation, multiplicative relation and substitutive relation according to their contributions. 1) Accumulating relation of index When indexes are not dependent on each other and independently contribute to the upper-grade value, they are of accumulating relations. In this research, some grade 2 indexes are of accumulating relation and the calculating formula is as following: Ri=
∑
WiRij
/
∑
Wi
(1)
R i : grade 1 index, R i j : grade 2 index, Wi: weight. 2) Multiplicative relation of index When indexes are dependent on each other and co-contribute to the upper-grade value, they are of multiplicative relations. In this research, grade 1 indexes are of multiplicative relations and the calculating formula is as following: 4
R=
πRi
(2)
R: value of comprehensive assessment 3) Substitutive relation When an index is the greatest and substitutes other index of the same grade to contribute to the upper-grade index, the relation of such indexes is substitutive. In this research, grade 2 indexes are of substitutive relations and the calculating formula is as following:
(R
R i = Max
i l,
)
R i 2, …, R i j
(3)
4) Mathematical model of comprehensive risk assessment A model, based on logical relation and mathematical relation between the index framework and the indexes, is set up to calculate the comprehensive risk value: (4) R= 4 R1 ∗ R 2 ∗ R3 ∗ R 4 3.3 Divides the Risk Grades of Alien Plants The risk grading of alien species should take the current grading system of harmful species as reference and the starting point, facilitating a scientific and effective risk assessment of alien species which presents the relations between risk value and risk
584
G. Xie et al.
grading, resulting in the preventive and control measures of various levels to various risk. Meanwhile, in order to compare risk levels respectively and to conduct risk management, the significance of invasiveness suggested in the risk grading and its position in the harmful species grading system should be clarified. This research puts forward the following grading scheme (Table 2). Table 2. Criteria for risk grading of alien plants Risk grades 1
2
3
Risk levels Extremely dangerous Highly dangerous Moderately dangerous
VCA(R)
Invasive distinguish
Management strategies
Extremely high risk, according with the 3.2~4.0
1st-grade pests or malignant weeds in hazard
no introduction
characterization High risk, according with the 2nd-grade 2.7~3.2
pests or malignant weeds in hazard
no introduction
characterization Moderately risk, according with 3rd-grade 2.0~2.7
pests or common / general weeds in hazard
no introduction
characterization
4
Low dangerous
1.0~2.0
Low invasive risk
5
Non-dangerous
0~1.0
No invasive risk
Introduction with risk control measures introduction / without control measures
VCA: value of comprehensive assessment
4 Results and Analysis The original habitat, ways of introduction, time of introduction of the nine alien plants are presented in Table 3. In accordance with the criteria of invasive species assessment index and formula [ (1), (2), (3), (4)], various analytic indexes of invasive plants, taken as judging indexes, are value assigned. Table 4 presents the value assignment, after calculation and sorting, of the dangerousness judging indexes of 9 invasive plants in China From the statistic analysis(Table 4), the R value of Erigeron annuus, Ageratum conyzoides and Alternanthera philoxeroides is higher than 3.2, which falls into the scope of no-introduction. However, since these plants have been planted in a large area in China, strict measures should be taken to prevent inter-regional introduction into non-invaded areas, while in the invaded areas, measures should be taken to lessen the harmful effect. The range of R value of Wedelia trilobata and Oxalis corymbosa is between 2.14 to 2.28, which fall into the scope of strict restriction. Measures should be taken to prevent them from further spreading into non-invaded areas. At the same time, measures should be taken to attract more attention to these exotic weeds and strengthen the restoration research of the invaded ecological system.
Ecological Risk Assessment with MCDM of Some Invasive Alien Plants in China
585
Table 3. Investigation results of basic information of the nine alien plants Alien plants
original habitat
Habit
value
VI
introduced time
Ageratum conyzoides
the Americas
annual herb
weed
OI
19th century
Mikania micrantha
the Americas
perennial liane
weed
OI
20s in 20th century
Erigeron annuus
North America
biennial herb
weed
OI
1886
Pilea microphylla
South America
annual herb
Weed
OI
20s in 20th century
Alternanthera philoxeroides
Brazil
perennial herb
feed, weed
AI
1892
Wedelia trilobata
the Americas
perennial herb
ornamental
AI
70s in 20th century
Oxalis corymbosa
the Americas
perennial herb
ornamental
AI
50s in 19th century
Mirabilis jalapa
South America
annual herb
OM
AI
18th century
Calendula officinalis
Mediterranean
annual herb
OM
AI
19th century
VI: way of introduction; OI: occasional introduction; AI: artificial introduction; OM: ornamental, medicine. Table 4. Assessment results of the ecological fatalness of the nine alien plants Alien plants
R1
R2
R4
R
Sequence
Erigeron annuus
2.8
3
2.95
R3
3.67
3.46
1
Ageratum conyzoides
2.5
3
2.85
3.67
3.38
2
Alternanthera philoxeroides
2.4
3
2.85
3.67
3.26
3
Mikania midrantha
2.2
3
2.55
3.67
3.15
4
Oxalis corymbosa
1.9
3
2.35
2.93
2.28
5
Wedelia trilobata
1.6
2
2.15
2.85
2.14
6
Mirabilis jalapa
1.5
2
1.85
2.80
1.95
7
Pilea microphylla
1.4
2
1.75
2.74
1.73
8
Calendula officinalis
1.2
1
1.35
2.33
1.12
9
As to Mirabilis jalapa, Pilea microphylla and Calendula officinalis, measures of control- introduction and pre-mature excavation can be advisable. It is also advisable to cultivate more local fine flower breeds to take the place of exotic varieties. The situation is presented that these plants have been introduced into China and spread year by year. In this respect, measures should be taken to prevent from further spreading.
5 Risk Strategies 1) More careful quarantine inspection and more careful introduction: Since most of the invasive plants are introduced or brought in by people, strict quarantine inspection can lower to the least the possibility of exotic plant invasion. 2) Comprehensive prevention and control: Integrating such means as mechanical, artificial, chemical and biological to cope with invasive plants. Since invasive exotic plants have long been affected, comprehensive measures of prevention and control are most effective on the basis of persistent research and experiment.
586
G. Xie et al.
3) Early warning and early risk assessment: To those exotic plants in their successful colonization, early warning analysis and ecological risk assessment are of importance so as to prepare measures to prevent invasions. 4) Development and practice of effectively ecological restore: To the already-invaded species, make use of full energy-/feed-stuff-/medicine-development to control spreading and to lessen harmful effects. When exotic species are under control or eliminated affected areas should be soon restored so as to prevent exotic species from invading again. 5) Improving relevant laws and regulations and strengthen education: Clarifying the object, scope, risk assessment, ecological restoration, responsibility of compensation in laws and regulations concerning invasive exotic species and strengthening education program to make the public more informed of the danger and characteristics of invasive exotic plants, and to get them involved in the exotic species management.
6 Conclusion Based on the theory of ERA and methods of the analytic hierarchy process (AHP) of multi-criteria decision-making (MCDM), and through the analyses of the characteristics and processes of alien plant invasion, this paper discusses the methodologies of ERA of alien plant invasion. The indicator system of risk source assessment as well as the indices and formulas applied to measure the ecological loss and risk were established, and the method for comprehensively assessing the ecological risk of alien plant invasion was worked out. The result of ecological risk analysis to 9 representative invasive alien plants in China shows that the ecological risk of Erigeron annuus, Ageratum conyzoides Alternanthera philoxeroides and Mikania midrantha is high (grade1-2), that of Oxalis corymbosa and Wedelia chinensis comes next (grade3), while Mirabilis jalapa, Pilea microphylla and Calendula officinalis of the last (grade 4).
,
Acknowledgement. This study was financially supported by the National Natural Science Foundation of China (grant no. 30470146, 39460011).
References 1. Carlsson, C., Fuller, R.: Fuzzy multiple criteria decision-making: recent developments. Fuzzy Set Syst. 78, 139–153 (1996) 2. Ehrenfeld, J.G.: Effects of exotic plant invasions on soil nutrient cycling processes. Ecosystems 6, 503–523 (2003) 3. Fan, X.H., Li, W.M.: Research on quarantine strategy for bio-safety protection in China. Biodiversity Sci. 9, 439–445 (2001) (in Chinese) 4. Li, Z.Y., Xie, Y.: Invasive alien species in China. China Forestry Publishing House, Beijing (2002) (in Chinese) 5. Li, M., Qing, J.Q.: A study on the methods of comprehensive evaluation for PRA. Plant Quarant Ine 12(1), 52–55 (1998) 6. Pimentel, D., McNair, S., Janecka, J., et al.: Economic and environmental threats of alien plant, animal, and microbe invasions. Agri. Ecosys. Environ. 84, 1–20 (2001)
Ecological Risk Assessment with MCDM of Some Invasive Alien Plants in China
587
7. Saaty, T.L.: How to make a decision? In: Saaty, T.L., Vargas, L.G. (eds.) Models methods, concepts and applications of the analytic hierarchy process, ch.1, Kluwer, Dordrecht (2001) 8. Tesfamariam, S., Sadiq, R.: Risk-based environmental decision-making using fuzzy analytic hierarchy process (F-AHP). Stoch. Environ. Res. Risk Assess. 21, 35–50 (2006) 9. Vaidya, O.S., Kumar, S.: Analytic hierarchy process: an overview of applications. Eur. J. Oper. Res. 169, 1–29 (2006) 10. Wu, X.W., Luo, J., Chen, J.K., et al.: Spatial patterns of invasive alien plants in China and its relationship with Environmental and anthropological factors. J. Plant Ecol. 30(4), 576–584 (2006) 11. Xie, Y., Li, Z.Y., Gregg, W.P., Li, D.M.: Invasive species in China - an overview. Biodiversity and Conserv. 10, 1317–1341 (2001)
Empirically-Based Crop Insurance for China: A Pilot Study in the Down-middle Yangtze River Area of China Erda Wang*, Yang Yu, Bertis B. Little, Zhongxin Chen, and Jianqiang Ren Center for Risk Management Research, School of Management, Dalian University of Technology, Dalian, 116023, Tel.: 0411-84707090; Fax: 0411-84707090
[email protected] Abstract. Factors that caused slow growth in crop insurance participation and its ultimate failure in China were multi-faceted including high agricultural production risk, low participation rate, inadequate public awareness, high loss ratio, insufficient and interrupted government financial support. Thus, a clear and present need for data driven analyses and empirically-based risk management exists in China. In the present investigation, agricultural production data for two crops (corn, rice) in five counties in Jiangxi Province and Hunan province for design of a pilot crop insurance program in China. A crop insurance program was designed which (1) provides 75% coverage, (2) a 55% premium rate reduction for the farmer compared to catastrophic coverage most recently offered, and uses the currently approved governmental premium subsidy level. Thus a safety net for Chinese farmers that help maintain agricultural production at a level of self-sufficiency that costs less than half the current plans requires one change to the program: ≥80% of producers must participate in an area.
1 Introduction Chinese agriculture faces a significant challenge because the population exceeds onefifth that in the World, but holds only 10% of the arable land. An important limitation is that China has only one quarter of the average world water resources per person (OECD 2005), which is a major risk factor in agricultural production. Agriculture production risks of crop failure or decreased yields are caused mainly by adverse weather events (drought, excess precipitation, floods), followed in small part by pests, diseases, and fire. Few economic sectors are as vulnerable to climatic (stochastic) variation (Dismukes and Glauber, 2000; Glauber and Collins, 2004). Government intervention to provide assistance to agriculture was widely adopted in the People’s Republic of China since its foundation in 1949, but assistance programs have not been data driven. Historically, the effectiveness of different forms of government assistance (ad hoc disaster relief, emergency loans, crop insurance) has not been systematically analyzed, and is not empirically driven. The major handicap in China is the lack of data and analysis on which to base a risk management program (Lin, 2000; Zhang and Xu, 2005; Luan and Cheng, 2007; World Bank, 2003). More importantly, the risks have not been quantitatively evaluated. *
Corresponding author.
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 588–594, 2009. © Springer-Verlag Berlin Heidelberg 2009
Empirically-Based Crop Insurance for China
589
Chinese agricultural production is significantly affected by multiple natural hazards including: seasonal floods, drought, typhoons, hail, freeze, pests, earthquakes, and wildlife. Approximately 24 percent of all hectares sown had agricultural production losses of 10 percent or more. The five leading causes of loss based on the 2004 crop year were: drought (52%), floods (28%), hail (10%), frost/freeze (6%), and typhoons (4%). On average, 30 percent farmland was affected by flood, hail, freeze, and drought natural disasters, and 6 to 9 percent total crop value loss occurred between 1980 to 2004 (World Bank, 2005; OECD 2004; Chen, 2007; Li, 1996). In the present investigation, an actuarially-based pilot crop insurance program for maize, rice, sorghum, soy beans, and wheat is developed for five neighboring counties in the Down-middle Yangtze River area of China where Nan County and Yuanjiang City in Hunan province; Hukou County, Duchang County and Boyang County in Jiangxi province.
2 Methods and Materials 2.1 Study Area The five county areas contain 223,610 hectares (559,025 acres) of cultivated land with 82% irrigated and 18% non-irrigated. The high proportion of irrigated land area is primarily due to large paddy land areas that are mostly irrigated. Over the 22 years (1981, 1985-2005), average crop yields were 4.57 tons/ha for rice, 0.75 tons/ha wheat, 1.7 tons/ha corn, 1.34 tons/ha soy beans, and 0.32 tons/ha sorghum. Yearly crop yields also varied across within the province and between counties because of rainfall annual variation, especially for non-irrigated crop production. 2.2 Analytical Methods Agricultural data for 1981 to 2005 were collected at the county level for five counties. Data are at the crop type, and practice level for corn, soybeans, rice, and sorghum. Data included: (1) total arable acres, (2) arable acres cultivated by crop and by practice (irrigated vs. non-irrigated), (3) yields, (4) daily weather, (5) amount of chemical fertilizer used, and (6) commodity prices by year. Yield data were detrended for county-specific rainfall, year, and fertilizer amounts (tons per hectare). For the survival function: Let t be time, T* denote the survival time; assume T* > 0, and, S(t) = Pr[T* > t ] = 1 - Pr [T* < t ] = 1 - F (t )
(1)
where F(t) is CDF of T* . Assuming that S(0) = 1; S(∞) = 0 and if t1 < t2, then (2) S(t1) > S(t2) where S(t) is the probability of survival at time t conditional on a specific crop survival to time t. Losses at these levels were modeled under a survivor and proportional hazards model using PROC LIFETEST. The procedure produces the crop-specific proportional hazard function: hi = lim δ →0
Pr[t1 ≤ T * < t 2 + δ | T * ≥t 1
δ
(3)
590
E. Wang et al.
where hi is crop specific and δ is the proportion censored. δi = 1 if failure i.e., ti = Ti*; and 0 if right censored, i.e., Ti = Ci. The hazard is the instantaneous rate of failure at time where T* = t conditional on survival to time t, so that the following holds:
{
t
S (t ) = exp − ∫ h(τ )dτ 0
}
(4)
And h(t ) = −
d ( S (t )) / dt S (t )
(5)
The exact time of failure of a given crop type is not known, and some crops will not fail by the end of the interval. Therefore, right censoring will be applied and computations account for this. Crop specific actuarial tables are constructed from the ∧
∧
[
S (t ) = ∏ p j for t ∈ t j , t j +1
]
(6)
∧
where p j = (n j − m j ) / n j is the estimated probability of surviving interval [t(j), t(j+1)] conditional on survival up to t(j). Exponential regression was used to directly compute the survival probabilities (i.e., 1 – probability of failure) computed under the hazards model, deriving the classic “exponent” used in actuarial risk analysis. The exponential distribution is essentially the time between two consecutive Poisson events with intensity λ events per unit time such that the time sustained between failure times, and if we let X be the distribution of interval time, and x is the time after which the second event occurs. In the instance where X > x, then the probability is Pr (X > x) = 1 – F(x) where F(x) is the CDF of X (i.e., no Poissson events have occurred before time x). Hence the probability of no events is: Pr(0) = e-λx which yields the function: 1 - F ( x) = e
-λ x
(7)
Given F(x) = 1 - e-λx , if and only if x > 0, and λ > 0 , the derivative of f(x) = dF(x)/dx gives the probability distribution function λ e-λx for all x > 0. Where λ is replaced by 1/δ, f(x) = 1/δ e-x/δ if x > 0, and δ > 0. If time at the event (i.e., crop loss of a given level) is t0 then the probability of time at the event x > t0 is: Pr ( X > t 0 ) = 1- Pr ( X < t 0 ) = 1 - F (t 0 ) = e -λt 0
(8)
Thus, the exponential regression model used to compute the waiting time between crop failure events is:
ln (Y ) = ln (b0 ) + b1t0
(9)
where Y is the waiting fime between crop failure events, i.e. survival probability, b0 is the intercept and b1 is the exponential regression coefficient. If this is implemented in this application to crop insurance, then:
Empirically-Based Crop Insurance for China
Y = b 0 * EXP (b1t 0 ) or Y= b 0 * e b1t0
591
(10)
The model is tested using a Monte Carlo simulation extrapolated to 100 years and cycled 10,000 times to produce a stable solution, and a historical (empirical) simulation (Table 1). The Monte Carlo solution is compared to the empirical analysis. The functions can be compared using a log rank test because censoring precludes using the Wilcoxon ranks sum test. Using data from two samples: (Tij, δ ij) for i = 1, 2 and j = 1, 2, . . . , n The null hypothesis to be tested is:
H0 : S1(t) = S2(t)
S j (t ) = Pr[T j* > t ]
Where
for j = 1, 2
(11)
The log rank test of the H0 is: X1 =
(O1 − E1 ) 2 V1
~ χ2
(12)
Table 1. Expon ential regression analysis of corn and rice survival probabilities for Jiangxi and Hunan Provinces
Crop
Corn
Rice
Practice Irrigated NonIrrigated Combined Irrigated NonIrrigated Combined
EXP(B) B SE -0.031 0.001
Intercept A SE 1.016 0.018
R2 0.98
-0.038 -0.035 -0.023
0.002 0.002 0.001
1.031 1.024 1.074
0.022 0.02 0.013
0.95 0.97 0.97
-0.027 -0.025
0.001 0.001
1.02 1.045
0.021 0.017
0.94 0.96
Significance of B Within & Between Crop* Crops** 0.001 0.001 --
0.001
0.001
--
--
--
*Irrigated vs. non-irrigated for corn and rice. **Irrigated corn v. irrigated rice, non-irrigated corn v. non-irrigated.
The simulated models for crop insurance are compared against those currently used by the US crop insurance program for actuary concordance using the log rank test. Commodity price data is used to analyze the cost of losses for each crop by 5, 10, 20, and 100 year loss functions. Using the rates of loss expected per year, the raw and “loaded” (premium, reserve, and delivery costs) economic burden of insuring these losses is computed using the probabilities produced by Eq (10). Levels of are evaluated by employing critical values of the Z-distribution as noted above using the 75% coverage level as the reference. Following derivation of the loss probabilities based on observed data, the pilot program is developed for 5 counties in China. A simplified formula used is: Annual Risk Premium = PC * Lp * Cp and Producer Premium = PC * Lp * Cp * Sp, where Pc is the
592
E. Wang et al.
crop price per land unit (hectare), Lp is annual loss probability (b0*eb1t0), Cp is the probability of catastrophic losses (5, 10 and 20 year events), and Sp is the governmental premium subsidy proportion. Three types of insurance plans are evaluated using the loss model developed here: (1) income protection (IP), (2) group risk plan (GRP), and group risk income protection (GRIR).
3 Results The approach is similar to the Group Program in the US crop insurance program where isolated losses do not qualify (Table 3). In addition, high area participation is required. It was assumed that > 80% of farmers participate in the program. One of the incentives offered to farmers to participate in the US crop insurance program was linking crop insurance with participation in the USDA Farm Service Agency (FSA) disaster programs. For example, those farmers who do not participate in the US crop Table 2. Break-Even with Adverse Selection in Crop Insurance for Rice and Corn Using 2004 PICC Book of Business: Simulated for Jiangxi and Hunan Provinces 2006 Rice Price/Tonne (T) ¥ 2,140 Mean T/hectare 5.62 Price per hectare ¥ 12,026.8 Total Hectares (2006 Hunan,Five 365,771 counties) Hectare Price ¥ 12,026.80 Total Liability ¥ 4,399,847,544 Total Indemnity ¥ 585,179,723 RISK PREMIUM* 75% Coverage ¥ 1599.85 / ha Producer ¥ 799.93 447.65 / ha Government ¥ 799.93 671.48 / ha A&O (20%) ¥ 319.97 112.91 / ha Loss Ratio1 0.40 Producers Loss Ratio2 1.05 Program Loss Cost 0.13 Subsidy/ Hectare ¥ 1119.90 Total Gov’t Cost** ¥ 409,626,942.90 Reserve ¥ 117,035,746.87 / year *Based on Life Table Estimated Loss Rates by World Bank Mean+ SE Rice: 0.15 + 0.02 95% CI 0.11 to 0.19 ** Government premium subsidy plus A&O
Corn ¥ 1,229 3.75 ¥ 4,608.75 2,138 ¥ 4,608.75 ¥ 9,846,479 ¥ 1,309,581 ¥ 612.53 / ha ¥ 306.26 / ha ¥ 306.26 / ha ¥ 122.51 / ha
0.13 ¥ 428.77 ¥ 916,710.26 ¥ 261,917.83 / year Corn: 0.19 + 0.02 0.15 to 0.23
insurance program were not eligible for FSA disaster payments. The results indicate that crop insurance could be affordable, with recommended government subsidy (50%), and would provide the safety net that the Chinese Central Government wishes
Empirically-Based Crop Insurance for China
593
to provide Chinese farmers. The empirically-based 75% coverage program (Table 2 and Table 3) is less costly (55% less) than CAT coverage offered most recently in China. Insurance products for intermediate levels of coverage between 50% and 75% may also be offered. These results indicate the feasibility of providing coverage higher than 50% at a price that is less than half (45%) of CAT coverage that was flawed by adverse selection. The crop insurance design presented (Table 3) as the optimal program at 75% coverage (25% deductible) has a total risk premium that is reasonable at the farm unit level, the government subsidy is very similar to the US program, and has a built-in mechanism to avoid adverse selection (group losses). Table 3. Crop Insurance for Rice and Corn: Jiangxi and Hunan Provinces 2006 Rice Corn Price/Tonne (T) ¥ 2,140 ¥ 1,229 Mean T/hectare 5.62 3.75 Price per hectare ¥ 12,026.8 ¥ 4,608.75 Total Hectares (2006 Hunan,Five 365,771 2,138 counties) Hectare Price ¥ 12,026.80 ¥ 4,608.75 Total Liability ¥ 4,399,847,544 ¥ 9,846,479 Total Indemnity ¥ 263,990,852 ¥590,788.74 Total Premium ¥ 263,991,561.54 RISK PREMIUM* 75% Coverage ¥ 721.74 / ha ¥ 276.32 / ha Producer ¥ 364.87 / ha ¥ 138.16 / ha ¥ 364.87 / ha ¥ 138.16 / ha Government A&O (20%) ¥ 144.35 / ha ¥ 55.26 / ha Loss Ratio1 0.50 Producers Loss Ratio2 0.99 Program Loss Cost 0.06 0.13 Government Cost Per Hectare ¥ 509.22 ¥ 193.42 Total Gov’t Cost** ¥ 409,626,942.90 ¥ 916,710.26 Reserve*** ¥ 186,257,908.62 / year ¥ 118,157.75 / year *Based on Life Table Estimated Loss Rates (Table2) from Human Province Empirical Data Mean+ SE Rice: 0.02 + 0.004 Corn: 0.025 + 0.004 99% CI 0.016 to 0.024 0.017 to 0.033 ** Government premium subsidy plus A&O; ***~20% premium / year
4 Discussion A long-standing challenge for China is implementation of an agricultural insurance program that can overcome a poor loss management experience, low participation and adverse selection. Institutionally, a “one state insurance company” philosophy runs counter to the theory of distribution of risk that underlies all insurance principles, and there is limited appreciation of the need for different premium rates for different risk scenarios, underscoring the importance of proper underwriting for agricultural insurance (Babcock and Hart, 2005; Wu, 1999; Young, Vandeveer, and Schndpf, 2001).
594
E. Wang et al.
In our example (Table 2), we suggested 75% coverage with 50% government subsidy. The present model is empirically based and delivers a 50% increase in coverage over the CAT programs previously implemented. However, the Chinese government must mandate that > 80% of farmers in any given area participate, and not allow a national average of participation to be used as a substitute. In summary, the proposed program design provides 75% coverage for a price that is 45% of the catastrophic coverage, requires only one stipulation beyond the current Chinese government agricultural policy on crop insurance – mandate > 80% participation by area.
References Babcock, B.A., Hart, C.E.: ARPA subsidies, unit choice, and reform of the U.S. crop insurance program 45(2), 11–70 (2005) Chen, L.: Price analysis and its application on Chinese agricultural insurance products. Nankai Economic Study (4), 203–211 (2007) Chambers, R.G., Quiggin, J.: Decomposing input adjustments under price and production uncertainty. Am. J. Agr. Econ. 83, 20–34 (2001) Duncan, J., Myers, R.J.: Crop insurance under catastrophic risk. Am. J. Agr. Econ. 82, 842–855 (2000) Glauber, J.W., Collins, K.J.: Risk management and the role of the federal government. Agricultural Management and the Role of the Government 5, 143–183 (2004) Ismukes, R., Glauber, J.: Crop and revenue insurance: premium discounts attractive to producers. Agricultural Outlook AGO 269(3), 4–6 (2000) Yu, Y., Wang, E.D.: Current situation and new mode development of Liaoning agricultural insurance program. Scientific Management Research 25(4), 149–152 (2007) Lin, Y.: Restudy on institution, technology and the development of Chinese Agriculture 22, 13– 15 (2000) Luan, J., Cheng, J.: Construction of industrial chain-based agricultural risk management system. Journal of Issues on Agr. Economics 3, 23–27 (2007) OECD, China in the Global Economy: Rural Finance and Credit Infrastructure, Paris (2004), ISBN 92-64-01528-0 World Bank: World development report of year 2003 – Persisting development in the changing world improving institutions, mode of increase and life quality, ch. 37. Chinese Finance and Economics Publishing (2003) Wu, J.: Crop insurance, acreage decisions and nonpoint-source pollution. Amer. J. Agr. Econ. 81, 305–320 (1999) Young, C.E., Vandeveer, M.L., Schnepf, R.D.: Production and price impacts of U.S. crop insurance programs. Amer. J. Agr. Econ. 83(5), 1196–1203 (2001)
A Response Analysis of Economic Growth to Environmental Risk: A Case Study of Qingdao Yunpeng Qin1, Jianyue Ji2, and Xiaoli Yu2 1
School of Environmental Science and Engineering, Ocean University of China, Qingdao 266071, China 2 School of Economics, Ocean University of China, Qingdao 266071, China
[email protected] Abstract. The economic growth is one kind of risk sources for environmental pollution. Taking Qingdao as sample, using principal component analysis, Granger causality test and impulse response function, this paper aims to find the relationship between economic growth and environmental pollution. The result shows that the economic growth is Granger cause of environmental risk which are influenced by time lags. The influence is progressive, gradual, and longstanding.
1 Introduction Environmental risk refers to the occurring probability of adverse events and their consequences which happen in natural environment or pass through the natural environment and are harmful to our human society and the natural environment [1]. The issues of environmental risk have been the social focus, and most studies focus on the environmental risk assessment[2] [3]. Because environment has certain bearing capacity itself [4], the relationship between the economic growth and environmental risk will be different from others. Taking Qingdao as an example, this paper analyzes the relationship between the economic growth and its consequential environmental risk.
2 Comprehensive Evaluation on Environmental Risk 2.1 Selection and Source of Environmental Risk Variables The measures we need are ones can reflect the environmental changes as a result of human economic activities. We choose the total discharge amount of wastewater, industrial wastewater, industrial waste gas, industrial solid wastes and industrial dust as our assessment measures. The larger the five variables’ values are, the higher the potential environmental risk is. We select variables of Qingdao from 1982 to 2007 as our data sample which are all from the Qingdao Statistical Yearbook (1983-2008 Edition). Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 595–599, 2009. © Springer-Verlag Berlin Heidelberg 2009
596
Y. Qin, J. Ji, and X. Yu
2.2 Comprehensive Evaluation on Environmental Risk Using Principal Component Analysis Using MATLAB7.0, the PCA results of the environmental risk level are listed in Table1. Table 1. PCA results of environmental risk level Variables Principal Component 1 Total discharge amount of wastewater 0.5501 Discharge amount of industrial 0.0592 wastewater Emission amount of industrial waste gas 0.5358 Discharge amount of industrial solid 0.5365 wastes Discharge amount of industrial dust -0.3450 Variance contribution rate (%) 64.19 Cumulated variance contribution rate(%) 64.19
Principal Component 2 0.0514 0.8284 0.1127 0.0899 0.5388 24.68 88.87
As shown in Table 1, the cumulated variance contribution rate of the first two principal components amounts to 88.87%, reflecting most variables’ information. So we select the first two principal components for analysis. The comprehensive evaluation score of environmental risk is calculated in Table2 which reflects the deteriorating environmental risk situation with a slight fluctuation. Table 2. Comprehensive evaluation score of environmental risk 1982 -3.54 1995 -0.04
1983 -3.13 1996 -0.12
1984 -2.59 1997 0.16
1985 -2.06 1998 -0.15
1986 1987 1988 -2.21 -1.48 -1.37 1999 2000 2001 -0.19 -0.76 1.35
1989 -1.45 2002 1.72
1990 1991 1992 1993 1994 -1.35 -0.48 -0.67 -0.55 -0.54 2003 2004 2005 2006 2007 2.26 3.14 3.48 4.43 4.63
3 Granger Causality Test between the Economic Growth and the Environmental Risk The Granger causality test can help us to determine whether there is a causal relationship between the economic growth and environmental risk while the GDP per capita represents the economic growth level and the comprehensive evaluation score stands for environmental risk. 3.1 Unit Root Test Before employing Granger causality test, the time series should be checked to make sure that they form a smooth one; and the method we use is called ADF. Using Eviews5.1, we carry out the unit root test in three aspects: Trend and Intercept, Intercept, None. The results are listed in Table3 where LNG, DE and DLNG respectively represent the natural logarithm of GDP per capita, the first order difference
A Response Analysis of Economic Growth to Environmental Risk
597
sequence of environmental risk (E) and LNG. DDLNG refers to the second order difference sequence of LNG. Table 3 shows that E and LNG are all non-stationary time processes. The first order difference sequence of E is stationary in all three aspects and thus E is an integrated process of order 1, E~I (1). The second order difference sequence of LNG turns to be stationary and thus LNG is an integrated process of order 2, LNG~I (2). Table 3. Results of the unit root test Trend and Intercept E DE Prob*(DE) LNG DLNG DDLNG Prob*(DDLNG)
-0.74 -5.17 0.0018 -2.66 -2.88 -4.77 0.0047
Intercept
None
Trend and Intercept
Intercept
None
0.77 -4.94 0.0006 -0.21 -2.96 -4.90 0.0007
0.28 -3.19 0.0027 2.50 -0.88 -5.03 0.0000
Non-stat stationary Non-stat Non-stat stationary -
Non-stat stationary Non-stat Non-stat stationary -
Non-stat Stationary Non-stat Non-stat stationary -
The conclusions in Table 3 are accordant for significance level a=0.01, 0.05. 3.2 Analysis by Granger Causality Test E and LNG are not of the same order sequences of single-whole and can’t be used directly under the conditions of Granger causality test. They are all stationary time serials after the second-order difference change. Thus DDE and DDLNG can be used to analyze the nexus. Using Eviews5.1, the results are listed in Table4. As shown, the null hypothesis that DDLNG does not Granger Cause DDE is rejected in lag period 6 when F-statistic=3.71. That is, DDLNG does Granger Cause DDE in lag period 6.The two null hypotheses are all accepted in other lag times, showing that the economic growth doesn’t bring about environmental risk at once. Table 4. The result of Granger causality test Lags to DDE does not Granger Cause DDLNG Accept (A) or Reject (R) DDLNG does not Granger Cause DDE Accept (A) or Reject (R)
1 0.15 A 0.64 A
2 0.13 A 0.25 A
3 4 0.12 0.21 A A 0.17 0.70 A A
5 0.29 A 2.48 A
6 7 0.07 0.08 A A 3.71 2.40 R A
4 Impulse Response Function Analysis between the Economic Growth and Environmental Risk Using Eviews5.1, we establish the Vector Auto Regressive (VAR) model which is proved to be stationary with DE and DLNG as endogenous variables (the detailed process and results are omitted here). The impulse response function curve is depicted in figure 1.
598
Y. Qin, J. Ji, and X. Yu
As shown in Figure1, the economic growth has an immediate strong response to its standard deviation innovation in the early period, increasing about 7%. Then it goes downward and reaches its minimum at about minus 1% in the fourth or fifth year. Later it goes upward and finally keeps stable at about zero. The economic growth has an immediate response to the standard deviation environmental risk innovation in the early period, dropping about 2%. Subsequently, it begins to rise to the sixth year and finally remains stable at about zero, showing a permanent effect. The environmental risk responses strongly to its standard deviation by increasing about 40% and begins to decline before the 2nd year, reaching its minimum. Then it ascends and declines, showing a clear sine wave form and a permanent effect. .08
.08
Respons e of DLNG to DLNG
Respons e of DLNG to DE
.04
.04
.00
.00
-.04
-.04
1
2
3
4
5
6
7
8
9
10
1
.6
.6
.5
.5
Response of DE to DLNG
.4
2
3
.4
.3
.3
.2
.2
.1
4
5
6
7
8
9
10
9
10
Response of DE to DE
.1
.0
.0
-.1
-.1
-.2
-.2
-.3
-.3 1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
Fig. 1. Response to Cholesky One S.D. Innovations ±2 S.E
The environmental risk doesn’t immediately response to the standard deviation innovation from the economic growth, which is in line with the result of Granger causality test. DE begins to rise to the maximum in the second year and keeps downward to the 3rd year. Then it ascends from the 3rd year to the 6th year and finally keeps stable at about zero. The whole change process displays the wave form, showing a permanent effect.
5 Conclusions By employing PCA, Granger causality test and the impulse response function, we analyzed the dynamic relationship between the economic growth and environmental risks. The conclusion is that Granger causality does exist between the economic growth and environmental risks while it is affected by time-lags. The impact of economy growth on the environment risks is progressive, gradual, and longstanding, which we must not neglect in the economy development.
A Response Analysis of Economic Growth to Environmental Risk
599
Acknowledgement This research was supported by Soft Science of Qingdao City (NO. 08R-01) and China Development Research Foundation.
References [1] Baxter, J., Eyles, J.: The utility of in-depth interviews for studying the meaning of environmental risk. The Professional Geographer 51(2), 307–320 (2004) [2] Atmadja, J., Bagtzoglou, A.C.: Pollution source identification in heterogeneous porous media. Water Resources Research 37(8), 2113–2125 (2001) [3] Jones, R.N.: An environmental risk assessment/management framework for climate change impact assessments. Natural Hazards 23(2-3), 197–230 (2001) [4] Rydin, Y.: Land use planning and environmental capacity: reassessing the use of regulatory policy tools to achieve sustainable development. Journal of Environmental Planning and Management 41(6), 749–765 (1998)
A Multiple Criteria and Multiple Constraints Mathematical Programming Model for Classification* Peng Zhang1, Yingjie Tian1, Dongling Zhang1, Xingquan Zhu2, and Yong Shi1,3 1
FEDS Research Center, Chinese Academy of Sciences, Beijing 100190, China 2 Dep. of Computer Sci. & Eng., Florida Atlantic University, Boca Raton, FL 33431, USA 3 College of Inform. Science & Technology, Univ. of Nebraska at Omaha, Nebraska, USA
[email protected] {tianyj,zdl,yshi}@gucas.ac.cn,
[email protected] Abstract. Mathematical programming has been widely used in data classification. A general strategy to build classifiers is to optimize a global objective function such as the square-loss function. However, in many real life situations, optimizing only one single objective function can hardly achieve a satisfactory classifier. Thus a series of models based on multiple criteria mathematical programming (MCMP) have been proposed recently, such as the multiple criteria linear programming (MCLP) model and the linear discriminant Analysis (LDA) model. In this paper, we argue that due to the inherent complexity of the real world data, multiple criteria mathematical programming may be also inadequate to identify a genuine classification boundary. Under this observation, we present a multiple criteria multiple constraints mathematical programming (MC2MP) model for classification. More specifically, we extend a most recent multiple criteria programming model, the MEMBV model, into a multiple constraints MEMBV model. Keywords: MCLP, MEMBV, Multiple Criteria Multiple Constraints Program.
1 Introduction Discovering useful knowledge from large scale of data is a non-trivial task [1]. A lot of classification models have been built to find useful knowledge from large scale of data. Traditionally, these models are built by optimizing a global objective function, i.e., decision tree is built by minimizing the information entropy, most of the neural network models are built by minimizing a square-loss function. However, in many real life situations, classification models merely optimizing one single objective function are usually inadequate to differentiate the genuine decision boundary. In contrast, models optimizing *
This research has been partially supported by a grant from National Natural Science Foundation of China (#70621001, #70531040, #70501030, #10601064, #70472074), National Natural Science Foundation of Beijing #9073020, 973 Project #2004CB720103, Ministry of Science and Technology, China and BHP Billiton Co., Australia.
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 600–605, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Multiple Criteria and Multiple Constraints Mathematical Programming Model
601
multiple objective functions are most likely to decrease the error rate. For example, in 2008, Peng et al. [2] proposed a Minimal Error and Maximal Between-class Variance (MEMBV) model and reported rather high classification accuracy on benchmark UCI datasets. In this paper, we argue that although lots of studies have shown that classification models that based on multiple criteria mathematical programming (MCMP) can achieve good results, MCMP models maybe also inadequate for some especially hard-to-separated datasets. Under this observation, in this paper, we extend the MCMP model into a multiple criteria multiple constraints mathematical programming (MC2MP) model, specifically; we extend the MEMBV model, into a multiple Constraints MEMBV model to improve its performance. The rest of this paper is organized as follows: in the second section, we give a short introduction of Multiple Criteria Linear Programming (MCLP), Linear Discriminant Analysis (LDA) and Minimal Error and Maximal Between-class Variance (MEMBV) models; in the third section, we extend the MEMBV model into a multiple Constraints MEMBV formulation; in the fourth section, we present a DC algorithm to solve this new multiple Constraints MEMBV model; in the fifth section, we conclude our paper with some discussions.
2 MCLP, LDA and MEMBV Models In this section, we will give a short introduction of three classification models which based on multiple criteria mathematical programming. The first model is the MCLP model, which proposed by Shi et al [3]. MCLP is a linear programming model, to a specific training instance xi, if xi is a correctly classified example, MCLP tries to find a project direction w that maximize the Euclidian distance (denoted by βi ) between the projected location wxi and the boundary b; otherwise, if xi is a misclassified example, MCLP defines α i to denote the Euclidian distance from wxi to the boundary b. By doing so, the MCLP can be formulated as follows:
Maximize Minimize
∑β ∑α i
i
i
i
(1) (2)
s.t.: wxi − α i + βi = b, xi ∈ G1 wxi + α i − βi = b, xi ∈ G2
α i , βi ≥ 0 The second model is the LDA model [4]. Consider a two-group classification problem, group G1 has N1 examples which denoted by X 1 , group G2 has N2 examples which denoted by
X 2 , LDA tries to find an optimal decision boundary b (determined by the pro-
jection direction w), where X1 and X2 can be separated as far as possible. More specifically, LDA maximizes the between-class variance T
variance w
S w w as follows:
wT S B w and minimizes within-class
602
P. Zhang et al.
Maximize wT S B w
(3)
Minimize wT S w w
(4)
Combining (3) and (4), we get the formulation of LDA model as follows
Maximize J F ( w) =
wT S B w wT S w w
(5)
As far as the MEMBV model be considered, it chooses the “maximizing betweenclass variance” which inherited from LDA and the “minimizing error rate” which inherited from MCLP as its two objective functions (as shown in Figure 1):
Maximize wT S B w
∑α
Minimize
(6) (7)
i
s.t.: wxi − α i ≤ b, xi ∈ G1 wxi + α i ≥ b, xi ∈ G2
αi ≥ 0 where w is the projection direction, b is the classification boundary. When combing (6) and (7) together into one single objective function by a compromise factor c, we get the MEMBV formulation as follows: (8) T
∑α
Minimize
i
− c ⋅ w SB w
s.t.: wxi − α i ≤ b, xi ∈ G1 wxi + α i ≥ b, xi ∈G2
αi ≥ 0 From formulation (8), we can see that MEMBV is, in fact, a concave quadratic programming model, we will discuss how to solve it in the next section. Table 1. Models from the point of view of multiple critiera programming
Maximize
MCLP LDA MEMBV
between-class variance √ √
distance to boundary
Minimize within-class variance √
error rate √ √
A Multiple Criteria and Multiple Constraints Mathematical Programming Model
603
3 Multiple Constraints MEMBV Model As we discussed above, classification models based on MCMP have shown their effectiveness in solving most of real-life datasets, i.e., Peng et al. [2] reported excellent performances of MEMBV on benchmark UCI datasets. However, due to the complexity of the real-world data, MCMP models sometimes are inadequate to achieve the best classification boundary. To cope with this difficulty, in this section, we extend the MCMP model into the MC2MP model, more specifically, we extend the MEMBV model into a multiple constraints MEMBV model by relaxing the boundary b into a linear combination of the left limitation bl and the right limitation br, which can be denoted as γ 1bl + γ 2br , where γ 1 + γ 2 = 1 . Figure 1 illustrates the multiple constraints MEMBV model.
Minimize
∑α
i
− c ⋅ wT S B w
(9)
s.t.: wxi − α i ≤ γ 1bl + γ 2br , xi ∈ G1 wxi + α i ≥ γ 1bl + γ 2br , xi ∈ G2
αi ≥ 0
Fig. 1. An illustration of the Multiple Constraints MEMBV model
4 Solution of Multiple Constraints MEMBV Model The multiple Constraints MEMBV model is a concave quadratic programming model. Concave quadratic programming is NP-hard problem. It is very difficult to get the global optimal solution, especially for large scale problems. In order to solve (9) efficiently, we propose an algorithm which converges to a local optimal solution of (9). In order to describe the algorithm in detail, we introduce some notations first. Let x = ( w, α , bl , br ) ,
f ( x) = ∑ α i − c ⋅ wT S B w , and
604
P. Zhang et al.
⎧ ( w, α , bl , br ) : ⎫ ⎪ wxi − α i ≤ γ 1bl + γ 2 b, xi ∈ G1 ⎪⎪ ⎪ X =⎨ ⎬ wxi + α i ≥ γ 1bl + γ 2 br , xi ∈ G2 ⎪ ⎪ ⎪⎩ ⎪⎭ α i ≥ 0, i = 1,..., n be the feasible region of model (9). Let
⎧0, x ∈ X , U X ( x) = ⎨ ⎩+∞, x ∉ X . Then (9) is equivalent to the following problem:
Minimize f ( x ) + U X ( x) Rewrite
(10)
f ( x) + U X ( x) as the following form, f ( x) + U X ( x) = g ( x ) − h( x ) ,
where g ( x ) =
1 1 ρ || x ||2 +∑ α i + U X ( x), h( x) = ρ || x ||2 +c ⋅ wT S B w 2 2
And ρ > 0 is a small positive number. Then g ( x) and h( x) are convex functions. By applying the simplified DC algorithm [5] in Le Thi Hoai An and Pham Dinh Tao to problem (10), we get the following algorithm: Algorithm 1. Given initial point
x 0 ∈ R 3n +1 and parameter ε > 0 at each iteration
K ≥ 1 , compute x k +1 by solving the convex quadratic programming: (Q k ) {minimize
1 ρ || x ||2 + ∑ α i − ( h ' ( x k ), x), x ∈ X } The stopping condition is 2
|| x k +1 − x k ||≤ ε . Theorem 1. After finite iterations, Algorithm 1 terminates at a local minimizing solution of (10).
5 Conclusions In this paper, we argue that due to the inherent complexity of the real world data, learning models which based on optimizing one single objective function or multiple criteria are inadequate in differentiating the genuine decision boundaries. Under this observation, we relax the constraints of the MEMVB model and propose a new multiple Constraints MEMVB model. Since this new multiple Constraints MEMVB model is a concave quadratic programming problem, we use the DC algorithm to transform it into a continuously convex quadratic programming problem. In the future, we will test our new model on some benchmark UCI datasets.
A Multiple Criteria and Multiple Constraints Mathematical Programming Model
605
References 1. Olson, D., Shi, Y.: Introduction to Business Data Mining. McGraw-Hill/Irwin (2007) 2. Zhang, P., Tian, Y., Zhang, Z., Li, A., Zhu, X.: Select Objective Functions for Multiple Criteria Programming. In: Proc. of IEEE/WIC/ACM International Conference on Web Intelligence (2008) 3. Fisher, R.A.: The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics 7, 179–188 (1936) 4. Shi, Y., Peng, Y., Xu, W., Tang, X.: Data Mining via Multiple Criteria Linear Programming: Applications in Credit Card Portfolio Management. International Journal of Information Technology and Decision Making 02(1), 131–151 (2002) 5. An, L.T.H., Tao, P.D.: Solving a class of linearly Constraintsed indefinite quadratic problem by D. C. algorithms. Journal of Global Optimization 11, 253–285 (1997)
New Unsupervised Support Vector Machines Kun Zhao1 , Ying-jie Tian2 , and Nai-yang Deng3, 1
2
Logistics School, Beijing Wuzi University
[email protected] Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences
[email protected] 3 College of Science, China Agricultural University
[email protected] Abstract. Support Vector Machines (SVMs) have been dominant learning techniques for more than ten years, and mostly applied to supervised learning problems. Recently nice results are obtained by two-class unsupervised classification algorithms where the optimization problems based on Bounded C-SVMs, Bounded ν-SVMs and Lagrangian SVMs respectively are relaxed to Semi-definite Programming. In this paper we propose another approach to solve unsupervised classification problem, which directly relaxes a modified version of primal problem of SVMs with label variables to a semi-definite programming. The preliminary numerical results show that our new algorithm often obtains more accurate results than other unsupervised classification methods, although the relaxation has no tight bound, as shown by an example where its approximate ratio of optimal values can be arbitrarily large. Keywords: Support Vector Machines, Semi-definite Programming, unsupervised learning, Kernel.
1
Introduction
Efficient convex optimization techniques have had a profound impact on the field of machine learning. Most of them have been used in applying quadratic programming techniques to Support Vector Machines (SVMs) and kernel machine learning[1]. Semi-definite Programming (SDP) extends the toolbox of optimization methods used in machine learning, beyond the current unconstrained, linear and quadratic programming techniques. Semi-definite Programming has showed its utility in machine learning. Lanckreit et al [2] show how the kernel matrix can be learned from data via SDP techniques. De Bie and Cristanini [3] relax two-class transduction problem to SDP
Supported by the Key Project of the National Natural Science Foundation of China (No.10631070),the National Natural Science Foundation of China (No.10601064) and Funding Project for Academic Human Resource Development in Institutions of Higher Learning Under the Jurisdiction of Beijing Municipality. Corresponding author.
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 606–613, 2009. c Springer-Verlag Berlin Heidelberg 2009
New Unsupervised Support Vector Machines
607
based on transductive SVMs. Xu et al [4] develop methods to two-class unsupervised and semi-supervised classification problems based on Bounded C− SVMs in virtue of relaxation to SDP in the foundation of [2,3]. Zhao et al [5,6] present other versions which are based on Bounded ν− SVMs and Lagrangian SVMs respectively. The virtues of [5] are the facility of selecting parameter and better classification results, while the superiority of [6] is the much less consumed CPU seconds than other algorithms [4,5] for the same data set. All of the unsupervised classification algorithms mentioned above have complicated procedures of relaxing NP-hard problem to Semi-definite Programming, because they all need to find the dual problem twice. Moreover, approximation ratios in general of these SDP relaxations have no any data-independent upper bounds[7].
2
Unsupervised Classification Algorithm
Considering the supervised classification problem, and training set is T={(x1 , y1 ), . . . , (xl , yl )} where yi ∈ {−1, +1} is corresponding output of input xi . The goal of SVMs is to find the linear classifier f (x) = (w · x) + b in Hilbert space that maximizes the minimum misclassification margin l
min
w∈H,b∈R,ξ∈Rl
) 1 w2 + C ξi 2 i=1
(1)
s.t. yi (w · xi + b) ≥ 1 − ξi , i = 1, 2, . . . , l ξi ≥ 0, i = 1, 2, . . . , l
(2) (3)
The problem (1)-(3) is the primal problem of standard Support Vector Machines (C-SVMs)[8]. In order to avoid the complicated procedure mentioned in Sect. 1, we will modify problem (1)-(3) in order to solve unsupervised classification problem appropriately. When variable ξi is replaced by ξi2 , the constraints (3) can be dropped. So we get the modified primal problem of Support Vector Machines l
min
w∈H,b∈R,ξ∈Rl
) 1 w2 + C ξi2 2 i=1
(4)
s.t. yi (w · xi + b) ≥ 1 − ξi2 , i = 1, 2, . . . , l
(5)
When the labels yi , i = 1, · · · , l are unknown, a NP-hard optimization problem for unsupervised classification problem is formulated as follows: l
min
yi ∈{1,−1}l ,w∈H,b∈R,ξ∈Rl
) 1 w2 + C ξi2 2 i=1
s.t. yi (w · xi + b) ≥ 1 − ξi2 , i = 1, 2, . . . , l −ε ≤
l ) i=1
yi ≤ ε
(6) (7) (8)
608
K. Zhao, Y.-j. Tian, and N.-y. Deng
Constraint about class balance −ε ≤
l )
yi ≤ ε should be added ( ε is an
i=1
integer), otherwise we can simply assign all the data to the same class and get unbounded margin; moreover, it can avoid noisy data’s influence in some sense. Set λ = (y T , wT , b, ξ T )T , then we get 1 T λ A0 λ λ 2 s.t. λT Ai λ ≥ 1, i = 1, 2, . . . , l
min
−ε ≤
(eTl , 0Tn+l+1 )λ
≤ε
(9) (10) (11)
where A0 = Diag(0l×l , In×n , 0, 2CIl×l ) ⎛ ⎞ 0l×l Bi1l×(n+1) 0l×l Ai = ⎝ Bi1Tl×(n+1) 0(n+1)×(n+1) 0(n+1)×l ⎠ 0l×l 0l×(n+1) Bi2l×l
(12) (13)
Bi1l×(n+1) (i = 1, 2, . . . , l) is the matrix which its’ elements in ith row are 1 T (xi , 1) and the rest elements are all zeros. Bi2l×l (i = 1, 2, . . . , l) is the matrix 2 which its’ element in ith row ith column is 1 and the rest elements are all zeros too. Let λλT = M and relax λλT = M to M 0 and diag(M )l = el . Then we get Semi-definite Programming 1 Tr(M A0 ) M 2 s.t. Tr(M Ai ) ≥ 1, i = 1, 2, . . . , l
min
−εe ≤
Ml (eTl , 0Tl+1+n )T
M 0, diag(M )l = el
≤ εe
(14) (15) (16) (17)
where diag(M )l denotes the first l diagonal elements of matrix M , and Ml denotes the first l rows of matrix M . In the formulation (6)-(8), xi = φ(xi ) where the mapping φ corresponds to a kernel function. In [2,4] kernels can be applied to dual problems directly, so the corresponding mapping to some kernel needs not to be known. But in our method, the primal problem is focused, so the mapping needs to be known in prior. Therefore we should seek a mapping which inner product can approximate the given kernel. In [9], such mappings are constructed for a given kernel based on the following theorem: Theorem 1. Assume that training data set T is drawn from some distribution D over Rn and labeled +1 or -1 by some unknown function. f denotes the combined distribution over labeled examples. If f has margin γ in the φ-space
New Unsupervised Support Vector Machines
H induced by kernel K(·, ·), then with probability ≥ 1 − δ, for d =
609
8 1 1 [ + ln ], ε γ2 δ
and draw x1 , . . . , xd from D, the mapping ˜ φ(x) = (K(x, x1 ), . . . , K(x, xd ))T
(18)
produces a distribution f that is linearly separable with error at most ε. Based on Theorem 1, for a given kernel K(·, ·), we first construct the map˜ ˜ i ), i = 1, · · · , l, then get the following optimization ping φ(·), and let xi = φ(x problem from the problem (14)-(17): 1 0 ) Tr(M A 2 i ) ≥ 1, i = 1, 2, . . . , l s.t. Tr(M A
min
(19)
M
(20)
Ml (eTl , 0Tl+1+d )T
≤ εe −εe ≤ M 0, diag(M )l = el
(21) (22)
where 0 = Diag(0l×l , Id×d , 0, 2CIl×l ) A ⎛ ⎞ l×(d+1) 0l×l 0l×l Bi1 ⎟ i = ⎜ T A 0(d+1)×(d+1) 0(d+1)×l ⎠ ⎝ Bi1 l×(d+1)
0l×l
0l×(d+1)
(23) (24)
Bi2l×l
l×(d+1) (i = 1, 2, . . . , l) is the matrix which its’ elements in ith row are Bi1 1 ˜ (φ(xi )T , 1) and the other elements are all zeros. Bi2l×l (i = 1, 2, . . . , l) is the 2 same as that in problem (14)-(17). After getting the optimal solution M ∗ to problem (19)-(22), training data ’s labels y ∗ T are obtained by the following rounding method: 1) Find the first l elements t = (t1 , . . . , tl )T in eigenvector corresponding to the largest eigenvalue of the matrix M ∗ . Construct the vector y = (¯ y1 , · · · , y¯l )T = l ) (sgn(t1 ), · · · , sgn(tl ))T . If y satisfy the constraint −ε ≤ y¯i ≤ ε, set y ∗ = y, i=1
which is final label of data and two classes of data are clustered. l ) 2) If y dose not satisfy the constraint −ε ≤ y¯i ≤ ε, let δ = |y T e−ε|, the labels i=1
y ∗ is obtained from y by the following way: select δ smallest absolute values of ti from the majority class, and change the corresponding labels in y.
3
Numerical Results
In order to evaluate the performance of our unsupervised classification algorithm (Primal-SDP), we will compared it with C-SDP[4], ν-SDP[5], L-SDP[6],
610
K. Zhao, Y.-j. Tian, and N.-y. Deng Table 1. Results about six algorithms on three synthetic data sets Algorithm
AI
Gaussian
Circles
Primal-SDP L-SDP ν-SDP C-SDP K-means DBSCAN
2/19 2/19 2/19 2/19 2/19 7/19
7/30 7/30 5/30 7/30 2/30 9/30
3/20 4/20 3/20 4/20 10/20 3/20
Table 2. Results about four algorithms on three synthetic data sets with polynomial kernel Algorithm
AI
Gaussian
Circles
Primal-SDP L-SDP ν-SDP C-SDP
2/19 2/19 2/19 2/19
4/30 4/30 5/30 5/30
2/20 2/20 2/20 2/20
30
20
8
7
25
15
6 10 5
20
5
4
0
3
15
2
−5
1 −10
10
0 −15
−1
5
2
4
6
8
10
12
14
16
18
−2
0
1
2
3
4
5
6
7
−20 −20
8
−15
−10
−5
0
5
10
15
20
Fig. 1. Results by Primal-SDP on the three synthetic data sets 30
20
8
7
25
15
6 10 5
20
5
4
0
3
15
2
−5
1 −10
10
0 −15
−1
5
2
4
6
8
10
12
14
16
18
−2
0
1
2
3
4
5
6
7
8
−20 −20
−15
−10
−5
0
5
10
15
20
Fig. 2. Results by Primal-SDP on the three synthetic data sets with polynomial kernel
straightforward k-means algorithm and DBSCAN [10] on three synthetic data sets including data set AI, Gaussian and Circles with almost same to that in [4] with SeDuMi library[11]. Parameter ε in all of Primal-SDP, C-SDP, ν-SDP and L-SDP equals to 1, and parameter ν = 0.5 in ν-SDP, C = 100 in Primal-SDP, C-SDP and L-SDP. The parameter k (number of objects in a neighborhood of an object) in DBSCAN is 3. The classification results are in Table 1, and the number is the misclassification percent. So as to classify inseparable data set, it seems feasible to use kernel with F1 (x) = (K(x, x1 ), . . . , K(x, xd )) to approximate mapping corresponding to kernel K(·, ·). Consider polynomial kernel with c = 1 and d = 2. The classification results are in Table 2, and the number is the misclassification percent.
New Unsupervised Support Vector Machines
611
Table 3. Results about four algorithms on three synthetic data sets with Gaussian kernel Algorithm
AI
Gaussian
Circles
Primal-SDP L-SDP ν-SDP C-SDP
1/19 1/19 2/19 2/19
3/30 3/30 3/30 4/30
2/20 2/20 2/20 2/20
Table 4. Results about four algorithms on Digit data sets Algorithm
Digit23
Digit56
Digit17
Digit09
Primal-SDP L-SDP ν-SDP C-SDP
2/20 2/20 2/20 2/20
2/20 2/20 2/20 2/20
2/20 2/20 1/20 2/20
2/20 2/20 2/20 2/20
30
20
8
7
25
15
6 10 5
20
5
4
0
3
15
2
−5
1 −10
10
0 −15
−1
5
2
4
6
8
10
12
14
16
18
−2
0
1
2
3
4
5
6
7
8
−20 −20
−15
−10
−5
0
5
10
15
20
Fig. 3. Results by Primal-SDP on the three synthetic data sets with Gaussian kernel Table 5. Results about four algorithms on Digit data sets with polynomial kernel Algorithm
Digit23
Digit56
Digit17
Digit09
Primal-SDP L-SDP ν-SDP C-SDP
2/20 2/20 1/20 2/20
2/20 2/20 2/20 2/20
2/20 2/20 1/20 2/20
2/20 2/20 2/20 2/20
Considering Gaussian kernel with parameter σ = 1 we can get better classification results in Table 3. We also conduct our algorithm on the real data sets which can be obtained from http://www.cs.toronto.edu/ roweis/data.html, including Face and Digit data sets. To evaluate clustering performance, a labeled data set was taken and the labels are removed, then run unsupervised classification algorithms, and labeled each of the resulting clusters with the majority class according to the original training labels, then measured the number of misclassification. The results are showed in Table 4 and the number is the misclassification percent. Considering polynomial kernel and Gaussian kernel with the parameters same to part of synthetic data, and classification results are showed in Table 5 and Table 6 respectively.
612
K. Zhao, Y.-j. Tian, and N.-y. Deng
Table 6. Results about six algorithms on Face and Digit data sets with Gaussian kernel Algorithm
Digit23
Digit56
Digit17
Digit09
Face12
Face34
Face56
Face78
Primal-SDP L-SDP ν-SDP C-SDP K-means DBSCAN
2/20 2/20 2/20 2/20 4/20 10/20
2/20 2/20 1/20 2/20 5/20 10/20
2/20 2/20 1/20 2/20 4/20 10/20
2/20 2/20 2/20 2/20 4/20 10/20
2/20 4/20 2/20 4/20 2/20 10/20
2/20 2/20 2/20 2/20 4/20 10/20
2/20 2/20 1/20 2/20 4/20 10/20
2/20 2/20 2/20 2/20 2/20 10/20
Fig. 4. Images of every row are cluster discovered by Primal-SDP with Gaussian kernel
4
Conclusions
From Sect. 3 we can learn that Primal-SDP is better than other unsupervised classification algorithms. Obviously, Semi-definite Programming relaxation for min min min problem (6)-(8) provides a lower bound, i.e., zSDP ≤ zN P , where zSDP and min zN P are optimal objective function value of Semi-definite Programming (14)(17) and problem (6)-(8) respectively. There is in general no data-independent min min upper bound on zN P /zSDP , as shown by the following example: 2 and Considering the data set {x1 , x2 } ∈ R2 , and x1 = x2 , x1 ∈ R+ 2 x2 ∈ R− .
) 1 w2 + C ξi2 , 2 i=1 2
min 2
yi ∈{1,−1}2 ,w∈R ,b∈R,ξ∈R2
s.t. yi (w · xi + b) ≥ 1 − ξi2 , i = 1, 2, −ε ≤
2 ) i=1
yi ≤ ε.
(25) (26) (27)
New Unsupervised Support Vector Machines
613
Select parameter ε = 1, and yi ∈ {1, −1}2, so we can get y1 = 1 and y2 = −1 1 ∗ 2 min w . or y1 = −1 and y2 = 1. The optimal objective function value zN P ≥ 2 ∗ ∗ For the data set x1 = x2 , we can get w = (y1 x1 + y2 x2 ) and b = 0 easily. 1 min (y1 x1 + y2 x2 )T (y1 x1 + y2 x2 ) = The optimal objective function value zN P ≥ 2 1 T (x x1 + xT2 x2 − 2xT1 x2 ). 2 1 min For Semi-definite Programming (14)-(17), I is a feasible solution and zSDP ≤ T T T x + x x − 2x x x 1 2 2 1 2 1 min min , which can be arbitrarily large, 1 + 2C. So zN P /zSDP ≥ 2 + 4C depending on the data set {x1 , x2 }. Acknowledgements. We would like to thank professor Zhi-Quan Luo of University of Minnesota, who shares his insights with us in discussions.
References 1. Schoelkopf, B., Smola, A.: Learning with kernels: Support Vector Machines, Regularization, Optimization,and Beyond. MIT Press, Cambridge (2002) 2. Lanckriet, G., Cristianini, N., Bartlett, P., Ghaoui, L., Jordan, M.: Learning the kernel matrix with semidefinite programming. Journal of Machine learning research 5 (2004) 3. De Bie, T., Crisrianini, N.: Convex methods for transduction. In: Advances in Neural Information Processing Systems (NIPS-2003), vol. 16 (2003) 4. Xu, L., Neufeld, J., Larson, B., Schuurmans, D.: Maximum margin clustering. In: Advances in Neural Information Processing Systems (NIPS-2004). 17 (2004) 5. Zhao, K., Tian, Y.J., Deng, N.Y.: Unsupervised and Semi-supervised Two-class Support Vector Machines. In: Sixth IEEE Internaitonal Conference on Data Minging workshops, Hong Kong, December 2006, pp. 813–817 (2006) 6. Zhao, K., Tian, Y.-J., Deng, N.-Y.: Unsupervised and semi-supervised lagrangian support vector machines. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007. LNCS, vol. 4489, pp. 882–889. Springer, Heidelberg (2007) 7. Zhao, K.: Unsupervised and Semi-supervised Support Vector Classification, Phd.Thesis (2008) 8. Deng, N.Y., Tian, Y.J.: A New Method of Data Mining: Support Vector Machines. Science Press (2004) 9. Balcan, M.F., Blum, A., Vempala, S.: Kernels as Features: On Kernels, Margins and Low-dimension Mappings. Machine Learning 65, 79–94 (2006) 10. Ester, M., Kriegel, H., Sander, J., Xu, X.: Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, pp. 226–231 (1996) 11. Sturm, J.F.: Using SeDuMi1.02, A Matlab Toolbox for Optimization over Symmetric Cones. Optimization Methods and Software 11-12, 625–653 (1999)
Data Mining for Customer Segmentation in Personal Financial Market Guoxun Wang1,2, Fang Li2, Peng Zhang1, Yingjie Tian1, and Yong Shi1 1
Research Center on Fictitious Economy & Data Science, Chinese Academy of Sciences, Beijing 100190, China {guoxunwang,lifg011,tianyingjie1213}@163.com 2 School of Computer Science and Information Engineering, Henan University, China
[email protected] Abstract. The personal financial market segmentation plays an important role in retail banking. It is widely admitted that there are a lot of limitations of conventional ways in customer segmentation, which are knowledge based and often get bias results. In contrast, data mining can deal with mass of data and never miss any useful knowledge. Due to the mass storage volume of unlabeled transaction data, in this paper, we propose a clustering ensemble method based on majority voting mechanism and two alternative manners to further enhance the performance of customer segmentation in real banking business. Through the experiments and examinations in real business environment, we can come to a conclusion that our model reflect the true characteristics of various types of customers and can be used to find the investment preferences of customers. Keywords: Personal financial market, Customer segmentation, Data mining, Clustering Ensembles, classification.
1 Introduction Customer segmentation in the personal financial market is the process dividing clients into several groups based on their characteristics (including demographic characteristics and the investment behavior characteristics), making customers in the same group with a similar investment buying. At present, personal financial market services in commercial banks of China is still at the initial stage. Customer segmentation methods used in this field are very simple, and most of them are based on questionnaires, thus they often get bias results due to the respondents’ subjectivity and the limited available samples. All this has led to a serious situation that the conflicts between banks and customers are more and more intensive because of the lack of personalized services. At the same time, some commercial banks have already built up data warehouses which store a large volume of data related to transaction records and personal information, for instance, purchasing records, investment buying, age, career, marital status, education background and so on. How to extract the potential useful information from this massive, incomplete and noise data and build smart customer segmentation models has become the consensus of business decision-makers. Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 614–621, 2009. © Springer-Verlag Berlin Heidelberg 2009
Data Mining for Customer Segmentation in Personal Financial Market
615
It is inevitable to use data mining to solve this problem. Data mining has achieved great success in business world [5]. For example, retail stores routinely use data mining tools to learn about purchasing habits of their customers. Some other successful applications of data mining include credit card management, insurance, telemarketing, telecommunication and human resource management etc. Clustering is a major technology in data mining. It divide a group of individuals into several clusters, and makes individuals belong to the same cluster close to each other, while individuals belong to different clusters dissimilar to each other. Because of these characteristics, clustering is very suitable for customer segmentation, especially where labeled samples are unavailable. In this paper, we propose a clustering ensemble method based on majority voting scheme and two alternative manners to further enhance the effectiveness of customer segmentation. The remained of this paper is organized as follows. In section 2, we focus on the clustering ensemble method; section 3 introduces the classification methods which being used to promote the accuracies of predictions; in section 4, we show the experimental results of clustering and classification. Finally we conclude this paper with a summary.
2 Clustering Ensemble Algorithms Clustering algorithms provide means to explore and ascertain structure within the data, by organizing it into groups or clusters. Many clustering algorithms exist in the literature [1, 2], such as non-parametric density estimation based methods, central clustering and square-error clustering, and so on. But in exploratory data analysis, each clustering algorithms may have different strengths and weaknesses. For instance, K-means [8] is one of the simplest and most widely used clustering methods. The drawbacks of K-means algorithm is that different and possibly inconsistent data partitions may be produced by given different initializations, especially when there are some dirty data. In order to produce a consistent result, the clustering ensembles method [11] had been adopted. In recent years a series of studies and experiments have shown that clustering ensembles method can be more advantageous than a single algorithm for the robustness and stability. It combines several data partitions results into a unified division result by some strategy. The majority voting scheme is the most common associations to solve this problem. The idea behind it is that the judgment of a group is superior to those of individuals. And some theoretical foundations and behavior of this technique can be found in [9, 10]. However, at present, almost all of those methods are constrained on the single clustering method, and merging the results generated by different parameters setting, for instance, the voting-k-means. In this paper, the proposed method solves this problem by combining several data partition results generated by different clustering methods such as k-means, DBSCAN, EM and so on. It works as the following ways. Firstly, we assume that D is a data set containing N objects; M is the co-association matrix according to the partition result and initialized by NULL; R represent the number of clustering methods. We run those clustering methods and get R different partitions of the data. The results are combined into a co-association matrix [3], where each cell (i, j) represents the number of times the given
616
G. Wang et al.
sample pair has co-occurred in a cluster. Each co-occurrence is therefore a vote towards their gathering in a cluster. By dividing the co-occurrence matrix with R and setting a proper threshold, we can easily get a result based on majority vote. When it comes to combing several clustering results, it is very important to convert the labels in distinct partitions. We solve this problem in ways as it done in the voting-k-means. The purpose of this method is to find the pair of clusters which get the highest matching score. The matching score is defined by the fraction of shared samples. The procedure can be depicted as follows:
Input :Partitions P1 , P2 ; n, the total number of samples. Output:P2 ' , reordered P2 matching to clusters in P1 ; let: Pi = partitons i:(nci , C1i ...Cnci i ) nci = number of clusters in partition i C ij = {sl : sl ∈ cluster j of partition i} = list of samples in the j th cluster of partition i ⎧⎪1 if sk ∈Cij X ij : X ij (k ) = ⎨ , k = 1, 2,..., n ⎪⎩ 0 otherwise = binary valued vector representation of cluster C ij
Steps : 1. Convert Cij into the binary valued vector X ij : Cij → X ij , i=1,2; j=1,2,..nci 2. Set: P2' (i) = 0, i=1,2,..nc2 (clusters new indexes) 3. Do min{nc1, nc2} times: -Find the best matching pair of clusters, (k, l), between P1 and P2 according to the match coefficient: 1T 2 ⎧⎪ X i Xj ⎫⎪ (k, l) = argmax ⎨ 1T 1 2T 2 1T 2 ⎬, i, j ⎪ ⎩ X i Xi + X j X j − X i X j ⎪⎭ 2 2 -Rename Cl as Ck : P2new_ indexes (l) = k.
-Remove Ck1 and Cl2 from P1 and P2,respectively.
4. If nc1 > nc2 go to step 5; otherwise fill in empty locations in P2new_ indexes with arbitrary labels in the set {nc1 +1,...,nc2 }. 5. Return P2' .
3 Classification Based on Clustering There are several different choices to build a forecasting or classification model for the customer segmentation in business environments. The first choice is to take the clustering results as the prediction model directly. When a customer comes, this
Data Mining for Customer Segmentation in Personal Financial Market
617
method calculates the distances from the data point to each cluster center, and then classifies the point into the cluster whose center is the nearest to it. It can be easily implemented and also maintains a relatively high accuracy. However, owing to the time consuming computation of distances between each data point and the cluster centers, it will be inefficient in large volume of dataset which has a lot of attributes. So, we solve this problem in another classification way by the following steps. Firstly, we add labels for every data points in the dataset according to the clustering results. Secondly, classification method such as C5, Logistic Regression, MCLP (Multiple Criteria Linear Programming) [12], MCQP (Multiple Criteria Quadratic Programming, originated from MCLP), and SVM (Support Vector Machine) [13] will be trained based on the new labeled dataset. In this paper, we selected the MCLP, MCQP and SVM as the classification models because of their excellent performances in many different spheres. At last, this classifier can be used by the CB bank to do customer segmentation in personal financial marketing. In resent years, the MCLP models have exhibited their powerful ability to classify different kinds of real-life data, such as credit card data, network intrusion data, VIP Email data, and biologic data. Hence, we will give a short introduction of the formulation of MCLP model here. Assume a two-group classification problem {G1, G2}. Given a training sample Tr = { A1, A2,..., An} , where n is the total number of records in the training sample. Each training instance Ai (i = 1,...n) has r attributes. A boundary scalar b is used to separate G1 and G2. Thus a vector X = ( x1, x 2,..., xn ) ∈ establish the following linear inequality [5]:
R r can be identified to
<
AiX b , some Ai ∈ G1 AiX ≥ b , some Ai ∈ G 2
(1)
To formulate the criteria and complete constraints for data separation, some variables will be introduced. α i is defined to measure the overlapping of two-group boundary for record Ai , that means if Ai ∈ G1 but we misclassified it into G 2 or vice versa, there is a distance α i and the value equals AiX − b . Then β i is defined to measure *
Ai is correctly * , where b = b + α i or
the distance of record Ai from its adjusted boundary b , that means if classified, there is a distance β i and the value equals AiX − b
*
b* = b - α i . To separate the two groups as far as possible, two objective functions should be designed which minimize the overlapping distances and maximize the distances between classes. Supposed while
β
q q
α
p p
denotes for the relationship of all overlapping α i
denotes for the aggregation of all distances
βi
. The final correctly
classified instances are depended on simultaneously minimize
β
q q
α
p p
and maximize
. Thus, a generalized bi-criteria programming model can be formulated as [7]:
618
G. Wang et al.
(Model 1)
Minimize
α
p p
and maxmize
β
q q
(2)
subject to : AiX - α i + β i - b =0,
Ai ∈ G1 AiX + α i - β i - b =0, Ai ∈ G 2
Where Ai ψis given,
α = (α 1, α 2,..., α n)T ≥0, β = ( β 1, β 2,..., β n)T ≥0, X
and b are unrestricted. When choosing linear formulation for (2), we get the original multiple criteria linear programming (MCLP) model as follows: (Model 2)
Minimize
n
n
i =1
i =1
wα ∑ α i 2 − wβ ∑ β i 2
(3)
Subject to:
Ai x − α i + βi − b = 0, ∀Ai ∈ G1 Ai x + α i − βi − b = 0, ∀Ai ∈ G2
α i , βi ≥ 0, i = 1,..., n Where Ai is given, X and b are unrestricted.
4 Experiments A major China commercial bank provided the data for this study. This CB Bank plans to clustering customers into three kinds, namely conservative customers, moderate customers and speculative customers. After the data cleaning, format conversion and data integration, at last, we built a table contains 30287 records and 143 attributes, and each record represents and only represents a customer. 4.1 Clustering Results
In our experiment, we chose the K-means, EM, DBSCAN and OPTICS as the foothold of the proposed methods. It is prefer to generate three clusters according to the need in personal financial marketing. So, we let the K-means generate three clusters and the others output three or four cluster by adjusting their parameters. And then we applied the proposed clustering ensemble method by the manner mentioned above. Firstly, we use the proposed method to do clustering without any feature selection and get the result as shown in Table 1. We can see that nearly half of the people were assigned to the cluster 0, while one fourth of them were marked with cluster 2 and thirty percent located in the cluster 1. The number of attributes directly affects the speed of clustering. Due to hundreds of millions of records in the operational environment of the CB Bank, it is almost impossible to generate so many variables and do clustering on them. The best way to solve this problem is by feature extraction. In this paper, the F-score algorithm [4] was chose for feature extraction. F-score is a simple technique which measures the
Data Mining for Customer Segmentation in Personal Financial Market
619
ability of a feather to discriminate two sets of real number. The larger the F-score is, the more discriminative this feature is. Finally, we selected 8 attributes which have the largest F-score value for next clustering. The Table 2 shows the result based on the 8 attributes selected by F-score algorithm. Form table 2, it is clearly to see that despite only 8 attributes were used in the clustering, the outcome still maintains the data distribution depicted by 143 attributes in table 1. Table 1. Results of clustering ensembles with all 143 attributes Cluster NO. 0 1 2
Records 14223 8745 7319
Percentage 46.96% 28.87% 24.17%
Table 2. Results of clustering ensembles with 8 attributes selected by F-score Cluster NO. 0 1 2
Records 6699 7503 16085
Percentage 22.12% 24.77% 53.11%
By analyzing the figures in Table 3, we found that customers in cluster 2 make most of their assets invested in time deposits which has the lowest risk level. This value as high as 0.83, far beyond values of the other two groups. And they get the lowest value in other properties which on behalf of high-risk financial products. So we determined that customers in cluster 2 are conservative. On the contrary, customers in cluster 1 get the highest value from properties 2 to 8 which all represent the high-risk financial products. So we confirm that cluster 1 represent the speculative customers. The values in cluster 0 are between each of the cluster 1 and cluster 2, which mean that customers in this cluster are moderate investors. And the variances in each cluster also show the same characteristics. Table 3. Attributes selected by F-score and their mean in each cluster Attribute NO. 1 2 3 4 5 6 7 8
Mean Cluster 0 0.025 0.011 0.020 0.012 0.013 0.016 0.040 0.034
Cluster 1 0.093 0.213 0.258 0.239 0.317 0.311 0.670 0.670
Variance Cluster 2 0.842 0.006 0.009 0.012 0.010 0.010 0.055 0.063
Cluster 0
0.007 0.002 0.002 0.002 0.003 0.003 0.010 0.748
Cluster 1
0.035 0.043 0.045 0.041 0.054 0.046 0.083 4.367
Cluster 2
0.029 0.001 0.001 0.002 0.002 0.001 0.041 0.651
620
G. Wang et al.
We also select 200 customers randomly from the dataset and let the experts working at this field to judge the accuracy of the clustering results manually. At last, the results show that 178 customers have been estimated correctly, the accuracy rate is 89%. So, our clustering model can be safely applied to the next step of customer segmentation in the CB Bank. 4.2 Classification Results
In this paper, a ten-folder cross validation is used to test the performance of MCLP, MCQP and SVM models. From the Table 4, we can see that the average accuracies of all three models are above 93%, and the average accuracy of MCLP on the testing sets is nearly 90%; MCQP performs better than the MCLP with the 92.6% testing accuracy; obviously, the SVM achieved the best results. The results indicate that the separations among the three clusters are represented very well by our models. Table 4. Results of MCLP, MCQP and SVM classifier on the new dataset
Cluster 1 Cluster 2 Cluster 3
Accuracy of MCLP Training Testing 97.50% 90.62% 93.74% 89.30% 90.10% 88.47%
Accuracy of MCQP Training Testing 97.30% 95.72% 95.45% 90.35% 93.22% 91.88%
Accuracy of SVM Training Testing 98.63% 95.44% 97.76% 91.88% 97.14% 93.59%
5 Conclusion In this paper, we proposed a clustering ensemble method based on the majority voting scheme. We also introduced two alternative methods to further boost the performances of customer segmentation, the first one takes the clustering results as the predict model directly, the second one adopts the method from clustering to classification. In the following, we shortly illuminated the basic data processing such as data cleaning, format conversion, data integration, and so on. After the analysis of more than 143 properties, we finally got 8 attributes by “f-score” to build up the clustering and classification models. After the strict examinations in actual business, experts in the personal financial marketing believed that our models can properly reflect the true characteristics of the various kinds of customers and can be used to find the investment trends of customers. Consequently, there are no doubts that it can make a great boom in the process of providing personalized services by using our models.
Acknowledgements In the process of performing this paper, the authors get much support from the follow members: Nie Guangli, Zhang Zhan. We express our esteem with the deepest appreciation. This research has been partially supported by a grant from National Natural Science Foundation of China (#70621001, #70531040, #10601064).
Data Mining for Customer Segmentation in Personal Financial Market
621
References 1. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988) 2. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31(3), 264–323 (1999) 3. Fred, A.: Finding consistent clusters in data partitions. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 309–318. Springer, Heidelberg (2001) 4. Chen, Y.W., Lin, C.J.: Combining SVMs with various feature selection strategies (2005) 5. Olson, D.L., Shi, Y.: Introduction to Business Data Mining, pp. 8–9. McGraw-Hill/Irwin (2005) 6. Fisher, R.A.: The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics 7, 179–180 (1936) 7. Zhang, J., Shi, Y., Zhang, P.: Several Multi-criteria Programming Methods for Classification. Computers & Operations Research (2007), doi:10.1016/j.cor. 2007.11.001 8. MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967) 9. Lam, L., Suen, C.Y.: Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Trans. Systems, Man, and Cybernetics 27(5), 553–568 (1997) 10. Lam, L.: Classifier combinations: Implementations and theoretical issues. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 77–86. Springer, Heidelberg (2000) 11. Strehl, A., Ghosh, J.: Cluster ensembles-a knowledge reuse framework for combing multiple partitions [J]. Journal of Machine Learning Research 3(3), 583–617 (2003) 12. Shi, Y., Wise, W., Lou, M., et al.: Multiple Criteria Decision Making in Credit Card Portfolio Management. Multiple Criteria Decision Making in New Millennium, 427–436 (2001) 13. Vapnik, V.: Statistical Learning Theory. John Wiley&Sons (1998)
Nonlinear Knowledge in Kernel-Based Multiple Criteria Programming Classifier Dongling Zhang1,2, Yingjie Tian1,*, and Yong Shi1,3 1
Research Center on Fictitious Economy and Data Science, Graduate University of Chinese Academy of Sciences, Beijing 100190, China 2 Beijing University of Science and Technology, Beijing 100083, China 3 College of Information Science and Technology, University of Nebraska at Omaha, Omaha NE 68182, USA
[email protected],
[email protected],
[email protected] Abstract. Kernel-based Multiple Criteria Linear Programming (KMCLP) model is used as classification methods, which can learn from training examples. Whereas, in traditional machine learning area, data sets are classified only by prior knowledge. Some works combine the above two classification principle to overcome the defaults of each approach. In this paper, we propose a model to incorporate the nonlinear knowledge into KMCLP in order to solve the problem when input consists of not only training example, but also nonlinear prior knowledge. In dealing with real world case breast cancer diagnosis, the model shows its better performance than the model solely based on training data. Keywords: Kernel-based MCLP, nonlinear prior knowledge, classification.
1 Introduction Multiple Criteria Linear Programming (MCLP) is used as a classification method which is based on a set of classified training examples [1]. By solving a linear programming problem, MCLP can find a hyperplane to separate two classes. The principle of MCLP classifier is to train on the training set then get some separation model that can be used to predict the label of the new data. However, MCLP model is only applicable for linear separable data. To facilitate its application on nonlinear separable data set, kernel-based multiple criteria linear programming (KMCLP) method was proposed by Zhang et al [2], which introduces kernel function into the original MCLP model to make it possible to solve nonlinear separable problem. Likewise, there are also many other prevalent classifiers, such as Support Vector Machine, Neural Networks, Decision Tree etc., which share the same principle of learning solely from training examples. This inevitably can bring out some disadvantages. One problem is that noisy points may lead to poor result. The other more important one is that when training samples are hard to get or when sampling is costly, these methods will be inapplicable. Different from the above empirical classification methods, another commonly used principle in some area to classify the data is prior knowledge. Two well-known traditional methods are Rule-Based reasoning and Expert System. In these methods, *
Corresponding author.
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 622–629, 2009. © Springer-Verlag Berlin Heidelberg 2009
Nonlinear Knowledge in Kernel-Based Multiple Criteria Programming Classifier
623
prior knowledge can take the form of logical rule which is well recognized by computer. However, these methods also suffer from the fact that pre-existing knowledge can not contain imperfections [3]. Whereas, as is known to all, most of the knowledge is tacit in that it exists in people’s mind. Thus, it is not an easy task to acquire perfect knowledge. Recent works combine the above two classification principles to overcome the defaults of each approach. Prior knowledge can be used to aid the training set to improve the classification ability; also training example can be used to refine prior knowledge. In such combination methods, Knowledge-Based Artificial Neural Networks (KBANN) and Knowledge-Based Support Vector Machine (KBSVM) are two representatives. KBANN is a hybrid learning system which firstly inserts a set of hand-constructed, symbolic rules into a neural network [4]. KBSVM provides a novel approach to incorporate prior knowledge as constraints into the original support vector classifier [5, 6]. Some works are focused on incorporating nonlinear knowledge into nonlinear kernel classification problem [7]. In addition to the application in classification problem, [8] has shown the effectiveness of introduce prior knowledge into function approximation. Based on the previous works, this paper proposes an approach to incorporate nonlinear prior knowledge into kernel-based multiple criteria linear programming (KMCLP) model. It is supposed to be necessary and possible that KMCLP model make better use of knowledge to achieve better outcomes in classifying nonlinear separable data. This approach can also extend the application of KMCLP to the cases where nonlinear prior knowledge is available. The outline of the paper is as follows. We will start from giving a brief review of KMCLP model in Section 2, plus an overview of the nonlinear prior knowledge and how to express it into logical implications. Then Section 3 introduces how to incorporate the nonlinear prior knowledge into KMCLP model. This model is capable of generating a classifier where both real data and nonlinear prior knowledge are available. To demonstrate the effectiveness of the model, the results of experiment are provided in Section 4. Finally, the conclusion is given in Section 5.
2 KMCLP Model and Prior Knowledge 2.1 A Brief Review of KMCLP Model Kernel-based multiple criteria linear programming (KMCLP) is a classification method [2], which is originated from multiple criteria linear programming model (MCLP) [9]. The derivation of KMCLP can be described as follows: Suppose the training set of the classification problem is X, which has n observations in it. Of each observation, there are r attributes (or variables) which can be any real value and a two-value class label G (Good) or B (Bad). Of the training set, the ith observation can be described by Xi = (Xi1, . . . , Xir), where i can be any number from 1 to n. In linear discriminate analysis, the purpose is to determine the optimal coefficients (or weights) for the attributes, denoted by W = (w1, …, wr) and a boundary value (scalar) b to separate two predetermined classes: G (Good) and B (Bad); that is
624
D. Zhang, Y. Tian, and Y. Shi
Xi 1w1 + L + Xir wr ≤ b, Xi ∈ B (Bad ) and Xi 1w1 + L + Xir wr ≥ b, Xi ∈ G (Good )
Based on two linear discriminate analysis criteria maximize the minimum distances (MMD) and minimizing the sum of the deviations (MSD), MCLP is trying to find the compromise solution of both [9]. The experiments and applications results showed that it can achieve satisfying results. KMCLP introduces kernel function into the original MCLP model to make it possible to solve nonlinear separable problem. It assumes that the solution of MCLP model can be described in the following form: n
w = ∑ λi yi X i
(1)
i =1
Here, n is the sample size of data set. Xi represents each training sample. yi is the class label of ith sample, which can be +1 or -1. Put this w into two-class MCLP model [9], and replace (Xi·Xj) with K(Xi, Xj), then kernel-based multiple criteria linear programming (KMCLP) nonlinear classifier is formulated [2]:
Minimize d α+ + dα− + d β+ + d β− n
Subject to: α ∗ + ∑ α i = d α− − d α+ i =1
n
β − ∑ βi = d β − d β ∗
−
+
i =1
λ1 y1 K ( X 1 , X 1 ) + ... + λn yn K ( X n , X 1 ) = b + α1 − β1 , for X 1 ∈ B
(2)
...... λ1 y1 K ( X 1 , X n ) + ... + λn yn K ( X n , X n ) = b + α n − β n , for X n ∈ G
α1 , ..., α n ≥ 0, β1 , ..., β n ≥ 0, λ1 , ..., λn ≥ 0, dα+ ,dα− ,d β+ ,d β− ≥ 0 The above model can be used as a nonlinear classifier, where K(Xi,Xj) can be any nonlinear kernel, for example RBF kernel k ( x, x ') = exp(−q || x − x ' || ). α* and β* in the model need to be given in advance. With the optimal value of this model (λ, b, α, β,), we can obtain the discrimination function to separate the two classes: 2
λ1 y1 K ( X 1 , z ) + ... + λn yn K ( X n , z ) ≤ b, then z ∈ B λ1 y1 K ( X 1 , z ) + ... + λn yn K ( X n , z ) ≥ b, then z ∈ G
(3)
where z is the input data which is the evaluated target with r attributes. 2.2 Prior Knowledge to Classify Data
Prior knowledge in some classifiers usually consist of a set of rules, such as, if A then x ∈ G (or x ∈ B ), where condition A is relevant to the attributes of the input data. For example, If L ≥ 5 and T ≥ 4 Then RECUR and If L = 0 and T ≤ 1.9 Then NONRECUR, where L and T are two of the total attributes of the training samples.
Nonlinear Knowledge in Kernel-Based Multiple Criteria Programming Classifier
625
The conditions in the above rules can be written into such inequality as Cx≤c, where C is a matrix driven from the condition, x represents each individual sample, c is a vector. In some works [5, 10, 11], such kind of knowledge was imposed to constraints of an optimization problem, thus forming the classification model with training samples and prior knowledge as well. We notice the fact that the set {x| Cx≤c} can be viewed as polyhedral convex set, which is a linear geometry in input space. But, if the shape of the region which consists of knowledge is nonlinear, for example, {x| ||x||2≤c}, how to deal with such kind of knowledge? Suppose the region is nonlinear convex set, we describe the region by g(x)≤0. If the data is in this region, it must belong to class B. Then, such kind of nonlinear knowledge may take the form of:
∈
g ( x) ≤ 0 ⇒ x ∈ B h( x ) ≤ 0 ⇒ x ∈ G
(4)
∈
Here g(x): Rr→Rp (x Γ) and h(x): Rr→Rq (x Δ) are functions defined on a subset Γ and Δ of Rr which determine the regions in the input space. All the data satisfied g(x)≤0 must belong to the class B and h(x)≤0 must belong to the class G. With KMCLP classifier, this knowledge equals to: g ( x) ≤ 0 ⇒ λ1 y1 K ( X 1 , x) + ... + λn yn K ( X n , x) ≤ b, ( x ∈ Γ )
(5)
h( x) ≤ 0 ⇒ λ1 y1 K ( X 1 , x) + ... + λn yn K ( X n , x) ≥ b, ( x ∈ Δ) This implication can be written in the following equivalent logical form [12]:
∈ ∈
g ( x ) ≤ 0 , λ1 y1 K ( X 1 , x ) + ... + λn yn K ( X n , x ) − b > 0 ,has no solution x Γ. h( x) ≤ 0 , λ1 y1 K ( X 1 , x ) + ... + λn yn K ( X n , x) − b < 0 , has no solution x Δ.
(6)
∈ ∈R , v,r≥0 such that:
The above expressions hold, then there exist v Rp, r
q
−λ1 y1 K ( X 1 , x ) − ... − λn yn K ( X n , x ) + b + v T g ( x ) ≥ 0, ( x ∈ Γ )
λ1 y1 K ( X 1 , x ) + ... + λn yn K ( X n , x) − b + r T h( x) ≥ 0, ( x ∈ Δ)
(7)
Add some slack variables on the above two inequalities, then they are converted to: −λ1 y1 K ( X 1 , x) − ... − λn yn K ( X n , x) + b + vT g ( x) + s ≥ 0, ( x ∈ Γ)
λ1 y1 K ( X 1 , x ) + ... + λn yn K ( X n , x ) − b + r T h( x ) + t ≥ 0, ( x ∈ Δ ) The above statement is able to be added to constraints of an optimization problem.
3 Nonlinear Knowledge in KMCLP Model Suppose there are a series of knowledge sets as follows:
∈ ∈
If gi(x) ≤0, Then x ∈ B (gi(x): Rr→Rpi (x Γi), i=1,…,k ) If hj(x) ≤0, Then x ∈ G (hj(x): Rr→Rqj (x Δj), j=1,…,l )
(8)
626
D. Zhang, Y. Tian, and Y. Shi
Based on the above theory in last section, we converted the knowledge to the following constraints: There exist vi Rpi, i=1,…,k , rj Rqj , j=1,…,l, vi,rj≥0 such that:
∈
∈
−λ1 y1 K ( X 1 , x) − ... − λn yn K ( X n , x ) + b + viT g i ( x) + si ≥ 0, ( x ∈ Γ)
λ1 y1 K ( X 1 , x) + ... + λn yn K ( X n , x) − b + rj T h j ( x) + t j ≥ 0, ( x ∈ Δ)
(9)
These constraints can be easily imposed to KMCLP model (6) as the constraints acquired from prior knowledge. Nonlinear knowledge in KMCLP classifier: k
l
i =1
j =1
Min(dα+ + dα− + dβ+ + dβ− ) + C (∑s i + ∑t j ) s.t.
λ1y1K(X1, X1 ) + ... + λnynK(Xn , X1 ) = b + α1 − β1,
,
for X1 ∈ B,
.. .
λ1y1K(X1, Xn ) + ... + λnyn K(Xn , Xn ) = b − αn + βn , for Xn ∈ G, n
α * + ∑αi = dα− − dα+ , i =1 n
β − ∑ βi = dβ− − dβ+ , *
(10)
i =1
− λ1y1K(X1, x ) − ... − λn yn K(Xn , x ) + b + v g (x ) + si ≥ 0, T i i
si ≥ 0,
i=1,..., k
i=1,..., k
λ1y1K(X1, x ) + ... + λnyn K(Xn , x ) − b + rjT hj (x ) + t j ≥ 0, t j ≥ 0,
j=1,...,l
j=1,...,l
α1,..., αn ≥ 0, β1,..., βn ≥ 0, λ1,..., λn ≥ 0, (vi , rj ) ≥ 0 dα− , dα+ , dβ− ,dβ+ ≥ 0
In this model, all the inequality constraints are derived from the prior knowledge. The k
l
last objective C (∑ s + ∑ t j ) is about the slack error, which attempts to drive the i
i =1
j =1
error variables to zero. We notice the fact that if we set the value of parameter C to be zero, this means to take no account of knowledge. Then this model will be equal to the original KMCLP model. Theoretically, the larger the value of C, the greater impact on the classification result of the knowledge sets. Several parameters need to be set before optimization process. Apart from C we talked about above, the others are parameter of kernel function q (if we choose RBF kernel) and the ideal compromise solution α* and β*. We want to get the best bounding plane (λ, b) by solving this model to separate the two classes. And the discrimination function of the two classes is:
λ1 y1K ( X 1 , z ) + ... + λn yn K ( X n , z ) ≤ b, then z ∈ B λ1 y1 K ( X 1 , z ) + ... + λn yn K ( X n , z ) ≥ b, then z ∈ G
(11)
Nonlinear Knowledge in Kernel-Based Multiple Criteria Programming Classifier
627
where z is the input data which is the evaluated target with r attributes. Xi represents each training sample. yi is the class label of ith sample.
4 Experiment on Wisconsin Breast Cancer Data To prove the effectiveness of model (10), we apply it to Wisconsin breast cancer prognosis data set which consists of prior knowledge and training samples for predicting recurrence or nonrecurrence of the disease. This data set concerns 10 features obtained from a fine needle aspirate. Of each feature, the mean, standard error, and worst or largest value were computed for each image, thus resulting in 30 features. Besides, two histological features, tumor size and lymph node status, obtained during surgery from breast cancer patients, are also included in the attributes [12, 13]. According to this, we separate the features into four groups F1, F2, F3 and F4, which represent the mean, standard error, worst or largest value of each image and histological features, respectively. We plotted each point and the prior knowledge in the 2-dimsional space in terms of the last two attributes in Fig. 1. prior knowledge used for WPBC Data set 30
25
Lymph Nodes
20
15
10
5
0
0
1
2
3
4
5 6 Tumor Size
7
8
9
10
Fig. 1. WPBC data set and prior knowledge
The prior knowledge consists of three regions, which correspond to the following three implications: ⎛ 5.5 × xiT ⎜ ⎝ xiL
5.5 × 7 ⎞ ⎛ 5.5 × xiT ⎟ + ⎜ 9 ⎠ ⎝ xiL
5.5 × 4.5 ⎞ ⎟ − 23.0509 ≤ 0 ⇒ X i ∈ RECUR 27 ⎠
⎛ − xiL + 5.7143 × xiT − 5.75 ⎞ ⎜ ⎟ ⎜ xiL − 2.8571× xiT − 4.25 ⎟ ⎜ ⎟ − xiL + 6.75 ⎝ ⎠ 1 2 2 ( xiT − 3.35) + ( xiL − 4) − 1 2
≤ 0 ⇒ X i ∈ RECUR ≤ 0 ⇒ X i ∈ RECUR
628
D. Zhang, Y. Tian, and Y. Shi
Here, xiT is the tumor size, and xiL is the number of lymph nodes of training sample Xi. In Fig. 1, the ellipse near to the upper-right corner is about the knowledge of the first implication. The triangular region corresponds to the second implication. And the ellipse in the bottom corresponds to the third implication. The red circle points represent the recurrence samples, while the blue cross points represent nonrecurrence samples. Before analysis, we scaled the attributes to [0, 1]. And in order to balance the samples in the two classes, we need to randomly choose 46 samples, which is the exact number of the recurrence samples, from the nonrecurrence group. We choose the value of q from the range [10-6, …, 106], and find the best value of q for this scaled data set should be 1. Leave-one-out cross-validation method is used to get the accuracy of the classification of our method. Experiments are conducted with respect to the combinations of four subgroups of attributes. C=0 means the model takes no account of knowledge. The results are shown here. Table 1. The accuracies of classification on Wisconsin breast cancer data set
C=0 C=1
F1 and F4 51.807% 56.522%
F1, F3 and F4 59.783% 66.304%
F3 and F4 57.609% 63.043%
F1,F2,F3 and F4 63.043% 64.13%
The above table shows that classified by our model with knowledge (C=1), the accuracies are much higher than the results without knowledge (C=0). However, as we can see from Fig. 1, the knowledge here is not as precise as can produce noticeable improvement to the precision. But it does have influence on the classification result. If we have much more precise knowledge, the classifier will be more accurate.
5 Conclusions In this paper, we presented the model of incorporate nonlinear knowledge into KMCLP in order to solve the problem when input data consists of not only training example, but also nonlinear prior knowledge. The nonlinear prior knowledge in the form of convex sets in the input space of the given data can be expressed into logical implications, which can further be converted into inequalities. Incorporating such kind of constraints to original KMCLP model, we obtain the final model. This is a linear programming formula. Solving it by some commercial software, we can get the separation hyperplane of the two classes. Application on breast cancer data indicated that the new model was effective when adding knowledge to the original KMCLP model. Acknowledgments. This work has been partially supported by grants from National Natural Science Foundation of China(#10601064,#70531040,#70621001).
Nonlinear Knowledge in Kernel-Based Multiple Criteria Programming Classifier
629
References 1. Kou, G., Liu, X., Peng, Y., Shi, Y., Wise, M., Xu, W.: Multiple criteria linear programming to data mining: models, algorithm designs and software Developments. Optimization Methods and Software 18 (2003) 2. Zhang, Z., Zhang, D., Tian, Y., Shi, Y.: Kernel-based Multiple Criteria Linear Program. In: Proceeding of Conference on Multi-criteria Decision Making (2008) (working paper) 3. Towell, G.G., Shavlik, J.W., Noordewier, M.O.: Refinement domain theories by knowledge-based artificial neural network. In: The proceedings of the Eighth National Conference on Artificial Intelligence, pp. 861–866 (1990) 4. Towell, G.G., Shavlik, J.W.: Knowledge-Based Artificial Neural Networks. Artificial Intelligence 70 (1994) 5. Fung, G., Mangasarian, O.L., Shavlik, J.: Knowledge-based support vector machine classifiers. In: NIPS 2002 Proceedings, Vancouver, pp. 9–14 (2002) 6. Mangasarian, O.L.: Knowledge-based linear programming. SIAM Journal on Optimization 15, 375–382 (2005) 7. Mangasarian, O.L., Wild, E.W.: Nonlinear Knowledge in Kernel Approximation. IEEE Transactions on Neural Networks (to appear) 8. Mangasarian, O.L., Shavlik, J.W., Wild, E.W.: Knowledge-Based Kernel Approximation. Journal of Machine Learning Research 5, 1127–1141 (2004) 9. Olson, D., Shi, Y.: Introduction to Business Data Mining. McGraw-Hill/Irwin (2007) 10. Fung, G.M., Mangasarian, O.L., Shavlik, J.: Knowledge-based nonlinear kernel classifiers. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS, vol. 2777, pp. 102– 113. Springer, Heidelberg (2003) 11. Zhang, D., Tian, Y., Shi, Y.: Knowledge-incorporated MCLP Classifier. In: Proceeding of Conference on Multi-criteria Decision Making (2008) (working paper) 12. Mangasarian, O.L., Wild, E.W.: Nonlinear knowledge in kernel machines. Computational & Applied Mathematics Seminar, Mathematics Department University of California at San Diego (April 2007) 13. Murphy, P.M., Aha, D.W.: UCI machine learning repository (1992), http://www.ics.uci.edu/~mlearn/MLRepository.html
A Note on the 1-9 Scale and Index Scale In AHP* Zhiyong Zhang1,**, Xinbao Liu2, and Shanlin Yang2 1
Management School, Hefei University of Technology Hefei 230009, Anhui Province, People’s Republic of China and Logistics School,Beijing Wuzi University, Beijing 101149, People’s Republic of China
[email protected] 2 Management School, Hefei University of Technology, Hefei 230009, Anhui Province, People’s Republic of China
[email protected],
[email protected] Abstract. This paper demonstrates that the 1-9 scale and the index scale used in the Analytical Hierarchy Process (AHP) can both be derived from the same scaleselection criteria. They are both accordant to Saaty’s basic thoughts about the scale-selection in the AHP. These two different kinds of ratio scales are the results of the two different ways of applications of the Weber-Fechner psychophysical law. We believe that the index scale is preferable to the 1-9 scale in the theory and hope that the more use of the index scale could be encouraged in the future. But for the rich experiences of application, people can keep on using the existing 1-9 scale in the practice. Keywords: Analytical Hierarchy Process, Pair Comparison, Psychophysical law, Ratio Scale, 1-9 Scale.
1 Introduction The existing 1-9 scale in the Analytical Hierarchy Process (AHP) was first introduced by Saaty — the originator of the AHP decision making theory in 1970’s (Saaty 1977, 1980). This widely used 1-9 scale and its definition could be described as in Table 1. In about thirty years ago, Dr. Saaty tested the 1-9 scale, the index scale and about twenty other scales when he wants to choose a suitable ratio scale for the pairwise comparisons in the AHP (Saaty 1980, 1994, 1996). Based on their testing results, the 1-9 scale was accepted by the AHP. But the index scale and many other ratio scales were rejected by Dr. Saaty. Since then the 1-9 scale has became the most widely used ratio scale in the AHP. *
This paper was supported by the Funding Project for Academic Human Resources Development in Institutions of Higher Learning Under the Jurisdiction of Beijing MunicipalityPHR(IHLB). ** Corresponding author. Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 630–634, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Note on the 1-9 Scale and Index Scale in AHP
631
Along with the development of the AHP decision making theory, the 1-9 scale has been used for over 30 years. Although the AHP theory has been widely used in practice, many people have raised doubts about the 1-9 scale and suggested that the index scale could be better than the existing 1-9 scale (Wang and Ma 1993; Hou and Shen 1995; Lu 2001 etc.).The index scale and its definition could be also described as in Table 1 in a similar way. Table 1. The 1-9 Scale and the Index Scale in the AHP Intensity of importance Definition 1-9 Scale
Index Scale
1
30/4 = 1.0000
Equal importance
(E)
2
31/4 = 1.3161
Weak
(W)
3
32/4 = 1.7321
Moderate importance
(M)
4
33/4 = 2.2795
Moderate plus
(M+)
5
34/4 = 3.0000
Strong importance
(S)
6
35/4 = 3.9482
Strong plus
(S+)
7
36/4 = 5.1962
Very Strong or demonstrated importance
(VS)
8
37/4 = 6.8385
Very, very strong
(VVS)
9
38/4 = 9.0000
Extreme importance
(Ex)
Reciprocals of above
Reciprocals of above
If activity i has one of the above nonzero numbers assigned to it when compared with activity j, then j has the reciprocal value when compared with i.
See also: Saaty 1980, 1994, 1996
In this paper, we will demonstrate that the existing 1-9 scale and the index scale used in the AHP are the results of the two different ways of application of the psychophysical law proposed by Weber (1834, Ernest Heinrich Weber 1795-1878) and Fechner (1860, Gustav Theodor Fechner 1801-1887) — the most important condition which any proper scale used to represent people’s judgments should satisfy. These two ratio scales are both accordant to Saaty’s basic scale-selection thoughts. In the next section we provide detailed discussions.
,
,
2 Analysis In this section, we summarize the basic scale-selection criteria of Saaty first. Then with a brief introduction of the Weber-Fechner psychophysical law, we show you the deduction processes of the 1-9 scale and the index scale based on the two different ways of applications of the law.
632
Z. Zhang, X. Liu, and S. Yang
2.1 Saaty’s Scale-Selection Thoughts and the Psychophysical Law Saaty’s thoughts about the selection of scale in the AHP could be summarized as follows (Saaty 1980, 1994, 1996). The psychophysical law of Weber-Fechner should be satisfied. 1) Using 9 as the upper limit of a scale to represent people’s judgment is reasonable and enough in practices. 2) Using nine consecutive values or 9 scale points to represent people’s judgments is also enough and reasonable in most of the practical cases. On stimulus-response relationships, according to the law of Weber-Fechner, physical stimuli S1 , S2 and their sensory responses M 1 , M 2 are connected by the relationship (2.1).
M 1 − M 2 = α log ( S1 S2 ),
α >0
(1)
Where: alpha is a positive constant. This implies that a geometric sequence of noticeable stimuli yields an arithmetic sequence of responses (Batschelet 1973 Saaty 1980, 1994, 1996).
,
2.2 The Index Scale People already suggested the index scale in many different ways many years ago (Saaty 1980, 1994, 1996; Wang and Ma 1993; Hou and Shen 1995; Lu 2001 etc.). But actually, this index scale could also be deduced from the psychophysical law of Weber-Fechner and Saaty’s basic scale-selection thoughts. From the psychophysical law of Weber-Fechner we know that the stimuli for producing noticeable responses follow a geometric progression sequentially (Batschelet 1973). In making pairwise comparisons of relatively comparable activities, this noticeable geometric stimuli series is given by (2) (Saaty 1980, 1994, 1996).
Si = S0α i , i = 0, 1, 2, L , 8 Where:
(2)
S i is the ith noticeable ratio stimulus value of people and α the smallest no-
ticeable ratio stimulus value of people, which is a constant. Considering the two other conditions of the above Saaty’s basic scale-selection thoughts, we have
α = ⎛⎜ S8 S ⎞⎟ 0⎠ ⎝
1
8
=9 8 =3 1
1
4
≈ 1.316
(3)
and
Si
= 3 4 , i = 0, 1, 2, L , 8 i
S0
(4)
A Note on the 1-9 Scale and Index Scale in AHP
633
In making pairwise comparisons of relatively comparable activities, this noticeable geometric stimuli series is given by (4). If we think that the noticeable stimuli series given by (4) should be used as the ratio scale in the AHP, we will obtain the index scale as shown in Table 1. Obviously, the deduction process of this index scale is also in accordance with Saaty’s thoughts about the selection of scale in the AHP. 2.3 The 1-9 Scale The deduction process of the existing 1-9 scale is similar to the deduction process of the index scale as discussed above. The detailed deduction process of the 1-9 scale can be found in Saaty’s books (Saaty 1980, 1994, 1996). The only difference between the former and the latter is that Dr. Saaty chose the noticeable arithmetic sequence of responses, i.e. 1, 2, 3…, as the ratio scale for the AHP.
3 Conclusions With the above analysis, we know that the existing widely used 1-9 scale and the once being rejected index scale in the AHP could both be derived from the same scaleselection thoughts. They are the results of the two different ways of applications of the Weber-Fechner psychophysical law. But these two different ratio scales are both accordant to Saaty’s basic thoughts about the scale-selection in the AHP. We consider that the index scale is preferable to the 1-9 scale in the theory. But for the rich experience of application the 1-9 scale could still be used in the practice. 3.1 Different Applications of the Psychophysical Law Lead to Different Scales From the above analysis we know that the 1-9 scale and the index scale could both be derived from the psychophysical law of Weber-Fechner and Saaty’s basic scaleselection thoughts. Their difference is due to the different ways we use the psychophysical law of Weber-Fechner. So these two ratio scales are both accordant to Saaty’s basic scale-selection thoughts. If one considers that the noticeable stimuli series should be used as the ratio scale, the index scale is the proper scale in the AHP. Otherwise, if one still thinks that the noticeable sequence of responses should be used as the ratio scale in the AHP directly, the existing 1-9 scale will be preferred. 3.2 The Index Scale is Preferable to the 1-9 Scale in Theory The existing 1-9 scale is simple, straightforward and easy to use. As discussed above, however, we believe that the index scale has a sounder theoretical basis and as such may be a little preferable to the 1-9 scale in theory. This preference does not seem illogical because we think that as the bases of pairwise comparisons a proper scale should not just be based on the sensory responses of people directly but on the physical stimuli, which these sensory responses reflect for. So the use of the index scale ought to be encouraged in the future.
634
Z. Zhang, X. Liu, and S. Yang
3.3 The 1-9 Scale Could Still be Used in Practice in Future Although we consider that the index scale may be better than the 1-9 scale in the theory, we believe that people can keep on using the existing 1-9 scale in practice in future. Because we think that the long history makes people much more familiar with the characteristics of the 1-9 scale than that of the index scale. And the experience could partly compensate the weaker theoretical basis. On the other hand, it is suggested to use the index scale in the AHP, in order to gain more experiences in the practice.
Acknowledgement Sincere thanks to Dr. Thomas L. Saaty and Dr. Luis G. Vargas for their helpful comments and pertinent suggestions to the initial versions of this paper.
References Batschelet, E.: Mathematics for Life Scientists. Springer, New York (1973) Hou, Y., Shen, D.: The index scale and its comparison with several other kinds of scales. Systems Engineering Theory and Practice 15(10), 43–46 (1995) (in Chinese) Lu, Y.: A comparison research on the scale system of the AHP, Scientific decision making theory and practice, pp. 50–58. Ocean Press, Beijing (2001) (in Chinese) Saaty, T.L.: A Scaling Method for priorities in Hierarchical Structures. Journal of Mathematical Psychology 15, 59–62 (1977) Saaty, T.L.: The Analytic Hierarchy Process. McGraw-Hill International Book Company, New York (1980) Saaty, T.L.: Fundamentals of Decision Making and Priority Theory with the Analytic Hierarchy Process. AHP Series, vol. VI, p. 527. RWS Publ. (1994) Saaty, T.L.: Decision Making with Dependence and Feedback: The Analytic Network Process, p. 370. RWS Publ. (1996) Wang, H., Ma, D.: The scale evaluation and new scale method in the AHP. Systems Engineering Theory and Practice 13(5), 24–26 (1993) (in Chinese)
Linear Multi-class Classification Support Vector Machine Yan Xu1 , Yuanhai Shao2 , Yingjie Tian3 , and Naiyang Deng2 1
College of Science, China Agricultural University, Beijing, China 100083 College of Application and Science, University of Science and Technology, Beijing 100083
[email protected] 2 College of Science, China Agricultural University, Beijing, China 100083 3 Graduate University of CAS, China, 100080
Abstract. Support Vector Machines (SVMs) for classification have been shown to be promising classification tools in many real-world problems. How to effectively extend binary SVC to multi-class classification is still an on-going research issue. In this article, instead of solving quadratic programming (QP) in algorithm in [1], utilizing a linear function in the objective function a linear programming (LP) problem is introduced in our algorithm,thus leading to a new algorithm for multi-class problem named linear multi-class classification support vector machine. Numerical experiments on artificial data sets and benchmark data sets show that the proposed method is comparable to algorithm [1] in errors, while considerably ten times faster and the same robustness. Keywords: Support vector classification Linear programming Multiclass classification.
1
Introduction
Support vector classification (SVC), motivated by results of statistical learning theory [1,2], refers to constructing a decision function which is defined from an input space X = Rn onto an unordered set of class Y = {Θ1 , Θ2 , · · ·Θk } where k ≥ 2 is an integer ,based on independently and identically distribution training set T = {(x1 , y1 ), (x2 , y2 ), · · ·, (xl , yl )} ∈ (X × Y)l
(1)
Binary classification and multi-class classification correspond to k = 2 and k ≥ 3 respectively. For both cases support vector classification(SVC) is very promising and there are many existing algorithms of SVM for binary classification. However multification problems such as optical character recognition ,text classification ,medical analysis and so on are more frequent than binary classification in real world. It also serves as an effective tool for cancer classification [4]. Some of the well known binary classification learning algorithms have been extended to handle multiclass problems such as the ’one verse the rest’method [6], the ’one versus one’ [7,8] and the ’error-correcting output code’method [9] in short Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 635–642, 2009. c Springer-Verlag Berlin Heidelberg 2009
636
Y. Xu et al.
output coding which is termed by Dietterich and Bakir (1995). Crammer and Singer proposed a method [5] (called CS SVC in this article) which wants to find a good matrix ⎛ T⎞ w1 ⎜ .. ⎟ W =⎝ . ⎠ (2) wkT which is of size k × n over R, each row of the matrix W corresponds to a class y ∈ Y,and the decision function is given by f (x) = arg max{(wr · x)}
(3)
Namely producting each row of the matrix W by an instance x, the label of the instance x belongs to which make the product value maximum. In this article, following the idea [5] we propose a new algorithm depends on solving linear programming support vector classification which uses linear programming in the objective function instead of quadratic programming.Thus the optimal solution follows directly from solving LP instead of QP. The rest of this article is organized as follows: in section 2 we review some basic work about SVM classification in linear programming formulation and proposes our algorithm CS-LSVC (K class linear Support Vector Machine based on CSSVC [5]) , in section 3 experiments both on artificial data sets and benchmark data sets illustrate our algorithm CS-LSVC having the superiority over CS SVC, section 4 concludes the article.
2
Two Classification in Linear Programming Formulation
In this section we briefly review SVC in linear programming formulation. Consider a binary classification problem with a training set T = {(x1 , y1 ), (x2 , y2 ), · · ·, (xl , yl )} ∈ (X × Y)l
(4)
where input xi ∈ X = Rn and output yi ∈ Y = {−1, +1}. i = 1, · · ·, l. In the standard SVC framework the main goal of classification problems is to find a hyperplane (w · x) + b = 0 which can separate the two classes with the largest margin. The corresponding classification rule is given by f (x) = sgn((w · x) + b) and constructed usually by solving a quadratic programming problem.The primal model is l
min
w,b,ξ
) 1 w2 + C ξi , 2 i=1
s.t. yi ((w · xi ) + b) 1 − ξi , i = 1, · · · , l , ξi 0 , i = 1, · · · , l,
(5) (6) (7)
Linear Multi-class Classification Support Vector Machine
637
As a variation, linear programming formulation of SVC is also proposed, see, e.g.[9]. For two classification problem with training set T = {(x1 , y1 ), (x2 , y2 ), ···, (xl , yl )} ∈ (X × Y)l where xi ∈ X = Rn and yi ∈ Y = {1, 2}. i = 1, · · ·, l. we can also consider to find a matrix W = (w1 , w2 ) which satisfied (w1 · xi ) ≥ (w2 · xi ) yi = 1; (w2 · xi ) ≥ (w1 · xi ) yi = 2;
(8) (9)
namely y = f (x) = arg maxr=1,2 (wr · x),and its optimization problem as follows l
min w,ξ
) 1 (w1 2 + w2 2 ) + C ξi 2 i=1
(10)
s.t. (w1 · xi ) − (w2 · xi ) ≥ 1 − ξi ,
yi = 1
(11)
(w2 · xi ) − (w1 · xi ) ≥ 1 − ξi ,
yi = 2
(12)
ξi ≥ 0,
i = 1, · · ·, l.
(13)
We also can get its linear programming formulation with kernel
min 1 2
α ,α ,ξ
s.t.
−
l )
(α1i + α2i ) + C
i=1 l )
ξi
(14)
i=1
α1i K(xi , xj ) −
i=1 l )
l )
l )
α2i K(xi , xj ) ≥ 1 − ξi ,
yi = 1
(15)
α1i K(xi , xj ) ≥ 1 − ξi ,
yi = 2
(16)
i=1
α2i K(xi , xj ) −
i=1 αri , ξi
l ) i=1
≥ 0, i = 1, · · ·, l, r = 1, 2
(17)
As following,we can extend to multi-class classification problem.
3
The CS-LSVC Learning Machine
Given the training set T defined by (1) we call the decision function is homogenous as b = 0. The [5] Crammer and Singer proposed algorithm which primal problem as follows k
min w,ξ
l
) 1) wr 2 + C ξi 2 r=1 i=1
s.t. (wyi · xi ) − (wr · xi ) ≥ 1 − δyi ,r − ξi , i = 1, · · ·, l, r ∈ {1, · · ·, k}.
(18) (19) (20)
638
Y. Xu et al.
where implicates ξi ≥ 0 , for 0 ≥ 0 − ξi if yi = r. The dual problem formulation of CS-SVC with α is l
l
min f (α) = α
s.t.
k )
l
) 1 )) (xi · xj )¯ αTi α¯j + α ¯ Ti e¯i 2 i=1 j=1 i=1
αri = 0, i = 1, · · ·, l.
r=1 αri ≤ αri ≤
(21)
(22)
0 if yi = r, C if yi = r,
(23) (24)
i = 1, · · ·, l.r = 1, · · ·, k,
(25)
here, α¯i = [α1i , · · ·, αki ]T e¯i = [e1i , · · ·, eki ]T ,
eri = 1 − δyi ,r = ⎛
⎞ α11 , · · ·, α1l ⎜ ⎟ α = (α¯1 , · · · , α¯l ) = ⎝ ... ... ... ⎠ αk1 , ·
·
(26) 0 1
if yi = r if yi = r
(27)
(28)
·, αkl
We can obtain the decision matrix W = (w1T , · · · , wkT )T by solving its dual quadratic program (QP). Suppose (w1∗ , w2∗ , · · ·wk∗ , ξ ∗ ) and α∗ are the optimal solutions of (22)∼ (24) and (25)∼ (32) then the decision matrix is constructed l ∗r as wr∗ = i=1 αi xi by applying the QP necessary and sufficient condition KKT.As a variation,linear programming formulation of SVC is also proposed, see e.g.[10].Following this idea utilizing the property of strong dual theorem that the primal optimal value at (w1∗ , w2∗ , ···wk∗ , ξ ∗ ) is equal to the dual value at α∗ , we can obtain the Linear Programming (LP) formulation of CS-SVC with(26)−(28). min − α,ξ
s.t.
l ) i=1
l ) i=1
α ¯Ti e¯i + C
l )
ξi
(29)
i=1
αyi i (xi · xj ) −
l )
αri (xi · xj ) ≥ erj − ξj ,
(30)
i=1
j = 1, · · ·, l.r ∈ {1, · · ·, k}
(31)
Suppose α∗ is the optimal solution of (29), (30),(31),then the decision matrix W l is constructed as wr∗ = i=1 αr∗ i xi and the decision function is given by f (x) = arg max{(wr · x)}.
(32)
It is easy to apply for the nonlinear separating problem by inducing the kernel K(x, x ) = (φ(x) · φ(x )),where φ(·) is a nonlinear function which maps the input
Linear Multi-class Classification Support Vector Machine
639
space into a higher dimensional space and one constructs an optimal solution in this space. However this function is not explicitly constructed. Because the decision function can also be expressed in terms of K(x, x ). The multi-class classifier is designed by solving min − α,ξ
s.t.
l )
α ¯ Ti e¯i + C
i=1 l ) i=1
l )
ξi
(33)
i=1
αyi i K(xi , xj ) −
l )
αri K(xi , xj ) ≥ erj − ξj ,
(34)
i=1
j = 1, · · ·, l.r ∈ {1, · · ·, k}
(35)
with (26) − (28), finally we obtain the algorithm CS-LSVC. Algorithm (CS-LSVC) (1) Input training sets T = {(x1 , y1 ), (x2 , y2 ), · · ·, (xl , yl )} ∈ (X × Y)l
(36)
X = Rn ,Y = {Θ1 , Θ2 , · · ·Θk } where k ≥ 2 is an integer, based on independently and identically distribution; (2) Choose a suitable kernel function K(x, x ) and parameter C, construct and solve the CS LSVC (37) ∼ (38), and obtain the optimal solution ⎛ 1∗ ⎞ α1 , · · ·, α1∗ l ⎜ ⎟ .. .. .. α∗ = ⎝ (37) ⎠; ... k∗ αk∗ 1 , · · ·, αl
(3)Construct the decision function f (x) = arg max
l )
αr∗ i K(xi , x).
(38)
i=1
4
Experiments
In this section,in order to compare our algorithm with the counterparts in [5], we use the same test problems, including the artificial data sets and the benchmark data sets. Our experiments are carried out using Matlab v7.0 on Intel Pentium IV 2.0MHz PC with 512 MHz of RAM. 4.1
Experiments on Artificial Data Sets
The training set T is generated from Gaussian distribution on R2 . It contains 150 examples in K = 3 classes, each of which has 50 examples, as is shown in Fig 1. In this experiment, parameter C is a variable in the parameters, and
640
Y. Xu et al.
50 40 30 20 10 0 −10 −20 −30 −50
0
50
Fig. 1. The training set T(Artificial distribution)
Table 1. The effect of the parameter C in algorithms C 0.01 Error1 0.1467 Error2 0.1920 FSV1 0.46 FSV2 0.55
0.1 0.1067 0.1494 0.55 0.58
1 0.0267 0.0400 0.65 0.77
10 0.0267 0.0267 0.79 0.81
100 0.0133 0.0133 0.81 0.83
1000 0 0.0133 0.81 0.87
Gauss kernel K(x, x ) = exp(−x − x 2 /2σ 2 ) with σ=0.3 is employed. The results are summarized in Table 1, where the error is the totally error rate and the FSV refers to fraction of support vectors respectively,the Error1 and FSV1 refers to CS-SVC and the Error2 and FSV2 are CS-LSVC .From Table 1, it can be observed that the behavior of our algorithm is similar to that in [5] in the following aspects: the parameter C provides an upper bound of the fraction of margin error vectors and a lower bound of the fraction of support vectors. In addition, the increasing C allows both more margin error vectors and support vectors. In fact, the time consumed by the algorithm in [5] is 1.9s and our algorithm is 0.1s. It is obvious that our algorithm costs much less time.This point will be strengthened in the experiment on benchmark data set below.Future research about CS-LSVC includes comprehensive testing of the algorithm and parameters selection. 4.2
Experiments on Benchmark Data Sets
In this subsection, our algorithm is tested on a collection of three benchmark data sets: ’Iris’, ’Wine’ and ’Glass’, from the UCI machine learning repository [11]. Each data set is firstly split randomly into ten subsets. Then one of these subsets is reserved as a test set and the others are summarized as the training set; this process is repeated ten times. For data sets ’Iris’ and ’Wine’, the linear kernels
Linear Multi-class Classification Support Vector Machine
641
Table 2. Results Comparison
Iris Wine Glass
CS SVC Error 1.7 2.8 5.6
Ours’s algorithm Time Error 3.4s 1.9 3.9s 3.3 12.3s 6.1
Time 0.2s 0.3s 1.0s
and the polynomial kernels K(x, x ) = (x · x )d with degree d = 3 is employed respectively. For data set’Glass’, the Gauss kernel K(x, x ) = exp(−x− x 2 /2σ 2 ) with σ=0.25 is employed. We compare the obtained results with that by algorithm CS SVC in Table 2. In Table 2, the ”Error” columns refers to the error percentage.The first number is the percentage of error when examples are finally assigned to the wrong classes. The number in the T ime columns is the number of seconds consumed by the corresponding algorithm. Remarkably, Table 2 shows that the time consumed by our algorithm is much less than the others while their errors are in the same level. Generally speaking, our algorithm is faster than both of them over ten times. It can be observed from Table 1 and Table 2 that as the parameter C increasing, the test error rates obtained is decrease both from our algorithm and CS-SVC. Generally speaking, our algorithm is faster than both of them over ten times.
5
Conclusions
In this paper, we have proposed a new algorithm, for the multi-class classification by solving linear programming. Because this new algorithm has the same structure with algorithm in CS-SVC, it can also be proved that it has good robustness. Experiments have shown that our algorithm is considerably faster, usually over ten times than CS-SVC while the same level of errors are kept. Therefore, it is suitable for solving large-scale data sets. Future research includes comprehensive test of the new algorithm and its parameters selection.
References 1. Vapnic, V.: Statistical Learning Theory. Wiley, Chichester (1998) 2. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995) 3. Brown, M., Grundy, W., Lin, D., Christianini, N., Sugnet, C., Ares Jr., M., Haussler, D.: Support vector machine classification of microarray gene expression data, UCSC-CRL 99-09, Department of Computer Science, University California Santa Cruz. Santa Cruz, CA (1999) 4. Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16, 906–914 (2000)
642
Y. Xu et al.
5. Crammer, K., Singer, Y.: On the Learnability and Design of Output Codes For Multiclass Problems. Machine Learning 47, 201–233 (2002) 6. Bottou, L., Cortes, C., Denker, J.S., et al.: Comparision of classifier methods: a case study in handingwriting digit recognition. In: LAPR (ed.) Proceedings of the international Conference on Pattern Recognition, pp. 77–82. IEEE Computer Society Press, Los Alamitos (1994) 7. Hastie, T.J., Tibshirani, R.J.: Classification by Pairwise Coupling. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing systems, vol. 10, pp. 507–513. MIT Press, Cambridge (1998) 8. Krebel, U.: Pairwise Classification and Support Vector Machines. In: Sch¨ olkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advanced in Kernel Methods: Support Vector learning, pp. 255–268. MIT Press, Cambridge (1999) 9. Dietterich, T.G., Bakiri, G.: Sovling multi-class learning problems via Errorcorrecting Output Codes. Journal of Artificial Intelligence Research (2), 263–286 (1995) 10. Bennett, K.P.: Combining Support Vector and Mathematical Programming methods for classification. In: Sch¨ olkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advanced in Kernel Mehtods: Support Vector learning, pp. 307–326. MIT Press, Cambridge (1999) 11. Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. University of California, Irvine (1998), http://www.ics.uci.edu/mlearn/MLRepository.html
A Novel MCQP Approach for Predicting the Distance Range between Interface Residues in Antibody-Antigen Complex Yong Shi1,*, Ruoying Chen1,2, Jia Wan1, and Xinyang Zhang1 1
Research Center of Fictitious Economy & Data Science, Chinese Academy of Sciences, Beijing 100090, China
[email protected] 2 College of Life Sciences, Graduate University of Chinese Academy of Sciences, Beijing 100049, China
[email protected],
[email protected] Abstract. Antibody-antigen association plays a key role in immune system.The distance range between interface residues in protein complex is one of interface features.Three machine learning approaches, known as Multiple Criteria Quadratic Programming (MCQP), Linear Discriminant Analysis (LDA) (SPSS 2004) and Decision Tree based See5 (Quinlan 2003), are used to predict the distance range between interface residues in antigen-antibody complex. It is explored that how different surface patch size affects the accuracy of different distance range. The results of three approaches are compared with each other.
1 Introduction An antibody is composed of heavy chain and light chain which includes constant regions and variable regions respectively. Antibody function is associated with antibody-antigen interaction with high specificity and affinity. Residues mutation on the affinity and specificity of the antibody have been studied in the past years [1,2].Antibody-antigen interaction is one kind of protein-protein interaction. Many features of interfaces such as: electronics interaction, hydrogen bonds, salts bridges, hydrophobic, residue composition and residue-residue contact preferences[3-7], have been dissected. To explore the principle of antibody-antigen binding, the distance between interface residues has also been investigated [8,9].When two residues from different chains are in a certain distance, both of them are regarded as contacting. It has been noted that, if the distance between two Cα atoms, one from each protein, is less than 12 Å, the two residues are flagged as interface residues[10].In other words, they are in contact across the interface. In previous studies, surface patches are used to analyze protein-protein interaction sites[11,12]. Y Shi and his co-workers proposed a method, named as Multiple Criteria Quadratic Programming (MCQP)[9]. On the base of multiple criteria, the model can divide datasets into different groups. In this study, we adopted MCQP approach for predicting *
Corresponding author.
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 643–648, 2009. © Springer-Verlag Berlin Heidelberg 2009
644
Y. Shi et al.
the distance range between interface residues in antigen-antibody complex .We got 15 samples according to different distance ranges and sequence patch sizes. The samples are trained and tested using MCQP.The result of MCQP is compared with the results of 2 widely accepted classification tools: Decision Tree based See5 [13] and Linear Discriminant Analysis (LDA) [14].
2 Construction of a Non-redundant Dataset 2.1 Collection of Complex Structure and Selection of Interface Residues In this study, the basic complex structure composed of an antibody and its binding antigen is selected from the PDB crystallographic database[15]. Those missing residues in complexes are filled using HyperChem 5.1 for Windows (Hypercube, FL, USA). After the removal of redundant information, 37 complex structures are collected. Interface residues in protein-protein complex are identified as follows. Fariselli’s suggestion [10] is that two residues from different chains are marked as interface residues, if the distance between their Cα atoms is under 12 Å. In this work, the coordinate of Cα atom in residue is represented as the coordinate of this residue. We chose the distance of 8Å, 10Å and 12Å as the cutoff values. After thorough computation, we got 329 residues belonging to distance range 8 Å, 508 residues belonging to distance range 10 Å, and 668 residues belonging to distance range 12 Å. 2.2 Extraction of Sequence Feature We chose sequence patch describing the neighbor relationship in sequence as sequence feature. And, we selected 5 sequence patch sizes, which are 1, 2, 3, 4 and 5 respectively. We give an explanation adopted in other work as follows: if the sequence patch size is set to be 2, it demonstrates that the sequence patch contains the target residue and 2 residues at the front and the back neighbors respectively (the total of 5 residues) in sequence [8,9].Each type of residues corresponds to one of twenty dimensions in the basic vector. When the sequence patch size is set to 2, this sequence feature is regarded as a 100-dimensional vector. 2.3 Evaluation of Prediction Accuracy The ten-fold cross validation is applied on the evaluation measure of classification accuracy. The mean value of the accuracy in the ten-fold cross validation test is used as the accuracy measure of this experiment. The classification accuracy is composed of two parts: the accuracy of correct prediction residue in the distance range and the accuracy of correct prediction residue out of the distance range. Other indictors, such as Type I Error, Type II Error and correlation coefficient, can be obtained from the crossvalidation test for analyzing the effectiveness of the method.
3 Comparison and Analysis The result of MCQP is compared with the results of 2 widely accepted classification tools: LDA and See5. The following tables (Table 1-9) summarize the averages of 10-fold cross-validation test-sets results of LDA, See5, and MCQP for each dataset.
A Novel MCQP Approach for Predicting the Distance Range
645
When the distance range is 8Ǻ, the results of the ten-fold cross validation tests for MCQP, See5 and LDA are listed in Table 1, 2, 3. This result indicates that the increase in the sequence path size will help reform the prediction result. In the comparison of the three approaches, MCQP method has shown strong advantages in the accuracy of correct prediction residue in the distance range, Type II Error and correlation coefficient with the same sequence path size. Table 1. The results of the ten-fold cross validation tests for the distance range 8Ǻ with MCQP Classification Accuracy
Error Rate
Correlation
MCQP(distance range 8Å)
Residue in the range
Residue out of the range
Type I
Type II
coefficient
Sequence patch size 1
62.27%
79.42%
24.84%
32.21%
42.32%
Sequence patch size 2
73.29%
81.09%
20.51%
24.78%
54.55%
Sequence patch size 3
79.73%
84.21%
16.53%
19.40%
64.00%
Sequence patch size 4
80.19%
85.92%
14.94%
18.74%
66.22%
Sequence patch size 5
80.18%
89.16%
11.91%
18.19%
69.62%
Table 2. The results of the ten-fold cross validation tests for the distance range 8Ǻ with See5 Classification Accuracy
Error Rate
Correlation
See5(distance range 8Å)
Residue in the range
Residue out of the range
Type I
Type II
coefficient
Sequence patch size 1
1.82%
99.72%
13.50%
49.61%
7.55%
Sequence patch size 2
16.41%
99.15%
4.94%
45.74%
27.71%
Sequence patch size 3
27.36%
98.58%
4.94%
42.43%
36.95%
Sequence patch size 4
29.48%
98.62%
4.48%
41.69%
38.89%
Sequence patch size 5
29.79%
98.88%
3.62%
41.52%
39.66%
Table 3. The results of the ten-fold cross validation tests for the distance range 8Ǻ with LDA Classification Accuracy
Error Rate
Correlation
LDA(distance range 8Å)
Residue in the range
Residue out of the range
Type I
Type II
coefficient
Sequence patch size 1
29.79%
98.88%
3.62%
41.52%
39.66%
Sequence patch size 2
29.79%
98.88%
3.62%
41.52%
39.66%
Sequence patch size 3
29.79%
98.88%
3.62%
41.52%
39.66%
Sequence patch size 4
74.77%
91.73%
9.96%
21.57%
67.48%
Sequence patch size 5
75.68%
92.85%
8.64%
20.75%
69.56%
When the distance range is 10Ǻ, the prediction results of MCQP method are listed in Table 4. Similar to the results of distance range 8 Ǻ, the prediction results are also improved when the sequence path size increases.
646
Y. Shi et al.
The results of See5 and LDA are summarized in Table 5, 6. MCQP method performs better than See5 and LDA in the accuracy of correct prediction residue in the distance range and Type II error rate. The correlation coefficients of MCQP and LDA are close. Table 4. The results of the ten-fold cross validation tests for the distance range10Ǻ with MCQP Classification Accuracy
Error Rate
Correlation
MCQP(distance range 10Å)
Residue in the range
Residue out of the range
Type I
Type II
coefficient
Sequence patch size 1
63.11%
80.07%
24.00%
31.54%
43.81%
Sequence patch size 2
71.78%
82.96%
19.18%
25.38%
55.09%
Sequence patch size 3
76.58%
84.69%
16.66%
21.66%
61.47%
Sequence patch size 4
79.27%
86.85%
14.23%
19.27%
66.31%
Sequence patch size 5
80.34%
89.53%
11.53%
18.01%
70.17%
Table 5. The results of the ten-fold cross validation tests for the distance range 10Ǻ with See5 Classification Accuracy
Error Rate
Correlation
See5(distance range 10Å)
Residue in the range
Residue out of the range
Type I
Type II
coefficient
Sequence patch size 1
30.71%
98.02%
6.06%
41.42%
38.85%
Sequence patch size 2
36.61%
97.62%
6.11%
39.37%
43.20%
Sequence patch size 3
50.39%
97.09%
5.46%
33.82%
53.69%
Sequence patch size 4
51.57%
97.34%
4.90%
33.22%
55.01%
Sequence patch size 5
52.76%
97.24%
4.98%
32.70%
55.83%
Table 6. The results of the ten-fold cross validation tests for the distance range 10Ǻ with LDA Classification Accuracy
Error Rate
Correlation
LDA(distance range 10Å)
Residue in the range
Residue out of the range
Type I
Type II
coefficient
Sequence patch size 1
64.76%
76.76%
26.41%
31.46%
41.82%
Sequence patch size 2
69.49%
82.98%
19.68%
26.89%
52.95%
Sequence patch size 3
72.05%
89.73%
12.48%
23.75%
62.77%
Sequence patch size 4
74.21%
92.03%
9.70%
21.89%
67.32%
Sequence patch size 5
75.79%
93.74%
7.63%
20.53%
70.68%
When the distance range is 12Ǻ, the prediction results of MCQP method are listed in Table 7. The results of distance range 12 Ǻ for See5 and LDA are shown in Table 8 and 9. The results of LDA are better than the results of MCQP and See5 while MCQP outperforms to See5.
A Novel MCQP Approach for Predicting the Distance Range
647
Table 7. The results of the ten-fold cross validation tests for the distance range 12Ǻ with MCQP Classification Accuracy
Error Rate
Correlation
MCQP(distance range 12Å)
Residue in the range
Residue out of the range
Sequence patch size 1
56.26%
76.19%
29.74%
36.47%
33.11%
Sequence patch size 2
60.05%
79.23%
25.70%
33.52%
40.02%
Sequence patch size 3
63.96%
80.07%
23.76%
31.04%
44.61%
Sequence patch size 4
64.39%
82.48%
21.39%
30.15%
47.66%
Sequence patch size 5
66.03%
84.36%
19.15%
28.71%
51.26%
Type I
Type II
coefficient
Table 8. The results of the ten-fold cross validation tests for the distance range 12Ǻ with See5 Classification Accuracy
Error Rate
Correlation
See5(distance range 12Å)
Residue in the range
Residue out of the range
Type I
Type II
coefficient
Sequence patch size 1
40.72%
97.40%
5.99%
37.83%
46.27%
Sequence patch size 2
47.31%
97.62%
4.78%
35.06%
51.99%
Sequence patch size 3
55.24%
97.75%
3.91%
31.41%
58.54%
Sequence patch size 4
58.38%
97.49%
4.12%
29.92%
60.71%
Sequence patch size 5
60.48%
97.84%
3.45%
28.77%
62.87%
Table 9. The results of the ten-fold cross validation tests for the distance range 12Ǻ with LDA Classification Accuracy
Error Rate
Correlation
LDA(distance range 12Å)
Residue in the range
Residue out of the range
Type I
Type II
coefficient
Sequence patch size 1
63.32%
73.94%
29.16%
33.16%
37.47%
Sequence patch size 2
65.42%
86.65%
16.95%
28.52%
53.28%
Sequence patch size 3
69.31%
90.88%
11.62%
25.24%
61.64%
Sequence patch size 4
73.05%
93.63%
8.02%
22.35%
68.14%
Sequence patch size 5
76.80%
94.33%
6.88%
19.74%
72.25%
4 Conclusions The geometric complementarities of two surfaces are important for high affinity of antibody-antigen interaction. The mechanism of antibody-antigen interaction is dissected based on the distance between interface residues in antibody-antigen complex. In this study, sequence patch size has been studied to show how this factor will affect the prediction accuracy. The result of MCQP is compared with the results of two classification tools: Decision Tree based See5 and Linear Discriminant Analysis (LDA). With the increase of patch size, three approaches perform better gradually. When the distance range is 8 and 10 Ǻ, the MCQP method has shown stronger superiority. The
648
Y. Shi et al.
findings indicate that there are positive correlations between sequence information of antibody and antibody-antigen interaction. Acknowledgments. This research has been partially supported by a 973 Project grant (2004CB720103) from the Ministry of Science and Technology, China and by the National Natural Science Foundation, China. (#90718042, #09110421A1, #70621001, #70531040). We thank Drs Gang Kou and Yi Peng for many useful comments and suggestions.
References [1] Pons, J., Rajpal, A., Kirsch, J.F.: Energetic analysis of an antigen/antibody interface: alanine scanning mutagenesis and double mutant cycles on the HyHEL-10/lysozyme interaction. Protein Science 8, 958–968 (1999) [2] Lang, S., Xu, J., Stuart, F., Thomas, R.M., Vrijbloed, J.W., Robinson, J.A.: Analysis of Antibody A6 Binding to the Extracellular Interferon ç Receptor R-Chain by AlanineScanning Mutagenesis and Random Mutagenesis with Phage Display. Biochemistry 39, 15674–15685 (2000) [3] Bogan, A.A., Thorn, K.S.: Anatomy of hot spots in protein interfaces. J. Mol. Biol. 280(1), 1–9 (1998) [4] Jones, S., Thornton, J.M.: Principles of protein-protein interactions. Proc. Natl. Acad. Sci. U.S.A. 93(1), 13–20 (1996) [5] Sheinerman, F.B., Norel, R., Honig, B.: Electrostatic aspects of protein-protein interactions. Curr. Opin. Struct. Biol. 10(2), 153–159 (2000) [6] Xu, D., Tsai, C.J., Nussinov, R.: Hydrogen bonds and salt bridges across protein-protein interfaces. Protein Eng. 10(9), 999–1012 (1997) [7] Lo Conte, L., Chothia, C., Janin, J.: The atomic structure of protein-protein recognition sites. J. Mol. Biol. 285(5), 2177–2198 (1999) [8] Shi, Y., Zhang, X., Wan, J., Wang, Y., Yin, W., Cao, Z., Guo, Y.: Predicting the distance between antibody’s interface residue and antigen to recognize antigen types by support vector machine. Neural Comput. & Applic. 16, 481–490 (2007) [9] Shi, Y., Wan, J., Zhang, X., Kou, G., Peng, Y., Guo, Y.: Comparison study of two kernelbased learning algorithms for predicting the distance range between antibody interface residues and antigen surface. International Journal of Computer Mathematics, 1–11 (2007) [10] Fariselli, P., Pazos, F., Valencia, A., Casadio, R.: Prediction of protein—protein interaction sites in heterocomplexes with neural networks. Eur. J. Biochem. 269, 1356–1361 (2002) [11] Jones, S., Thornton, J.M.: Analysis of protein–protein interaction sites using surface patches. Journal of Molecular Biology 272, 121–132 (1997) [12] Jones, S., Thornton, J.M.: Prediction of protein–protein interaction sites using patch analysis. Journal of Molecular Biology 272, 133–143 (1997) [13] Quinlan, J.: See5.0 (2003), http://www.rulequest.com/see5-info.html [14] SPSS Incorporation: SPSS for Windows, release 12.0.0 (2004), http://www.spss.com/ [15] Berman, H.M., Westbrook, J., Feng, Z., Gillliland, G., Bhat, T.N., et al.: The ProteinData Bank. Nucl. Acids Res. 28, 235–242 (2000)
Robust Unsupervised Lagrangian Support Vector Machines for Supply Chain Management Kun Zhao1 , Yong-sheng Liu1 , and Nai-yang Deng2, 1
2
Logistics School, Beijing Wuzi University
[email protected],
[email protected] College of Science, China Agricultural University
[email protected] Abstract. Support Vector Machines (SVMs) have been dominant learning techniques for more than ten years, and mostly applied to supervised learning problems. These years two-class unsupervised and semisupervised classification algorithms based on Bounded C-SVMs, Bounded ν-SVMs, Lagrangian SVMs (LSVMs) and robust version to Bounded C− SVMs respectively, and which are relaxed to Semi-definite Programming (SDP), get good classification results. The time consumed of method based on robust version to BC-SVMs is too long. So it seems necessary to find a faster method, which has almost accurate results as above at least. Therefore we proposed robust version to unsupervised and semi-supervised classification algorithms based on Lagrangian Support Vector Machines and its application on evaluation of supply chain management performance. Numerical results confirm the robustness of the proposed method and show that our new unsupervised and semisupervised classification algorithms based on LSVMs often obtain almost the same accurate results as other algorithms,while considerably faster than them. Keywords: Lagrangian Support Vector Machines Semi-definite Programming unsupervised learning semi-supervised learning robust.
1
Introduction
The supply chain is a worldwide network of suppliers, factories, warehouses, distribution centers, and retailers through which raw materials are acquired, transformed, and delivered to customers[1]. While data uncertainty is present in many real-world optimization problems. For example, in supply chain optimization, actual material requirements and other resources are not precisely known when critical decisions need to be made[2]. Therefore evaluation of supply chain management performance with perturbations is an important problem.
Supported by the Key Project of the National Natural Science Foundation of China (No.10631070),the National Natural Science Foundation of China (No.10601064) and Funding Project for Academic Human Resource Development in Institutions of Higher Learning Under the Jurisdiction of Beijing Municipality. Corresponding author.
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 649–652, 2009. c Springer-Verlag Berlin Heidelberg 2009
650
K. Zhao, Y.-s. Liu, and N.-y. Deng
Efficient convex optimization techniques have had a profound impact on the field of machine learning. Lanckreit et al show how the kernel matrix can be learned from data via semi-definite programming techniques[3]. De Bie and Cristanini relax two-class transduction problem to semi-definite programming based transductive Support Vector Machines[4]. Xu et al[5] develop methods to two-class unsupervised and semi-supervised classification problem based Bounded C− Support Vector Machines in virtue of relaxation to Semi-definite Programming in the foundation of [3][4]. Zhao et al proposed two-class unsupervised and semi-supervised classification algorithms based on Bν−SVMs and LSVMs respectively[6,7] and robust version to unsupervised and semi-supervised classification algorithms based on Bounded C− Support Vector Machines [8]. The reason that algorithm mentioned above run slowly is its semi-definite relaxation has so many variables. In order to decrease the number of variables, we tend to find a qualified SVM which has fewer constraints. Therefore it seems better to use Lagrangian Support Vector Machines to resolve unsupervised classification problem.
2 2.1
Robust Unsupervised Classification Algorithms Robust Unsupervised Classification Algorithm with Polyhedrons (PRLSDP)
When considering the measurement noise, we assume training data xi ∈ Rn , i = 1, 2, . . . , l, which has perturbed as x i , concretely, x ij = xij + xij zij , i = 1, 2, . . . , l, j = 1, 2, . . . , n, zi p ≤ Ω. zi is a random variable, when select its norm as l1 norm, then the perturbation region of xi is a polyhedron clearly. Considering Lagrangian Support Vector Machines[9], and the training data have perturbations as mentioned above. In Sim’s proposed robust framework[2], constraint yi ((w · x i ) − b) ≥ 1 − ξi is equivalent to yi ((w · xi ) − b) − 1 + ξi ≥ Ωti , ti ≥ 0
(1)
|xij |wj yi ≤ ti , −|xij |wj yi ≤ ti , j = 1, . . . , n
(2)
Use the similar method in [8],then we get the optimization problem
min M,θ,hi ,hi ,κi ,ϕi ,ϕi
1 θ 2
s.t. κ ≥ 0, ϕi ≥ 0, ϕi ≥ 0, i = 1, . . . , l, M 0, −εe ≤ M e ≤ εe, h ≥ 0, diag(M ) = e ⎞ ⎛ I 0l×2ln IM IT ) + 1 ς⎠ G ◦ ( C ⎝ 02ln×l 02ln×2ln 0 θ ςT
(3) (4) (5) (6)
When get the solution M ∗ , set y ∗ = sgn(t1 ), where t1 is eigenvector corresponding to the maximal eigenvalue of M ∗ .
Robust Unsupervised LSVMs for Supply Chain Management
2.2
651
Robust Unsupervised Classification Algorithm with Ellipsoids (ERLSDP)
As same to Sect. 2.1, when the norm of random variable zi equals to l2 norm, theb the perturbation region of xi is an ellipsoid clearly . In the Sim’s proposed robust framework [2], constraint yi ((w · x i ) − b) ≥ 1 − ξi is equivalent to yi ((w · xi ) − b) − 1 + ξi ≥ Ωti (ti , yi xi1 w1 , . . . , yi xin wn )T ∈ Ln+1 i
(7) (8)
Use the similar method in Sect. 2.1, we can get the semi-definite programming 1 θ M,θ,κi ,hi 2 s.t. (κi , hi1 , . . . , hin ) ∈ Ln+1 , i = 1, . . . , l i
(9)
min
(10)
−εe ≤ M e ≤ εe, M 0, diag(M ) = e ⎞ ⎛ T I 0l×ln 1 η⎠ G ◦ IM I + CΩ 2 ⎝ 0ln×l 0ln×ln 0 θ ηT
3
(11) (12)
Numerical Results
In this section, through numerical experiments, we will test our algorithms (PRLSDP and ERLSDP) on data set using SeDuMi library [10]. In order to evaluate the influence of the robust tradeoff parameter Ω, we will set value of Ω from 0.25 to 1.25 with increment 0.25 on synthetic data set AI. Select parameters ε = 1 and C = 100. Directions of data perturbations are produced randomly. Results are showed in Table 1. The number is the misclassification percent and the time consumed of every algorithms with different parameter.
4
Application on Supply Chain Management
An index system for evaluating the performance of supply chain management is set up according to the theory of balanced scorecard, and consider perturbation Table 1. Results and time consumed (seconds of CPU) about the parameter Ω change from 0.25 to 1.25 on synthetic data set AI with PRC−SDP, PRLSDP, ERC−SDP and ERLSDP Ω
0.25
PRC−SDP 6/19(14.6250) PRLSDP 4/19(12.5938) ERC−SDP 4/19(16.6875) ERLSDP 2/19(8.7813)
0.5
0.75
1
1.25
4/19(12.8125) 2/19(14.2656) 2/19(12.8750) 2/19(12.3594) 2/19(12.3125) 2/19(12.8594) 0(11.6563) 0(12.0625) 4/19(18.1563) 4/19(17.1875) 4/19(18.2969) 4/19(18.1406) 2/19(7.500) 0(8.1719) 0(8.5781) 2/19(8.0156)
652
K. Zhao, Y.-s. Liu, and N.-y. Deng
of data in supply chain management[11] with ellipsoids. Ellipsoid perturbation is most ever-present in supply chain management. Due to inaccuracy of the evaluating the performance of supply chain management by themselves, we throw away their scores and hold the value of their attributes. The trend in scores of evaluation of the data in supply chain management is almost same to the result that their scores without perturbations. Therefore the index system for evaluating the performance of supply chain management according to the theory of balanced scorecard is so stable, and the scores evaluated by themselves is relative exact, but our evaluation function which get by ERLSDP can reflect the discrimination degree of them better.
5
Conclusions
In this paper we have established the robust unsupervised classification algorithms based on Lagrangian Support Vector Machines. From Sect. 3 we can learn that our new algorithms (PRLSDP and ERLSDP) often obtain almost the same accurate results as other algorithms,while considerably faster than them. The application of ERLSDP on evaluation of supply chain management performance get better results.
References 1. Mark, S.F., Mihai, B., Rune, T.: Agent-Oriented Supply-Chain Management. The International Journal of Flexible Manufacturing Systems 12, 165–188 (2000) 2. Sim, M.: Robust Optimization, Phd.Thesis (June 2004) 3. Lanckriet, G., Cristianini, N., Bartlett, P., Ghaoui, L., Jordan, M.: Learning the kernel matrix with semidefinite programming. Journal of Machine learning research 5 (2004) 4. De Bie, T., Crisrianini, N.: Convex methods for transduction. In: Advances in Neural Information Processing Systems (NIPS 2003), vol. 16 (2003) 5. Xu, L., Neufeld, J., Larson, B., Schuurmans, D.: Maximum margin clustering. In: Advances in Neural Information Processing Systems (NIPS 2004), vol. 17 (2004) 6. Zhao, K., Tian, Y.J., Deng, N.Y.: Unsupervised and Semi-supervised Two-class Support Vector Machines. In: Sixth IEEE Internaitonal Conference on Data Minging workshops, Hong Kong, December 2006, pp. 813–817 (2006) 7. Zhao, K., Tian, Y.-J., Deng, N.-Y.: Unsupervised and semi-supervised lagrangian support vector machines. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007. LNCS, vol. 4489, pp. 882–889. Springer, Heidelberg (2007) 8. Zhao, K., Tian, Y.J., Deng, N.Y.: Robust Unsupervised and Semisupervised Bounded C− Support Vector Machines. In: Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, Omaha NE, USA, October 2007, pp. 331–336 (2007) 9. Mangasarian, O.L., David, R.M.: Lagrangian Support Vector Machines. Journal of Machine Learning Research 1, 161–177 (2001) 10. Sturm, J.F.: Using SeDuMi1.02, A Matlab Toolbox for Optimization over Symmetric Cones. Optimization Methods and Software 11-12, 625–653 (1999) 11. Chun-ming, Y., Hui-min, M., Dan, L., Yi, L.: Application Research of Backpropagation Neutral Network in Supply Chain Management Performance Index. Industrial Engineering andManagement 5 (2005)
A Dynamic Constraint Programming Approach Eric Monfroy1 , Carlos Castro2 , and Broderick Crawford3 1
LINA, Universit´e de Nantes, Nantes, France and Universidad T´ecnica Federico Santa Mar´ıa, Valpara´ıso, Chile
[email protected] 2 Universidad T´ecnica Federico Santa Mar´ıa, Chile
[email protected] 3 Pontificia Universidad Cat´ olica de Valpara´ıso, Chile and Universidad T´ecnica Federico Santa Mar´ıa, Chile
[email protected] Abstract. Constraint Programming (CP) is a powerful paradigm for solving Combinatorial Problems (generally issued from Decision Making). In CP, Enumeration Strategies are crucial for resolution performances. In a previous work, we proposed to dynamically change strategies showing bad performances, and to use metabacktrack to restore better states when bad decisions were made. In this work, we design and evaluate strategies to improve resolution performances of a set of problems.
1
Introduction
In [Castro et al., 2005], we proposed a framework for adaptive strategies and metabacktracks to find solutions more quickly and with less variance in solution time, i.e., trying to avoid very long runs when other strategies (or sequences of strategies) can lead quickly to a solution. This framework aims at dynamically detecting bad decisions made by enumeration strategies: instead of predicting the effect of a strategy, we evaluate the efficiency of running strategies, and we replace the ones showing bad results. When this is not sufficient we also perform metabacktracks (several levels of backtracks) to quickly undo several “bad” enumerations and restore a “better” context. In this work, using our framework, we design some dynamic enumeration strategies and metabacktracks and we evaluate them.
2
Dynamic Enumeration Strategies
We consider 12 basic/static enumeration strategies (i.e., strategies that are usually fixed for the whole solving process but that we use as part of our dynamic strategies) based on 3 variable selection criteria: min selects the variable with the smallest domain (the first one in the order of appearance of variables in the problem for tie-breaking); Mcon is the most constrained selection, i.e., Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 653–656, 2009. c Springer-Verlag Berlin Heidelberg 2009
654
E. Monfroy, C. Castro, and B. Crawford
selection of the (first) variable that appears the most often in the constraints; and ran randomly selects a variable which is not yet instantiated. Variable selection criteria are combined with 4 value selection criteria: Min (resp. Mid, Max, Ran) selects the smallest (resp. the middle, the largest, a random) value of the domain of the selected variable. The observations (snapshots) we consider focus on the search tree to draw some indicators on the resolution progress are: – – – –
M axd: the maximum depth reached in the search tree, d: the depth of the current node, s: the size of the current search space, f , f : the percentage of variables fixed by enumeration (respectively by enumeration or propagation), – v, vf , vf e: the number of variables, the number of fixed variables, the number of variables fixed by enumeration.
The indicators we compute are the following, where F is the last taken snapshot, and F − the previous one: – δn1 = M axdF − M axdF − represents a variation of the maximum depth, – δn2 = dF − dF − : if positive, the current node is deeper than the one explored at the previous snapshot, – δn3 = 100 ∗ (sF − − sF )/sF − : a percentage of reduction since F − ; if positive, the current search space is smaller than the one at snapshot F − , – δn4 = fF −fF − (respectively δn5 = fF −fF − ): if positive, reflects an improvement in the degree of resolution (resp. resolution made by enumeration), – δn6 = dF − − vf eF − : an indicator of thrashing. Each strategy has a priority of 1 at the beginning. Then, the priority p of the last running strategy is updated as follows based on 5 of the indicators: 5 reward : if δni ≥ 10 then p = p + 1 i=1 5 penalty : if i=1 δni ≤ 0 then p = p − 3 The dynamic strategies are based on a set of static enumeration strategies and the rules for decision shown above. We consider two types of strategies. The first type uses a set of strategies and it always applies the strategy of highest priority (we randomly choose one of the highest when we need tie-breaking, and also the first static strategy is randomly chosen). We consider the following dynamic strategies: Dynmin, DynM con, DynRan, and DynAll that apply this selection process on the set of strategies min, M con, and Ran, and all the 12 strategies respectively. The second type of dynamic strategies uses a partitioning of the basic strategies into subsets. When a strategy s must be changed, the strategy of highest priority from a different subset than the one containing s will be selected. The strategy DynAllP ack partitions the 12 strategies into 3 subsets, i.e., the group of min strategies, the one of M con, and the one of Ran. The metabacktrack will be triggered when the thrashing is too important: meta : if δn6 > 4 then meta − backtrack
A Dynamic Constraint Programming Approach
655
Table 1. Static enumeration strategies vs. metabacktracks minAV G mb10min rsmin MconAV G mb10Mcon rsMcon RanAV G mb10Ran rsRan AllAV G mb10All mb10AllP ack rsAll rsAllP ack
Q150 Q250 M10 M11 0 0 75 75 9 0 0 12 6 3 3 9 75 78 95 100 78 90 34 84 78 75 28 87 78 83 95 100 81 78 81 93 75 84 78 93 51 54 88 92 65 63 0 10 76 92 0 2 72 68 0 11 71 89 0 4
L20 0 0 0 75 18 9 75 18 12 50 28 32 30 23
L25 0 0 0 75 50 68 80 90 90 52 40 62 32 67
Ps 25 4 4 83 59 58 85 74 72 64 34 44 36 42
Table 2. Static strategies vs. dynamic enumeration strategies and metabacktracks minAV G Dynmin + mb10min Dynmin + rsmin MconAV G DynMcon + mb10Mcon DynMcon + rsMcon RanAV G DynRan + mb10Ran DynRan + rsRan AllAV G DynAll + mb10All DynAllP ack + mb10AllP ack DynAll + rsAll DynAllP ack + rsAllP ack
Q150 Q250 M10 0 0 75 18 2 2 13 2 0 75 78 95 72 78 24 76 76 34 78 83 95 89 89 75 81 83 68 51 54 88 63 65 0 78 95 1 56 55 0 76 87 0
M11 75 21 21 100 91 81 100 100 94 92 1 7 8 8
L20 0 0 0 75 16 7 75 18 24 50 29 22 18 23
L25 0 0 0 75 43 37 80 91 82 52 37 76 34 74
Ps 25 7 6 83 54 52 85 77 72 64 33 47 29 45
We consider 2 techniques of metabacktracks. The first technique (mb10) performs n steps of backtracks, i.e., mb10 goes up of n nodes in the search tree undoing n enumerations. n is computed as 10% of the total number of variables of the problem. The second technique (rs) is a restart, i.e., the metabacktrack jumps to the root of the search tree.
3
Experimental Results
We are interested in finding the first solution in less than 10 min., time that we call timeout. Our prototype implementation in Oz1 fixes the constraint propagation process: arc-consistency [Mackworth, 1977] computation (with dedicated algorithms for global constraints) with a look ahead strategy [Kumar, 1992]. The snapshots are taken every 80ms. Tests were run on an Athlon XP 2000 with 256 MB of RAM. We consider classical problems: 150-queens (Q150), 1
http://www.mozart-oz.org
656
E. Monfroy, C. Castro, and B. Crawford
250-queens (Q250), magic squares of size 10 (M 10) and 11 (M 11), and latin squares of size 20 (L20) and 25 (L25). In all tables the first six columns represent problems, rows represent strategies: a cell is the percentage of timeouts over 100 runs (i.e., how many times a strategy was not able to find a solution in 10 min.). The last column (P s) represents the average performance of strategies on the set of problems: it is thus the average number of timeouts (in percentage) obtained by a strategy on 600 runs (100 per problem). A row xxxAV G is the average of some strategies on a problem.
4
Conclusion
We have presented some dynamic enumeration strategies and metabacktrack based on running strategy performances. Our dynamic approach is able to detect bad cases to repair strategies and states. The experimental results prove that on average, on a set of problems, our dynamic strategies and metabacktracks significantly improve the solving process. Acknowledgements. The second author has been partially supported by the Chilean National Science Fund through the project FONDECYT 1070268. The third author has been partially supported by Escuela de Ingenier´ıa Inform´atica PUCV through the project INF-03/2008 and DGIP-UTFSM through a PIIC project.
References Castro, C., Monfroy, E., Figueroa, C., Meneses, R.: An approach for dynamic split ´ Terashima-Mar´ın, strategies in constraint solving. In: Gelbukh, A., de Albornoz, A., H. (eds.) MICAI 2005. LNCS, vol. 3789, pp. 162–174. Springer, Heidelberg (2005) Kumar, V.: Algorithms for Constraint-Satisfaction Problems: A Survey. A.I. Magazine 13(1), 32–44 (Spring, 1992) Mackworth, A.K.: Consistency in Networks of Relations. AI 8, 99–118 (1977)
The Evaluation of the Universities’ Science and Technology Comprehensive Strength Based on Management Efficiency Baiqing Sun, Yange Li, and Lin Zhang School of Management, Harbin Institute of Technology, Harbin 150001, China
Abstract. This paper proposes the evaluation methods of Universities’ science and technology, which can eliminate the advantage and disadvantage effects between the objective being evaluated, truly reflect the improvement of science and technology comprehensive strength, due to the subjective efforts, and very fair and objective. Management efficiency methods make universities be in a difficult condition, which is beneficial to university manager to find differences and improve management condition duo to the dynamic changes of the benchmark indexes and present indexes of the colleges and universities’ science and technology comprehensive strength. It is showed that management efficiency methods have impact of incentive on the all universities, and have a broad application prospect in the area of performance evaluation. Keywords: Management efficiency; evaluation indexes; performance evaluation.
1 Introduction Universities are the important component of the national innovation system, which plays an important effect in the strategy of prospering the nation. In the real economy system, many industries, units and departments exist such a situation that, taking the universities for example, the differences of objective basic condition in universities influence the actual operating results. The performance evaluation of universities, which have good objective basic condition, is always better than the bad ones. However, this does not mean that the subjective efficient effort and ability of the manager in bad basic condition is lower than that in good basic condition. Though the former scholars recognized the problem, but they have not given a more appropriate solution, thus, the performance evaluation results only reflect the comprehensive strength of the object, not accurately reflect the person's degree of subjective efficiency effort and the contribution of their ability to the performance. It is for this reason why many poor performance of universities is always talking about their inefficient from the area the advantage and disadvantage of the objective basic condition, but lack the analysis of its subjective efforts and ability. In order to fairly and objectively reflect the degree of subjective effort and ability of the managers in universities , this paper puts forward the evaluation method of management efficiency, truly reflect the improvement of the universities science and Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 657–660, 2009. © Springer-Verlag Berlin Heidelberg 2009
658
B. Sun, Y. Li, and L. Zhang
technology comprehensive strength, achieved by managers’ subjective efficient effort. The application results showed that the result evaluated by method of management efficiency can reflect the subjective initiative of various universities in different objective basic conditions, which can excite the scientific research enthusiasm and enterprising spirit of the universities, impel managers to sum up experience actively, find problems, tape the potential for scientific research, make coping measures, implement standardize management, in order to seek development in competition.
2 Method of Management Efficiency First, evaluating the past science and technology strength by use of some quantitative method, the evaluation result received reflecting the objective basic condition of universities, is called reference index. In the same way, the present index is the evaluation result of present science and technology strength. Supposing ( x j , y j ) is the j th university’s index state, we call
T = {(x, y) |
n
n
n
j=0
j= 0
j= 0
∑λjx j ≤ x, ∑λj y j ≥ y, ∑λj = 1, λ j ≥ 0, j = 0,1,2,L, n}
as possibility sets of index state composed by index states ( x j , y j ) , where
( x 0 , y 0 ) = (0,0) .The possibility sets of index state T is obviously a convex set, that is if ( x ′, y ′) ∈ T , ( x ′′, y ′′) ∈ T ,then ( λ x ′ + (1 − λ ) x ′′, λy ′ + (1 - λ )y ′′) ∈ T , where 0 ≤ λ ≤ 1. The DEA model of measuring the management efficiency evaluation value of the university is as follows: max Z n
s.t.
∑λ
jx j
j= 0 n
∑
≤ x j0
(1) λ j y j ≥ Zy
j= 0 n
∑λ
j
j0
= 1, ∀ λ j ≥ 0
j= 0
j = 0 ,1,2, L , n
Where x j is the
j th university’s benchmark index, y j is its’ present,
j = 0,1,2, L , n , x0 = y 0 = 0 . If the linear programming(1), optimal value Z 0 = 1 , then we call the university is on the production frontier of the possibility sets of index state T . If Z 0 is the optimal value of linear programming(1), let x j0 = x j0 , y j0 = Z 0 y j0 , it is easy to see that
( x j0 , y j0 ) is on the production frontier of the possibility sets of index state, then
The Evaluation of the Universities’ Science and Technology
659
( x j0 , y j0 ) is the projection of the j 0 th university’s index state ( x j0 , y j0 ) to the possibility sets of index state T . Supposing Z 0 is the optimal value of linear programming(1), management effi-
ciency evaluation value η
= y / y × 100% = 1 / Z 0 × 100% expresses The percent-
age of evaluation universities’ present index to the maximum present index achieved in the same reference conditions.
3 The Evaluation Index System of University’s Comprehensive Science and Technology Strength In order to fully reflect the universities’ comprehensive science and technology strength, the evaluation should be according to the reality of universities, and follow the principles of optimizing subject structure, achieving the reasonable and optimal configuration of resources and improving the comprehensive competitiveness of universities. First, carry out statistical survey and get the summarized value of mutual related statistical indexes to explain the quantitative and development circumstances. If several indexes independent but related compose the evaluation index system, then they can overall reflect the whole process of scientific research activities in input and output. The thesis sets up the evaluation index system of university’s comprehensive science and technology strength.
4 Application Example This paper is based on the evaluation index system of university’s comprehensive scitech strength posed by table 1, adopts sample index data 1999-2000 from 28 universities directly under the State Education Commission. It takes the score of first level in 1999 of 28universities as the reference indexes, the score of first level in 2000 of 28universities as present indexes, and calculates the evaluation results of university scitech comprehensive strength in 2000. The calculating results shows one kind of universities with lower value of management efficiency evaluation has lower value in both reference indexes and present indexes, such as East China University of Science and Technology, Tongji University. The other kind of universities with lower value of management efficiency evaluation declined in the present indexes compared with the reference indexes, such as Sun Yat-sen University. We can see by that the evaluation value of management efficiency do not dependent on its’ reference indexes, that is not dependent on its’ objective basic condition, but on elevated amplitude of present indexes. It is worth noting that: although Peking University only ranked No.4, but it still belongs to the university with big improvement of sci-tech comprehensive lstrength. This is because that the evaluation results don’t dependent on their objective basic condition, but on the elevated range of their present indexes. This shows that the ranking orders of various universities can truly reflect the sci-tech comprehensive strength and their position, for example, the advancement range in sci-tech comprehensive strength of top universities like Tsinghua University and Peking university is wider than that of behindhand universities like South China University of Technology,
660
B. Sun, Y. Li, and L. Zhang
Huazhong University of Science and Technology, but universities with close ranking orders are almost the same in advancement range. So, it should be more important that the rank of advancement range in university sci-tech comprehensive strength (including the very large, the larger, general high, the smaller, non-progress, and so on), not the excess clacking on ranking orders.
5 Conclusions The evaluation of university science-technology comprehensive strength by use of management efficiency can motivate all the universities: the universities with good objective basic condition can not sit back and relax, and the ones with bad objective basic condition should not feel no hope of catching up, because as long as the advancement range is wide, the high evaluation result of management efficiency can also be achieved, so that all universities can feel both hope and pressure. The management efficiency method enable colleges and universities to enhance the sense of urgency, which is beneficial not only to find the gap between them, tape the potential and thereby improve the operation and seek a broader space for development,and has a broad application foreground. Acknowledgements. The authors should say thanks to Key Project (70131001) supported by Natural Scientific Research Foundation of China, Project (40000045-607259) supported by Natural Scientific Research Foundation in Heilongjiang Province; Project (HIT. NSRIF. 2008.59) supported by Natural Scientific Research Innovation Foundation in Harbin Institute of Technology, foundation of National Center of Technology, Policy and Management, Harbin Institute of Technology.
References [1] Shulian, W.U., Jia, L.V., Shilin, G.U.O.: Evaluation of China’s universities in 2008 [J]. Science of Science and Management of S & T (1), 42–51 (2008) [2] Wei, Q.: Method of DEA in evaluating relative efficiency-new areas in operations research [M]. Renmin University of China Press, Beijing (1988) [3] Feng, Y., Hui, L.V., Kexin, B.I.: An AHP/DEA method for measurement of the efficiency of R&D management activities in vniversities [J]. International Transactions in Operational Research 10, 1–11 (2003)
MCDM and SSM in the Public Crisis Management: From the Systemic Point of View Yinyin Kuang and Dongping Fan School of Public Administration, South China Normal University
[email protected] Abstract. From the perspective of system science, this paper analyzes the system mechanism of public crisis formation and argues that the essence of public crisis is a variety of bifurcations of the social system. Accordingly, the main characteristic of public crisis is shown as an ill- structured problem situation. In the public crisis management, MCDM, as a hard system approach, is strong in coping with the managerial problem which has clear objectives and well structure. When encountered ill- structured problem situations, SSM can be a supplementary methodology to MCDM.
1 Introduction With the process of globalization, the new revolution of technology and people’s pursuit of utility, the human society is undergoing different kinds of bifurcations. Accordingly, the possibility of public crisis has greatly increased. How to recognize and deal with crisis, improve and strengthen crisis management has become an urgent issue nowadays. Public crisis management has tremendous complexity and such complexity determines the diversity of management methodology. From the perspective of systems thinking, MCDM, which belongs to hard systems approach, is better used in the managerial problems with clear objectives and well structure. SSM, created by Peter Checkland and his colleagues, is used in ill-structured problem situation of the human activity system, that is, the problem faced are not well-defined, but messy, vague and with no explicit objectives. MCDM and SSM have different characteristics on methodology, however, they are complementary in the public crisis management, they should be both applied in the management process in order to have it improved and strengthened.
2 Public Crisis Is a Variety of Bifurcations of the Social System The social system is a self-organization system, thus, the fluctuation- bifurcationmechanism of self-organization can help to learn the formation of crisis. In the evolutionary process of self-organization, when the system evolves from one state to several possible states, it will definitely evolve towards a certain bifurcation point to select one state from several different stable states. However, it is unpredictable and no judgment about which steady state the system will get to (Prigogine 2005; Xian, Fan and Zhang 2006). It depends on the fluctuation of the system in the Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 661–668, 2009. © Springer-Verlag Berlin Heidelberg 2009
662
Y. Kuang and D. Fan
self-organization process.When the system has a stable or equilibrium structure, the fluctuation, usually, would be very slight and can be offset by the negative feedback effect of the structure. But once the control parameter values approach to a certain critical point (bifurcation point), a significant change of the system is observed (Tsonis 1992; Xian, Fan, Zhang 2006). In this case the system become unstable or unbalanced, a slight fluctuation will grow fast and become a giant fluctuation which represents a new kind of ordered structure. Once it is amplified and stabilized by some kinds of positive feedback mechanism, it becomes a new predominant ordered structure. This is the fluctuation-bifurcation mechanism of the self-organization that a system arrives to an ordered state through the process of fluctuation and bifurcation. The fluctuation-bifurcation mechanism of the self-organization can be of help to explain the discontinuous change of the social system (Laszlo 1988,). From the systemic point of view, the essence of the public crisis is just the different kinds of bifurcations (Laszlo 1988) of the social system. When the acting force from outer part (natural environment) or inner part (various unstable social elements) exceeds the critical stress point that stabilizes the social system, the crisis will burst out. Before arriving at the critical point, the behavior of the system is relatively in order except some periodic fluctuation, however, once it exceeds the critical point, the system order will be broken and the system will get into chaos. Nevertheless, the chaos will eventually give way to a new kind of order, that is, a new social state after the crisis (Figure1).
Fig. 1. The Fluctuation-Bifurcation process of the Social System
3 Public Crisis Is an Ill-Structured Problem Situation The system mechanism of the public crisis formation helps us to get a more profound understanding of its characteristics. When the crisis breaks out, as the system is approaching or already entered to chaos state, it will display high nonlinearity, unpredictability, uncertainty, and extreme sensitivity to the initial conditions. (Prigogine 2005; Xian, Fan, Zhang 2006). Accordingly, the public crisis has the following characteristics
MCDM and SSM in Public Crisis Management z z z
663
Public crisis is sudden happening. Crisis events are in high uncertainty and randomness. The crisis has a wide range of influences and a certain extent of diffusivity as well, one crisis is able to arouse another, namely chain reaction (Xie, Zhang, Zhong 2003).
The former three characteristics determine the main characteristic of public crisis: it is shown as an ill-structured problem situation. For example, the crisis event is related to long standing problem from history and it touches the core value and moral principle of the society (Xie, Zhang, Zhong 2003); the problems involved in the crisis show strong complexity and related to many interest groups; The crisis influences intensively and the objectives are not definite, solution shows diversity; The objectives are clear but they are conflicting, it is difficult to order them by importance, so it seems that impossible to find a feasible solution accepted by different parties in a short time.
4 MCDM Is a Hard Systems Approach MCDM (Multi-Criteria Decision-Making) is developed from Single-objective (criteria) decision making (such as Linear Programming, Non-Linear Programming and Dynamic Programming) which is a method of hard OR in the early days. Completely different from the Single-objective (criteria) decision making, MCDM is multi criteria and focus on the situation when the multi criteria are in collision. Accordingly, the goal of MCDM changes from selecting the best solution to choosing the most satisfactory solution (Zeleny 1982). So MCDM has much stronger ability in coping with complex problems than Single-objective (criteria) decision-making. Furthermore, this ability has been promoted with the rise and development of fuzzy MCDM. From the perspective of systems thinking, MCDM belongs to hard systems approach. “hard systems thinking”, given by Checklend (1981), is a general designation for the different kind of methodologies for the real-life problem solving which is developed during the second world war and a period of time after its finish. The systems approaches relate to this general designation are systems engineering, systems analysis and operational research. However, it becomes clear that we can add to the list other approaches such as decision science and management cybernetics (Jacson 2005). Hard systems approaches can be used in areas where problems are welldefined and the objectives of a particular system can be found out clearly. In addition, the means of hard systems methodology can be quantitative measure and optimization, and the technical factors will tend to predominate. As a result, the process of such methodology is a “how-oriented” activity which can be linearized to attain the aim step by step. Hard systems approaches have their basic characteristics as follows: 1. The process is basically linear. Take MCDM process for example, it could be described as five steps (Chankong& Haimes 1983): z z z
Initial step (general description of the overall demand and objective) problems expression( clearly define the objectives set and basic elements) establishment of system model
664 z z
Y. Kuang and D. Fan
analysis and evaluation Planning for action
Though feedbacks exist among the process, that is, the information from the next step may be fed back to the previous for correction, whereas, the process is not emphasized as a learning cycle. The last step of the process is planning for action. If the objective has been attained, the process has been ultimately finished at the same time, so we say MCDM process is basically linear. 2. The main characteristic and advantage of the hard systems approach is its method of global optimization. As H. Chestnut (1967) illustrated in Systems Engineering Methods, “the systems engineering method recognizes each system is an integrated whole even though composed of diverse, specialized structures and sub-functions. It further recognizes that any system has a number of objectives and that the balance between them may differ widely from system to system. The methods seek to optimize the overall system functions according to the weighted objectives and to achieve maximum compatibility of its parts.” 3. The hard systems problem is defined as the gap between objective state S1 and the current state S0. Problem solving is to choose the best project to eliminate the distance of S1-S0 (Checkland 1981). Here, the objective is clear, but if in the situation that the objective is unclear, the hard system approach may not be of use.
5 SSM and Its Characteristics SSM is a response to the difficulty in applying hard systems thinking to human activity systems which is complex, fuzzy and pluralistic. The problems in this area are usually intangible and messy, namely ill-structured (Checkland 1981, 1999, Jackson 2000). How to deal with such a large number of ill-structured problems in the human activity systems? Checkland presents a seven - stage model in Systems Thinking, Systems Practice (Checkland 1981), which is the best known today.
Fig. 2. The currently preferred representation of SSM (adapted from Checkland, 1999, P. 49)
MCDM and SSM in Public Crisis Management
665
However, Checkland has no longer used the seven- stage model since he found it too limiting. In order to stress that the learning cycle could be commenced at any stage and that SSM was to be used flexibly and iteratively (Jackson 2000), in its latest account, SSM is presented as the four- activity model (Figure2). This model consists of the following four activities (Checkland 1999): 1. Finding out (Perceiving and expressing) a problem situation. 2. Formulating some relevant purposeful activity models. 3. Debating the situation, seeking systemically desirable and culturally feasible changes, and the accommodations between conflicting interests which will enable action to be taken. 4. Taking action to improve the problem situation. Four characteristics of SSM are noteworthy: Firstly, SSM is concerned about problem situation but not problem. That is because for some ill-structured problem, it becomes a problem itself what system it should belong to. Therefore, different from hard systems approach, the first step of SSM is to build up the richest possible picture of the problem situation in order to make it clear rather than put the problem out too early. In addition, since most of the human activity systems have no well-defined objectives, SSM does not attempt to determine what the objectives are but to choose relevant human activity systems, prepare “root definitions” from these relevant systems, and construct the conceptual models. As a result, while hard systems approach leads to the design of systems, SSM to the implementation of agreed changes. Secondly, SSM lays great stress on the interpretation of pluralistic value of the system objectives. For a human activity system, different people have different interpretation on what the problem situation is, what objectives of the system should be achieved and how to improve the system. That is because people hold different weltanschauungs, world views and cultural backgrounds. Thirdly, in the process of interpretation of the system, SSM attaches great importance to the intervention of social, political and cultural factors, which reflected by the “two strands” version (Checkland and Scholes 1990) of SSM. This model gives equal space to the culture stream of analysis and to the logic-based stream. Fourthly, SSM is a learning cycle. As M. C. Jackson (2000) illustrated in his book Systems Approaches to Management, “Problem resolving in social systems is, for Checkland, a never-ending process of learning, in which participants’ attitudes and perceptions are continually tested and changed, and they come to entertain new conceptions of desirability and feasibility.”
6 The Roles of MCDM and SSM in Public Crisis Management It goes without saying that MCDM plays an important role in the Public Crisis Management. For example, it applies to various risk analysis of the hazards, decision making for resource allocation, as well as various establishment and evaluation of the emergency planning. However, as mentioned in the second part of this paper, the public crisis is an ill-structured situation, so MCDM, is sill not able to break away from its limitation as a hard systems approach in the public crisis management, that is, in the
666
Y. Kuang and D. Fan
absence of well-defined problem, agree goals and objectives, and an obvious hierarchy of systems to be engineered, this approach can not be used at the very beginning. The Global Financial Tsunami, for instance, is a typical ill-structured situation. It broke out all of a sudden, and it has widespread impact which involves quite a few interest groups. Facing with such difficulties, every country takes actions to cope with it. However, “What actually the problem is?” It is vague from the begging. Though people give various explanations for the cause of this economy crisis, most of them usually just see the fuse which cause its explosion, yet the relevant systems behind it have not been present. They include the political and culture systems. Hard systems approach can not lead us to find out these wide relevant systems, so it is hard to find out clear and agreed objectives or criteria. In the absence of these, using hard systems approach too early can only lead to distorting the problem situation and to jumping to premature conclusion (Jackson 2000). Thus, SSM can be a supplementary methodology to MCDM and it plays an important role in the public crisis management. Firstly, SSM can be of help in the situation that the problem is not well-defined and the objectives are not clear. Actually MCDM also attaches importance to the problemexpression stage of its process, but when facing ill-structured or unstructured problems and the decision maker still not clear about the nature of the problem, it could not be sure that the main defined objective and its objectives set are accurate (Keeney, Raiffa 1976). Accordingly, the optimization function, which built up on it, can not help us to select a satisfactory solution. Public crisis has a high degree of uncertainty, which is a typical ill-structured situation, thus, for a decision maker, “what to do” is even more important than “how to do it” at the beginning. In this case, SSM can help us to make clear the situation, lead to feasible and desirable changes to improve the problem situation, under special circumstances, to tackle more structured problems with hard approaches. Secondly, SSM can be of help to coordinate conflicting values and interests. Like SSM, MCDM also recognizes pluralistic goals and values. This is particularly reflected in solution selecting and evaluation (such as MADM), since pluralism of attributes or criteria itself reflects pluralism of values. But some of the attributes are incommensurable and hard to quantify. This also increases the difficulty to exert pluralism (Chen1987, Feng 1990). Furthermore, compared with SSM, MCDM does not have a wide range debate about possible changes among those concerned with the problem situation, does not have an accommodation of different worldview, which develops over changes that are both desirable and feasible. Therefore, when facing interest conflict among multi-parties, in which value of the participants has high inconsistency, such as the religious conflict, MCDM becomes weak and SSM is a very good supplementary methodology to MCDM. Thirdly, SSM can be of help to improve the crisis management in the long run. The process of MCDM is an open-circuit linear process while SSM is a loop-circuit learning process. SSM is more suitable for some managerial problems which need longterm improvement, rather than the problems which can be solved in a short time. On the other hand, SSM attaches great importance to culture stream, emphasizes the impact from social politics, culture and history, and also stresses the publicity and fairness during the decision- making process. Therefore, SSM can combine with MCDM to strengthen its comprehensiveness and humanity in every decision- making process during the preventing stage or restoration stage of crisis management.
MCDM and SSM in Public Crisis Management
667
7 Conclusions Positivism and humanism have always been two types of tension in the social studies. In the process of handling problems of human activity systems, it seems that as if hard systems methodology and software systems methodology are just these two types of tension, of which the former tends to logical thinking while the latter tends to culture stream. Accordingly, the former belongs to objectivism, functionalism and reflectionism of epistemology, while the latter belongs to subjective constructionism, hermeneutic, and pluralism. MCDM, as the outstanding achievements of the development of hard systems approaches, its ability of handling systems complexity is continuously enhanced with the development of research. It also plays an important role in the public crisis management. However, as we can not ignore the tension of humanism, we can not ignore the application of SSM either, which is also a response to the great complexity of public crisis management. Hard systems methodology and soft systems methodology are both needed in the public crisis management.
References 1. Checkland, P.B.: Systems Thinking, Systems Practice, pp. 160-181, 201–226. John Wiley & Sons, Chichester (1981) 2. Checkland, P.B., Scholes, J.: Soft Systems Methodology in Action, vol. 91. John Wiley & Sons, Chichester (1990) 3. Checkland, P.B.: Systems Thinking, Systems Practice (new edn, including a 30-year retrospective). John Wiley & Sons, Chichester (1999) 4. Chen, W.: Analysis of decision - making. Science Press, Beijing (1987) 5. Chestnut, H.: Systems Engineering Methods, vol. 10. Wiley, New York (1967) 6. Chankong, V., Haimes, Y.Y.: Multi objective Decision Making: Theory and Methodology Series, vol. 8. Elsevier Science Publishing Co., Inc., Amsterdam (1983) 7. Feng, S.: The approach and application of multiple criteria decision – making, Guangzhou, China (1990) 8. Heath, R.: Crisis management for managers and executives. Pearson Education Limited, London (1999) 9. Jackson, M.C.: Systems Approaches to Management, pp. 246–247, 254–256. Kluwer/Plenum, New York (2000) 10. Jackson, M.C.: Systems Thinking — Creative Holism for Managers (Chinese edn.), vol. 17, 47. China Renmin University Press (2005) 11. Keeney, R.L., Raiffa, H.: Decisions with multiple objectives: preferences and Value Tradeoffs. John Wiley & Sons, New York (1976) 12. Laszlo, E.: Design for evolution - Managing the coming bifurcation of society (Chinese edn.), pp. 1–17. Social Sciences Academic Press, Beijing (1988) 13. Laszlo, E.: Evolution — The Grand Synthesis (Chinese edn.), vol. 30, pp. 179–199. Social Sciences Academic Press, Beijing (1988) 14. Prigogine, I., Stengers, I.: Order out of Chaos (Chinese edn.), pp. 160–169. Century Publishing Group, Shanghai Translation Publishing House, Shanghai (2005) 15. Tsonis, A.A.: Chaos — from theory to applications, pp. 103–115. Plenum Press, New York, London (1992)
668
Y. Kuang and D. Fan
16. Lan, X., Qiang, Z., Kaibin, Z.: Crisis management in China — The Challenge of the Transition, pp. 27–35, 161–194. Tsing Hua University Press, Beijing (2003) 17. Yan, Z., Fan, D., Zhang, H.: An Introduction to Systems Science, pp. 348–351. The People Press, Beijing (2006) 18. Zeleny, M.: Multiple criteria decision making. McGraw-Hill, New York (1982)
The Diagnosis of Blocking Risks in Emergency Network∗ Xianglu Li1, Wei Sun1, and Haibo Wang2 1
School of Economics & Management, Zhongyuan University of Technology, Zhengzhou, China 450007
[email protected] 2 Sanchez School of Business, Texas A&M International University, Laredo, TX 78041
Abstract. For the maximum flow f* in the network N, if an arc set B has characteristics: the increase of the capacity of every arc will increase the maximum value v* of the network and meanwhile decrease the total expense Z; then, we call “B” as the “Blocking Set” and the network with blocking set as “illconditioned network”. In this paper we first characterize the blocking set and ill-conditioned network and second, present the optimal investment decision for increasing the capacity and removing ill-conditioned network. Keyword: network flow, minimum cost and maximum flow, blocking set, illconditioned network.
1 Introduction In a traffic jam or in a situation of emergency evacuation, the network usually has risks of being blocked by the saturated flow, which will lower the transportation efficiency, congest the access and possibly bring the financial loss and even the death. Generally, blocking problems are found in the ill-conditioned network with unreasonably-distributed capacity. Studying the problem of diagnosis on blocking risks of emergency network and discovering the true problems of ill-conditioned network, will play a significant role in improving the network, enhancing the network transportation efficiency, as well as avoiding and removing the blocking risks. The text defines the meaning of “blocking set” by viewing from another point of minimum cost and maximum flow, specifies ill-conditioned network and further make the optimal decision on increasing the capacity and removing ill-conditioned network.
2 Problem of Minimum Cost and Maximum Flow and Blocking Set In Network flow theories, Min-cost Max- flow is a typical problem with a broadlyapplicable value. We are given a flow-network N=(s, t, V, A, c, w), among which, V={1, 2,…,n}; it is a vertices set; A is arc set; s, t V are the source and is the sink respectively. For the arc(i ,j) A, c(i, j) w(i, j) and f(i, j) stand for the capacity limit,
∈
∗
,
∈
Project supported by Natural Science Foundation of Henan Province (No. 0411011400).
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 669–675, 2009. © Springer-Verlag Berlin Heidelberg 2009
670
X. Li, W. Sun, and H. Wang
weighting (cost/unit) and flow respectively. The Min-cost Max- flow problem is the linear programming as follows:
min z =
∑
w(i, j ) f (i, j )
(1.1)
( i , j )∈ A
s.t ⎧v* if i = s ⎪ f (i, j ) − ∑ f (i, j ) = ⎨−v* if i = t (1.2) ∑ j: ( i , j )∈A j: ( j ,i )∈ A ⎪0 otherwise ⎩ 0 ≤ f (i, j ) ≤ c(i, j ) ∀(i, j ) ∈ A (1.3) Where v* is the value of maximum flow (see [1-4]). Needless to say, solving such optimal problem is to seek the optimal transportation plan with given network structure and parameter distribution. Considering from another view, we also found the disadvantages and shortcomings of the network, to improve its performance. For instance, due to unreasonable capacity distribution, some networks are blocked in some barriers and therefore force the flow detours in a long way, thus bring a costly transportation. Once those places’ blocks are removed, the flow value will be increased and meanwhile, the total costs will be reduced; it is called a ill-conditioned network. The blocking places will be the key to enhance the network efficiency. Thus, following concepts are made. Definition 1. For the maximum flow f* in the network N, if an arc set B has following characteristics, it will be called “blocking set”; if the capacity of every arc is increased, the maximum value v* of the network will be increased, yet the total expense Z will be decreased. For instance, as the Network N shown in Figure 1, numerals in the array (c, w) of all arcs stands for the capacity and weight. The maximum flow of the network in Figure 1 (a) is f(s, a)=f(b, t)=2, f(s, b)=f(a, t)=f(a, b)=1; the flow value v*=2 and the total cost Z is 16. a a (1ˈ1) (2ˈ1) (2ˈ1) (2,1) S (1ˈ10) t S (1,10) t (1,1) (2,1) (2,1) (2,1) b b (a) (b)
,
In this network, B={(s, b) (a, t)}; it is the blocking set, because if the arc capacity of B increases by 1 as shown in Figure 1 (b), the maximum flow f(s, a)=f(a, t)=f(b, t)=2; the value will be increased as v*=4 and the total cost z will be reduced to 8. The “more delivery, less cost” is similar with the paradox of transportation problem (see [5,6,7]); however, this paradox is caused by unreasonable allocation of supply
The Diagnosis of Blocking Risks in Emergency Network
671
and distribution. The current blocking problem is caused by the unbalanced distribution of capacity. Besides, the transportation problem refers to the special network, but the minimum cost in discussion now aims at the common network. Literatures [8, 9, 10] have also talked about the blocking situations in the discussion of minimum saturated flow. In the congestion course of emergency evacuation, the flow can not be increased in the application of fore arc only, thus there is a block and a network will be blocked easily; the smaller the saturated flow is, the worsen the performance will be. “Blocking” in the minimum cost and maximum flow problem, though is a different concept, the purpose of studying such diagnosis problem is consistent---finding where the essential problem of ill-conditioned network is, to further improve the network and enhance the transportation efficiency.
3 The Definition of Blocking Set Feasible solution f of the above linear programming (1.1)-(1.3)is called as the maximum flow. To the path P (which is not necessarily a directed path) from s to t, the arc that goes along (or against) with the direction from s to t is called as forward (backward)arc. We can take P as the set of arcs, in which the set of forward arcs is denoted by P+ , while the set of backward arcs is noted as P- , then P= P+ P- .The path P from s to t is defined as flow increasing path, if all of forward arcs are unsaturated, then all of backward arcs shall possess positive flow, that is:
∪
, )∈P f(i,j)=C(i,j) (i,j)∈P f(i,j)>0
(i j
+
-
Lemma 1. ([1,2]) A flow f is the maximum flow, if and only if there is no flow increasing path. One partition of vertices set V is( S , S ) , in which s ∈ S, t ∈ S is called as a s-t cut, denote f its forward arcs as( S , S ) ={(i, j A| i ∈ S, j ∈ S }; and the set of its back-
( )∈
)∈
ward arcs as( S , S ) ={ i, j A| i ∈ S , j ∈ S }; The definition for the capacity of cut( S , S ) is c( S , S ) =
∑ c(i, j )
( i , j )∈( S , S )
The minimum cut is the cut with minimum capacity, the famous max-flow min-cut theorem is: Lemma 2. ([1,2]) The value of the maximum flow equals to capacity of the minimum cut.
672
X. Li, W. Sun, and H. Wang
Corollary. ([1,2]) A cut( S , S ) is the minimum cut, if and only if to any of the maximum flow f , its forward arcs (i,j)∈( S , S ) are all saturated, and backward arcs (i, j)∈ ( S , S ) are all no flow. Now consider the characterization of blocking set, the weight for any subset X ⊆ A is defined as the sum of weights of its all arcs, that is: w( X ) = w(i, j ) .
∑
( i , j )∈X
Theorem 1. With regard to the maximum flow f, the subset B ⊆ A is a minimum blocking set, if and only if the following conditions are satisfied. Arcs in B are all saturated arcs; Each minimum cut shall contain the arc in B as the forward arc; There is a path P from s to t, which takes B as the set of forward saturated arcs, and in which all backward arcs shall have positive flow, also w (P+)<w(P-). Proof. Suppose that B is the minimum blocking set of the maximum flow f, if arc e ∈B is unsaturated, then the phenomena of flow increasing and cost decreasing can be occurred without extending the capacity of E, therefore B\{e}is still a blocking set, which is contradict with the minimality, so the condition (1) is tenable. Next, if forward arc of the minimum cut( S , S ) does not include the arc of B, then the capacity extension of B cannot increase the value of flow, so the condition (2) is tenable. Last, to the condition (3). Suppose that capacity of all the arcs of B increases one unit and the network N/ is gotten. Therefore, there must be flow increasing path P existing in the network N, which can make the cost decrease after flow increasing along with P, that implies the following facts: first, as P is flow increasing path, so there shall be f(i,j)>0 to all backward arcs (i,j)∈P-; Second, the cost decreasing after flow increasing along with P is that the variation of cost w(P+)-w(P-)0, the set of new added arcs gotten in this condition can be notes as A/, then take A∪A/ as the arc set of G. (Note: the weight of original arc set is still w(i,j)).
The Diagnosis of Blocking Risks in Emergency Network
673
Theorem 2. There is blocking set existing in the network N, if and only if there is s-t directed path of negative cost existing in auxiliary digraph G. Proof: There are blocking sets in N, and only there is minimum blocking set, which is equal to s-t path that meets the conditions of theorem 1, namely there is s-t directed path of negative cost existing in auxiliary digraph G. As a result, we can search the algorithm of blocking set, which is to search the shortest s-t directed path P in auxiliary digraph G. If the weight of this path is in negative, then Pˆ + will makeup the minimum blocking set, therefore we can further search other negative cost paths, and get other minimum blocking sets, the combination of arbitrary number of minimum blocking sets is blocking set also. It is well known that searching the shortest path between two points in weighted digraph has had very brief algorithm. (See [2, 3]).
4 Definition of Normal Network Definition 3. If a network N has no blocking set, it will be a normal network and vice versa. Theorem 2. In case that a network N is normal, the auxiliary digraph G doesn’t have the s-t directed path of negative cost. Hence, the shortest path algorithm can be applied to test whether a network is normal or not. The test can be completed in the multinomial time. For some special networks, it will be meaningful if the more direct judgment can be given. The following is network with good performance. Definition 4. ([11]) A diagraph is called “Two-Terminal” Series Parallel Digraph. Its recursive conformation is as follows: (1) The diagraph formed by an arc (s, t)is series parallel digraph, in which, s is the starting point and t is the end point. (2) If G1 and G2 are the series parallel diagraph taking s1 and s2 as their respective starting point and t1 and t2 as the end point, then the diagraph derived from the following algorithms is still the series parallel diagraph. 。Parallel algorithm: make the starting points, s1 and s2 of G1 and G2 as superposition (taken as the starting point) and the end point, t1 and t2, as superposition (as the end point). 。Serial algorithm: make the end point, t1 of G1 and the starting point, s2 of G as superposition (s1 will be the starting point; t2 will be the end point). Theorem 3. The network that is based on series parallel diagraph will always be normal, despite of the allocation method of the capacity and power. Proof. Induction is adopted in the arc number of the network N; the conclusion will be clear when the network N is comprised of one arc. Provided that the arc number is less than R, the conclusion is effective and with the consideration of conditions under which arc number is equal to k, there are two conditions then as follows: (1) If N is gotten through parallel algorithm by N1 and N2, then it can be sure that N1 and N2 are all normal by inductive assumption. If N is abnormal, there must be S-t
674
X. Li, W. Sun, and H. Wang
directed path P of negative cost existing in its auxiliary digraph G according to theorem 2, and the path P can be contained either in N1 or N2, both of which is contradict with the normality of them. (2) If N is gotten through serial algorithm by N1and N2, then it can be sure that N1 and N2 are all normal by inductive assumption also. If N is abnormal, there exists negative cost path P. Suppose that the sub path of P in N1 is P1, and that in N2 is P2, then one of the both that in P1 and P2 shall be negative cost path, which is contradict with the normality of N1 and N2. Therefore, the conclusion is tenable to any serial -parallel network. This completes the proof. That is to say, design dredging network by using serial-parallel digraph cannot occur blocking no matter how the capacity is assigning, so it is the network with fine performance, it can be seen easily that the network shown in diagram 1 is not a serialparallel one.
5 Investment Problem When there is found a blocking set in the network N, it must extend the capacity of each arc in blocking set to eliminate the ill condition. While it shall invest funds when extending the capacity in practical problem, then the optimal investment strategy for eliminating blocking is advanced. One maximum flow f in the network N has already been known, as each saturated arc may belong to one certain blocking set, so it is possible to extend the capacity. Therefore there shall introduce an parallel arc e/ for each saturated arc E, whose capacity is without limit, but there shall define a weight w(e/) which shows the cost when the capacity C(e) extending one unit. At the same time considering the balance between the one-time investment for upgrading the network and advancing the longtime efficiency of the network performance (reducing the freight), the unit cost w(e/)shall times a proportional divisor, and then it can get a new weighted network noted as N. Furthermore, It is wished that the value of flow can add up to vˆ = v* + Δv (Δv f 0) after the network upgrading, then the problem of optimal investment change into the problem of searching the minimum cost flow with N as the value in the new network N, while there has had quite nature algorithm for the minimum cost flow in the network (see [2, 3]. After getting the optimum solution, the flow on the new added arc e’ shall be the extended capacity value on the original arc e’.
References 1. Ford, L.R., Fulkerson, D.R.: Flows in Networks. Princeton University Press, Princeton (1962) 2. Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network flows: theory, algorithms, and applications. Prentice Hall, New Jersey (1983) 3. Papadimitriou, C.H., Steiglitz, K.: Combinatorial optimization: algorithms and complexity. Prenticehall, New jersey (1982)
The Diagnosis of Blocking Risks in Emergency Network
675
4. Minieka, E.: Optimized Arithmetic of Network and Diagram. China Railway Publishing House, Beijing (1984) 5. Yun, L.: Conditions of the Transportation Problem “Paradox”. Journal of Operations Research 3(1), 7–13 (1984) 6. Yun, L.: The “Paradox” Problem in Linear Programming. Journal of Operations Research 5(1), 79–80 (1986) 7. Charnes, A., Duffua, S., Ryan, M.: The More-for-Less Parodox in Linear Programming. European J. Oper. Res. 31, 194–197 (1987) 8. Ning, X.: Directed Network Minimal Flow Problem and Its Branch-and-Band Algorithm. System Engineering 14(5), 61–66 (1996) 9. Ning, X.: An Bidirected Flow Increasing Algorithm for Solving the Problem of Minimum Flow in the Network. System Engineering 15(1), 50–57 (1997) 10. Lin, Z., Li, X., Deng, J.: The minimum saturated flow problem in emergeney networks. Operations Research Transactions 5(2), 12–20 (2001) 11. Valdes, J., Tarjan, R.E., Law, L.: The recognition of Series-paralles digraph. Computing 11(2), 298–313 (1982)
How Retailer Power Influence Its Opportunism Governance Mechanisms in Marketing Channel?– An Empirical Investigation in China* Yu Tian and Xuefang Liao School of Business, Zhongshan University, Guangzhou 510275
[email protected] Abstract. This study aims to investigate the relationship among power, norms, contract and opportunism, analysis the effect of retailer power on the choice of norms and contracts, examine the influence of retailer power on opportunism, and discuss the moderating effect of communication between retailer’s power and governance mechanism, and also the moderating effect of monitoring between governance mechanism and opportunism. We got the following conclusions: (1) The retailer power affects supplier’s choice of governance mechanisms. (2) The use of different types of governance mechanisms from supplier has a significant negative impact on retailer’s opportunism behavior. (3) Communication significantly decreases the frequency of supplier’s use of contract, but does not show a significant effect on decreasing the frequency of the use of norms; monitoring does not show a significant moderating effect between governance mechanism and opportunism behavior. Keywords: Power; Governance Mechanism; Communication; Monitoring; Opportunism.
1 Introduction As more and more supermarkets swarm into retailing market, retailers’ opportunism behaviors emerge more frequently. Opportunism behaviors lead to great channel conflicts. It reveals when a channel member get the interests at the expense of other members’(Wathne and Heide 2000), it makes members’ channels objectives, ideas and actions have a confrontation, leading to reduce the economic reward that channel members get, and also lower the cooperation between the psychological and social satisfaction(Wathne and Geyskens 2005). Channel managers need to adopt appropriate governance mechanism to prevent opportunism (Heide 1994). Channel power influences the relationship between channel partners (Gaski et al. 1985). The rise of power not only increases channel partners’ opportunism (Geyskens et al. 1999), but also has an effect on the use of governance mechanism (kumar et al. 1995). Supplier’s perception of the channel control is a positive liner relationship when the supplier’s power increased, but when the distributor’s power increased, *
This research is supported by the National Natural Science Foundation of China (# 70872117).
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 676–680, 2009. © Springer-Verlag Berlin Heidelberg 2009
How Retailer Power Influence Its Opportunism Governance Mechanisms
677
distributors perception of the channel control from supplier is an inverted U-shaped relationship (Kim and Hsieh 2006). Unfortunately there is no literature make in-depth study on the influence that channel power has on governance mechanisms. We organize the article as follow: we present a conceptual framework in Part 2, which power influences the choice of contracts and norms, and power has a restraining effect on opportunism, and then examine the moderating effect of communication and governance mechanism. Second, we describe the measures and the data from Chinese manufacturing industry; discuss results in Part 3, the conclusion and managerial implications in Part 4.
2 Theory and Hypothesis Through defined the expectations, responsibilities and role requirements of channel members, formal contract can build healthy channel relation and prevent opportunistic risk (Cannon et al. 2000). Channel members exposure on social, political and economic environment agree relation norms, pressure and social agreement will reduce the evasion of responsibility and opportunism, even in the face of threatened lock, norms can effectively circumvent opportunism (Heide and John 1992). The use of relations can guide both partners focus on the win-win strategies and objectives and long-term relationship-oriented (Bello and Gilliland 1997), promote cooperation when conflict and change happened to make trade continue and bilateral (Macneil IR 1978). Formal contract often detail the responsibilities and obligations of both sides, clear the the process of supervision and punishment of disobedience, determine the final benefits or outputs (Laura Poppo and Todd Zenger 2002). In channels, the more power the retailers have, the greater felling of insecurity the suppliers get. In order to obtain long-term economic interests and cooperation relationship, suppliers will more inclined to adopt relation norms which to maintain long-term trading relations and formal contact which to protect the vulnerable side. H1: The power of retailer has a positive effect on the use of norms of supplier H2: The power of retailer has a positive effect on the use of contract of supplier A formal contract lists commitment for long-term cooperation, limits partners‘s opportunism behavior through a clear penalty terms, so detailed contract can effectively reduce the incidence of opportunistic behavior (Zhang et al. 2000). Relation norms are the expectations of two sides, which, at the minimum level, are agreed by relevant decision-makers of both sides, it is a series of hidden rules that both sides shared (Heide and John 1992). As an inherent constraints of a mutual selfrestraint and control mechanisms, relation norms is rule of ethics in the cooperation among enterprises, it can enhance identity of each other and establish trust relationship between enterprises, reduce opportunism behavior (Zhang et al. 2003). H3: supplier’s use of relation norms has a negative effect on retailer’s opportunism H4: supplier’s use of contract has a negative effect on retailer’s opportunism Communication is the process that information sender sent the information to recipient for getting response from the receiver (Bogle et al. 1992). Communication creates a good atmosphere for channel governance. Through the channel information sharing, or even participating in each other’s plans and decision of objectives, will help to regular the expectations and behavior of channel members (Henderson 2002), by communication, channel partners can provide valuable and helpful business
678
Y. Tian and X. Liao
information and trade secrets, and get deep understanding of future possible changes, as well as each other’s strengths and weaknesses, help to improve attitude of relations, make relations strengthened, thus, communication can play a regulation role in the choice of channel governance mechanisms. H5: the level of communication between two channel partners will enhance the degree of using relation norms by supplier H6: the level of communication between two channel partners will weaken the degree of using contract by supplier By monitoring the output performance or channel members’ behavior, monitor can reduce the degree of information asymmetry (Balakrishnan and Koza 1993; Celly and Frazier 1996). Although relation norms consider the needs and acceptable behavior of both sides, these norms may not exist objection in both sides, but perhaps there still be fuzzy on expectations of channel activities, through strengthening monitor, the two sides will constantly adjusted awareness of the roles, get a better performance of relation norms, construct and create a harmonious relationship (Sungmin et al. 2007); it is unrealistic to make plan that foresee all changes in the future (Macneil 1978, 1980), bringing continuous monitor into the process of conducting formal contract can respond to environment changes, thus achieve flexibility of transactions. H9: the degree of monitor will enhance the degree that relation norms reduce the opportunism behavior H10: the degree of monitor will enhance the degree that contract reduce the opportunism behavior Theoretical framework of the study is expressed by Figure 1. H1
Relation norms
H3 H7
Power H2 H5
Formal contract
Opportunism
H4 H8
H6
Monitor
Communication
Fig. 1. Structure
3 Data and Analysis The scale of channel power are designed based on the research of Gaski and Nevin(1985).The scale of governance Mechanisms are designed based on the research of Jap and Ganesan(2000). Relation norms as a second scale includes three dimensions which are information exchange, solidarity and participation, then measure each dimension by scale. The scale of formal contract is designed based on the research of Cannon et al.(2000). The scale of opportunism is designed based on the research of Gundlach, et.al (1995).The scale of communication used is designed based on the research of Sheng, et.al (2006). The scale of monitoring is designed based on the research of Heide, et al. (2007).
How Retailer Power Influence Its Opportunism Governance Mechanisms
679
The questionnaire survey objects come from two parts. We e-mail questionnaire to the appropriate MBA graduates from management school of Sun Yat-sen University; another part are from MBA students studying in the management school of Sun Yatsen University. There are 168 questionnaires for total. 137 questionnaires are valid; the questionnaire response rate was 81.6 percent. The article uses SPSS and Lisrel to analyze reliability and validity of model. all factors’ coefficient α are greater than 0.70, Results of confirmatory factor analysis (NFI, NNFI, CFI are more than 0.94, RMSEA is 0.048) show that model of confirmatory factor analysis and data fit well. The article use lisrel to analyze model fit level not considering adjustment variable. The results in table 1 show that H1, H2, H3, H4 are all supported. Table 1. Standard estimate value
Hypothesis
Relations
Standard estimate value
T Value
Verify
H1
P→N
0.26
2.62
Support
H2
P→C
0.35
3.34
Support
H3
N→O
-0.34
-2.87
Support
H4
C→O
-0.22
-2.55
Support
χ2/df=1.86, NFI=0.93, NNFI=0.95, CFI=0.96, RMSEA=0.080 The article uses “interaction term” in the regression analysis to test the influence of adjustment variables. Table 2 is the analysis results by SPSS 12.0. Table 2. Results of Regression Analysis
Independent variable Power(P) Communication(COM) P*COM Model R2
Dependent variable Relation norms Beta(standard) Delta F Delta R Sig.FChange square 0.428 10.252 0.071 0.002 0.385 1.127 0.074 0.001 -0.178 3.812 0.001 0.758 0.146 -
H5’s rejection and H6’s acceptance indicate while communication cannot increase suppliers’ use of relation norms, it will decrease suppliers’ use of contract. Contract is expensive to create and use comparing to relation norms, since it is costly to draft a complicated contract (Laura Poppo and Todd Zenger 2002). Therefore, it would help reducing transaction costs for both parties to have common understanding of products, contract details, etc. by sufficient communication. The rejection of H7 and H8 proves that monitor has no significant impact between governance mechanism and opportunism behavior. More than 70 percent enterprises
680
Y. Tian and X. Liao
reviewed in this article have fixed property investment more than 50 million yuan, while 60 percent of all enterprises have been in industries for more than 8 years. In the process of development of enterprises, management system will be regulated and institutionalized, as for their cooperation with retailers. Monitoring is an essential and common practice in transactions, which both parties understand and will not antagonize it. As a result, monitoring has relatively minor influence in an institutionalized enterprise, and wouldn’t significantly change the other party’s opportunism behaviors which directly jeopardize relationship between both.
4 Conclusion Our conclusion includes: a) retailers’ power significantly affects suppliers’ choice of governance mechanism; b) when retailers have greater power in transactions, suppliers tend to use contract and relation norms in channel relationship; c) such governance mechanism will significantly reduce opportunism behavior of retailers, regardless adopting contract or relation norms, d) communication reduces suppliers’ usage of contracts considerably, but not that much of relation norms, while monitoring fail to have obvious affect on opportunism behavior. Well-build communication between parties reduces usage of expensive contracts. Among the marketing channels in China, communications definitely affect the enterprises’ choice of opportunism governance mechanism. cooperating enterprises should facilitate communication between them, develop mutual solution of problems and accidents, and enhance communications with feedbacks. Although we haven’t found monitor has significant influence on governance mechanism and opportunism behavior, We cannot infer that monitor is not needed in transactions between organizations.
References 1. Bello, D.C., Gilliland, D.I.: The effect of output controls, process controls, and flexibility on export channel performance. Journal of Marketing 61(1), 22–38 (1997) 2. Cannon, J.P., Achrol, R.S., Gundlach, G.T.: Contracts, norms, and plural form governance. Journal of the Academy of Marketing Science 28(2), 180–194 (2000) 3. Gaski, J.F., Nevin, J.R.: The differential effects of exercised and unexercised power sources in a marketing channel. Journal of Marketing Research 22(2), 130–142 (1985) 4. Heide, J.B., Wathne, K.H., Rokkan, A.I.: Interfirm monitoring, social contracts, and relationship outcomes. Journal of Marketing Research (JMR) 44(3), 425–433 (2007) 5. Sheng, S., Brown, J.R., Nicholson, C.Y., Poppo, L.: Do exchange hazards always foster relational governance? An empirical test of the role of communication. International Journal of Research in Marketing 23(1), 63–77 (2006) 6. Poppo, L., Zenger, T.: Do formal contracts and relational governance function as substitutes or complements? Strategic Management Journal 23(8), 707–725 (2002)
Applications in Oil-Spill Risk in Harbors and Coastal Areas Using Fuzzy Integrated Evaluation Model Chaofeng Shao, Yufen Zhang, Meiting Ju, and Shengguang Zhang College of Environmental Science and Engineering, Nankai University, 300071 Tianjin, China
[email protected],
[email protected],
[email protected],
[email protected] Abstract. Based on statistical analysis for reasons and impacts of oil-spill accident in ports and offshore areas in China, the possibility index system of oil-spill risk and impact assessment index system were set up, respectively. Take Tianjin Port for example, weight of each evaluation factor was determined by analytic hierarchy process (AHP), and the membership functions of evaluation factors were established by down semi-trapezoidal distribution. Then the possibility and impacts of oil spills were evaluated by multi-level fuzzy integrated evaluation model. And the overall effects of oil-spill risk in Tianjin Port were determined by two-dimensional risk matrix. Keywords: Fuzzy comprehensive evaluation model, Oil spilling accident, Risk assessment, harbor, Index system.
1 Introduction Oil spilling is a leading type of sudden environmental pollution accident in harbors and offshore areas. According to the statistics from Ministry of Transport of People’s Republic of China, the number of oil spilling accidents is 2635 in China from 1973 to 2006, including 69 big ones. Total amount of oil spilling accumulate up to 37 000 tons, and direct economic losses exceeds billions of dollars. To control oil spilling accidents effectively, scholars at home and abroad have simulated and forecasted probability and impacts of oil spilling, by Random theory method, Fuzzy math method, and so on[1,2], with the tendency to digitization and simulation[3]. In the paper, multi-level fuzzy integrated evaluation model was used to assess the possibility and the impact of oil spilling in harbors and offshore areas, and two-dimensional risk matrix was used to determine the overall effect of oil spilling risk.
2 To Establish the Fuzzy Integrated Evaluation Model Weight values of evaluation factors are complicated. There are two criteria to determine: the contribution of individual factor and interaction between the factors. In this paper, the overall factors were divided into many relevant subsets according to the actual situation and the purpose of evaluation. Then the weight value of each Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 681–688, 2009. © Springer-Verlag Berlin Heidelberg 2009
682
C. Shao et al.
factor was analyzed and taken integrated evaluation into account. The steps were as follows: 2.1 To Establish the Indexes of Evaluation The index system is a Fuzzy subset composed of the actual value of evaluation factors with the number of n. As follows: U = {u1 u2
u3 K un }
(1)
2.2 To Establish the Evaluation Criterions In the evaluation process, the evaluation set includes all the appropriate levels of evaluation criteria, just as follows: V = {V1 V2 V3 K Vm }
(2)
Where, m is the number of grade divided, and m=5 in this paper. 2.3 To Determine the Weight In fuzzy integrated evaluation, the weight can reflect the status or roles of each factor during decision-making[4]. Weights of the indexes are determined by their contribution to the evaluation level. In this paper, AHP method and Delphi method S = (uij )m× m , as were used to determine the weight. We can obtain the judgment matrix follows:
⎡ u11 u11 L u1m ⎤ ⎢u u22 L u2m ⎥⎥ S = ⎢ 21 ⎢ M M M ⎥ ⎥ ⎢ ⎣um1 um 2 L umm ⎦
(3)
Where, Uij is the importance of factor i to factor j. Based on the judgment matrix, the eigenvector to the maximum eigenvalue can be calculated as the weight distribution A=(a1, a2,…,an). 2.4 To Establish the Membership Function At present many methods are put forward to daily application, such as subjective appraisal method, fuzzy statistical method, definition analysis method, variable models, relative selection method, filter function method and dualistic contrast compositor method, and so on[5,6]. Grading lines of the evaluation factors usually are described by the membership degree. We adopted the down semi-trapezoidal distribution to determine the membership degree. The evaluation results to factor ui composed single-factor fuzzy evaluation subset , as Ri=( ri1 ri2 … rim) . As a result, we built up fuzzy mathematics matrix R = rij
( )
follows:
n× m
Applications in Oil-Spill Risk in Harbors and Coastal Areas
⎡ r11 ⎢r R = ⎢ 21 ⎢M ⎢ ⎣rn1
r12 r22 M rn 2
r1m ⎤ L r 2 m ⎥⎥ L M ⎥ ⎥ L rnm ⎦
683
L
(4)
Where, rij is the membership degree of factor ui to grade j. 2.5 Integrated Evaluation Fuzzy integrated evaluation should take into account all the factors. Integrating fuzzy balance vector A and single-factor fuzzy evaluation matrix R, we can obtain fuzzy comprehensive evaluation vector B. After determining the weight sunset w of each level of the index factors, Multi-factor fuzzy comprehensive evaluation of the indicators could be launched bottom-up on the basis of single-factor evaluation. (1)Initial evaluation For each factor Uij of superior Ui, we suppose that the weight fuzzy mathematics of Ui is subset Ai, and the evaluation matrix of Uij is another subset Ri, we can draw assessment subset Bi, as follow:
L r1m ⎤ L r 2 m ⎥⎥ (i = 1 2 L n) Bi = Ai • Ri = ( w1 w2 (5) L M ⎥ ⎥ L rnm ⎦ Where: Ai is the weight obtained by AHP; Ri is the corresponding membership degree of evaluation factors. (2)Secondly evaluation Supposed that the factor aggregate, which is been divided, U= {U1 U2 … UN} has a weight subset A and a membership degree subset R, the total assessment matrix B is as follows. And the final result is determined by the weighted average method, based on the fuzzy comprehensive evaluation index bj in this paper. ⎡ r11 r12 ⎢r r22 21 L wm ) • ⎢ ⎢M M ⎢ ⎣ rn1 rn 2
⎡ B1 ⎤ ⎡ A1 ⎢B ⎥ ⎢ A B = A • R; R = ⎢ 2 ⎥ = ⎢ 2 ⎢ M ⎥ ⎢ ⎢ ⎥ ⎢ ⎣ B N ⎦ ⎣ AN
•R1 ⎤ • R 2 ⎥⎥ ; v= ⎥ M ⎥ •R N ⎦
N
∑b v j =1 N
j
∑b j =1
j
(6) j
Where, bj is the fuzzy comprehensive evaluation value; vj is the corresponding evaluation criterion value.
3 Environmental Risk Assessment Index System of Oil Spilling 3.1 The Possibility Index System of Oil-Spill Risk Risk assessment of oil spill in harbors and coastal areas should take basic conditions of into account not only oil tankers, but also the weather and water environment in
684
C. Shao et al.
harbors, as well as the nature of oil. Thus the Risk Assessment index system of oil spill is a multi-factor, multi-level complex system, which has many important components and closely correlation among them. When analyzing risks of oil spill in the sea, we should take six aspects into account. These aspects are the status of the harbor and offshore oil tanker, the nature of the oil contained in the oil tankers, harbors and offset environment, statistical analysis of the existing ports and offshore oil tanker accident, checked items by the authorities, and suggestion from experts. Table 1. The evaluation factors of the possibility the and effect of oil spills
Level 1 Possibility of the oil spilling risk(U)
Level 2
Level 3 Age of the oil tanker (U11) Status of daily maintenance( U12) Status of the oil tankers Type of the oil tanker (U13) (U1) Tonnage of the tanker (U14) Certificate (U15) Technical status (U16) Traffic density(U21) Conditions of the channels and Sea environment navigation (U22) (U2) Wind conditions (U23) Visibility (U24) Waves (U25) Psychological factors (U31) Human factors Physiological factors (U32) (U3) Operation skills (U33) Educational level (U34) Responsibility of production safety and security checks (U41) Factors of organization Rules regulations and education of production safety (U42) and management Staffing and training (U43) (U4) Construction and operation of the safety management system (U44) Toxicity of the oil (M11) Hazards of the oil Persistence of the oil (M12) (M1) Flammability of the oil (M13) Over 50t (M21) Amount of oil spilling 5-50t (M22) Results and impacts (M ) 2 of oil spilling Below 5t (M23) (M) Location of the environmental sensitive objects (M31) Environmental sensitivity Population distribution (M32) (M3) Distribution of offshore aquaculture (M33)
Applications in Oil-Spill Risk in Harbors and Coastal Areas
685
According to above factors and relevant evaluation standards, index system for the possibility of oil spill risk is set up, as Tab.1. 3.2 The Index System for Assessing the Effect of Oil Spills in Harbors and Coastal Areas As most oil spill accidents are abrupt, it does not mean that the oil tankers are safe to the area even if we determine the possibility of oil spill is small. The reason is that an accident of oil spill has great impact to surrounding environment. There are two methods to assess the degree of threat after oil spilling: one is application of the monitoring system of oil spill, the other is on-site assessment according to the water pollution emergency plan in harbors. Oil spill monitoring and simulation system on the sea is based on the data from oil-spill locale, such as the amount, property and location of oil spill. On-site assessment would take the following factors into account, the types and amount of oil, property and location of the accident, weather and sea conditions, the impacts of pollutants to the environment, and so on. Integrating the evaluation factors referred in above two methods, the index system for assessing the degree of threat after oil spilling is established, see Tab.1.
4 Environmental Risk Assessments of Oil Spills in Tianjin Port 4.1 Introduction of Tianjin Port Tianjin Port, the largest artificial harbor in China, which lies on the west coast of Bohai Sea, is a major foreign trade port in North China and ligament connected West Asia and North-East Asia. At present, Tianjin Port has got in touch with 180 countries and 400 harbors all over the world, and more than 400 flights per month. Its economic hinterland area exceeds 5,000,000 square kilometers, about 52% of China. In 2007, the throughput of Tianjin Port achieved 300 million tons, and the throughput of containers achieved 71 million TEU. 4.2 Risk Assessment of Oil Spills in Tianjin Port According to above theory and methods, the relevant parameters and evaluation criterions of factors are determined based on Statistical Yearbook of Tianjin Port, 20 experts and field survey. (1)Weights of the evaluation factors According to the mentioned AHP method, weights of the evaluation factors are determine, the weight distribution as follows: AU = (0.40 0.24 0.13 0.13); AU1 = (0.40 0.24 0.13 0.13)
AU 2 = (0.21 0.08 0.32 0.07 0.12 0.21); AU 3 = (0.30 0.25 0.16 0.11 0.18)
AU 4 = (0.22 0.35 0.25 0.18); AM = (0.15 0.60 0.25)
AM 1 = (0.52 0.10 0.38); AM 2 = (0.05 0.10 0.85); AM 3 = (0.62 0.10 0.28)
(2) Membership degree matrix of the evaluation factors Because of different types of evaluation factors are very complicated, the grading standards of most factors are not formatted or defined. In this paper the factors were
686
C. Shao et al.
determined on the basis of the statistics of Tianjin Port in recent years. Statistical and the membership matrix of the third layer evaluation factors were set up, as follows: ⎡0.17 ⎢0.48 ⎢ ⎢0.60 RU 1 = ⎢ ⎢0.13 ⎢0.60 ⎢ ⎣⎢0.60
RU 3
⎡0.33 ⎢0.42 =⎢ ⎢0.30 ⎢ ⎣0.28
⎡ 0.21 RM 1 = ⎢⎢0.24 ⎢⎣0.28 ⎡0.40 RM 3 = ⎢⎢0.08 ⎢⎣0.16
0.35 0.29 0.13 0.06⎤ 0.18 0.18 0.08 0.08⎥⎥ 0.20 0.15 0.05 0 ⎥ ⎥ 0.22 0.25 0.23 0.17⎥ 0.19 0.12 0.05 0.04⎥ ⎥ 0.19 0.12 0.05 0.04⎦⎥
0.38 0.24 0.05 0.40 0.16 0.02 0.30 0.24 0.08 0.34 0.23 0.07 0.15 0.31 0.13 0.15 0.27 0.14 0.18 0.33 0.13
⎡0.20 0.16 0.33 0.15 0.16⎤ ⎢0.37 0.28 0.20 0.07 0.08⎥ ⎥ ⎢ RU 2 = ⎢0.22 0.38 0.25 0.07 0.08⎥ ⎥ ⎢ ⎢0.15 0.32 0.34 0.11 0.08⎥ ⎢⎣0.20 0.11 0.18 0.15 0.36⎥⎦ 0 ⎤ ⎡0.22 0.30 0.30 0.10 0.08⎤ ⎢ 0.31 0.32 0.22 0.07 0.08⎥ ⎥ 0 ⎥ ⎥ RU 4 = ⎢ ⎢0.20 0.15 0.30 0.15 0.20⎥ 0.08⎥ ⎢ ⎥ ⎥ 0.08⎦ ⎣0.27 0.31 0.22 0.08 0.12⎦ 0.20⎤ 0 0 0.20 0.80⎤ ⎡ 0 0.20⎥⎥ RM 2 = ⎢⎢ 0 0.20 0.60 0.20 0 ⎥⎥ ⎢⎣0.80 0.20 0.08⎥⎦ 0 0 0 ⎥⎦
0.18 0.24 0.10 0.08⎤ 0.10 0.24 0.18 0.40⎥⎥ 0.14 0.30 0.16 0.24⎥⎦
Membership matrix of the second layer evaluation factors were set up though initial fuzzy comprehensive evaluation (see Sect. 1.5), as follows: ⎡ 0.47 ⎢ 0.24 RU = ⎢ ⎢ 0.35 ⎢ ⎣ 0.25
0.23 0.18 0.08 0.04 ⎤ ⎡ 0.24 0.16 0.31 0.13 0.15 ⎤ 0.23 0.26 0.11 0.15 ⎥⎥ ⎢ ⎥ = R 0.36 0.21 0.05 0.03⎥ M ⎢ 0.68 0.19 0.06 0.03 0.04 ⎥ ⎥ ⎢⎣ 0.30 0.16 0.26 0.12 0.16 ⎥⎦ 0.29 0.26 0.09 0.11⎦
The possibility of oil spilling and its impact in Tianjin Port were determined by second fuzzy comprehensive evaluation (see Sect. 1.5). B U = {0.32 0.23 0.19 0.08 0.07}
B M = {0.55 0.19 0.17 0.08 0.10}
The value of the possibility and impact were determined by above weighted average method (see Sect. 1.5). 0.32 × 0.5 + 0.23 × 2.0 + 0.19 × 3.5 + 0.08 × 4.5 + 0.07 × 5.0 = 2.47 0.32 + 0.23 + 0.19 + 0.08 + 0.07 0.55 × 0.5 + 0.19 × 2.0 + 0.17 × 3.5 + 0.08 × 4.5 + 0.10 × 5.0 vM = = 1.95 0.55 + 0.19 + 0.17 + 0.08 + 0.10 vU =
(3)Overall assessment of the oil-spill risk in Tianjin Port The possibility of oil spills could be divided into 5 ranks according to above evaluation criterions (see Sect. 1.2):
Applications in Oil-Spill Risk in Harbors and Coastal Areas
687
U = {u1 , u2 , u3 , u4 , u5 } = {1, 2,3, 4,5} Accordingly, the consequence of oil spills was divided into 5 ranks too.
M = {m1 , m2 , m3 , m4 , m5 } = {1, 2,3, 4,5} According to the two-dimensional risk matrix “Risk= possibility × consequences” (see table.2), comprehensive level of oil-spill risk were calculated. Table 2. Risk matrix
Possibility evaluation low possibility lower possibility Average possibility higher possibility high possibility
Negligible 1 2 3 4 5
Consequences Evaluation Normal Medium Big 2 3 4 4 6 8 6 9 12 8 12 16 10 15 20
Serious 5 10 15 20 25
According to table.2, final risk level could be divided into 5 types: very low(1-5), low(6-10), acceptable(11-15), high(16-20), very high(21-25). Through the above assessing process, the comprehensive value of oil-spill risk in Tianjin Port was determined as 4.39, which was acceptable.
5 Conclusions Applications of fuzzy integrated evaluation model can provide valuable information to control risk factors. The key process was to ascertain the index system and evaluation criteria of these factors, and establish the membership function. Through the analysis and calculation, the effect of each risk-factor to the final result would be confirmed. According to the result, oil-spill risk in harbors and coastal areas would be controlled.
Acknowledgments These studies were funded by Science& Technology Department of Tianjin(No. 07ZCGYSF01900). And Tianjin Port (Group) Corporation Limited provided large numbers of valuable information and data.
References 1. Cummins, E.J.: The role of quantitative risk assessment in the management of foodborne biological hazards. International Journal of Risk Assessment and Management 3, 318–330 (2008) 2. Suter II, G.W., Barnthouse, L.W., O’Neill, R.V.: Treatment of risk in environmental impact assessment. Environmental Management 3, 295–303 (1987)
688
C. Shao et al.
3. IPECA Report Series: A Guide to Contingency Planning for oil spills on water (1993) 4. Feibiao, H., Manyin, Z.: Application of Fuzzy Mathematics to Comprehensive Evaluation for Dynamic Change of water Quality. Technology of Water Treatment 1, 76–79 (2008) 5. Zhihong, Z., Yi, Y., Jingnan, S.: Entropy method for determination of weight of evaluating indicators in fuzzy synthetic evaluation for water quality assessment. Journal of Environmental Sciences 5, 1020–1023 (2006) 6. Mingjie, X.: Application of modified fuzzy comprehensive evaluation in water quality assessment. Water Sciences and Engineering Technology 4, 6–9 (2007)
Coexistence Possibility of Biomass Industries Sun Jingchun and Hou Junhu School of Management, Xi’an Jiaotong University, Xi’an 710049, China
[email protected] Abstract. This research aims to shed light on the mechanism of agricultural biomass material competition between the power generation and straw pulp industries and the impact on their coexistence. A two-stage game model is established to analyze including factors such as unit transportation cost, and profit spaces for the firms. The participants in the competition are a biomass supplier, a power plant and a straw pulp plant. From the industrial economics perspective, our analysis shows that raw material competition will bring about low coexistence possibility of the two industries based on agricultural residues in a circular collection area.
1 Introduction The straw pulp and the power generation from agricultural residues in China will develop rapidly due to the increasing demand for power and paper products. Agricultural residues is common material of these industries, and it is feared that the raw material competition between the paper mills and biomass power plant will result in a rapid increase in biomass materials prices, which will pose a potential threat to both industries. Thus the competition for raw materials between these two industries is a valuable topic that relates to the sustainability of the biomass supply, and the further development of the two industries. The present research aims to shed light on the nature and mechanisms of agricultural biomass material competition between power generation industry and straw pulp industry by analyzing agricultural biomass material market, and identify a coexistence possibility for both pulp-paper industry and biomass power generation industry in a constantly changing market.
2 Model Constructions For analytic simplicity, it is assumed that large-scale logistic firms will be committed to collect, transport, and deposit the biomass in a circular area around a central storage facility. Hence, temporarily many complicated and costly collection patterns are not considered. Let a be base price increment for selling biomass, p s procurement price for biomass from farmers, c 0 represents unit operation cost of biomass, ct the unit transportation cost of biomass per kilogram per meter, q s agricultural biomass output in unit area, k ratio of utilized biomass quantity to biomass output, and p be the inverse demand function, then Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 689–691, 2009. © Springer-Verlag Berlin Heidelberg 2009
690
S. Jingchun and H. Junhu
Where
p = a + ps + b Q + c0
(1)
b = 2ct / (3 π kqs )
(2)
b is defined as
The timing of game: Stage 1: The downstream buyers seek individual Nash equilibrium biomass quantity under Cournot game based on the any given price increment for the maximum profits. Stage 2: The upstream supplier seeks Nash equilibrium price increment under Nash equilibrium solution in Stage 1. If
Q j (a), j = A, B represents optimal solution for Q j , j = A, B ,(A denotes
power plant and B paper mill) at base price increment level
a , we have:
Proposition 1 The response functions of two biomass buyers at any a are:
QA ( a ) =
4 ( pe + p p − 2a )(3 pe − 2 p p − a ) 25b 2
(3)
QB (a ) =
4 ( pe + p p − 2a )(3 p p − 2 pe − a ) 25b 2
(4)
3 Coexistence Possibilities of Biomass Industries Both of biomass-fuel power generation plant and paper-pulp plant estimate that they will need the biomass material no more than the half of maximum supply in the circular area. L denotes the sum of their estimation of maximum profits. Then
pe (Q max / 2) + p p (Q max / 2) = L
(5)
It can be simplified as
pe + p p = 2 L / Q max = M Where
(6)
M is defined as M = 2 L / Q max .
Numerical examples of material supply for both firms From formulas (3) and (4) the following figures can be easily derived: Definition 1 Possibility of material supply for both firms at a level is defined as:
Pcoexist (a ) = Prob{QA (a ) ≥ 0 and QB (a) ≥ 0 }
(7)
Coexistence Possibility of Biomass Industries
691
Proposition 2 If
pe + p p = M holds and M 1 , M 2 , M 3 , M 4 are points of interception as in
Fig.1,
X ,Y
denotes the distance between the two points
X , Y in a plane, the
possibility of material supply satisfies:
Pcoexist (a) =
M2, M3
/
M1 , M 4
= 0.2 − 2a / (5M )
(8)
Fig. 1. Fixed profit spaces for the buyers
4 Conclusions Generally speaking, the agricultural biomass is very cheap when it is not utilized in large-scale. The local government expects that the income of farmers will increase from agricultural residues and will encourage the investors to invest in the projects based on biomass. Practically the investors are optimistic about the supply of raw materials in the beginning, but when the projects are built and operated, they find that the price gradually rises and material competition appears. The research shows that only one project is proper for the operation in a circular collection area, and the possibility to run two firms is very small.
How Power Mechanism Influence Channel Bilateral Opportunism* Yu Tian and Shaodan Chen School of Business, Zhongshan University, Guangzhou 510275
[email protected] Abstract. In the background of marketing channel power asymmetry structure, this article discuss the relation between power dominant member’s use of power mechanism and the opportunism behavior of both Power disadvantage member and the power dominant member itself, and test whether distributive fairness perception and procedural fairness perception have moderate effects on this relation. The result shows that, the power dominant member’s use of coercive power will increase the opportunistic tendency of both sides; in contrast, the power dominant member’s use of noncorecive power will inhibit such tendency. Distributive fairness perception and procedural fairness perception negatively moderate the relation between power dominant member’s use of noncorecive power and power disadvantage member’s opportunism. Procedural fairness perception also negatively moderates the relation between power dominant member’s use of coercive power and the other side’s opportunism. Keywords: power asymmetry; bilateral opportunism; power mechanism.
1 Introduction The opportunism in marketing channel means by not fully revealing the relevant information or distorted information, one channel partner benefits own at the cost of sacrificing other partners’ interests (Wathne and Heide 2000). It sets channel partners’ goals against behavior, results in the reduction of economic rewards, the willing to cooperate and social satisfaction fall off (Wathne and Geyskens 2005). Channel member control each other by power and the use of power (Heide 1994). Coercive power results in frequent channel conflict (Frazier and Rody 1991); the use of noncoercive power can form a good atmosphere and enhance mutual understanding so as to promote better cooperation (Moore et al. 2004). The use of coercive and noncorecive power both have impact on channel member’s opportunism (Provan and Skinner 1989). Principal-agent theory contends that information incompleteness and information asymmetry not only exist in the client side, agents may also be in a position of information disadvantage, and client also has opportunism impulse and cause bilateral opportunism (Kim and Wang 1998). Whether power use has an impact on both sides of channel member’s opportunism? *
This research is supported by the National Natural Science Foundation of China (# 70872117).
Y. Shi et al. (Eds.): MCDM 2009, CCIS 35, pp. 692–696, 2009. © Springer-Verlag Berlin Heidelberg 2009
How Power Mechanism Influence Channel Bilateral Opportunism
693
This article plan to: First, propose a conceptual model about the impact of power dominant member’s use of power mechanism on bilateral opportunism, and the moderate effect of fairness perception will be involved; Second, propose research hypothesis; Third, design scales, analyze the data from Chinese manufacturing, and describe the results; At last this article will summarize the findings.
2 Theory and Hypothesis Total power is the summation of the partners’ dependence on each other in marketing channel; Power asymmetry is the difference of the partners’ dependence on each other (Jap and Ganesan 2000). Even if in a power dominant position, a channel member does not tend to use coercive power. As both sides have great motivation to prevent loss, the higher the total power is, the less the penalty behavior will be taken (Kumar et al. 1998).High total power leads to more use of noncorecive power(Lusch and Brown 1996). In the following hypothesizes, we use A presents the power dominant member, B presents the power disadvantage member. H1a: the higher the total power is, A tends to use less coercive power H1b: the higher the total power is, A tends to use more noncoercive power According to conflict spiral theory, since having clear understanding of the power dominant member’s control advantage, the power disadvantage member behaviors as required without being imposed coercive power (Kumar 1998). The level of channel partners’ power is positively related to the use of noncoercive strategy, negatively related to the use of coercive strategy (Frazier and Rody 1991). H2a: the higher the power asymmetry is, A tends to use less coercive power H2b: the higher the power asymmetry is, A tends to use more noncoercive power When use coercive power, considering own benefit without the other side’s feeling, the power dominant member may behavior not in accordance with contract provisions, or force the other side to comply with the requirement in excess of contract. When get familiar with the power dominant member’s policies, the power disadvantage member will try every means to circumvent the policies without being found. When use noncorecive power, the power dominant member shows his respect. It is helpful to foster common business rules and values, enhance the realization of the objectives of consistency, effectively to avoid both sides’ opportunistic behavior. H3a: A’s use of coercive power is positively related with B’s opportunism. H3b: A’s use of noncoercive power is negatively related with B’s opportunism. H4a: A’s use of coercive power is positively related with own opportunism. H4b: A’s use of noncoercive power is negatively related with own opportunism. Distributive fairness is a firm’s comparison of its actual income to those the firm deems it deserves because of its pay. Procedural fairness is the distributor’s perception of the fairness of process and the procedures when doing transactions with suppliers, emphasing on supplier’s behavior, the elements that impact on procedural fairness are mainly under supplier control (Kumar 1995).
694
Y. Tian and S. Chen
In an asymmetric relationship, the power dominant member can get the other side’s trusting and commitments when the other side percepts distributive fairness and procedural fairness (Kumar 2005). Distributive fairness and procedural fairness are impacted by the element under power dominant member’s control. When develop a contract, they can establish policies, procedure and methods for their own interests. To them, fairness is relatively unimportant. H5a: Distributive fairness perception negatively moderates the impacts of A uses: (1) coercive power (2) noncoercive power on B’s opportunism H5b: Procedural fairness perception negatively moderate s the impacts of A uses: (1) coercive power (2) noncoercive power on B’s opportunism H6a: Distributive fairness perception has no moderate effect on the impacts of A uses: (1) coercive power (2) noncoercive power on own opportunism H6b: Procedural fairness perception has no moderate effect on the impacts of A uses: (1) coercive power (2) noncoercive power on own opportunism. According to the analysis above, we establish a conceptual model.
Distributive fairness Procedural fairness H5 H6 Total power
Power asymmetry
H1a H1b
H2a H2b
Power dominant member’s power mechanism Coercive power Noncoercive power
H3a H3b H4a H4b
Power disvantage member’s opportunism Power dominant member’s opportunism
Fig. 1. Conceptual model
3 Data and Analysis We reflect the total power and power asymmetry by the examination of the total dependence and dependence asymmetry (Kumar 1995). The scales of dependence are designed based on the research of Kumar (1995).The scales of the use of coercive and noncoercive power are designed based on the research of Skinner et al (1992).The scales of opportunism are designed mainly based on the relevant research of Wuyts and Geyskens (2005).The scales of distributive fairness perception and procedural fairness perception is mainly based on the research of Kumar (1995). This survey continued from June to August 2008 by e-mail and mailing a total of 500 questionnaires. At last we receive 265 valid questionnaires. As the research background is the channel power asymmetric structure, we select 148 questionnaires represent the power dominant members. We use spss16.0 to test reliability, all factors’ coefficient α are greater than 0.70, so the scales of this study have good reliability. We use Amos7.0 for the confirmatory
How Power Mechanism Influence Channel Bilateral Opportunism
695
factor analysis in 3 separate models. Excluding three items (standard factor loadings are too small), the other factors loadings are all statistically significant and greater than 0.5. The result indicates the model has good convergence. The discriminant validity analysis also shows each construct has good discriminant validity. We adopt structural equation path analysis and use AMOS7.0 to test model without considering the moderate variables. As CMIN/DF=1.422, CFI=0.919, NNFI=0.909, RMSEA=0.054, the model has a good fit. Overall, H1b, H2a, H2b, H3a, H3b, H4a, 0.153 (1.508)
TP
0.219 (2.273)
CP
-0.298 (-2.073) 0.297 (2.811)
PA
0.354 (2.366)
NP
-0.243 (-2.525)
DO
0.257 (2.594)
-0.212 (-2.210)
AO
Fig. 2. Path coefficients chart without moderate variables1 Table 1. Analysis of the moderate effect2
model
Path coefficient T model fit index of interaction value Moderate effect of distributive fairness perception CP
DO
-0.131
-1.529
NP
DO
-0.231***
-2.644
CP
AO
-0.077
-0.922
NP
AO
-0.080
-0.935
CMIN/DF=1.597,CFI=0955 NNFI=0.945, RMSEA=.064 CMIN/DF=1.859,CFI=0.937 NNFI=.922, RMSEA=.076 CMIN/DF=1.675,CFI=0.934 NNFI=0.923, RMSEA=0.068 CMIN/DF=1.877,CFI=0.914 NNFI=0.901, RMSEA=0.077
Moderate effect of procedural fairness perception -0.155* -1.816 CMIN/DF=1.405,CFI=0.955 CP DO
1
NP
DO
-0.235***
-2.716
CP
AO
-0.031
-0.371
NP
AO
-0.054
-0.632
NNFI=0.946, RMSEA=0.052 CMIN/DF=1.746,CFI=0.927 NNFI=0.906, RMSEA=0.071 CMIN/DF=1.565,CFI=0.925 NNFI=0.914, RMSEA=0.066 CMIN/DF=1.754,CFI=0.911 NNFI=0.890, RMSEA=0.072
Support hypothesis no yes
yes yes yes yes yes yes
TP: total power PA: power asymmetry; CP: power dominant member’s use of coercive power; NP: power dominant member’s use of coercive power DO: power dominant member’s opportunism; AO: power disadvantage member’s opportunism The data in Fig.2 is the standard path coefficient, and the data in the brackets is T value. 2 In table 1, *presents P0 which restrict the minimum amount that are to be purchased, j=1, 2, , n. (3) Round-lots constraints are xj = yj ej, yj N which restrict the smallest volumes ej that can be purchased for each securities, j=1, 2, , n.
…
∈
…
The constrained multiobjective portfolio selection model is formulated as
{
n n n max E (∑ j =1 rj x j ), −Var (∑ j =1 rj x j ), ∑ j =1 PM ( L% j ) x j
⎧∑ Sign( x j ) ≤ K , ⎪ j =1 ⎪⎪ x j ≥ d j , if x j > 0, s.t. ⎨ ⎪x j = y j ⋅ e j , y j ∈ N , ⎪ n ⎪⎩∑ j =1 x j = 1, x j ≥ 0, j = 1, 2,L , n.
}
n
(1)
For simplicity, the feasible region of (1) is set as X. Because of the existence of complex realistic constraints, it is not easy to solve the above multiobjective mixed integer programming (1) by using traditional algorithms.
700
J. Li
3 Compromise Approach-Based Genetic Algorithm To overcome the difficulty of evaluation a large set of efficient solutions and selection of the best one on non-dominated surface, Gen and Cheng [14] proposed the compromise approach-based genetic algorithm to obtain a compromised solution of multiobjective programming problem. Here, we design a compromise approach-based genetic algorithm to solve problem (1). 3.1 Compromise Approach Assume that the ideal point of problem (1) is Z*=(R*,V*,L*) and the anti-ideal point is Z*=(R*, V*, L*). For each feasible solution x X, the regret function r(x, p) (p≥1) is defined by the following weighted Lp-norm:
∈
r ( x, p ) = Z ( x ) − Z
* p, w
p n n ⎛ E (∑ j =1 rj x j ) − E * Var (∑ j =1 rj x j ) − V * p = ⎜⎜ w1p + w 2 E* − E * V* − V * ⎜ ⎝
p
1/ p
p n ⎞ PM (∑ j =1 PM ( L% j ) x j ) − L* ⎟ + w3p * ⎟ L* − L ⎟ ⎠
,
where weights w1, w2 and w3 are assigned to objectives to emphasize different degrees of importance. R*, V* and L* can be obtained by solving the following three mathematical programming problems, respectively. max x∈X E (∑ j =1 rj x j )
(2)
min x∈X Var (∑ j =1 rj x j )
(3)
n
n
max x∈X
∑
n j =1
PM ( L% j ) x j
(4)
Similarly, R*, V* and L* can be obtained by solving the following three mathematical programming problems, respectively. min x∈X E (∑ j =1 rj x j )
(5)
max x∈X Var (∑ j =1 rj x j )
(6)
n
n
min x∈X
∑
n j =1
PM ( L% j ) x j
(7)
Actually, it is not an easy task to find the optimal solutions of the linear or nonlinear hybrid integer programming models (2)-(7). The concept of a proxy ideal point proposed by Gen and Cheng in [12] is adopted here to replace the actual ideal point to overcome the above difficulties. The proxy ideal points are convinced to gradually approximate to the real ideal points along with evolutionary progress. We use the proxy ideal point Zp*=(R p*, V p*, L p*) and the proxy anti-ideal point Zp*=(R p*, V p*, L p*) to replace the actual ideal point Z*=(R*, V*, L*) and anti-ideal point Z*=(R*, V*, L*), respectively.
Compromise Approach-Based Genetic Algorithm
701
Let P denote the set of current population. The proxy ideal point Zp*=(R p*, V p*, L p*) is calculated by R p* = max x∈P E (∑ j =1 rj x j ), n
V p* = min x∈P Var (∑ j =1 rj x j ), n
n Lp* = max x∈P ∑ j =1 PM ( L% j ) x j .
The proxy anti-ideal point Zp*=(R p*, V p*, L p*) is calculated by R p* = min x∈P E (∑ j =1 rj x j ), n
V p* = max x∈P Var (∑ j =1 rj x j ), n
L p* = min x∈P ∑ j =1 PM ( L% j ) x j . n
3.2 Genetic Algorithm The steps of the compromise approach-based genetic algorithm to solve (1) are listed as follows:
…
(1) Representation structure: A solution x=(x1,x2, ,xn) of problem (1) is represented by the chromosome V=(v1,v2, ,vn), where the genes v1,v2, ,vn are restricted in the interval [0,1]. (2) Handling the constraints: Randomly generate a point from the hypercube [0,1]n and check its feasibility. If it satisfies the constraints of problem (1), i.e. V X, then it will be accepted as a chromosome. Otherwise the repair mechanisms proposed in [13] can be used to guarantee it to satisfy the constraints of (1).
…
…
∈
①. Keep the K-largest values of x and set all other x to zero. ②. Perform the following normalization technique x = x ∑ j
j
'
j
j
n j =1
x j to ensure that the
random point satisfies ∑ n x j = 1 . j =1
③
. In order to satisfy buy-in constraints, we set all xj below their given buy-in thresholds dj to zero after applying the maximal number of assets repair mechanism and normalization technique. Perform normalization technique again. . To meet round-lot constraints the algorithm rounds xj to the next round-lot level, xj’=xj-(xj mod ej), after cardinality repair, buy-in repair and normalization are applied. The remainder of the rounding process ∑ n ( x j mod e j ) is spent in quantities of
④
j =1
ej on those xj which had the biggest values for xj mod ej until all of the remainder is spent.
…
(3) Initializing process: We can make initial feasible chromosomes V1, V2, VNpop by repeating the above process Npop times, where Npop is the number of chromosomes. (4) Evaluation function: After the regret value of each chromosome V is calculated, the fitness function of each chromosome is computed by
702
J. Li
eval ( x ) =
r max − r ( x, p ) + ε , r max − r min + ε
where ε ∈ (0,1) is a random number, rmax and rmin denote the maximal regret value and minimum regret value in the current generation, respectively. (5) Selection process: The selection process is based on spinning the roulette wheel Npop times. Each time a single chromosome for a new population is selected in the following way: Calculate the cumulative probability qi for each chromosome xi, q0 = 0, qi = ∑ j =1 eval ( x j ), i = 1, 2,L , N pop . i
Generate a random number r in [0, qNpop], and select the ith chromosome xi such that qi-1