Lecture Notes in Artificial Intelligence Edited by R. Goebel, J. Siekmann, and W. Wahlster
Subseries of Lecture Notes in Computer Science
6077
Emilio Corchado Manuel Graña Romay Alexandre Manhaes Savio (Eds.)
Hybrid Artificial Intelligence Systems 5th International Conference, HAIS 2010 San Sebastián, Spain, June 23-25, 2010 Proceedings, Part II
13
Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany Volume Editors Emilio Corchado Universidad de Salamanca, Spain E-mail:
[email protected] Manuel Graña Romay Facultad de informatica UPV/EHU San Sebastian, Spain E-mail:
[email protected] Alexandre Manhaes Savio Facultad de informatica UPV/EHU San Sebastian, Spain E-mail:
[email protected] Library of Congress Control Number: 2010928917
CR Subject Classification (1998): I.2, H.3, F.1, H.4, I.4, I.5 LNCS Sublibrary: SL 7 – Artificial Intelligence ISSN ISBN-10 ISBN-13
0302-9743 3-642-13802-0 Springer Berlin Heidelberg New York 978-3-642-13802-7 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2010 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper 06/3180
Preface
The 5th International Conference on Hybrid Artificial Intelligence Systems (HAIS 2010) has become a unique, established and broad interdisciplinary forum for researchers and practitioners who are involved in developing and applying symbolic and sub-symbolic techniques aimed at the construction of highly robust and reliable problem-solving techniques, and bringing the most relevant achievements in this field. Overcoming the rigid encasing imposed by the arising orthodoxy in the field of artificial intelligence, which has led to the partition of researchers into so-called areas or fields, interest in hybrid intelligent systems is growing because they give freedom to design innovative solutions to the ever-increasing complexities of real-world problems. Noise and uncertainty call for probabilistic (often Bayesian) methods, while the huge amount of data in some cases asks for fast heuristic (in the sense of suboptimal and ad-hoc) algorithms able to give answers in acceptable time frames. High dimensionality demands linear and non-linear dimensionality reduction and feature extraction algorithms, while the imprecision and vagueness call for fuzzy reasoning and linguistic variable formalization. Nothing impedes real-life problems to mix difficulties, presenting huge quantities of noisy, vague and high-dimensional data; therefore, the design of solutions must be able to resort to any tool of the trade to attack the problem. Combining diverse paradigms poses challenging problems of computational and methodological interfacing of several previously incompatible approaches. This is, thus, the setting of HAIS conference series, and its increasing success is the proof of the vitality of this exciting field. This volume of Lecture Notes on Artificial Intelligence (LNAI) includes accepted papers presented at HAIS 2010 held in the framework of the prestigious “Cursos de Verano of the Universidad del Pais Vasco” at the beautiful venue of Palacio de Miramar, San Sebastián, Spain, in June 2010. Since its first edition in Brazil in 2006, HAIS has become an important forum for researchers working on fundamental and theoretical aspects of hybrid artificial intelligence systems based on the use of agents and multi-agent systems, bioinformatics and bio-inspired models, fuzzy systems, artificial vision, artificial neural networks, optimization models and alike. HAIS 2010 received 269 technical submissions. After a rigorous peer-review process, the International Program Committee selected 133 papers which are published in these conference proceedings. In this edition emphasis was put on the organization of special sessions. Fourteen special sessions, containing 84 papers, were organized on the following topics: • • • • •
Real-World HAIS Applications and Data Uncertainty Computational Intelligence for Recommender Systems Signal Processing and Biomedical Applications Methods of Classifiers Fusion Knowledge Extraction Based on Evolutionary Learning
VI
Preface
• • • • • • • •
Systems, Man, and Cybernetics by HAIS Workshop Hybrid Intelligent Systems on Logistics Hybrid Reasoning and Coordination Methods on Multi-Agent Systems HAIS for Computer Security Hybrid and Intelligent Techniques on Multimedia Hybrid ANNs: Models, Algorithms and Data Hybrid Artificial Intelligence Systems Based on Lattice Theory Information Fusion: Frameworks and Architectures
The selection of papers was extremely rigorous in order to maintain the high quality of the conference, and we would like to thank the Program Committee for their hard work in the reviewing process. This process is very important for the creation of a conference of high standard, and the HAIS conference would not exist without their help. The large number of submissions is certainly not only testimony to the vitality and attractiveness of the field but an indicator of the interest in the HAIS conferences themselves. As a follow-up of the conference, we anticipate further publication of selected papers in special issues scheduled for the following journals: • • • • •
Information Science, Elsevier Neurocomputing, Elsevier Journal of Mathematical Imaging and Vision, Springer Information Fusion, Elsevier Logic Journal of the IPL, Oxford Journals
HAIS 2010 enjoyed outstanding keynote speeches by distinguished guest speakers: • • • • • •
Gerhard Ritter, University of Florida (USA) Mihai Datcu, Paris Institute of Technology, Telecom Paris (France) Marios Polycarpou, University of Cyprus (Cyprus) Ali-Akbar Ghorbani, University of New Brunswick (Canada) James Llinas, Universidad Carlos III de Madrid (Spain) Éloi Bossé, Defence Research and Development Canada (DRDC Valcartier) (Canada)
We would like to fully acknowledge support from the GICAP Group of the University of Burgos, the BISISTE Group from the University of Salamanca, the GIC (www.ehu.es/ccwintco), Vicerrectorado de Investigación and the Cursos de Verano of the Universidad del Pais Vasco, the Departamento de Educación, Ciencia y Universidades of the Gobierno Vasco, Vicomtech, and the Ministerio de Ciencia e Investigación. The IEEE Systems, Man & Cybernetics Society, through its Spanish chapter, and the IEEE-Spanish Section also supported this event. We also want to extend our warm gratitude to all the Special Session Chairs for their continuing support to the HAIS series of conferences.
VII
Preface
We wish to thank Alfred Hoffman and Anna Kramer from Springer for their help and collaboration during this demanding publication project. The local organizing team (Alexandre Manhaes Savio, Ramón Moreno, Maite García Sebastian, Elsa Fernandez, Darya Chyzyk, Miguel Angel Veganzones, Ivan Villaverde) did a superb job. Without their enthusiastic support the whole conference burden would have crushed our frail shoulders.
June 2010
Emilio Corchado Manuel Graña
Organization
Honorary Chair Carolina Blasco María Isabel Celáa Diéguez Marie Cottrell Daniel Yeung
Director of Telecommunication, Regional Goverment of Castilla y León (Spain) Consejera de Educación del Gobierno Vasco (Spain) Institute SAMOS-MATISSE, Universite Paris 1 (France) IEEE SMCS President (China)
General Chairs Emilio Corchado Manuel Graña
University of Salamanca (Spain) University of the Basque Country (Spain)
International Advisory Committee Ajith Abraham Carolina Blasco Pedro M. Caballero Andre de Carvalho Juan M. Corchado José R. Dorronsoro Mark A. Girolami Petro Gopych Francisco Herrera César Hervás-Martínez Tom Heskes Lakhmi Jain Samuel Kaski Daniel A. Keim Isidro Laso Witold Pedrycz Xin Yao Hujun Yin Michal Wozniak
Norwegian University of Science and Technology (Norway) Director of Telecommunication, Regional Goverment of Castilla y León (Spain) CARTIF (Spain) University of São Paulo (Brazil) University of Salamanca (Spain) Autonomous University of Madrid (Spain) University of Glasgow (UK) Universal Power Systems USA-Ukraine LLC (Ukraine) University of Granada (Spain) University of Córdoba (Spain) Radboud University Nijmegen (The Netherlands) University of South Australia (Australia) Helsinki University of Technology (Finland) Computer Science Institute, University of Konstanz (Germany) D.G. Information Society and Media (European Commission) University of Alberta (Canada) University of Birmingham (UK) University of Manchester (UK) Wroclaw University of Technology (Poland)
X
Organization
Publicity Co-chairs Emilio Corchado Manuel Graña
University of Salamanca (Spain) University of the Basque Country (Spain)
Program Committee Manuel Graña Emilio Corchado Agnar Aamodt Jesús Alcalá-Fernández Rafael Alcalá José Luis Álvarez Davide Anguita Bruno Apolloni Antonio Aráuzo-Azofra Estefania Argente Fidel Aznar Jaume Bacardit Antonio Bahamonde Javier Bajo John Beasley Bruno Baruque Joé Manuel Benítez Ester Bernadó Richard Blake Juan Botía Prof Vicente Botti Robert Burduk José Ramón Cano Cristóbal José Carmona Blanca Cases Oscar Castillo Paula María Castro Castro Jonathan Chan Richard Chbeir Enhong Chen Camelia Chira Sung-Bae Cho Darya Chyzhyk Juan Manuel Corchado Emilio Corchado
University of the Basque Country (Spain) (PC Co-chair) University of Salamanca (Spain) (PC Co-chair) Norwegian University of Science and Technology (Norway) University of Granada (Spain) University of Granada (Spain) University of Huelva (Spain) University of Genoa (Italy) Università degli Studi di Milano (Italy) University of Córdoba (Spain) University of Valencia (Spain) University of Alicante (Spain) University of Nottingham (UK) University of Oviedo (Spain) Universidad Pontifícia de Salamanca (Spain) Brunel University (UK) University of Burgos (Spain) University of Granada (Spain) Universitat Ramon Lull (Spain) Norwegian University of Science and Technology University of Murcia (Spain) Universidad Politécnica de Valencia (Spain) Wroclaw University of Technology (Poland) University of Jaén (Spain) University of Jaén (Spain) University of the Basque Country (Spain) Tijuana Institute of Technology (Mexico) Universidade da Coruña (Spain) King Mongkut's University of Technology Thonburi (Thailand) Bourgogne University (France) University of Science and Technology of China (China) University of Babes-Bolyai (Romania) Yonsei University (Korea) University of the Basque Country (Spain) University of Salamanca (Spain) University of Salamanca (Spain)
Organization
Rafael Corchuelo Guiomar Corral Raquel Cortina Parajon Carlos Cotta José Alfredo F. Costa Leticia Curiel Alfredo Cuzzocrea Keshav Dahal Theodoros Damoulas Ernesto Damiani Bernard De Baets Enrique de la Cal Javier de Lope Asiain Marcilio de Souto María José del Jesús Ricardo del Olmo Joaquín Derrac Nicola Di Mauro António Dourado Richard Duro Susana Irene Díaz José Dorronsoro Pietro Ducange Talbi El-Ghazali Aboul Ella Hassanien Marc Esteva Juan José Flores Alberto Fernández Alberto Fernández Elías Fernández-Combarro Álvarez Elsa Fernández Nuno Ferreira Richard Freeman Rubén Fuentes Giorgio Fumera Bogdan Gabrys João Gama Matjaz Gams Jun Gao TOM Heskes Isaías García José García Salvador García Neveen Ghali
University of Seville (Spain) University Ramon Lull (Spain) University of Oviedo (Spain) University of Málaga (Spain) Universidade Federal do Rio Grande do Norte (Brazil) University of Burgos (Spain) University of Calabria (Italy) University of Bradford (UK) Cornell University (UK) University of Milan (Italy) Ghent University (Belgium) University of Oviedo (Spain) Universidad Politécnica de Madrid (Spain) Universidade Federal do Rio Grande do Norte (Brazil) University of Jaén (Spain) University of Burgos (Spain) University of Granada (Spain) University of Bari (Italy) University of Coimbra (Portugal) University of Coruña (Spain) University of Oviedo (Spain) Universidad Autónoma de Madrid (Spain) University of Pisa (Italy) University of Lille (France) University of Cairo (Egypt) Artificial Intelligence Research Institute (Spain) University of Michoacana (Mexico) Universidad Rey Juan Carlos (Spain) University of Granada (Spain) University of Oviedo (Spain) University of the Basque Country (Spain) Instituto Politécnico de Coimbra (Portugal) Capgemini (Spain) Universidad Complutense de Madrid (Spain) University of Cagliari (Italy) Bournemouth University (UK) University of Porto (Portugal) Jozef Stefan Institute Ljubljana (Slovenia) Hefei University of Technology (China) Radboud University Nijmegen (The Netherlands) University of León (Spain) University of Alicante (Spain) University of Jaén (Spain) Azhar University (Egypt)
XI
XII
Organization
Adriana Giret Jorge Gómez Pedro González Petro Gopych Juan Manuel Górriz Maite García-Sebastián Manuel Graña Maciej Grzenda Arkadiusz Grzybowski Jerzy Grzymala-Busse Anne Håkansson Saman Halgamuge José Alberto Hernández Carmen Hernández Francisco Herrera Álvaro Herrero Sean Holden Vasant Honavar Vicente Julián Konrad Jackowski Yaochu Jin Ivan Jordanov Ulf Johansson Juha Karhunen Frank Klawonn Andreas König Mario Köppen Rudolf Kruse Bernadetta Kwintiana Dario Landa-Silva Soo-Young Lee Lenka Lhotská Hailin Liu Otoniel López Karmele López Teresa Ludermir Julián Luengo Wenjian Luo Núria Macià Kurosh Madani Ana Maria Madureira
Universidad Politécnica de Valencia (Spain) Universidad Complutense de Madrid (Spain) University of Jaén (Spain) Universal Power Systems USA-Ukraine LLC (Ukraine) University of Granada (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) Warsaw University of Technology (Poland) Wroclaw University of Technology (Poland) University of Kansas (USA) Stockholm University (Sweden) The University of Melbourne (Australia) Universidad Autónoma del Estado de Morelos (Mexico) University of the Basque Country (Spain) University of Granada (Spain) University of Burgos (Spain) University of Cambridge (UK) Iowa State University (USA) Universidad Politécnica de Valencia (Spain) Wroclaw University of Technology (Poland) Honda Research Institute Europe (Germany) University of Portsmouth (UK) University of Borås (Sweden) Helsinki University of Technology (Finland) University of Applied Sciences Braunschweig/Wolfenbuettel (Germany) University of Kaiserslautern (Germany) Kyushu Institute of Technology (Japan) Otto-von-Guericke-Universität Magdeburg (Germany) Universität Stuttgart (Germany) University of Nottingham (UK) Brain Science Research Center (Korea) Czech Technical University in Prague (Czech Republic) Guangdong University of Technology (China) Universidad Autónoma de Madrid (Spain) University of the Basque Country (Spain) Universidade Federal de Pernambuco (Brazil) University of Granada (Spain) University of Science and Technology of China (China) Universitat Ramon Llull (Spain) University of Paris-Est Creteil (France) Instituto Politécnico do Porto (Portugal)
Organization
Roque Marin Yannis Marinakis José Fco. Martínez-Trinidad José Luis Martínez Jacinto Mata Giancarlo Mauri David Meehan Gerardo M. Méndez Abdel-Badeeh M. Salem Masoud Mohammadian José Manuel Molina Claudio Moraga Marco Mora Ramón Moreno Susana Nascimento Martí Navarro Yusuke Nojima Alberto Ochoa Albert Orriols Rubé Ortiz Vasile Palade Stephan Pareigis Witold Pedrycz Elzbieta Pekalska Carlos Pereira Antonio Peregrín Lina Petrakieva Gloria Phillips-Wren Han Pingchou Camelia Pintea Julio Ponce Khaled Ragab José Ranilla Javier Ramírez Romain Raveaux Carlos Redondo Raquel Redondo Bernadete Ribeiro Ramón Rizo Peter Rockett Adolfo Rodríguez Rosa M. Rodríguez Maraña Katya Rodriguez-Vázquez Fabrice Rossi António Ruano Ozgur Koray Sahingoz
XIII
University of Murcia (Spain) Technical University of Crete (Grece) INAOE (Mexico) University of Castilla - La Mancha (Spain) University of Huelva (Spain) University of Milano-Bicocca (Italy) Dublin Institute of Technology (Ireland) Instituto Tecnológico de Nuevo León (Mexico) Ain Shams University (Egypt) University of Canberra (Australia) University Carlos III of Madrid (Spain) European Centre for Soft Computing (Spain) Universidad Católica del Maule (Spain) University of the Basque Country (Spain) Universidade Nova de Lisboa (Portugal) Universidad Politécnica de Valencia (Spain) Osaka Prefecture University (Japan) Juarez City University/CIATEC (Mexico) University Ramon LLull (Spain) Universidad Rey Juan Carlos (Spain) Oxford University (USA) Hamburg University of Applied Sciences (Germany) University of Alberta (Canada) University of Manchester (UK) Universidade de Coimbra (Portugal) University of Huelva (Spain) Glasgow Caledonian University (UK) Loyola College in Maryland (USA) Peking University (China) University of Babes-Bolyai (Romania) Universidad Autónoma de Aguascalientes (Mexico) King Faisal University (Saudi Arabia) University of Oviedo (Spain) University of Granada (Spain) La Rochelle University (France) University of León (Spain) University of Burgos (Spain) University of Coimbra (Portugal) University of Alicante (Spain) University of Sheffield (UK) University of León (Spain) University of León (Spain) Universidad Nacional Autónoma de México (Mexico) TELECOM ParisTech (France) University of Algarve (Portugal) Turkish Air Force Academy (Turkey)
XIV
Organization
Wei-Chiang Samuelson Hong Luciano Sánchez José Santamaría Alexandre Savio Mrs. Fatima Sayuri Quezada Gerald Schaefer Robert Schaefer Javier Sedano Leila Shafti Dragan Simic Konstantinos Sirlantzis Dominik Slezak Cecilia Sönströd Ying Tan Ke Tang Nikos Thomaidis Alicia Troncoso Eiji Uchino Roberto Uribeetxeberria José Valls Miguel Ángel Veganzones Sebastian Ventura José Luis Verdegay José Ramón Villar José Ramón Cano Krzysztof Walkowiak Guoyin Wang Michal Wozniak Zhuoming Xu Ronald Yager Hujun Yin Constantin Zopounidis Huiyu Zhou Rodolfo Zunino Urko Zurutuza
Oriental Institute of Technology (Taiwan) University of Oviedo (Spain) University of Jaén (Spain) University of the Basque Country (Spain) Universidad Autónoma de Aguascalientes (Mexico) Aston University (UK) AGH University of Science and Technology (Poland) University of Burgos (Spain) Universidad Autónoma de Madrid (Spain) Novi Sad Fair (Serbia) University of Kent (UK) University of Regina (Canada) University of Borås (Sweden) Peking University (China) University of Science and Technology of China (China) University of the Aegean (Greece) Universidad Pablo de Olavide de Sevilla (Spain) Yamaguchi University (Japan) Mondragon University (Spain) University Carlos III of Madrid (Spain) University of the Basque Country (Spain) Universidad de Córdoba (Spain) University of Granada (Spain) University of Oviedo (Spain) University of Jaén (Spain) Wroclaw University of Technology (Poland) Chongqing University of Posts and Telecommunications (China) Wroclaw University of Technology (Poland) Hohai University (China) Iona College (US) The University of Manchester (UK) Technical University of Crete (Greece) Brunel University (UK) University of Genoa (Italy) Mondragon University (Spain)
Special Session Committees Real-World HAIS Applications and Data Uncertainty José Ramón Villar André Carvalho Camelia Pintea
University of Oviedo (Spain) University of São Paulo (Brazil) University of Babes-Bolyai (Romania)
Organization
Eduardo Raúl Hruschka Oscar Ibañez Paula Mello Javier Sedano Adolfo Rodríguez Camelia Chira José Ramón Villar Luciano Sánchez Luis Oliveira María del Rosario Suárez Carmen Vidaurre Enrique de la Cal Gerardo M. Méndez Ana Palacios Luis Junco
University of São Paulo (Brazil) European Centre for Soft Computing (Spain) University of Bologna (Italy) University of Burgos (Spain) Universidad de León (Spain) University of Babes-Bolyai (Romania) University of Oviedo (Spain) University of Oviedo (Spain) University of Oviedo (Spain) University of Oviedo (Spain) Technical University of Berlin (Germany) University of Oviedo (Spain) Instituto Tecnológico de Nuevo León (Mexico) University of Oviedo (Spain) University of Oviedo (Spain)
Signal Processing and Biomedical Applications Juan Manuel Górriz Carlos G. Putonet Elmar W. Lang Javier Ramírez Juan Manuel Gorriz Manuel Graña Maite García-Sebastián Alexandre Savio Ana María Pefeito Tome Elsa Fernández Isabel Barbancho Diego Pablo Ruiz Padillo Fermín Segovia Román Ingo Keck Manuel Canton Miriam Lopez Perez Rosa Chaves Rodríguez Roberto Hornero Andres Ortiz Diego Salas-Gonzalez Ignacio Álvarez Ignacio Turias Jose Antonio Piedra Maria del Carmen Carrión Roberto Hornero Ruben Martín
University of Granada (Spain) University of Granada (Spain) University of Regensburg (Germany) University of Granada (Spain) University of Granada (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) University of Aveiro (Portugal) University of the Basque Country (Spain) University of Málaga (Spain) University of Granada (Spain) University of Granada (Spain) University of Regensburg (Spain) University of Almeria (Spain) University of Granada University of Granada (Spain) University of Valladolid (Spain) University of Malaga (Spain) University of Granada (Spain) University of Granada (Spain) University of Cadiz (Spain) University of Almeria (Spain) University of Granada (Spain) University of Valladolid University of Seville (Spain)
XV
XVI
Organization
Methods of Classifiers Fusion Michal Wozniak Álvaro Herrero Bogdan Trawinski Giorgio Fumera José Alfredo F. Costa Konrad Jackowski Konstantinos Sirlantzis Przemyslaw Kazienko Bruno Baruque Jerzy Stefanowski Robert Burduk Michal Wozniak Emilio Corchado Igor T. Podolak Vaclav Snasel Elzbieta Pekalska Bogdan Gabrys
Wroclaw University of Technology (Poland) University of Burgos (Spain) Wroclaw University of Technology (Poland) University of Cagliari (Italy) Universidade Federal do Rio Grande do Norte (Brazil) Wroclaw University of Technology (Poland) University of Kent (UK) Wroclaw University of Technology (Poland) University of Burgos (Spain) Poznan University of Technology (Poland) Wroclaw University of Technology (Poland) Wroclaw University of Technology (Poland) University of Salamanca (Spain) Jagiellonian University (Poland) VSB-Technical University of Ostrava (Czech Republic) University of Manchester (UK) Bournemouth University (UK)
Knowledge Extraction Based on Evolutionary Learning Sebastián Ventura Amelia Zafra Eva Lucrecia Gibaja Jesus Alcala-Fernández Salvador García Mykola Pechenizkiy Pedro González Antonio Peregrin Rafael Alcalá Cristóbal Romero Ekaterina Vasileya
University of Córdoba (Spain) University of Córdoba (Spain) University of Córdoba (Spain) University of Granada (Spain) University of Jaén (Spain) Technical University of Eindhoven (The Netherlands) University of Jaén (Spain) University of Huelva (Spain) University of Granada (Spain) University of Córdoba (Spain) Technical University of Eindhoven (The Netherlands)
Systems, Man, and Cybernetics by HAIS Workshop Emilio Corchado Juan M. Corchado Álvaro Herrero Bruno Baruque Javier Sedano Juan Pavón Manuel Graña Ramón Rizo
University of Salamanca (Spain) University of Salamanca (Spain) University of Burgos (Spain) University of Burgos (Spain) University of Burgos (Spain) University Complutense Madrid (Spain) University of the Basque Country (Spain) University of Alicante (Spain)
Organization
Richard Duro Sebastian Ventura Vicente Botti José Manuel Molina Lourdes Sáiz Barcena Francisco Herrera Leticia Curiel César Hervás Sara Rodríguez
University of A Coruña (Spain) University of Córdoba (Spain) Polytechnical University of Valencia (Spain) University Carlos III of Madrid (Spain) University of Burgos (Spain) University of Granada (Spain) University of Burgos (Spain) Univesity of Córdoba (Spain) University of León (Spain)
Hybrid Intelligent Systems on Logistics Camelia Chira Alberto Ochoa Zezzati Arturo Hernández Katya Rodríguez Fabricio Olivetti Gloria Cerasela Crisan Anca Gog Camelia-M. Pintea Petrica Pop Barna Iantovics
Babes-Bolyai University (Romania) Juarez City University (Mexico) CIMAT (Mexico) UNAM (Mexico) University of Campinas (Brazil) University of Bacau (Romania) Babes-Bolyai University (Romania) Babes-Bolyai University (Romania) North University Baia-Mare (Romania) Petru Maior University Targu-Mures (Romania)
Hybrid Reasoning and Coordination Methods on Multi-agent Systems Martí Navarro Yacer Javier Bajo Juan Botía Juan Manuel Corchado Luís Búrdalo Stella Heras Vicente Botti Vicente J. Julián Rubén Ortiz Rubén Fuentes Adriana Giret Alberto Fernández Marc Esteva Carlos Carrascosa Martí Navarro
Universidad Politécnica de Valencia (Spain) Universidad Pontificia de Salamanca (Spain) Universidad de Murcia (Spain) Universidad de Salamanca (Spain) Universidad Politécnica de Valencia (Spain) Universidad Politécnica de Valencia (Spain) Universidad Politécnica de Valencia (Spain) Universidad Politécnica de Valencia (Spain) Universidad Rey Juan Carlos (Spain) Universidad Complutense de Madrid (Spain) Universidad Politécnica de Valencia (Spain) Universidad Rey Juan Carlos (Spain) IIIA-CSIC (Spain) Universidad Politécnica de Valencia (Spain) Universidad Politécnica de Valencia (Spain)
HAIS for Computer Security (HAISfCS) Álvaro Herrero Emilio Corchado
University of Burgos (Spain) University of Salamanca (Spain)
XVII
XVIII
Organization
Huiyu Huiyu Zhou Belén Vaquerizo Cristian I. Pinzón Dante I. Tapia Javier Bajo Javier Sedano Juan F. De Paz Santana Sara Rodríguez Raquel Redondo Leticia Curiel Bruno Baruque Ángel Arroyo Juan M. Corchado
Queen's University Belfast (UK) University of Burgos (Spain) University of Salamanca (Spain) University of Salamanca (Spain) Pontifical University of Salamanca (Spain) University of Burgos (Spain) University of Salamanca (Spain) University of Salamanca (Spain) University of Burgos (Spain) University of Burgos (Spain) University of Burgos (Spain) University of Burgos (Spain) University of Salamanca (Spain)
Hybrid and Intelligent Techniques on Multimedia Adriana Dapena Janeiro José Martínez Otoniel López Ramón Moreno Manuel Graña Eduardo Martínez Javier Ruiz José Luis Martínez Daniel Iglesia Elsa Fernández Paula Castro
Universidade da Coruña (Spain) Universidad Autónoma de Madríd (Spain) Miguel Hernandez University (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) University of Murcia (Spain) Polytechnic University of Catalonia (Spain) Universidad de Castilla-La Mancha (Spain) Universidade da Coruña (Spain) University of the Basque Country (Spain) Universidade da Coruña (Spain)
Hybrid ANN: Models, Data Models, Algorithms and Data César Hervás-Martinez Pedro Antonio Gutiérrez Francisco Fernández-Navarroi Aldo Franco Dragoni Ángel Manuel Pérez-Bellido Daniel Mateos-García Francisco Fernández-Navarro Germano Vallesi José C. Riquelme-Santos Sancho Salcedo-Sanz Ekaitz Zulueta-Guerrero Emilio G. Ortíz-García Kui Li Liang Yu Alicia D'Anjou Francisco José Martínez-Estudillo
University of Córdoba (Spain) University of Córdoba (Spain) University of Córdoba (Spain) Università Politecnica delle Marche (Italy) University of Alcalá (Spain) University of Alcalá (Spain) University of Córdoba (Spain) Università Politecnica delle Marche (Italy) University of Sevilla (Spain) University of Alcalá (Spain) University of the Basque Country (Spain) University of Alcalá (Spain) Xidian University (China) (China) University of the Basque Country (Spain) University of Córdoba (Spain)
Organization
Juan Carlos Fernández Lin Gao Javier Sánchez-Monedero
XIX
University of Córdoba (Spain) Hefei University of Technology (China) University of Cordoba (Spain)
Hybrid Artificial Intelligence Systems Based on Lattice Theory Vassilis Kaburlasos Cliff Joslyn Juan Humberto Sossa Azuela Angelos Amanatiadis George Papakostas Gonzalo Urcid Peter Sussner Radim Belohlavek Theodore Pachidis Vassilis Syrris Anestis Hatzimichailidis Gonzalo Aranda-Corral Kevin Knuth Manuel Graña Gerhard Ritter Lefteris Moussiades Isabelle Bloch
Technological Educational Institution of Kavala (Greece) Pacific Northwest National Laboratory (USA) Centro de Investigación en Computación (Mexico) Technological Educational Institution of Kavala (Greece) Democritus University of Thrace (Greece) National Institute of Astrophysics, Optics and Electronics (Mexico) State University of Campinas (Brazil) Palacky University (Czech Republic) Technological Educational Institution of Kavala (Greece) Aristotle University of Thessaloniki (Greece) Technological Educational Institution of Kavala (Greece) University of Huelva (Spain) University at Albany (USA) University of the Basque Country (Spain) University of Florida (USA) Technological Educational Institution of Kavala (Greece) Ecole Nationale Supérieure des Télécommunications (France)
Information Fusion: Frameworks and Architectures José Manuel Molina López Javier Bajo Jose M. Armingol Juan A. Besada Miguel A. Patricio Arturo de la Escalera Eloi Bosse Jesus Garcia Jose M. Molina Antonio Berlanga James Llinas Ana M. Bernardos
University Carlos III (Spain) University of Salamanca (Spain) University Carlos III (Spain) Universidad Politécnica de Madrid (Spain) University Carlos III (Spain) University Carlos III (Spain) Defence R&D Canada (Canada) University Carlos III (Spain) University Carlos III (Spain) University Carlos III (Spain) University Carlos III (Spain) Universidad Politécnica de Madrid (Spain)
XX
Organization
Local Organizing Committee Manuel Graña Darya Chyzhyk Elsa Fernández Maite García-Sebastián Carmen Hernández Gómez Alexandre Manhães Savio Ramón Moreno Miguel Angel Veganzones Iván Villaverde
University of the Basque Country (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain) University of the Basque Country (Spain)
Table of Contents – Part II
SIFT-SS: An Advanced Steady-State Multi-Objective Genetic Fuzzy System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michel Gonz´ alez, Jorge Casillas, and Carlos Morell
1
Evolving Multi-label Classification Rules with Gene Expression Programming: A Preliminary Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ´ Jos´e Luis Avila-Jim´ enez, Eva Gibaja, and Sebasti´ an Ventura
9
Solving Classification Problems Using Genetic Programming Algorithms on GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alberto Cano, Amelia Zafra, and Sebasti´ an Ventura
17
Analysis of the Effectiveness of G3PARM Algorithm . . . . . . . . . . . . . . . . . . J.M. Luna, J.R. Romero, and S. Ventura
27
Reducing Dimensionality in Multiple Instance Learning with a Filter Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amelia Zafra, Mykola Pechenizkiy, and Sebasti´ an Ventura
35
Graphical Exploratory Analysis of Educational Knowledge Surveys with Missing and Conflictive Answers Using Evolutionary Techniques . . . Luciano S´ anchez, In´es Couso, and Jos´e Otero
45
Data Mining for Grammatical Inference with Bioinformatics Criteria . . . Vivian F. L´ opez, Ramiro Aguilar, Luis Alonso, Mar´ıa N. Moreno, and Juan M. Corchado
53
Hybrid Multiagent System for Automatic Object Learning Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ana Gil, Fernando de la Prieta, and Vivian F. L´ opez
61
On the Use of a Hybrid Approach to Contrast Endmember Induction Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miguel A. Veganzones and Carmen Hern´ andez
69
Self-emergence of Lexicon Consensus in a Population of Autonomous Agents by Means of Evolutionary Strategies . . . . . . . . . . . . . . . . . . . . . . . . . Dar´ıo Maravall, Javier de Lope, and Ra´ ul Dom´ınguez
77
Enhanced Self Organized Dynamic Tree Neural Network . . . . . . . . . . . . . . Juan F. De Paz, Sara Rodr´ıguez, Ana Gil, Juan M. Corchado, and Pastora Vega
85
XXII
Table of Contents – Part II
Agents and Computer Vision for Processing Stereoscopic Images . . . . . . . Sara Rodr´ıguez, Fernando de la Prieta, Dante I. Tapia, and Juan M. Corchado Incorporating Temporal Constraints in the Planning Task of a Hybrid Intelligent IDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ´ Alvaro Herrero, Mart´ı Navarro, Vicente Juli´ an, and Emilio Corchado HERA: A New Platform for Embedding Agents in Heterogeneous Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ´ ´ Ricardo S. Alonso, Juan F. De Paz, Oscar Garc´ıa, Oscar Gil, and Ang´elica Gonz´ alez A Genetic Algorithm for Solving the Generalized Vehicle Routing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P.C. Pop, O. Matei, C. Pop Sitar, and C. Chira
93
101
111
119
Using Cultural Algorithms to Improve Intelligent Logistics . . . . . . . . . . . . Alberto Ochoa, Yazmani Garc´ıa, Javier Ya˜ nez, and Yaddik Teymanoglu
127
A Cultural Algorithm for the Urban Public Transportation . . . . . . . . . . . . Laura Cruz Reyes, Carlos Alberto Ochoa Ort´ız Zezzatti, Claudia G´ omez Santill´ an, Paula Hern´ andez Hern´ andez, and Mercedes Villa Fuerte
135
Scalability of a Methodology for Generating Technical Trading Rules with GAPs Based on Risk-Return Adjustment and Incremental Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.A. de la Cal, E.M. Fern´ andez, R. Quiroga, J.R. Villar, and J. Sedano Hybrid Approach for the Public Transportation Time Dependent Orienteering Problem with Time Windows . . . . . . . . . . . . . . . . . . . . . . . . . . Ander Garcia, Olatz Arbelaitz, Pieter Vansteenwegen, Wouter Souffriau, and Maria Teresa Linaza
143
151
A Functional Taxonomy for Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sergio Esparcia and Estefan´ıa Argente
159
A Case-Based Reasoning Approach for Norm Adaptation . . . . . . . . . . . . . Jordi Campos, Maite L´ opez-S´ anchez, and Marc Esteva
168
An Abstract Argumentation Framework for Supporting Agreements in Agent Societies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stella Heras, Vicente Botti, and Vicente Juli´ an
177
Reaching a Common Agreement Discourse Universe on Multi-Agent Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alejandro Torre˜ no, Eva Onaindia, and Oscar Sapena
185
Table of Contents – Part II
Integrating Information Extraction Agents into a Tourism Recommender System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sergio Esparcia, V´ıctor S´ anchez-Anguix, Estefan´ıa Argente, Ana Garc´ıa-Fornes, and Vicente Juli´ an Adaptive Hybrid Immune Detector Maturation Algorithm . . . . . . . . . . . . . Jungan Chen, Wenxin Chen, and Feng Liang Interactive Visualization Applets for Modular Exponentiation Using Addition Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hatem M. Bahig and Yasser Kotb Multimedia Elements in a Hybrid Multi-Agent System for the Analysis of Web Usability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. Mosqueira-Rey, B. Baldonedo del R´ıo, D. Alonso-R´ıos, E. Rodr´ıguez-Poch, and D. Prado-Gesto An Approach for an AVC to SVC Transcoder with Temporal Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rosario Garrido-Cantos, Jos´e Luis Mart´ınez, Pedro Cuenca, and Antonio Garrido
XXIII
193
201
209
217
225
A GPU-Based DVC to H.264/AVC Transcoder . . . . . . . . . . . . . . . . . . . . . . Alberto Corrales-Garc´ıa, Rafael Rodr´ıguez-S´ anchez, Jos´e Luis Mart´ınez, Gerardo Fern´ andez-Escribano, Jos´e M. Claver, and Jos´e Luis S´ anchez
233
Hybrid Color Space Transformation to Visualize Color Constancy . . . . . . Ram´ on Moreno, Jos´e Manuel L´ opez-Guede, and Alicia d’Anjou
241
A Novel Hybrid Approach to Improve Performance of Frequency Division Duplex Systems with Linear Precoding . . . . . . . . . . . . . . . . . . . . . Paula M. Castro, Jos´e A. Garc´ıa-Naya, Daniel Iglesia, and Adriana Dapena
248
Low Bit-Rate Video Coding with 3D Lower Trees (3D-LTW) . . . . . . . . . . Otoniel L´ opez, Miguel Mart´ınez-Rach, Pablo Pi˜ nol, Manuel P. Malumbres, and Jos´e Oliver
256
Color Video Segmentation by Dissimilarity Based on Edges . . . . . . . . . . . Luc´ıa Ramos, Jorge Novo, Jos´e Rouco, Antonio Mosquera, and Manuel G. Penedo
264
Label Dependent Evolutionary Feature Weighting for Remote Sensing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daniel Mateos-Garc´ıa, Jorge Garc´ıa-Guti´errez, and Jos´e C. Riquelme-Santos
272
XXIV
Table of Contents – Part II
Evolutionary q-Gaussian Radial Basis Functions for Binary-Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F. Fern´ andez-Navarro, C. Herv´ as-Mart´ınez, P.A. Guti´errez, M. Cruz-Ram´ırez, and M. Carbonero-Ruz Evolutionary Learning Using a Sensitivity-Accuracy Approach for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Javier S´ anchez-Monedero, C. Herv´ as-Mart´ınez, F.J. Mart´ınez-Estudillo, Mariano Carbonero Ruz, M.C. Ram´ırez Moreno, and M. Cruz-Ram´ırez An Hybrid System for Continuous Learning . . . . . . . . . . . . . . . . . . . . . . . . . Aldo Franco Dragoni, Germano Vallesi, Paola Baldassarri, and Mauro Mazzieri Support Vector Regression Algorithms in the Forecasting of Daily Maximums of Tropospheric Ozone Concentration in Madrid . . . . . . . . . . . E.G. Ortiz-Garc´ıa, S. Salcedo-Sanz, A.M. P´erez-Bellido, J. Gasc´ on-Moreno, and A. Portilla-Figueras Neuronal Implementation of Predictive Controllers . . . . . . . . . . . . . . . . . . . Jos´e Manuel L´ opez-Guede, Ekaitz Zulueta, and Borja Fern´ andez-Gauna α-Satisfiability and α-Lock Resolution for a Lattice-Valued Logic LP(X) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xingxing He, Yang Xu, Yingfang Li, Jun Liu, Luis Martinez, and Da Ruan
280
288
296
304
312
320
On Compactness and Consistency in Finite Lattice-Valued Propositional Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaodong Pan, Yang Xu, Luis Martinez, Da Ruan, and Jun Liu
328
Lattice Independent Component Analysis for Mobile Robot Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ivan Villaverde, Borja Fernandez-Gauna, and Ekaitz Zulueta
335
An Introduction to the Kosko Subsethood FAM . . . . . . . . . . . . . . . . . . . . . Peter Sussner and Estev˜ ao Esmi An Increasing Hybrid Morphological-Linear Perceptron with Evolutionary Learning and Phase Correction for Financial Time Series Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ricardo de A. Ara´ ujo and Peter Sussner Lattice Associative Memories for Segmenting Color Images in Different Color Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gonzalo Urcid, Juan Carlos Valdiviezo-N., and Gerhard X. Ritter
343
351
359
Table of Contents – Part II
Lattice Neural Networks with Spike Trains . . . . . . . . . . . . . . . . . . . . . . . . . . Gerhard X. Ritter and Gonzalo Urcid Detecting Features from Confusion Matrices Using Generalized Formal Concept Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carmen Pel´ aez-Moreno and Francisco J. Valverde-Albacete Reconciling Knowledge in Social Tagging Web Services . . . . . . . . . . . . . . . Gonzalo A. Aranda-Corral and Joaqu´ın Borrego-D´ıaz 2-D Shape Representation and Recognition by Lattice Computing Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V.G. Kaburlasos, A. Amanatiadis, and S.E. Papadakis Order Metrics for Semantic Knowledge Systems . . . . . . . . . . . . . . . . . . . . . . Cliff Joslyn and Emilie Hogan
XXV
367
375 383
391 399
Granular Fuzzy Inference System (FIS) Design by Lattice Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vassilis G. Kaburlasos
410
Median Hetero-Associative Memories Applied to the Categorization of True-Color Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roberto A. V´ azquez and Humberto Sossa
418
A Comparison of VBM Results by SPM, ICA and LICA . . . . . . . . . . . . . . Darya Chyzyk, Maite Termenon, and Alexandre Savio Fusion of Single View Soft k-NN Classifiers for Multicamera Human Action Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rodrigo Cilla, Miguel A. Patricio, Antonio Berlanga, and Jose M. Molina Self-adaptive Coordination for Organizations of Agents in Information Fusion Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sara Rodr´ıguez, Bel´en P´erez-Lancho, Javier Bajo, Carolina Zato, and Juan M. Corchado Sensor Management: A New Paradigm for Automatic Video Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lauro Snidaro, Ingrid Visentini, and Gian Luca Foresti A Simulation Framework for UAV Sensor Fusion . . . . . . . . . . . . . . . . . . . . . Enrique Mart´ı, Jes´ us Garc´ıa, and Jose Manuel Molina An Embeddable Fusion Framework to Manage Context Information in Mobile Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ana M. Bernardos, Eva Madrazo, and Jos´e R. Casar
429
436
444
452
460
468
XXVI
Table of Contents – Part II
Embodied Moving-Target Seeking with Prediction and Planning . . . . . . . Noelia Oses, Matej Hoffmann, and Randal A. Koene Using Self-Organizing Maps for Intelligent Camera-Based User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zorana Bankovi´c, Elena Romero, Javier Blesa, ´ Jos´e M. Moya, David Fraga, Juan Carlos Vallejo, Alvaro Araujo, Pedro Malag´ on, Juan-Mariano de Goyeneche, Daniel Villanueva, and Octavio Nieto-Taladriz A SVM and k-NN Restricted Stacking to Improve Land Use and Land Cover Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jorge Garcia-Gutierrez, Daniel Mateos-Garcia, and Jose C. Riquelme-Santos
478
486
493
A Bio-inspired Fusion Method for Data Visualization . . . . . . . . . . . . . . . . . Bruno Baruque and Emilio Corchado
501
CBRid4SQL: A CBR Intrusion Detector for SQL Injection Attacks . . . . . ´ Cristian Pinz´ on, Alvaro Herrero, Juan F. De Paz, Emilio Corchado, and Javier Bajo
510
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
521
Table of Contents – Part I
Y-Means: An Autonomous Clustering Algorithm (Invited Paper) . . . . . . . Ali A. Ghorbani and Iosif-Viorel Onut
1
A Survey and Analysis of Frameworks and Framework Issues for Information Fusion Applications (Invited Paper) . . . . . . . . . . . . . . . . . . . . . James Llinas
14
A Regular Tetrahedron Formation Strategy for Swarm Robots in Three-Dimensional Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Fikret Ercan, Xiang Li, and Ximing Liang
24
Markovian Ants in a Queuing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ilija Tanackov, Dragan Simi´c, Siniˇsa Sremac, Jovan Tepi´c, and Sunˇcica Koci´c-Tanackov
32
A Parametric Method Applied to Phase Recovery from a Fringe Pattern Based on a Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . J.F. Jimenez, F.J. Cuevas, J.H. Sossa, and L.E. Gomez
40
Automatic PSO-Based Deformable Structures Markerless Tracking in Laparoscopic Cholecystectomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Haroun Djaghloul, Mohammed Batouche, and Jean-Pierre Jessel
48
A Framework for Optimization of Genetic Programming Evolved Classifier Expressions Using Particle Swarm Optimization . . . . . . . . . . . . . Hajira Jabeen and Abdul Rauf Baig
56
Developing an Intelligent Parking Management Application Based on Multi-agent Systems and Semantic Web Technologies . . . . . . . . . . . . . . . . . Andr´es Mu˜ noz and Juan A. Bot´ıa
64
Linked Multicomponent Robotic Systems: Basic Assessment of Linking Element Dynamical Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Borja Fernandez-Gauna, Jose Manuel Lopez-Guede, and Ekaitz Zulueta
73
Social Simulation for AmI Systems Engineering . . . . . . . . . . . . . . . . . . . . . . Teresa Garcia-Valverde, Emilio Serrano, and Juan A. Botia
80
Automatic Behavior Pattern Classification for Social Robots . . . . . . . . . . Abraham Prieto, Francisco Bellas, Pilar Caama˜ no, and Richard J. Duro
88
Healthcare Information Fusion Using Context-Aware Agents . . . . . . . . . . . Dante I. Tapia, Juan A. Fraile, Ana de Luis, and Javier Bajo
96
XXVIII
Table of Contents – Part I
Multivariate Discretization for Associative Classification in a Sparse Data Application Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mar´ıa N. Moreno Garc´ıa, Joel Pinho Lucas, Vivian F. L´ opez Batista, and M. Jos´e Polo Mart´ın
104
Recognition of Turkish Vowels by Probabilistic Neural Networks Using Yule-Walker AR Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Erdem Yavuz and Vedat Topuz
112
A Dynamic Bayesian Network Based Structural Learning towards Automated Handwritten Digit Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . Olivier Pauplin and Jianmin Jiang
120
A Dual Network Adaptive Learning Algorithm for Supervised Neural Network with Contour Preserving Classification for Soft Real Time Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Piyabute Fuangkhon and Thitipong Tanprasert The Abnormal vs. Normal ECG Classification Based on Key Features and Statistical Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jun Dong, Jia-fei Tong, and Xia Liu
128
136
Classification of Wood Pulp Fibre Cross-Sectional Shapes . . . . . . . . . . . . . Asuka Yamakawa and Gary Chinga-Carrasco
144
A Hybrid Cluster-Lift Method for the Analysis of Research Activities . . . Boris Mirkin, Susana Nascimento, Trevor Fenner, and Lu´ıs Moniz Pereira
152
Protein Fold Recognition with Combined SVM-RDA Classifier . . . . . . . . . Wieslaw Chmielnicki and Katarzyna St¸apor
162
Data Processing on Database Management Systems with Fuzzy Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ˙ Irfan S ¸ im¸sek and Vedat Topuz
170
A Hybrid Approach for Process Mining: Using From-to Chart Arranged by Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eren Esgin, Pinar Senkul, and Cem Cimenbicer
178
Continuous Pattern Mining Using the FCPGrowth Algorithm in Trajectory Data Warehouses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marcin Gorawski and Pawel Jureczek
187
Hybrid Approach for Language Identification Oriented to Multilingual Speech Recognition in the Basque Context . . . . . . . . . . . . . . . . . . . . . . . . . . N. Barroso, K. L´ opez de Ipi˜ na, A. Ezeiza, O. Barroso, and U. Susperregi
196
Table of Contents – Part I
An Approach of Bio-inspired Hybrid Model for Financial Markets . . . . . . Dragan Simi´c, Vladeta Gaji´c, and Svetlana Simi´c Interactive and Stereoscopic Hybrid 3D Viewer of Radar Data with Gesture Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jon Goenetxea, Aitor Moreno, Luis Unzueta, Andoni Gald´ os, and ´ Alvaro Segura Recognition of Manual Actions Using Vector Quantization and Dynamic Time Warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marcel Martin, Jonathan Maycock, Florian Paul Schmidt, and Oliver Kramer Protecting Web Services against DoS Attacks: A Case-Based Reasoning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cristian Pinz´ on, Juan F. De Paz, Carolina Zato, and Javier P´erez
XXIX
205
213
221
229
Ranked Tag Recommendation Systems Based on Logistic Regression . . . J.R. Quevedo, E. Monta˜ n´es, J. Ranilla, and I. D´ıaz
237
A Hybrid Robotic Control System Using Neuroblastoma Cultures . . . . . . J.M. Ferr´ andez, V. Lorente, J.M. Cuadra, F. delaPaz, ´ Jos´e Ram´ on Alvarez-S´ anchez, and E. Fern´ andez
245
Image Segmentation with a Hybrid Ensemble of One-Class Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Boguslaw Cyganek
254
Power Prediction in Smart Grids with Evolutionary Local Kernel Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oliver Kramer, Benjamin Satzger, and J¨ org L¨ assig
262
Automatic Quality Inspection of Percussion Cap Mass Production by Means of 3D Machine Vision and Machine Learning Techniques . . . . . . . . A. Tellaeche, R. Arana, A. Ibarguren, and J.M. Mart´ınez-Otzeta
270
Speaker Verification and Identification Using Principal Component Analysis Based on Global Eigenvector Matrix . . . . . . . . . . . . . . . . . . . . . . . Minkyung Kim, Eunyoung Kim, Changwoo Seo, and Sungchae Jeon
278
Hybrid Approach for Automatic Evaluation of Emotion Elicitation Oriented to People with Intellectual Disabilities . . . . . . . . . . . . . . . . . . . . . . R. Mart´ınez, K. L´ opez de Ipi˜ na, E. Irigoyen, and N. Asla
286
Fusion of Fuzzy Spatial Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nadeem Salamat and El-hadi Zahzah
294
Reducing Artifacts in TMS-Evoked EEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . ´ Juan Jos´e Fuertes, Carlos M. Travieso, A. Alvarez, M.A. Ferrer, and J.B. Alonso
302
XXX
Table of Contents – Part I
Model Driven Image Segmentation Using a Genetic Algorithm for Structured Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Romain Raveaux and Guillaume Hillairet Stamping Line Optimization Using Genetic Algorithms and Virtual 3D Line Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Javier A. Garc´ıa-Sedano, Jon Alzola Bernardo, ´ Asier Gonz´ alez Gonz´ alez, Oscar Berasategui Ruiz de Gauna, and Rafael Yuguero Gonz´ alez de Mendivil Evolutionary Industrial Physical Model Generation . . . . . . . . . . . . . . . . . . . Alberto Carrascal and Amaia Alberdi
311
319
327
Evolving Neural Networks with Maximum AUC for Imbalanced Data Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaofen Lu, Ke Tang, and Xin Yao
335
A Neuro-genetic Control Scheme Application for Industrial R3 Workspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. Irigoyen, M. Larrea, J. Valera, V. G´ omez, and F. Artaza
343
Memetic Feature Selection: Benchmarking Hybridization Schemata . . . . . M.A. Esseghir, Gilles Goncalves, and Yahya Slimani
351
A Hybrid Cellular Genetic Algorithm for Multi-objective Crew Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fariborz Jolai and Ghazal Assadipour
359
GENNET-Toolbox: An Evolving Genetic Algorithm for Neural Network Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vicente G´ omez-Garay, Eloy Irigoyen, and Fernando Artaza
368
An Evolutionary Feature-Based Visual Attention Model Applied to Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roberto A. V´ azquez, Humberto Sossa, and Beatriz A. Garro
376
Efficient Plant Supervision Strategy Using NN Based Techniques . . . . . . . Ramon Ferreiro Garcia, Jose Luis Calvo Rolle, and Francisco Javier Perez Castelo
385
FDI and Accommodation Using NN Based Techniques . . . . . . . . . . . . . . . . Ramon Ferreiro Garcia, Alberto De Miguel Catoira, and Beatriz Ferreiro Sanz
395
A Hybrid ACO Approach to the Matrix Bandwidth Minimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Camelia-M. Pintea, Gloria-Cerasela Cri¸san, and Camelia Chira
405
Table of Contents – Part I
Machine-Learning Based Co-adaptive Calibration: A Perspective to Fight BCI Illiteracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carmen Vidaurre, Claudia Sannelli, Klaus-Robert M¨ uller, and Benjamin Blankertz Analysing the Low Quality of the Data in Lighting Control Systems . . . . Jose R. Villar, Enrique de la Cal, Javier Sedano, and Marco Garc´ıa-Tamargo Type-1 Non-singleton Type-2 Takagi-Sugeno-Kang Fuzzy Logic Systems Using the Hybrid Mechanism Composed by a Kalman Type Filter and Back Propagation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gerardo M. Mendez, Angeles Hern´ andez, Alberto Cavazos, and Marco-Tulio Mata-Jim´enez An Hybrid Architecture Integrating Forward Rules with Fuzzy Ontological Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefano Bragaglia, Federico Chesani, Anna Ciampolini, Paola Mello, Marco Montali, and Davide Sottara Selecting Regions of Interest in SPECT Images Using Wilcoxon Test for the Diagnosis of Alzheimer’s Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Salas-Gonzalez, J.M. G´ orriz, J. Ram´ırez, Fermin Segovia, Rosa Chaves, Miriam L´ opez, I.A. Ill´ an, and Pablo Padilla Effective Diagnosis of Alzheimer’s Disease by Means of Association Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rosa Chaves, Javier Ram´ırez, J.M. G´ orriz, Miriam L´ opez, D. Salas-Gonzalez, I.A. Ill´ an, Fermin Segovia, and Pablo Padilla Exploratory Matrix Factorization for PET Image Analysis . . . . . . . . . . . . A. Kodewitz, I.R. Keck, A.M. Tom´e, J.M. G´ orriz, and Elmar W. Lang NMF-Based Analysis of SPECT Brain Images for the Diagnosis of Alzheimer’s Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pablo Padilla, Juan-Manuel G´ orriz, Javier Ram´ırez, ´ Elmar Lang, Rosa Chaves, Fermin Segovia, Ignacio Alvarez, Diego Salas-Gonz´ alez, and Miriam L´ opez Partial Least Squares for Feature Extraction of SPECT Images . . . . . . . . Fermin Segovia, Javier Ram´ırez, J.M. G´ orriz, Rosa Chaves, ´ D. Salas-Gonzalez, Miriam L´ opez, Ignacio Alvarez, Pablo Padilla, and C.G. Puntonet Sensor Fusion Adaptive Filtering for Position Monitoring in Intense Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alberto Olivares, J.M. G´ orriz, Javier Ram´ırez, and Gonzalo Olivares
XXXI
413
421
429
438
446
452
460
468
476
484
XXXII
Table of Contents – Part I
Prediction of Bladder Cancer Recurrences Using Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ekaitz Zulueta Guerrero, Naiara Telleria Garay, Jose Manuel Lopez-Guede, Borja Ayerdi Vilches, Eider Egilegor Iragorri, David Lecumberri Casta˜ nos, Ana Bel´en de la Hoz Rastrollo, and Carlos Pertusa Pe˜ na Hybrid Decision Support System for Endovascular Aortic Aneurysm Repair Follow-Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jon Haitz Legarreta, Fernando Boto, Iv´ an Mac´ıa, Josu Maiora, Guillermo Garc´ıa, C´eline Paloc, Manuel Gra˜ na, and Mariano de Blas
492
500
On the Design of a CADS for Shoulder Pain Pathology . . . . . . . . . . . . . . . K. L´ opez de Ipi˜ na, M.C. Hern´ andez, E. Mart´ınez, and C. Vaquero
508
Exploring Symmetry to Assist Alzheimer’s Disease Diagnosis . . . . . . . . . . I.A. Ill´ an, J.M. G´ orriz, Javier Ram´ırez, D. Salas-Gonzalez, Miriam L´ opez, Pablo Padilla, Rosa Chaves, Fermin Segovia, and C.G. Puntonet
516
Thrombus Volume Change Visualization after Endovascular Abdominal Aortic Aneurysm Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Josu Maiora, Guillermo Garc´ıa, Iv´ an Mac´ıa, Jon Haitz Legarreta, Fernando Boto, C´eline Paloc, Manuel Gra˜ na, and Javier Sanchez Abu´ın
524
Randomness and Fuzziness in Bayes Multistage Classifier . . . . . . . . . . . . . Robert Burduk
532
Multiple Classifier System with Radial Basis Weight Function . . . . . . . . . Konrad Jackowski
540
Mixture of Random Prototype-Based Local Experts . . . . . . . . . . . . . . . . . . Giuliano Armano and Nima Hatami
548
Graph-Based Model-Selection Framework for Large Ensembles . . . . . . . . . Krisztian Buza, Alexandros Nanopoulos, and Lars Schmidt-Thieme
557
Rough Set-Based Analysis of Characteristic Features for ANN Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Urszula Sta´ nczyk
565
Boosting Algorithm with Sequence-Loss Cost Function for Structured Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tomasz Kajdanowicz, Przemyslaw Kazienko, and Jan Kraszewski
573
Table of Contents – Part I
XXXIII
Application of Mixture of Experts to Construct Real Estate Appraisal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Magdalena Graczyk, Tadeusz Lasota, Zbigniew Telec, and Bogdan Trawi´ nski
581
Designing Fusers on the Basis of Discriminants – Evolutionary and Neural Methods of Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michal Wozniak and Marcin Zmyslony
590
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
599
SIFT-SS: An Advanced Steady-State Multi-Objective Genetic Fuzzy System Michel Gonz´ alez1 , Jorge Casillas2 , and Carlos Morell1 1
2
Universidad Central “Marta Abreu” de Las Villas, CUBA {michelgb,cmorellp}@uclv.edu.cu Dept. Computer Science and Artificial Intelligence, University of Granada, Spain
[email protected] Abstract. Nowadays, automatic learning of fuzzy rule-based systems is being addressed as a multi-objective optimization problem. A new research area of multi-objective genetic fuzzy systems (MOGFS) has capture the attention of the fuzzy community. Despite the good results obtained, most of existent MOGFS are based on a gross usage of the classic multi-objective algorithms. This paper takes an existent MOGFS and improves its convergence by modifying the underlying genetic algorithm. The new algorithm is tested in a set of real-world regression problems with successful results.
1
Introduction
In the last few years have grown the number of publications, in which automatic learning of fuzzy rule-based systems (FRBSs) is defined as a multi-objective optimization problem [1,2]. In the current approach, several interpretability and accuracy metrics are optimized during the learning process and higher quality models are obtained. Multi-objective genetic fuzzy systems (MOGFSs) are playing a fundamental role in this quest. They take the best of two research fields. On the one hand, genetic fuzzy systems (GFS) have a solid base of data structures and coding schemes that can be used to simultaneously learn many features of the FRBS. On the other hand, multi-objective evolutionary algorithms (MOEAs) are among the best and more versatile techniques for multi-objective optimization. Despite the advances obtained, there is still much work to be done. Most of the existent MOGFSs are based on a gross usage of standard MOEAs like NSGA-II [3] and SPEA2. However, there are recent results about current MOEAs limitations and new multi-objective techniques that require attention from the MOGFS’s community [4]. Besides, there are specific problem requirements in fuzzy modeling that can be taken into account to enhance the search process [5]. This paper is a study case in which the specificities of fuzzy modeling are incorporated into an existent MOGFS. For the study we focus on the algorithm SIFT [6] (Section 2) due to the high number of optimization objectives, its good performance and its standard MOEA core based on NSGA-II. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 1–8, 2010. c Springer-Verlag Berlin Heidelberg 2010
2
M. Gonz´ alez, J. Casillas, and C. Morell
The proposed SIFT-SS (Section 3) introduces the following modifications in the original algorithm. – – – – –
The generational scheme is changed to an iterational scheme. A new Objective Scale Crowding Distance is introduced (Section 3.1). A new Crowding-Based Mating heuristic is introduced (Section 3.2). The population size is dynamically adjusted (Section 3.3). The phenotypical copies are removed (Section 3.4).
These modifications do not interfere with the algorithm’s specific components, thus they can be easily implemented in other existent algorithms.
2
SIFT
Simplification of Fuzzy Models by Tuning, SIFT [6], is a multi-objective genetic algorithm with a generational evolutionary scheme based on NSGA-II [3]. It tunes the whole database definition (fuzzy variables, number and type of linguistic terms, and membership function parameters), while the fuzzy rule base is adapted to the tuned database by a greedy approach. The individuals are optimized according to three objectives: the mean square error (MSE) over the training dataset, the total number of linguistic terms (NL), and the number of fuzzy rules (NR). The output is a Pareto of optimal fuzzy models. SIFT presents a set of advantages. It is a highly efficient for large-scale regressions problems. It generates very legible fuzzy partitions thanks to its interpretability constrains and the fact of tuning the complete database definition. Besides, the obtained models are small and very accurate because of the multiobjective approach. Although SIFT is a good algorithm, its evolutionary process can be improved. As explained by Gacto [5] there are specific issues that need to be considered in the integration of MOEAs and GFS. In the case of SIFT, a careful analysis shows that two of the objectives (NL and NR) are correlated by definition and converge faster than the third objective (MSE) pulling to local minimums. This leads to many solutions with very low number of rules and poor fitness that are not very useful. Another disadvantage of SIFT is its disability to work with large population sizes. It takes a long time for a generational algorithm like NSGA-II reach a sufficient number of iterations with a large population. Although the generational nature of SIFT allows the use of parallelism, there is not always sufficient computational resources to do it.
3
SIFT-SS: An Improved Steady-State Version of SIFT
The proposed steady-state SIFT (SIFT-SS) reuses the codification, genetic operators and evaluation of SIFT; and modifies only the underlying genetic algorithm. The changes include the substitution of the generational scheme by an
SIFT-SS: An Advanced Steady-State MOGFS
3
iterational scheme. The iterational scheme improves convergence because each new born individual is instantly introduced in the population. As a result, it is expected a rapid advance with the same number of evaluations. The SIFT-SS core consists of the next steps: 1. Generate an initial population P and evaluate P . 2. Build a dominance rank and calculate the crowding distance for every front (see section 3.1 for objective scale crowding distance). 3. While not reached the maximun number of iterations do: (a) Select of two parents from P (see section 3.2 for mating heuristics). (b) Cross and mutate the selected parents and produce two new individuals. (c) Evaluate the new individuals. (d) Insert the new individuals in P , rebuild the rank and update crowding distance values (see section 3.4 for copies check). (e) Remove the worst individuals and ajust the population size (see section 3.3 for variable population size). 4. Output P 3.1
Objective Scaled Crowding Distance (OSCD)
All multi-objective evolutionary algorithms based on dominance selection sooner or later get stock when all the solutions in the population are non-dominated. The higher number of objectives, the sooner they will get stock [4]. NSGA-II is victim of this issue, in which the first selection criterion is unable to distinguish the best individuals. Of course, there is no such thing like best individuals when they are all non-dominated. The second selection criterion in NSGA-II is the Crowding Distance (CD) [3]. This is a measure which acts as an estimation of the density of solutions surrounding particular solution. When the solutions ranking is built, the most isolated ones are preferred. The CD preserves the diversity of the population and also will lead to an equally spread solution set. But, is it always an equally spread solution set the most representative or desired? Although in other optimization problems it may be true, in the case of a rule-based system learning algorithm like SIFT the researcher may be more interested in obtaining more accurate solutions but highest number of rules rules than having very inaccurate solutions with few number of rules [5]. In this section we present a new Objective Scaled Crowding Distance (OSCD) that extends the traditional definition in order to take into account the objectives values. The OSCD will be equal to the product of the traditional CD and an Expansion Factor (K), given by the expression (1). bis (ei − 1) + 1 K(s) = (1) where bis ∈ [0, 1] indicates the relative “goodness” of the solution s for the objective i among the rest of solutions in the same Pareto front and the parameter ei ∈ [1, ∞) establishes the level of strength desired for objective i (e.g. 1x, 2x, . . . ).
4
M. Gonz´ alez, J. Casillas, and C. Morell
There are many ways to measure the relative goodness of a solution for an objective. In the particular case of SIFT in which solutions with better MSEs are preferred, bMSE is defined as the relative position of the solution in the list s of solutions ordered by MSE. Therefore, the simplified expression for OSCD in SIFT is as follows: pmse (s) (emse − 1) + 1 CD(s) OSCD(s) = (2) n with pmse (s) being the inverted ranking of the solution s with respect to the M SE objective (where n is the value of the most accurate solution and 1 the one of the worst). The expansion of crowding distance allows an increased solution density in the zone with better MSE values. This allows more activity in the evolution of accuracy than in the rest of objectives, but, unlike Gacto approach [5] the OSCD is based in the control of crowding rather than SP EA2Acc . 3.2
Crowding-Based Mating (CBM)
The Crowding-Based Mating (CBM) considers the crowding distance in order to exploit the most promising solutions. It works in the following way: 1. Select the first parent p1 at random from the first front, using a discrete probability distribution proportional to each individual crowding distance. 2. Select the second parent p2 by binary tournament selection. The use of CBM in combination with OSCD guaranties that one of the two parents is likely to be an accurate and isolated solution. This press to recombine the most accurate solutions in order to obtain the least number of rules [5], but also preserves diversity. To avoid over-fitting, CBM is applied with probability Pcbm . Otherwise, both parents are selected by binary tournament in the same way the original SIFT does. 3.3
Variable Population Size (VPS)
The variable population size (VPS) is one of the fundamental strengths of the iterational scheme thus it can be more flexible to include a larger number of solutions without compromising too much the overall performance. The population size in SIFT-SS can dynamically grow when all the individuals are non-dominated and dynamically shrink when dominated individuals appear. The variation is controlled between a minimum and maximum number of individuals defined by the user. The VPS gives certain degree of flexibility to keep optimal solutions, that otherwise would have been eliminated. In problems that tends to many nondominated solutions (either because of the large search space or the use of many objectives to be optimized) this type of population size management can be useful.
SIFT-SS: An Advanced Steady-State MOGFS
3.4
5
Copies Check (CC)
The SIFT-SS also implements a copy check routine that prevents the insertion of phenotypical copies1 in the population. The decision of removing copies in SIFT-SS is sustained in two aspects. On the one hand, a copy consumes space in the population with redundant phenotypical information that does not help to the selection process. On the other hand, in the iterational scheme, copies are not as important for the survival and reproduction of elite individuals as they are in generational scheme.
4
Experimental Analysis
This section compares the original SIFT against nine different configurations of the proposed SIFT-SS. The configuration 1 was chossen to observe if the new iterational approach improves the generational approach. Configurations 2, 3 and 4 study the effect of the CC, the VPS and the OSCD heuristics separatly. Configurations 5 and 6 test the OSCD reinforced with CBM in the half and totality of crossovers respectively. Finally configurations 7, 8 and 9 are analogous to 4, 5 and 6 with the inclusion of CC. The detailed parameter specifications for SIFT-SS configurations are listed in Table 1 (if a parameter is not used then is marked with a dash). The experimentation has been performed with a 5-fold cross validation. Each algorithm was executed with six different random seeds, so a total of 30 experiments per problem and algorithm were done. The comparison covered twelve real-world regression problems of increasing level of complexity (from 2 to 40 input variables, from 43 to 16,599 instances). All experiments were initialized using a population of 30 individuals; the crossover probability was set to 0.7 and the mutation probability to 0.2. The stop condition was set to 50,000 evaluations. As regards the initial maximum
Table 1. Algorithm Configurations
1
Algorithm Pcbm Emse M CD V P S CC
Description
sift-ss.1 sift-ss.2 sift-ss.3 sift-ss.4 sift-ss.5 sift-ss.6 sift-ss.7 sift-ss.8 sift-ss.9
sift-ss sift-ss sift-ss sift-ss sift-ss sift-ss sift-ss sift-ss sift-ss
0.5 1.0 0.5 1.0
2x 2x 2x 2x 2x 2x
no no no yes yes yes yes yes yes
30-60 -
no yes no no no no yes yes yes
i.e. solutions with the same objective values.
+ + + + + + + +
cc vps 30-60 oscd 2x cbm 50% + oscd 2x cbm 100% + oscd 2x oscd 2x + cc cbm 50% + oscd 2x + cc cbm 100% + oscd 2x + cc
6
M. Gonz´ alez, J. Casillas, and C. Morell
number of linguistic terms, in the output variable it was set to seven, while in the input variable it was set to seven for problems with 2 input variables (diabetes and ele1), five for problems with 4 to 9 input variables (laser, ele2, dee, concrete, and ankara), or three for problems with 15 to 40 input variables (mortgage, treasury, elevators, compactiv, and ailerons). To measure the convergence improvement of each algorithm we used the Generational Distance (GD) proposed by Van Veldhuizen [7] which is expressed as follows (3): GD(S) =
1 min {f (x) − f (y) : y ∈ S ∗ } |S|
(3)
x∈S
where S is the Pareto solution of the algorithm; S ∗ is the true Pareto-optimal solution for the problem, and f (x) − f (y) is the Euclidean Distance between two solutions in the objectives space (objectives values were scale to [0, 1]). Since S ∗ is unknown in the considered real-world problems, we use as S ∗ the set of non-dominated solutions among all solutions examined in our computational experiments in this paper [8], i.e., the joined Pareto obtained by all the analyzed algorithms. The GD meassures the proximity of a solution S to the “best known” solution S ∗ for a problem. Considering that all algorithms produce the same number of individuals with the same genetic operators, a significant reduction in the average GD means that the proposed modification provides a better orientation in the search process. The average values of GD for each problem are calculated in Table 2. The best result for each problem is shown underlined. At first glance, the iterational scheme by its own (configuration 1) did not show the expected improvement of GD compared to SIFT. Nevertheless, its major advantage (i.e., dealing with larger population size) was not evaluated in the experiments. As recommended by Demˇsar [9], the average values of GD were statistically processed using a Friedman test. The test detected highly significant differences (p < 0.05) among algorithms. The mean ranks (Table 2) confirm a better performance of 8 and 9. Next, a Wilcoxon Signed-Ranks test was applied between each pair of algorithms. Table 3 shows the summarized result. Above the diagonal; there is a “+” when the row is significant better than the column (p < 0.05), a “−” when the column is significant better than the row and a “=” if there is no signifcant differences between them. Below the diagonal appears the p-values of the test. The results of Table 3 show that the full combination of OSCD, CBM and CC (8 and 9) achieve the best results. The test stands a highly significant improvement of the GD compared to the rest, except for 7. The OSCD performs really good, whether in combination with CC or alone (4 and 7). The increased density in the zone of accurate solutions makes possible a substantial reduction of the number of rules. The effect of OSCD can be observed in Figure 1 where some of the best solutions have been plotted.
SIFT-SS: An Advanced Steady-State MOGFS
7
Table 2. Average Generational Distance SIFT-SS 5 6
Dataset
SIFT
1
2
3
4
diabetes ele1 laser ele2 dee concrete ankara mortgage treasury elevator compactiv ailerons
0.0769 0.0619 0.0906 0.0867 0.0440 0.0531 0.1176 0.1546 0.0922 0.0863 0.0471 0.0653
0.0789 0.0557 0.0911 0.0682 0.0393 0.0560 0.1314 0.1921 0.1091 0.0808 0.0498 0.0453
0.0746 0.0596 0.0865 0.0770 0.0370 0.0587 0.1119 0.1379 0.1146 0.0522 0.0486 0.0526
0.0895 0.0595 0.0758 0.0863 0.0354 0.0587 0.1462 0.1824 0.1660 0.0782 0.0505 0.0562
0.0921 0.0576 0.0791 0.0706 0.0361 0.0510 0.0717 0.1533 0.0914 0.0506 0.0507 0.0481
0.0840 0.0609 0.0816 0.0671 0.0364 0.0528 0.0912 0.1171 0.0862 0.0718 0.0515 0.0569
7.00
6.17
7.50
5.17
5.83
Mean Ranks 7.50
7
8
9
0.0862 0.0623 0.0801 0.0691 0.0373 0.0574 0.0771 0.1387 0.0817 0.0747 0.0470 0.0489
0.0783 0.0619 0.0712 0.0645 0.0358 0.0535 0.0857 0.1601 0.0751 0.0668 0.0473 0.0429
0.0774 0.0530 0.0767 0.0599 0.0361 0.0514 0.0589 0.1333 0.0896 0.0585 0.0424 0.0436
0.0888 0.0589 0.0790 0.0540 0.0343 0.0509 0.0613 0.1178 0.0625 0.0587 0.0405 0.0473
6.08
4.33
2.67
2.75
Table 3. Wilcoxon signed-rank test (p-values) 8 SIFT-SS (8) SIFT-SS (9) SIFT-SS (7) SIFT-SS (4) SIFT-SS (5) SIFT-SS (6) SIFT-SS (2) SIFT-SS (1) SIFT (S) SIFT-SS (3)
0.937 0.136 0.019 0.050 0.012 0.012 0.002 0.003 0.005
9
7
4
5
6
2
1
S
3
=
= =
+ + =
+ + = =
+ + = = =
+ + = = = =
+ + + = = = =
+ + = + + + = =
+ + + + = + + = =
0.099 0.028 0.012 0.005 0.034 0.019 0.010 0.006
0.695 0.084 0.209 0.272 0.012 0.060 0.005
0.638 0.937 0.209 0.099 0.034 0.034
500 400 300
4.5 3.5 2.5 1.5
0
5
10
15
20
O2 (#R)
25
30
35
0.35 0.30 0.25 0.20
200 100
sift sift-ss.4 optimal
0.40
O1 (RMSE)
600
Ailerons 0.45
sift sift-ss.8 optimal
5.5
O1 (RMSE)
700
O1 (RMSE)
6.5
sift sift-ss.9 optimal
800
0.347 0.209 0.530 0.023 0.084 1.000 0.034 0.034 0.209 0.480
Ankara
Ele2 900
1.000 0.272 0.182 0.015 0.071
0.15 0
20
40
60
80
O2 (#R)
Fig. 1. Average Pareto solutions
0
20
40
O2 (#R)
60
80
8
M. Gonz´ alez, J. Casillas, and C. Morell
The CBM reinforced the effect of the OSCD, but this only led to a better performance if the copies were avoided by the CC approach. In general, the removal of copies CC was found important when using OSCD or CBM as can be observed in configurations 7, 8 and 9 compared to 4, 5 and 6. The VPS approach did not make any difference. Due to the interpretability constrains of SIFT, the number of non-dominated was not high and the modification was not effective.
5
Conclusion and Further Work
We have proposed an improved MOFGS by modifying the underlying genetic algorithm to consider the specific needs of fuzzy modeling. The proposed SIFTSS implements heuristics like OSCD and CBM that allow a better trade-off between the accuracy and interpretability objectives. In the future this heuristics are going to be analyzed in the generational approach of SIFT. Also there is going to be further experimentation with larger search spaces and many objectives to test the true capacities of the iterational approach and the VPS.
References 1. Ishibuchi, H.: Multiobjective genetic fuzzy systems: Review and future research directions. In: Proc. of 2007 IEEE International Conference on Fuzzy Systems, London, UK, July 23-26, pp. 913–918 (2007) 2. Ishibuchi, H.: Evolutionary multiobjective design of fuzzy rule-based systems. In: Proc. of 2007 IEEE Symposium on Foundation of Computational Intelligence, Honolulu, USA, April 1-5, pp. 9–16 (2007) 3. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6(2), 182–197 (2002) 4. Ishibuchi, H., Tsukamoto, N., Nojima, Y.: Evolutionary Many-Objective Optimization. In: Proc. of 3rd International Workshop on Genetic and Evolving Fuzzy Systems, Witten-Bommerholz, Germany, pp. 47–52 (2008) 5. Gacto, M.J., Alcal´ a, R., Herrera, F.: Adaptation and application of multi-objective evolutionary algorithms for rule reduction and parameter tuning of fuzzy rule-based systems. Soft Comput. 13(5), 419–436 (2008) 6. Casillas, J.: Efficient multi-objective genetic tuning of fuzzy models for large-scale regression problems. In: Proc. of 2009 IEEE International Conference on Fuzzy Systems, Jeju, Republic of Korea, pp. 1712–1717 (2009) 7. Van Veldhuizen, D.A.: Multiobjective Evolutionary Algorithms: Classifications, Analyses, and New Innovations. Ph.D. dissertation. Air Force Institute of Technology, Dayton (1999) 8. Ishibuchi, H., Narukawa, K., Tsukamoto, N., Nojima, Y.: An empirical study on similarity-based mating for evolutionary multiobjective combinatorial optimization. European Journal of Operational Research 188(1), 57–75 (2008) 9. Demˇsar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7, 30 (2006)
Evolving Multi-label Classification Rules with Gene Expression Programming: A Preliminary Study ´ Jos´e Luis Avila-Jim´ enez, Eva Gibaja, and Sebasti´ an Ventura Department of Computer Sciences and Numerical Analysis, University of C´ ordoba
Abstract. The present work expounds a preliminary work of a genetic programming algorithm to deal with multi-label classification problems. The algorithm uses Gene Expression Programming and codifies a classification rule into each individual. A niching technique assures diversity in the population. The final classifier is made up by a set of rules for each label that determines if a pattern belongs or not to the label. The proposal have been tested over several domains and compared with other multi-label algorithms and the results shows that it is specially suitable to handle with nominal data sets.
1
Introduction
Most of classification problems associate only one label per pattern, li , from a set of disjoint labels, L. Nevertheless, this is not the only possible scenario, because there are many problems of increasing actuality, like text and sound categorization, protein and gene classification or semantic scene classification, where a pattern can have associated not just one, but a set of class labels, Y ⊆ L. To deal with this kind of situation, an automatic learning paradigm called multi-label classification has emerged, which allows the one label per pattern restriction to be avoided. Thus, the main characteristic of multi-label classification is that labels are not mutually excluding, allowing patterns with more than one associated label. Multi-label classification problems have been basically dealt with from two points of view [1]. On one hand some studies describe several transformation methods, also called preprocessing techniques, which transform an original multilabel problem into several single label problems allowing the use of a classical classification algorithm. On the other hand, many algorithm adaptation techniques have been developed in order to adapt a classical algorithm to directly work with multi-label data. For instance, some SVM approaches have been developed [2]. With reference to other techniques, in [3] an adaptation of the K-nearest neighbor algorithm, called ML-KNN has been proposed. The use of syntactic trees for hierarchical multi-label classification has been studied in [4] and an adaptation of the C4.5 algorithm to multi-label classification is presented in [5]. In spite E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 9–16, 2010. c Springer-Verlag Berlin Heidelberg 2010
10
´ J.L. Avila-Jim´ enez, E. Gibaja, and S. Ventura
of their wide use in classical classification, bio-inspired approaches have rarely been applied to solve multi-label problems. It can be highlighted the MULAM algorithm based on ant colonies [6] and the work of Vallim et al. [7] where a genetic algorithm is proposed. In addition the authors have developed a genetic programming model that uses discriminant functions to learn multilabel classifiers [8]. However, there are few multilabel approach that allows to build understandable models the user can interpret as useful knowledge. Thats why this research proposes a multi-label classification algorithm designed for finding rules, which are more useful in certain domains. The proposed technique is based in GEP [9], a genetic programming paradigm successfully applied in classification problems. The global results point out that our approach is able to obtain results that are better or comparable to other classical multi-label approaches. The paper is organized as follows: first we describe our proposal, the experiments carried out and the metrics used to measure the performance of the multi-label algorithms studied. Finally, the results are shown with conclusions about our study and future research lines.
2
GC-ML Algorithm
This section describes the most relevant features of the proposed algorithm, called GC-ML (GEPCLASS Multi-label ). It is a Genetic Programming algorithm that maintains a population of individuals each of them will potentially be part of a multilabel classifier. The population will evolve to better classifier by the application of genetic operators like selection, mutation and crossover. Firstly, a description of the individual representation and evaluation is showed and after that, the overall operation of the algorithm will be described. 2.1
Individual Representation
The proposed algorithm tries to learn a set of rules for each label in the problem in order to build a multi-label classifier. Each rule will be in the form if A then L where A is the clause that must be satisfied by the pattern to be associated with a label L, and it is composed by a set of conditions joined by logical operators (and, or or not ). Each condition is a combination of an attribute and a constant value joint by a relational operator (=, =, > or
TAIL
4 >
5 6 b|4 c|3
7 a|1
TAIL
4 5 a|4 b|3
HEAD
3 > HEAD
1 2 OR =
Phenotype (Expression tree) AND
Classification rule
AND OR
>
NOT =
a
IF(((a=4) OR (b>3)) AND ((NOT(B>4)) AND(C>3))) THEN Label
>
4
b
C
>
3
b
3
4
Fig. 1. Example of GC-ML individual
The final classifier will be composed by a set of rules for each class. An so, whenever any of these rules determines that the pattern belongs to the class, the classifier will consider the patter belonging to the class regardless of the results yielded by the rest.
12
2.2
´ J.L. Avila-Jim´ enez, E. Gibaja, and S. Ventura
Individual Evaluation
GC-ML has a population of individuals with the above mentioned features. Due to the multi-label nature of the problem, during the evaluation stage each of these individuals is evaluated for each label in the training set by means of a fitness function. Thus, instead an unique fitness value, a fitness vector is stored for each individual in order to store the fitness value for each label. It is necessary to point out that during the application of the selection operators, only the greatest fitness value in the vector will be considered. The fitness function used is the harmonic mean between precision and recall, also known as the F-score. Precision for a label is the number of items correctly labeled as belonging to the class divided by the total number of patterns belonging to the label, and recall is defined as the number of items correctly labeled divided by the total number of patterns that the classifier has considered belonging to the label. The expressions of precision, recall and F-Score are the following: precision = recall = F − score = 2.3
tp tp + f n
tp tp + f p
2 × precision × recall precision + recall
(1) (2) (3)
Evolutionary Algorithm
The algorithm developed is similar to that proposed with GEP[9], but it has to calculate n fitness values for each individual, and it also must do the multi-label token competition to correct the fitness of individuals after evaluation. The algorithm starts generating the initial population,and, while the algorithm does not peak the maximum number of generations, the following actions are performed for each of them: 1. For each label present in the problem, all individuals are evaluated, and the fitness array is reached. 2. After the evaluation of the individuals, the algorithm applies the Token Competition [11] technique to correct the fitness values calculated. This approach has been widely used in other genetic algorithms applied to classification problems [12]. GC-ML carries out a token competition for each label where a token is played for each positive pattern associated with the label. This token is won by the individual with the highest fitness that correctly classifies it. After token distribution, algorithm proceeds to correct the label fitness of each individual using the following expression: new f itness =
original f itness × tokens won total tokens
(4)
Evolving Multi-label Classification Rules
13
3. After the token competition the best individuals are selected to constitute the next generation. To make this selection the best fitness of each individual is used. Moreover the genetic operators are applied over the population. When the algorithm finishes, it is easy to find the individuals that must be in the learned classifier. Only those individuals that have won some tokens are relevant for the classifier and the rest can be rejected.
3
Experimental Section
The implementation of the algorithm was carried out using the JCLEC library [13]. JCLEC is a framework to develop evolutionary computation applications implemented in Java. JCLEC provides a set of generic classes that abstract the main features presented in evolutionary algorithms. The objective of experiments is to determine the performance of the proposed algorithm and then to compare it with other multi-label proposals, both transformation methods and pure multi-label methods. Our algorithm has been compared to three other methods namely, Binary Relevance (BR), Label Powerset (LP) and the ML-KNN method. Both BR and LP are problem transformation methods[14] while ML-KNN is an implementation of the k-nearest neighbor method specifically designed for multi-label data. The configuration parameters of the algorithm have been obtained testing it previously to the main experiments. A population of 5000 individuals have been used, each of them with 6 genes with a head length of 35. The maximum number of generations is 60 and the crossover probability is 0.8. In addition, the mutation and transposition probability is 0.2. The selection method is tournament of 2 individuals size. The four data sets used to perform the experiments have been scene [15], yeast [16], genbase [17] and medical [18]. These data sets belong to a wide variety of application domains and they have been used in other multi-label studies covering several paradigms. Table 1 shows the main characteristics of these data sets, including label cardinality that is the average number of labels per example, and label density, which is the same number divided by the number of labels in the problem, |L|. Table 1. Characteristics of datasets Dataset #Patterns #Labels Cardinality Density Scene 2407 6 1.061 0.176 Genbase 662 27 1.252 0.046 Yeast 2417 14 4.228 0.302 Medical 978 45 1.245 0.028
´ J.L. Avila-Jim´ enez, E. Gibaja, and S. Ventura
14
4
Results and Discussion
The measures of accuracy, precision and recall have been used in order to compare GC-ML with BR, LP and ML-KNN methods. Accuracy is the percent of correct answers of the classifier divided by the total answers (Equation 5) and precision and recall are the same measures used to calculate fitness function, previously showed in Equations 1 and 2. A 10 fold cross validation has been made for each data set and algorithm. Table 2 shows the average metrics. Accuracy =
tp + tn tp + tn + f p + f n
IF (PARAM_48 > 0.5) OR (((PARAM_19 = 0.4) AND (PARAM_71 != 0.1)) OR (PARAM_13 >0.0)) OR ((NOT (PARAM_60 0.8) OR ((PARAM_36 = 0.1) AND (PARAM_11 = 0.4) ) OR (PARAM_46 != -6.0)) OR ((NOT (PARAM_54 >0.3) OR (PARAM_11 = 0.2)) AND PARAM_14 = 1.6)) OR ((PARAM_42 >3.1) AND ((PARAM_16 = 0.3) OR (PARAM_11 = 0.4) ) OR (PARAM_12 < -1.0))
THEN LABEL_1
Fig. 2. Example of a discovered rule
Table 2. Experimental results Algorithm Bin. Rel. Label Pow. ML-Knn GC-ML Dataset Accuracy values Scene 0.434 0.577 0.629 0.571 Genbase 0.273 0.684 0.638 0.778 Yeast 0.421 0.398 0.492 0.432 Medical 0.592 0.617 0.560 0.649 Precision values Scene 0.443 0.602 0.660 0.554 Genbase 0.276 0.694 0.674 0.753 Yeast 0.527 0.528 0.732 0.672 Medical 0.651 0.678 0.573 0.702 Recall values Scene 0.815 0.591 0.678 0.690 Genbase 0.273 0.654 0.638 0.683 Yeast 0.619 0.528 0.549 0.572 Medical 0.619 0.650 0.568 0.699
(5)
Evolving Multi-label Classification Rules
15
The proposed algorithm obtains better results, in general, than the other algorithms for each data set and measure. These results are comparable with those obtained by the other algorithms. Nevertheless, our algorithm obtains better results with categorical datasets. For example, if we see the results of scene and yeast, which are numeric data sets, GC-ML has the best results in accuracy, but it is similar to those obtained by the rest. However, with respect to the medical and genbase results, the proposed algorithm obtains better a accuracy. This behavior can be observed with the other metrics and it is reasonable because classification rules are more suitable to deal with nominal attributes than with numerical ones. Despite the other results, it is worth noting that, regardless of the kind of data set, our algorithm obtains always better values than the transformation methods (LP and BR) for every measure and dataset.
5
Conclusions
The present work shows the GC-ML algorithm, an algorithm for multi-label classification. This is an evolutionary proposal, based on GEP, where individuals encode classification rules to determine whether a pattern belongs or not to a particular label in a multi-label context. The final classifier is built as the combination of multiple rules present in the population. The algorithm implements a token competition technique specifically designed to deal with multilabel patterns whose aim is to ensure that there are individuals in the population associated with all the classes present in the problem. Studies developed to verify the performance of the GC-ML algorithm with respect to other alternatives indicate that it gets better results than problem transformation proposals like BR and LP. Besides, it obtains similar results, or better in most of cases, than the multi-label implementation of the KNN algorithm. The experiments also show that the proposed algorithm is well indicated to be used with nominal data sets. Regarding to future research, the algorithm is being tested with other datasets and also compared with other approaches for multi-label classification.
Acknowledges This work has been financed in part by the TIN2008-06681-C06-03 project of the Spanish Inter-Ministerial Commission of Science and Technology (CICYT), the P08-TIC-3720 project of the Andalusian Science and Technology Department and FEDER funds.
References 1. Tsoumakas, G., Katakis, I., Vlahavas, I.: A review of multi-label classification methods. In: Proceedings of the 2nd ADBIS Workshop on Data Mining and Knowledge Discovery (ADMKD 2006), pp. 99–109 (2006)
16
´ J.L. Avila-Jim´ enez, E. Gibaja, and S. Ventura
2. Wan, S.P., Xu, J.H.: A multi-label classification algorithm based on triple class support vector machine. In: Proc. 2007 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR 2007), Beijing, China (November 2007) 3. Zhang, M.-L., Zhou, Z.-H.: A k-nearest neighbor based algorithm for multi-label classification, vol. 2, pp. 718–721. The IEEE Computational Intelligence Society (2005) 4. Blockeel, H., Schietgat, L., Struyf, J., Dzeroski, S., Clare, A.: Decision trees for hierarchical multilabel classification: A case study in functional genomics. In: F¨ urnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI, LNB), vol. 4213, pp. 18–29. Springer, Heidelberg (2006) 5. Clare, A., King, R.D.: Knowledge discovery in multi-label phenotype data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, p. 42. Springer, Heidelberg (2001) 6. Chan, A., Freitas, A.A.: A new ant colony algorithm for multi-label classification with applications in bioinfomatics. In: GECCO 2006: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, pp. 27–34. ACM Press, New York (2006) 7. A new approach for multi-label classification based on default hierarchies and organizational learning (2008) ´ 8. Avila, J.L., Galindo, E.L.G., Zafra, A., Ventura, S.: A niching algorithm to learn discriminant functions with multi-label patterns. In: Corchado, E., Yin, H. (eds.) IDEAL 2009. LNCS, vol. 5788, pp. 570–577. Springer, Heidelberg (2009) 9. Ferreira, C.: Gene expression programming: a new adaptative algorithm for solving problems. Complex Systems 13(2), 87–129 (2001) 10. Weinert, W.R., Lopes, H.S.: GEPCLASS: A classification rule discovery tool using gene expression programming. In: Li, X., Za¨ıane, O.R., Li, Z.-h. (eds.) ADMA 2006. LNCS (LNAI), vol. 4093, pp. 871–880. Springer, Heidelberg (2006) 11. Wong, M.L., Leung, K.S.: Data Mining Using Grammar-Based Genetic Programming and Applications. Kluwer Academic Publishers, Norwell (2000) 12. Lu, W., Traore, I.: Detecting new forms of network intrusion using genetic programming. In: The 2003 Congress on Evolutionary Computation, CEC 2003, vol. 3, pp. 2165–2172 (2003) 13. Ventura, S., Romero, C., Zafra, A., Delgado, J.A., Herv´ as, C.: JCLEC: A Java framework for evolutionary computation. Soft Computing 12(4), 381–392 (2008) 14. Tsoumakas, G., Vlahavas, I.: Random k-labelsets: An ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladeniˇc, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI, LNB), vol. 4701, pp. 406–417. Springer, Heidelberg (2007) 15. Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recognition 37(9), 1757–1771 (2004) 16. Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems, vol. 14 (2001) 17. Diplaris, S., Tsoumakas, G., Mitkas, P., Vlahavas, I.: Protein classification with multiple algorithms. In: Advances in Informatics, pp. 448–456 (2005) 18. Diesner, J., Frantz, T.L., Carley, K.M.: Communication networks from the enron email corpus it’s always about the people. enron is no different. Comput. Math. Organ. Theory 11(3), 201–228 (2005)
Solving Classification Problems Using Genetic Programming Algorithms on GPUs Alberto Cano, Amelia Zafra, and Sebasti´ an Ventura Department of Computing and Numerical Analysis, University of C´ ordoba 14071 C´ ordoba, Spain {i52caroa,azafra,sventura}@uco.es
Abstract. Genetic Programming is very efficient in problem solving compared to other proposals but its performance is very slow when the size of the data increases. This paper proposes a model for multi-threaded Genetic Programming classification evaluation using a NVIDIA CUDA GPUs programming model to parallelize the evaluation phase and reduce computational time. Three different well-known Genetic Programming classification algorithms are evaluated using the parallel evaluation model proposed. Experimental results using UCI Machine Learning data sets compare the performance of the three classification algorithms in single and multithreaded Java, C and CUDA GPU code. Results show that our proposal is much more efficient.
1
Introduction
Evolutionary Algorithms (EA) are a good method, inspired by natural evolution, to find a reasonable solution for data mining and knowledge discovery [1], but they can be slow at converging and have complex and great dimensional problems. Their parallelization has been an object of intensive study. Concretely, we focus on the Genetic Programming (GP) paradigm. GP has been parallelized in many ways to take advantage both of different types of parallel hardware and of different features in particular problem domains. Most parallel algorithms during the last two decades deal with implementation in clusters or Massively Parallel Processing architectures. More recently, the studies on parallelization focus on using graphic processing units (GPUs) which provide fast parallel hardware for a fraction of the cost of a traditional parallel system. The purpose of this paper is to improve the efficiency of GP classification models in solving classification problems. Once it has been demonstrated that the evaluation phase is the one that requires the most computational time, the proposal is to parallelize this phase generically, to be used by different algorithms. An evaluator is designed using GPUs to speed up the performance, receiving a classifier and returning the confusion matrix of that classifier to a database. Three of the most popular GP algorithms are tested using the parallel evaluator proposed. Results greatly speed algorithm performance up to 1000 times with respect to a non-parallel version executed sequentially. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 17–26, 2010. c Springer-Verlag Berlin Heidelberg 2010
18
A. Cano, A. Zafra, and S. Ventura
The remainder of this paper is organized as follows. Section 2 provides an overview of GPU architecture and related experiences with Evolutionary Computation in GPUs. Section 3 analyzes GP classification algorithms to find out which parts of the algorithms are more susceptible to performance improvement, and discusses their parallelization and implementation. Section 4 describes the experimental setup and presents computational results for the models benchmarked. Conclusions of our investigation and future research tasks are summarized in Section 5.
2
Overview
In this section, first, the CUDA programming model on GPU will be specified. Then, the more important previous studies of GPU on GP will be described. 2.1
CUDA Programming Model on GPU
The CUDA programming model [13] executes kernels as batches of parallel threads in a Single Instruction Multiple Data (SIMD) programming style. These kernels comprise thousands to millions of lightweight GPU threads for each kernel invocation. CUDA threads are organized into a two-level hierarchy represented in Figure 1. At the higher one, all the threads in a data-parallel execution phase form a grid. Each call to a kernel initiates a grid composed of many thread groupings, called thread blocks. All the blocks in a grid have the same number of threads, with a maximum of 512. The maximum number of thread blocks is 65535 x 65535, so each device can run up to 65535 x 65535 x 512 = 2 · 1012 threads. To properly identify threads, each thread in a thread block has a unique ID in the form of a three-dimensional coordinate, and each block in a grid also has a unique two-dimensional coordinate. Thread blocks are executed in streaming multiprocessors (SM). A SM can perform zero-overhead scheduling to interleave warps and hide the latency of long-latency arithmetic and memory operations. There are four different main memory spaces: global, constant, shared and local. GPU’s memories are specialized and have different access times, lifetimes and output limitations. 2.2
Related GP Works with GPUS
Several studies have been done on GPUs and Massively Parallel Processing architectures within the framework of evolutionary computation. Concretely, we can cite some studies on Genetic Programming on GPUs [7]. D. Chitty [4] describes the technique of general purpose computing using graphics cards and how to extend this technique to Genetic Programming. The
Solving Classification Problems Using GP Algorithms on GPUs
19
Fig. 1. CUDA threads and blocks
improvement in the performance of Genetic Programming on single processor architectures is also demonstrated. S. Harding [8] goes on to report on how exactly the evaluation of individuals on GP could be accelerated. Previous research on GPUs have shown evaluation phase speedups for large training case sets like the one in our proposal. Furthermore, D. Robilliard [14] proposes a parallelization scheme to exploit the power of the GPU on small training sets. To optimize with a modest-sized training set, instead of sequentially evaluating the GP solutions parallelizing the training cases, the parallel capacity of the GPU are shared by the GP programs and data. Thus, different GP programs are evaluated in parallel, and a cluster of elementary processors are assigned to each of them to treat the training cases in parallel. A similar technique but using an implementation based on the Single Program Multiple Data (SPMD) model is proposed by W. Landgdon and A. Harrison [12]. The use of SPMD instead of Single Instruction Multiple Data (SIMD) affords the opportunity to achieve increased speedups since, for example, one cluster can interpret the if branch of a test while another cluster treats the else branch independently. On the other hand, performing the same computation inside a cluster is also possible, but the two branches are processed sequentially in order to respect the SIMD constraint: this is called divergence and is, of course, less efficient. These reports have helped us to design and optimize our proposal to achieve maximum performance in data sets with different dimensions and population sizes.
20
3
A. Cano, A. Zafra, and S. Ventura
Genetic Programming Algorithms on GPU
In this section the computation time of the different phases of the GP generational algorithm is evaluated to determine the most expensive part of algorithm. Then, it is specified how parallelize using GPU the evaluation phase of a GP algorithm. 3.1
GP Classification Algorithm
Genetic Programming, introduced by Koza [10], is a learning methodology belonging to the family of EA [17]. Among successful EA implementations, GP retains a significant position due to such valuable characteristics as: its flexible variable length solution representation, the fact that a priori knowledge is not needed about the statistical distribution of the data (data distribution free), data in their original form can be used to operate directly on them, unknown relationships that exist among data can be detected and expressed as a mathematical expression and finally, the most important discriminative features of a class can be discovered. These characteristics convert these algorithms into a paradigm of growing interest both for obtaining classification rules [2],[18], and for other tasks related to prediction, such as feature selection and the generation of discriminant functions. The evolutionary process of Genetic Programming algorithms [6], similar to other EA paradigms, is represented in Figure 2 and consists of the following steps. The initial population is created randomly and evaluated. For each generation, the algorithm performs a selection of the best parents. This subset of individuals is recombined and mutated by applying different genetic operators,
Table 1. CPU GP classification time test Phase Time (ms) Percentage Initialization 8647 8.96% Creation 382 0.39% 8265 8.57% Evaluation Generation 87793 91.04% 11 0.01% Selection 13 0.01% Crossover Mutation 26 0.03% 82282 85.32% Evaluation 26 0.03% Replacement Control 5435 5.64% Total 96440 100 % Fig. 2. GP Evolution Model
Solving Classification Problems Using GP Algorithms on GPUs
21
thus obtaining offspring. These new individuals must now be evaluated using the fitness function. Different replacement strategies can be employed on parents and offspring so that the next generation population size remains constant. The algorithm performs a control stage in which it terminates if it finds acceptable solutions or the generation count reaches a limit. The consecutive processes of selection of parents, crossover, mutation, evaluation, replacement and control constitute a generation of the algorithm. Experiments have been conducted to evaluate the computation time of the different phases of the generational algorithm. The experiment using GP algorithms represented in table 1 proves that around 94% of the time is taken up by the evaluation phase. This percentage is mainly linked to the population size and the number of patterns, increasing up to 98% in large problems. Thus, the most significant improvement is obtained by accelerating the evaluation phase, and this is what we do in our proposal. The most computationally expensive phase is evaluation since it involves the test of all individuals generated throughout all the patterns. Each individual’s expression must be interpreted or translated into an executable format which is then evaluated for each training pattern. The result of the individual’s evaluation throughout all the patterns is used to build the confusion matrix. The confusion matrix allows us to apply different quality indexes to get the individual’s fitness. The evaluation process of individuals within the population consists of two loops, where each individual iterates each pattern and checks if the rule covers that pattern. These two loops make the algorithm really slow when the population or pattern size increases. 3.2
Implementation on GPU
To parallelize the evaluation phase, the designed implementation is meant to be as generic as possible to be employed by different classification algorithms. The implementation receives a classifier and returns the confusion matrix that is used by most algorithms for the calculation of the fitness function. To take advantage of GPU architecture [15], all individuals are evaluated throughout all the patterns simultaneously. An easy way to do that, is to create a grid of thread blocks sized as follows: one dimension is sized as the number of individuals and the other dimension is sized as the number of patterns in the data set. This organization means that one thread is the evaluation of one individual in one pattern. To achieve full performance, we have to maximize multiprocessor occupancy, so each block represents the evaluation of one individual in 128 patterns. This way, each thread within the block computes one single evaluation, then the size of the second dimension of the grid is the number of patterns divided by 128. This configuration allows up to 65536 individuals and 8.388.608 patterns per GPU and kernel call, large enough for all the data sets tested. Larger populations can be evaluated using several GPUs, up to 4 devices per host or iterating kernel calls. Before running the evaluator on the GPU, the individuals’s rules must be copied to the memory device using the PCI-E bus. Full performance can be
22
A. Cano, A. Zafra, and S. Ventura
obtained by copying the rules into constant memory and pointing all the threads in a warp toward the same rule, resulting in a single memory instruction access. Constant memory provides a 64 KB cache written by the host and read by GPU threads. The evaluation takes place in two steps. In the first kernel, each thread checks if the antecedent of the rule covers the pattern and the consequent matches the pattern class and stores a value, generally a hit or a miss. This implementation allows reusability for different classification models by changing the value stored depending on whether the antecedent does or does not cover the pattern, and the resulting matches or non-matches of the pattern class. The kernel function must analyze the expression working with Polish notation, also known as prefix notation. Its distinguishing feature is that it places operators to the left of their operands. If the arity of the operators is fixed, the result is a syntax lacking parentheses or other brackets. While there remain tokens to be analyzed in the individual’s expression, it checks what it has to do next by using a stack in order to store numerical values. Finally, we check the top value of the stack. If this value is true, that means the antecedent was true, so depending on the algorithm used, we compare this value to the known class given for the pattern. The second kernel performs a reduction [5] and counts the results by subsets of 128, to get the total number of hits and misses. Using the total number of hits and misses, the proposed kernel performs the calculation of the confusion matrix on GPU, which could be used by any implementation of GP classification algorithms or even any genetic algorithm. This way, the kernel calculates the fitness of individuals in parallel using the confusion matrix and the quality metrics required by the classification model. Finally, results are copied back to the host memory and set to individuals for the next generation.
4
Experimental Results
Experiments carried out compare the performance of three different GP algorithms in single and multithreaded Java, C and CUDA GPU code. This section explains several experimentation details related with the data sets and the algorithms and then the speedup of the different implementations is compared. 4.1
Experimental Setup
This paper presents an implementation of a GPU GP evaluator for data classification using JCLEC [16]. JCLEC is a software system for Evolutionary Computation (EC) research, developed in Java programming language. It provides a high-level software environment for any kind of Evolutionary Algorithm, with support for Genetic Algorithms, Genetic Programming and Evolutionary Programming. We have selected two databases from the UCI repository for benchmarks, shuttle and poker hand inverse. The shuttle data set contains 9 attributes, 58000 instances and 7 classes. The poker hand inverse data set contains 11 attributes, 106 instances and 10 classes.
Solving Classification Problems Using GP Algorithms on GPUs
23
Experiments were run on a PC equipped with an Intel Core i7 processor running at 2.66GHz with two NVIDIA GeForce 285 GTX video cards equipped with 2GB of GDDR3 video RAM. No overclocking was done for any of the hardware. Three different Grammar-Guided Genetic-Programming classification algorithms that were proposed in the literature are tested using our evaluator proposal. The evaluation concerning each algorithm is detailed for parallelization purposes. Falco, Della and Tarantino [9] propose a method to achieve rule fitness by evaluating the antecedent throughout all the patterns within the data set. The adjustment function calculates the difference between the number of examples where the rule does or does not correctly predict the membership of the class and number of examples; if the opposite occurs, then the prediction is wrong. Specifically it measures the number of true positives tp , false positives fp , true negatives tn and false negatives fn . Finally fitness is expressed as: f itness = I − ((tp + tn ) − (fp + fn )) + α ∗ N
(1)
where I is the total number of examples from all training, α is a value between 0 and 1 and N is the number of nodes. Tan, Tay, Lee and Heng [11] propose a fitness function that combines two indicators that are commonplace in the domain, namely sensitivity (Se) and specifity (Sp), defined as follows: Se =
tp tp + f n
Sp =
tn f p + tn
(2)
Thus, fitness is defined by the product of these two parameters. f itness = Se ∗ Sp
(3)
The proposal by Bojarczuk, Lopes and Freitas [3] presents a method in which each rule is simultaneously evaluated for all the classes in a pattern. The classifier is formed by taking the best individual for each class generated during the evolutionary process. GP does not produce simple solutions. The comprehensibility of a rule is inversely proportional to its size. Therefore Bojarczuk defines the simplicity (Sy) and then the fitness of a rule: Sy = 4.2
maxnodes − 0.5 ∗ numnodes − 0.5 maxnodes − 1
f itness = Se ∗ Sp ∗ Sy
(4)
Comparing the Performance of GPU and Other Proposals
The results of the three GP classification algorithms are benchmarked using two UCI data sets that are shown in Tables 2 and 3. The rows represent the
24
A. Cano, A. Zafra, and S. Ventura
Speed up
Table 2. Shuttle generation speedup results Tan Model 200 400
Pop
100
Java
1
1
C1
2,8
C2
5,5
Falco Model 200 400 800
Bojarczuk Model 100 200 400 800
800
100
1
1
1
3,1
3,2
2,9
5,4
6,1
6,3
5,7
10,6
15,9 10,5 10,1
35,9 24,6 22,1 18,1
C4
10,1 11,5 12,5 10,7
19,7
30,3 20,5 19,8
65,2 47,3 40,1 33,7
C8
11,1 12,4 13,4 10,3
19,9
30,1 21,2 20,6
65,7 46,8 40,5 34,6
GPU
218
267
293
253
487
660
460
453
614
408
312
269
GPUs
436
534
587
506
785 1187
899
867
1060
795
621
533
1
1
1
8,1
5,2
5,0
1
1
1
1
18,8 12,5 11,2
9,5
Table 3. Poker-I generation speedup results
Pop Speed up
Java C1 C2
100
Tan Model 200 400
800
100
Falco Model 200 400
800
Bojarczuk Model 100 200 400 800
1
1
1
1
1
1
1
1
1
1
1
1
2,7
3,2
3,1
3,0
4,6
5,0
5,6
4,9
5,5
5,7
5,8
4,7
5,5
6,5
6,7
5,5
9,0
9,8
11,1
C4
10,3 11,1 12,8 10,5
16,8
18,9
21,6
18,9
9,7 10,6 11,0 11,6 11,3 20,3 20,7 22,5 22,3
C8
11,2 12,9 14,0 10,2
18,5
20,5
23,3
26,4
21,8 22,0 24,2 23,8
GPU
155
174
234
221
688
623
648
611
142
147
148
142
GPUs
288
336
500
439 1275 1200 1287 1197
267
297
288
283
speedup compared to Java execution time. Each column is labeled with the algorithm execution configuration from left to right: Population size, Java single CPU thread, C single CPU thread, C two CPU threads, C four CPU threads, C eight CPU threads, 1 GPU device, 2 GPU devices. Benchmark results prove the ability of GPUs to solve GP evaluation. Intel i7 quadcore performs linear scalability from 1 to 2 and 4 threads, but not any further. After that point, GPUs perform much better. Their parallelized model allows the time spent on a classification problem to drop from one month to only one hour. Real classification training usually needs dozens of evaluations to get an accurate result, so the absolute time saved is actually a great deal of time. The greatest speedup is obtained with the Falco model which increases performance by a factor up to 1200 over the Java solution and up to 150 over the C single threaded CPU implementation. The speed up obtained is actually similar when considering different population sizes, for example a population of 200 individuals is selected to represent the figures. Fig. 3 and Fig. 4 display the speed up obtained by the different proposals with respect to the sequential Java version in the Shuttle and Poker data sets respectively. Note the progessive
Solving Classification Problems Using GP Algorithms on GPUs ϮϬϰϴ
ϮϬϰϴ
&ĂůĐŽ
ϭϬϮϰ
ϱϭϮ
ŽũĂƌĐnjƵŬ
Ϯϱϲ
dĂŶ
Ϯϱϲ ^ƉĞĞĚƵƉ
ϭϮϴ ϲϰ ^ƉĞĞĚƵƉ
&ĂůĐŽ
ϭϬϮϰ
dĂŶ
ϱϭϮ
25
ϯϮ
ϲϰ ϯϮ
ϭϲ
ϭϲ
ϴ
ϴ
ϰ
ϰ
Ϯ
ŽũĂƌĐnjƵŬ
ϭϮϴ
Ϯ
ϭ
ϭ :ĂǀĂ
WhͲ ϭ
WhͲ Ϯ
WhͲ ϰ
WhͲ ϴ
'WhͲ ϭ
'WhͲ Ϯ
Fig. 3. Shuttle data set speed up
:ĂǀĂ
WhͲ ϭ
WhͲ Ϯ
WhͲ ϰ
WhͲ ϴ
'WhͲ ϭ
'WhͲ Ϯ
Fig. 4. Poker-I data set speed up
improvement obtained by threading C implementation and the significant increase that occurs when using GPUs.
5
Conclusions
Massive parallelization using the NVIDIA CUDA framework provides a hundred or thousand time speedup over Java and C implementation. GPUs are best for massive multithreaded tasks where each thread does its job but all of them collaborate in the execution of the program. The CPU solution is lineal and complex. However, the GPU groups the threads into a block; then a grid of blocks is executed in SMs multiprocessors, one per SM. Thus, linearity is approximated by the number of 30-block grids. This implementation allows future scalability for GPUs with more processors. Next the NVIDIA GPU code-named Fermi doubles the number of cores available to 512 and performs up to 1.5 TFLOPs in single precision. It is noteworthy that i7 CPU scores are 2.5 times faster than 3.0 GHz PIV in our benchmarks. Further work will implement the whole algorithm inside the GPU, so that selection, crossover, mutation, replacement and control phases will be parallelized, reducing data memory transfers between CPU and GPU devices.
Acknowledgments The authors gratefully acknowledge the financial support provided by the Spanish department of Research under TIN2008-06681-C06-03,P08-TIC-3720 Projects and FEDER funds.
References 1. Freitas, A.A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer, Heidelberg (2002) 2. Tsakonas, A.: A comparison of classification accuracy of four Genetic Programming-evolved intelligent structures. Information Sciences 176(6), 691–724 (2006)
26
A. Cano, A. Zafra, and S. Ventura
3. Bojarczuk, C.C., Lopes, H.S., Freitas, A.A., Michalkiewicz, E.L.: A constrainedsyntax Genetic Programming system for discovering classification rules: application to medical data sets. Artificial Intelligence in Medicine 30(1), 27–48 (2004) 4. Chitty, D.: A data parallel approach to Genetic Programming using programmable graphics hardware. In: GECCO 2007: Proceedings of the Conference on Genetic and Evolutionary Computing, pp. 1566–1573 (2007) 5. Kirk, D., Hwu, W.-m.W., Stratton, J.: Reductions and Their Implementation. University of Illinois, Urbana-Champaign (2009) 6. Deb, K.: A population-based algorithm-generator for real-parameter optimization. Soft Computing 9(4), 236–253 (2005) 7. Genetic Programming on General Purpose Graphics Processing Units, GP GP GPU, http://www.gpgpgpu.com 8. Harding, S., Banzhaf, W.: Fast Genetic Programming and artificial developmental systems on GPUS. In: HPCS 2007: Proceedings of the Conference on High Performance Computing and Simulation (2007) 9. De Falco, I., Della Cioppa, A., Tarantino, E.: Discovering interesting classification rules with Genetic Programming. Applied Soft Computing Journal 1(4), 257–269 (2002) 10. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992) 11. Tan, K.C., Tay, A., Lee, T.H., Heng, C.M.: Mining multiple comprehensible classification rules using Genetic Programming. In: CEC 2002: Proceedings of the Evolutionary Computation on 2002, pp. 1302–1307 (2002) 12. Langdon, W., Harrison, A.: GP on SPMD parallel graphics hardware for mega bioinformatics data mining. Soft Computing. A Fusion of Foundations, Methodologies and Applications 12(12), 1169–1183 (2008) 13. NVIDIA Programming and Best Practices Guide 2.3, NVIDIA CUDA Zone, http://www.nvidia.com/object/cuda_home.html 14. Robilliard, D., Marion-Poty, V., Fonlupt, C.: Genetic programming on graphics processing units. Genetic Programming and Evolvable Machines 10(4), 447–471 (2009) 15. Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.-m.W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp. 73–82 (2008) 16. Ventura, S., Romero, C., Zafra, A., Delgado, J.A., Herv´ as, C.: JCLEC: A Java framework for evolutionary computation. Soft Computing 12(4), 381–392 (2007) 17. Back, T., Fogel, D., Michalewicz, Z.: Handbook of Evolutionary Computation. Oxford University Press, Oxford (1997) 18. Lensberg, T., Eilifsen, A., McKee, T.E.: Bankruptcy theory development and classification via Genetic Programming. European Journal of Operational Research 169(2), 677–697 (2006)
Analysis of the Effectiveness of G3PARM Algorithm J.M. Luna, J.R. Romero, and S. Ventura Dept. of Computer Science and Numerical Analysis, University of C´ ordoba, Rabanales Campus, Albert Einstein building, 14071 C´ ordoba, Spain {i32luarj,jrromero,sventura}@uco.es
Abstract. This paper presents an evolutionary algorithm using G3P (Grammar Guided Genetic Programming) for mining association rules in different real-world databases. This algorithm, called G3PARM, uses an auxiliary population made up of its best individuals that will then act as parents for the next generation. The individuals are defined through a context-free grammar and it allows us to obtain datatype-generic and valid individuals. We compare our approach to Apriori and FP-Growth algorithms and demonstrate that our proposal obtains rules with better support, confidence and coverage of the dataset instances. Finally, a preliminary study is also introduced to compare the scalability of our algorithm. Our experimental studies illustrate that this approach is highly promising for discovering association rules in databases. Keywords: Genetic Programming, Association Rules, G3P.
1
Introduction
Association mining is used to obtain useful (or interesting) rules from which new knowledge can be mined. This kind of systems try to facilitate pattern discovery and to produce rules or inferences for subsequent interpretation by the end user. A leading algorithm [1] based on the discovery of frequent patterns has served as the starting point for many related research studies [2,6]. This algorithm, known as Apriori, was first used successfully in extracting association rules. In short, it permits the extraction of frequent itemsets, and uses this knowledge to obtain association rules. To reduce its computational cost, the Apriori algorithm establishes that if any length k pattern is not frequent in the database, its length (k + 1) super-pattern can never be frequent. Han et al. [5] proposed a novel frequent-pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-Growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques. Firstly, a large database is compressed into a condensed, smaller data structure, FPtree; then, the FP-tree-based mining adopts a pattern-fragment growth method E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 27–34, 2010. c Springer-Verlag Berlin Heidelberg 2010
28
J.M. Luna, J.R. Romero, and S. Ventura
to avoid the costly generation of a large number of candidate sets; and last, a divide-and-conquer method is used to decompose the mining task into a set of smaller tasks, which dramatically reduces the search space. This algorithm has served for other related research studies [3,8]. Many studies have already proposed Evolutionary Algorithms (EAs) [4] for rule extraction from databases, considering this kind of algorithm, and especially Genetic Algorithms (GAs), as one of the most successful search techniques applied in complex problems, since they have proved to be an important technique for learning and mining knowledge [7]. GAs are robust and flexible search methods. The same GA can be executed using different representations and also allows feasible solutions to be obtained within specified time limits. It is why the data mining experts have shown an increasing interest in both EAs and GAs. Many of the algorithms of association rules are brute-force algorithms, among which are Apriori and FP-Growth, where the computational time is too high. In this paper, we present the G3PARM algorithm wich uses G3P (Grammar Guided Genetic Programming) for mining association rules [9]. G3P is a genetic programming extension that allows correct programs to be obtained (in this case, rules) by defining individuals through context-free grammars. Therefore, each individual generated by G3P is a derivation tree that generates and represents a solution using the language defined by the grammar. This allows to use G3PARM with different types of data by simply changing the grammar. The G3P algorithm proposed here makes use of an auxiliary population of individuals that exceed a certain quality threshold. We present the results of empirical comparison between G3PARM, Apriori and FP-Growth 1 algorithms and demonstrate that our proposal obtains rules with high support, high confidence and high coverage of the dataset instances. Moreover, we present preliminary results reflecting the relative scalability of these algorithms. This paper is structured as follows: Section 2 describes the model conceived and its main characteristics; Section 3 describes the datasets used in the experiments; Section 4 describes both execution and results; finally, some conclusion remarks are underscored.
2
G3PARM Algorithm
This section presents our model along with its major characteristics: how individuals are represented, the genetic operators used, the evaluation process and the algorithm used. 2.1
Individual Representation
Each individual is composed of two distinct components: (a) a genotype, encoded using G3P with a tree structure with limited depth to avoid infinite derivations, and (b) a phenotype, that represents the complete rule consisting of an antecedent 1
Coenen, F. (2003), The LUCS-KDD FP-Growth Association Rule Mining Algorithm, http://www.cxc.liv.ac.uk/~ frans/KDD/Software/FPgrowth/fpGrowth.html, Department of Computer Science, The University of Liverpool, UK.
Analysis of the Effectiveness of G3PARM Algorithm
29
G = (ΣN , ΣT , Rule, P ) with: ΣN = {Rule, Antecedent, Consequent, Comparison, Categorical Comparator, Categorical Attribute Comparison } ΣT = {AND, “! =”, “=”, “name”, “value”} P = {Rule = Antecedent, Consequent ; Antecedent = Comparison | AND, Comparison, Antecedent ; Consequent = Comparison ; Comparison = Categorical Comparator, Categorical Attribute Comparison ; Categorical Comparator = “! =” | “=” ; Categorical Attribute Comparison = “name”, “value” ;} Fig. 1. Context-free grammar expressed in Extended BNF
and a consequent. The antecedent of each rule is formed by a series of conditions that contain the values of certain attributes that must all be satisfied, and the consecuent is composed of a single condition. Figure 1 shows the context-free grammar through which the population individuals are codified, where the nonterminal grammar symbol “name” is determined by the dataset attributes used each time. Moreover, for each grammar attribute the value that is assigned is determined by the range of values of that attribute in the dataset. Each individual is generated from the initial grammar symbol Rule through the random application of production rules P until a valid derivation chain is reached. The number of derivations is determined by the maximum derivation size provided in algorithm configuration parameters. To carry out the derivation of the symbols that appear in the grammar, we use the cardinality concept, which is defined as the number of elements generated in a set. The cardinality of each nonterminal symbol will be based on the set generated in d derivations. If a nonterminal symbol can be derived in several ways, the cardinality of the nonterminal symbols will be determined by the sum of the cardinalities of each of the possible derivations of that symbol. If a derivation has more than one nonterminal symbol, the cardinality of the set comprised by the symbols will be determined by the product of the cardinalities of each nonterminal symbol present in the derivation. 2.2
Genetic Operators
We use two genetic operators to generate new individuals in a given generation of the evolutionary algorithm: Crossover: this operator creates new individuals by making an exchange of two parent derivation subtrees from two randomly selected compatible nodes in each of them. Two nodes are compatible if they belong to the same nonterminal symbol, thus avoiding the production of an individual who does not fit the defined grammar. Mutator: this other operator randomly selects a tree node and will act based on the symbol type. If, the selected node is a nonterminal symbol, a new derivation
30
J.M. Luna, J.R. Romero, and S. Ventura
is performed from that node. If, however, the selected node is a terminal symbol, it changes the value of the terminal symbol at random. 2.3
Evaluation
Firstly, we must carry out the individual decoding by finding the association rule that corresponds to the genotype. This process is to build an in depth expression of the syntax tree, remove nonterminal symbols that appear and verifying that individuals do not have equal attributes in the antecedent and consecuent. The evaluation process of individuals is performed by obtaining the fitness function value. It will be the support, which is defined as the ratio of records that contain A and C to the total number of records in the database. Here, A is called antecedent, and C consequent. D refers to dataset records. Another heuristic that we will use is the rule confidence. This is defined as the ratio of the number of records that contain A and C to records that contain A. Both measures are expressed on a per unit basis. 2.4
Algorithm
The algorithm, represented by the pseudocode in Algorithms 1 and 2, starts producing the population by randomly generating individuals from the context-free grammar defined in Figure 1 and fulfilling the maximum number of derivations. In its initial generation, the auxiliary population will be empty. Based on the population, individuals are selected via a binary tournament from the union of the current population and the auxiliary population. This selector works by selecting two individuals randomly from the current population and after comparing them, it keeps the best of both. Individuals are selected to act as parents for the crossover. The next step is to perform the mutation of the individuals selected. Once we have the new population by crossover and mutation, we update the auxiliary population by combining the previous auxiliary population and the current population. Then, the individuals are ranked according to the support and those with the same genotype are eliminated. The G3PARM algorithm considers two individuals to be equal if, despite having different genotypes, they are comprised by the same attributes. For example, rules A AND B → C and B AND A → C are equal. From the resulting set, it selects the individuals that exceed a certain threshold of support and confidence. The algorithm terminates once all the instances from the dataset are properly covered, or when it reaches a certain number of generations, returning auxiliary population individuals.
3
Experimentation
To evaluate the usefulness of G3PARM algorithm, several experiments have been carried out on differents datasets and using an Intel Core i7 with 12Gb of memory and running CentOS 5.4. The different datasets used are: Credit-g (1000 instances, 21 attributes), HH (22784 instances, 17 attributes), M ushroom (8124
Analysis of the Effectiveness of G3PARM Algorithm
31
Algorithm 1. G3PARM algorithm Require: max generations, N Ensure: A 1: P0 ← random(N ) 2: A0 ← ∅ 3: while num generations < max generations do 4: Select parents (Pt ∪ At ) 5: Crossover (P ) 6: Mutation (P ) 7: P ← P 8: Update auxiliary population (At ) 9: num generations + + 10: end while 11: return A
Algorithm 2. Update auxiliary population Require: A Ensure: A 1: A ← P + At 2: Order (A ) 3: Eliminate duplicate (A ) 4: At ← Threshold(A ) 5: return A
instances, 23 attributes), Segment (1500 instances, 20 attributes), Sonar (208 instances, 36 attributes), Soybean (683 instances, 36 attributes) and the Wisconsin Breast Cancer data source (683 instances, 11 attributes). In order to be compared with the Apriori and FP-Growth algorithms, numerical data were preprocessed using equal-width binning and equal-frequency binning 2 discretization techniques in five and ten intervals. However, G3PARM allows to use numerical datasets without any discretization, obtaining very promising results. G3PARM behaves differently depending on the configuration parameters used, so to obtain the best parameters, a serie of tests were performed to obtain the population size, crossover probability, mutation probability, etc. The configuration parameters obtained from these tests are 50 individuals, 70% crossover probability, 10% mutation probability, a maximum derivation number of 24, an external population of size 20, 90% external confidence threshold and 70% external support threshold, and a limit of generations of 1000. For the Apriori and FP-Growth algorithms, we use the same threshold for support and confidence than the used for the external population in the G3PARM algorithm. The results obtained by the proposed algorithm are the average results obtained running our algorithm with ten different seeds. These seeds are used for the generation of random individuals based on the seed used. 2
These methods involves dividing the values range in constant size and frequent intervals respectively.
32
J.M. Luna, J.R. Romero, and S. Ventura Table 1. Obtained results
Dataset CreditEqF re10 CreditEqF re5 CreditEqW id10 CreditEqW id5 HHEqF re10 HHEqF req5 HHEqW id10 HHEqW id5 M ushroom SegmentEqF re10 SegmentEqF re5 SegmentEqW id10 SegmentEqW id5 SonarEqF re10 SonarEqF re5 SonarEqW id10 SonarEqW id5 Soybean W BCEqF re10 W BCEqF re5 W BCEqW id10 W BCEqW id5 Ranking
4
Average support (1) (2) (3) 0.780 0.709 0.850 0.780 0.709 0.850 0.780 0.709 0.892 0.773 0.709 0.858 None None 0.803 None None 0.740 0.761 0.761 0.922 0.765 0.765 0.902 0.824 0.817 0.890 0.876 0.876 0.813 0.876 0.876 0.817 0.815 0.815 0.884 0.860 0.860 0.882 None None 0.782 None None 0.583 None None 0.958 0.747 None 0.835 0.778 0.722 0.822 None None 0.875 None None 0.806 0.821 0.821 0.900 0.872 0.872 0.864 2.204 2.522 1.272
Average confidence (1) (2) (3) 0.941 0.855 0.939 0.941 0.855 0.953 0.941 0.855 0.965 0.942 0.863 0.961 None None 0.913 None None 0.909 0.950 0.950 0.986 0.955 0.955 0.979 0.968 0.960 0.978 0.975 0.975 0.926 0.975 0.975 0.974 0.968 0.968 0.979 0.964 0.964 0.969 None None 0.909 None None 0.731 None None 0.887 0.942 None 0.947 0.950 0.953 0.957 None None 0.958 None None 0.928 0.996 0.996 0.971 0.996 0.996 0.956 2.159 2.431 1.409
%Instances (1) (2) (3) 0.987 0.987 1.000 0.987 0.987 1.000 0.987 0.987 1.000 0.989 0.989 1.000 None None 1.000 None None 0.997 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.996 0.996 1.000 0.996 0.996 1.000 1.000 1.000 1.000 1.000 1.000 1.000 None None 1.000 None None 0.626 None None 1.000 0.846 None 1.000 1.000 1.000 1.000 None None 1.000 None None 1.000 0.821 0.821 1.000 0.872 0.872 1.000 2.340 2.386 1.272
Results
The results obtained with (1) Apriori, (2) FP-Growth and (3) G3PARM algorithms for each dataset are shown in Table 1 (the best results for each measure are highlighted in bold typefaces), where Average support is the average support of the rule set; Average confidence refers to the average confidence of the rule set; and %Instances states for the percentage (expressed on a per unit basis) of instances covered by the rules on the total instances in the database. For each dataset, it is indicated the preprocessing type (EqFre for equal-frequency binning, and EqWid for equal-width binning) and the number of intervals used. Analyzing the results presented in Table 1, notice that the G3PARM algorithm obtains rules with better support and confidence than Apriori and FPGrowth algorithms. Furthermore, G3PARM obtains rules that cover 100% of dataset instances. Only in two experiments does the algorithm not cover all the instances: HH (with a coverage of 99.75%) and Sonar (with a coverage of 62.69%). Furthermore, we must bear in mind that the maximum number of rules obtained is delimited by the auxiliary population size so the results present in this paper have been obtained with few rules (a maximum of 20 rules).
Analysis of the Effectiveness of G3PARM Algorithm
33
To compare the obtained results and to be able to precisely analyze whether there are significant differences between the three algorithms, we use the Friedman test. If the Friedman test rejects the null-hypothesis indicating that there are significative differences, then we performed a Bonferroni-Dunn test to reveal the differences. We evaluate the performance of G3PARM by comparing it to the other algorithms in terms of their average support, average confidence and percentage of instances covered of each algorithm. The average ranking for every algorithm is also shown in Table 1, where the computed control algorithm (i.e., the algorithm with the lowest ranking) is our proposal, as can be noted. The Friedman average ranking statistic for average support measure distributed according to FF with k − 1 and (k − 1)(N − 1) degrees of freedom is 15.332; 8.185 for average confidence measure; and 13.838 for percentage of instances covered measure. None of them belongs to the critical interval [0, 3.219]. Thus, we reject the null-hynothesis that all algorithms perform equally well for these three measures. Due to the significant differences between the three algorithms, we use the Bonferroni-Dunn test to reveal the difference in performance and the Critical Difference (CD) value is 0.846 considering p = 0.01. The results indicate that, at a significance level of p = 0.01 (i.e., with a probability of 99%), there are significant differences between the three algorithms. The performance of G3PARM is statistically better than others for the %Instances and Average support measures. Concerning to the average confidence measure, the results indicate that, at a significance level of p = 0.01 there are significant differences between G3PARM and FP-Growth, since the performance of G3PARM is statistically better than the value of FP-Growth. G3PARM is also pretty competitive with Apriori in terms of the average confidence measure. Several different experiments have been also carried out to analyse the computation time of the three algorithms using the HH dataset. Figure 2(a) shows the relationship between the runtime and the number of instances. The Y axis represents time in seconds, whereas the X axis states for the percentage of instances using all attributes. In the same way, Figure 2(b) shows the relationship Apriori
FPGrowth
G3PARM
Runtime (seconds)
Runtime (seconds)
G3PARM
1000
100
10
1 10%
FPGrowth
Apriori
1000
20%
30%
40%
50%
60%
70%
80%
Percentage of instances
(a) Percentage of instances.
90%
100%
100
10
1
3
5
7
9
11
13
15
17
Number of attributes
(b) Number of attributes.
Fig. 2. Relationship between the runtime, the number of attributes and the percentage of instances
34
J.M. Luna, J.R. Romero, and S. Ventura
between the runtime and the number of attributes. The Y axis represents time in seconds and the X axis, the number of attributes using the 100% of instances. G3PARM scales better to the larger number of attributes in comparison with Apriori and FP-Growth since, the greater the number of attributes, the greater the combinatorial explosion caused to obtain the frequent itemsets.
5
Concluding Remarks
In this paper we presented G3PARM, a novel G3P-based algorithm for mining association rules from large data sources. The results obtained during the experimentation phase outline some conclusions concerning to the effectiveness of our proposal: (a) the mined association rules maintain high support, high confidence and high coverage of the dataset instances, providing the user with high representative rules; and (b) the runtime of our approach scales quite linearly as we increase up the dataset size and the number of attributes. Acknowledgments. This work has been supported by the Regional Government of Andalucia and the Ministry of Science and Technology projects, P08-TIC-3720 and TIN2008-06681-C06-03 respectively, and FEDER funds.
References 1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB 1994, Proceedings of 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, September 1994, pp. 487–499 (1994) 2. Borgelt, C.: Efficient implementations of Apriori and Eclat. In: FIMI 2003, 1st Workshop on Frequent Itemset Mining Implementations, Melbourne, Florida, USA (December 2003) 3. Coenen, F., Goulbourne, G., Leng, P.: Tree structures for mining association rules. Data Mining and Knowledge Discovery 8(1), 25–51 (2003) 4. Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Springer, New York (2003) 5. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 1–12 (2000) 6. Pap`e, N.F., Alcal´ a-Fdez, J., Bonarini, A., Herrera, F.: Evolutionary extraction of association rules: A preliminary study on their effectiveness. In: Corchado, E., ´ Baruque, B. (eds.) HAIS 2009. LNCS, vol. 5572, Wu, X., Oja, E., Herrero, A., pp. 646–653. Springer, Heidelberg (2009) 7. Tsymbal, A., Pechenizkiy, M., Cunningham, P.: Sequential genetic search for ensemble feature selection. In: Nineteenth International Joint Conference on Artificial Intelligence (IJCAI 2005), Edinburgh, Scotland, August 2005, pp. 877–882 (2005) 8. Wei, Z., Hongzhi, L., Na, Z.: Research on the fp growth algorithm about association rule mining. In: ICFCC 2009, International Conference on Future Computer and Communication, Kuala Lumpur, Malaysia, April 2009, pp. 572–576 (2009) 9. Yang, G., Shimada, K., Mabu, S., Hirasawa, K.: A nonlinear model to rank association rules based on semantic similarity and genetic network programming, vol. 4, pp. 248–256. Institute of Electrical Engineers of Japan (2009)
Reducing Dimensionality in Multiple Instance Learning with a Filter Method Amelia Zafra1 , Mykola Pechenizkiy2, and Sebasti´ an Ventura1 1
Department of Computer Science and Numerical Analysis. University of Cordoba 2 Department of Computer Science. Eindhoven University of Technology
Abstract. In this article, we describe a feature selection algorithm which can automatically find relevant features for multiple instance learning. Multiple instance learning is considered an extension of traditional supervised learning where each example is made up of several instances and there is no specific information about particular instance labels. In this scenario, traditional supervised learning can not be applied directly and it is necessary to design new techniques. Our approach is based on principles of the well-known Relief-F algorithm which is extended to select features in this new learning paradigm by modifying the distance, the difference function and computation of the weight of the features. Four different variants of this algorithm are proposed to evaluate their performance in this new learning framework. Experiment results using a representative number of different algorithms show that predictive accuracy improves significantly when a multiple instance learning classifier is learnt on the reduced data set.
1
Introduction
Theoretically, having more features to carry out machine learning tasks should give us more discriminating power. However, the real-world provides us with many reasons why this is not generally the case. Thus, if we reduce the set of features considered by the algorithm, we can considerably decrease the running time of the induction algorithms increasing the accuracy of the resulting model. In light of this, a considerable amount of research has addressed the issue of feature subset selection in machine learning. Most studies are focused on traditional supervised learning and very few studies have dealt with a multiple instance learning framework. Multiple Instance Learning (MIL) introduced by Dietterich et al. [1] consists of representing each example in a data set as a collection or a bag composed of single or multiple instances. In machine learning, MIL extends traditional supervised learning for problems maintaining incomplete knowledge about labels in training examples. In supervised learning, every training instance is assigned a label that is discrete or has a real-value. In comparison, in MIL, the labels are assigned only to bags of instances. In the binary case, a bag is labeled positive if at least one instance in that bag is positive, and the bag is labeled negative if all the instances in it are negative, but there are no labels for individual instances. The goal of MIL is to classify unseen bags by using the labeled bags as the training data. From the formulation of MIL, it is easy to see E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 35–44, 2010. c Springer-Verlag Berlin Heidelberg 2010
36
A. Zafra, M. Pechenizkiy, and S. Ventura
that a positive bag may contain some negative instances in addition to one or more positive instances. Hence, the true labels for the instances in a positive bag may or may not be the same as the corresponding bag label and, consequently, the instance labels are inherently ambiguous. There have been very few feature selection studies done about MIL [2,3,4,5,6] and most of them deal with wrapper approaches. These approaches directly incorporate the bias of MIL algorithms and they proposed to achieve a better method for MIL that improves on previous algorithms which do not use dimensional reduction. However, there is no proposal for a filter method which can be used as a preprocessing step for any algorithm designed nor general empirical studies with a representative number of MIL algorithms to work with the same reduced data set to show the relevance of feature selection in the MIL framework. In this paper, we address the problem of selecting a subset of important features for MIL from a filter perspective. In the filter model, feature selection is performed as a preprocessing step to induction y therefore it can be applied in any previous MIL algorithm. We propose an effective feature selection approach to MIL called ReliefF-MI which extends the ReliefF algorithm [7] to multiple instance learning. The proposed method assigns a real-valued weight to each feature to indicate its relevance to the problem. First, the features are ranked according to their importance and then a subset of important features is selected. One of the relevant characteristics of this method is that it can be applied to continuous and discrete problems in multiple instance classification, is aware of contextual information and can correctly estimate the quality of attributes in problems with strong dependencies between attributes. To check the effectiveness of the proposed model, different versions of ReliefF-MI are considered, modifying the similarity function to calculate the distance between patterns and the weight of the different features. Seventeen algorithms including different paradigms of machine learning, such as, decision rules, support vector machines, naive Bayes, decision trees, logistic regression, diverse density and based on distances are tested to establish the effectiveness of filter methods in MIL. Experimental results show, on one hand, that the new similarity function designed gets that ReliefF-MI method achieves the best results and, on the other hand, it is proven that algorithms always out-perform the results obtained when they apply the dimensionality reduction provided by ReliefF-MI as a preprocessing step. Thereby, it is demonstrated considering an important number of different methods that knowing the more relevant features in classification with MIL, this task is become more efficient. The paper is structured as follows. Section 2 describes the proposals developed to reduce features in MIL. Section 3 evaluates and compares the methods proposed with respect to considering the whole feature set. Finally, section 4 ends with conclusions and raises several issues for future work.
2
A Filter Approach - ReliefF-MI
The procedure proposed is based on the principles of the ReliefF algorithm [7]. The original ReliefF algorithm estimates the quality of attributes by how well
Reducing Dimensionality in Multiple Instance Learning with a Filter Method
37
their values distinguish between instances that are near each other. In MIL, the distance between two patterns has to be calculated taking into account that each pattern contains one or more instances. Therefore, a new concept has to be introduced concerning the similarity function and the calculation of the differences between attribute A in patterns R and H, dif f (A, R, H). The literature proposes different distance-based approaches to solve MI problems [8,9,10]. The most extensively used metric is Hausdorff distance [11], which measures the distance between two sets. Three adaptations of this measurement have been proposed in MIL literature: maximal Hausdorff, minimal Hausdorff and average Hausdorff. Here, a new metric is also designed based on previous ones that has been called the Adapted Hausdorff. With each different metric a different version of ReliefF-MI is proposed. In continuation the metric and differential function in different versions is shown. In all cases Ri is considered the bag selected in the current iteration which contains three instances (Ri1 , Ri2 , Ri3 ), Hj is the j th bag of the k nearest hit selected in the current iteration which contains four instances, (Hj1 , Hj2 , Hj3 , Hj4 ) and Mj is the j th bag of the k nearest misses selected in the current iteration which contains six instances, (Mj1 , Mj2 , Mj3 , Mj4 , Mj5 , Mj6 ). ReliefF-MI with maximal Hausdorff distance. This extension of ReliefF for MIL uses the Maximal Hausdorff Distance [11]. This distance is classical Hausdorff distance and can be specified as: Hmax (Ri , Hj ) = max{hmax (Ri , Hj ), hmax (Hj , Ri )} where hmax (Ri , Hj ) = maxr∈Ri minh∈Hj ||r − h|| To calculate the difference between the attributes of two patterns, the instance selected are the maximal distance of the minimal distance between the different instances of one bag and another: dif fbag−max (A, Ri , Hj ) = dif finstance (A, Ri3 , Hj4 ) being Ri3 and Hj4 instances which satisfy this condition. Similarly, the dif fbag−max (A, Ri , Mj ) is computed but with the instances of Ri and Mj . ReliefF-MI with minimal Hausdorff distance. This extension of ReliefF for MIL uses the Minimal Hausdorff Distance [8]. This distance instead of choosing the maximum distance, the distance is ranked first and the lowest distance value is selected. Formally, it modifies the Hausdorff Distance definition as follows: Hmin (A, B) = mina∈A minb∈B ||a − b|| To calculate the difference between the attributes of two patterns, the instance of each bag selected to calculate the distances is the minimal distance between all instances of each bag: dif fbag−min (A, Ri , Hj ) = dif finstance (A, Ri1 , Hj3 ) being Ri1 and Hj3 instances that satisfy this condition. Similarly, dif fbag−min (A, Ri , Mj ) would be computed but with the instances of Ri and Mj . ReliefF-MI with average Hausdorff distance. This extension of ReliefF for MIL uses the Average Hausdorff distance [10] proposed by Zhang and Zhou to measure the distance between two bags. It is defined as follows: Havg (Ri , Hj ) minh∈Hj ||r − h|| + minr∈Ri ||h − r||)/(|Ri | + |Hj |) =( r∈Ri
h∈Hj
38
A. Zafra, M. Pechenizkiy, and S. Ventura
where |.| measures the cardinality of a set. Havg (A, B) averages the distances between each instance in one bag and its nearest instance in the other bag. Conceptually speaking, average Hausdorff distance takes more geometric relationships between two bags of instances into consideration than the maximal and minimal Hausdorff ones. To calculate the difference between the attributes of two patterns, there are several instances involved to update the weights of the features. If we suppose – d(Ri1 , Hj2 ), d(Ri2 , Hj1 ) and d(Ri3 , Hj4 ) are the minimal distance between each instance r ∈ Ri with respect to instances h ∈ Hj . – d(Hj1 , Ri1 ), d(Hj2 , Ri1 ), d(Hj3 , Ri2 ) and d(Hj4 , Ri3 ), are the minimal distances between each instance h ∈ Hj with respect to the instances r ∈ Ri . The function dif f would be specified as following, 1 dif fbag−avg (A, Ri , Hj ) = r+h ∗ [dif finstance (A, Ri1 , Hj2 ) + dif finstance (A, Ri2 , 1 3 4 hj ) + dif finstance (A, Ri , hj ) + dif fi nstance(A, Hj1 , Ri1 ) + dif finstance (A, Hj2 , Ri1 ) + dif finstance (A, Hj3 , Ri2 ) + dif finstance (A, Hj4 , Ri3 )]
The same process is used to calculate dif fbag−avg (A, Ri , Mj ), but considering the pattern Mj . ReliefF-MI with adapted Hausdorff distance. This extension of ReliefF for MIL uses the Adapted Hausdorff distance. Due to the particularities of this learning, we propose this new distance combining the previous ones to measure the distance between two bags. This metric represents a different calculation depending on the class of the pattern because the information about instances in each pattern depends on the class that it belongs to. Thus, the metric is different if we evaluate the distance between two positive or negative patterns or between one positive and one negative pattern. – If both patterns are negative, we can be sure that there is no instance in the pattern that represents the concept that we want to learn. Therefore, an average distance will be used to measure the distance between these bags because all instances are guaranteed to be negative: Hadapted (Ri , Hj ) = Havg (Ri , Hj ). – If both patterns are positive. The correct information is that at least one instance in each of them represents the concept that we want to learn, but there is no information about which particular instance or set represents the concept. Therefore, we use the minimal distance to measure their distance because the positive instance has more probability of being near: Hadapted (Ri , Hj ) = Hmin (Ri , Hj ) – Finally, if we evaluate the distance between patterns where one of them is a positive bag and the other is a negative one, the measurement considered will be the maximal Hausdorff distance because the instances in the different classes are probably outliers between the two patterns: Hadapted (Ri , Mj ) = Hmax (Ri , Mj )
Reducing Dimensionality in Multiple Instance Learning with a Filter Method
39
In this case, the calculation of the dif f function also depends on the pattern class. Therefore, the pattern label will determine how the function dif f will be evaluated. – If Ri is positive and Hj is positive, dif fbag−adapted (A, Ri , Hj ) = dif fbag−min (A, Ri , Hj ) – If Ri is negative and Hj is negative, dif fbag−adapted (A, Ri , Hj ) = dif fbag−avg (A, Ri , Hj ) – Finally, if Ri is positive and Mj is negative or viceverse, dif fbag−adapted (A, Ri , Mj ) = dif fbag−max (A, Ri , Mj ) The function dif finstance used in the previous calculations is the difference between two particular instances for a given attribute. The total distance is simply the sum of distances throughout all attributes (Manhattan distance, [12]). When dealing with nominal attributes, function dif f (A, Ri , Hj ) is defined as: 0; value(A, Ri ) = value(A, Hj ) dif finstance (A, Ix , Iy ) = 1; otherwise and for numerical attributes as: dif finstance (A, Ri , Hj ) =
3
|value(A, Ri ) − value(A, Hj )| max(A) − min(A)
Experimental Results
The experimental study is aimed to evaluate the performance of ReliefF-MI. This evaluation has been broken down into two parts: a comparative study between the results obtained by the different versions designed and a comparison between algorithm performance that does or does not use dimensionality reduction to show the relevance of this filter method in the MIL framework. In the experimentation, seventeen of the most popular proposals in MIL are considered with three applications of categorization of images based on content [13,14], whose names and characteristics are given in Table 1. All the experiments were executed using 10-fold cross validation, and a statistical test was adopted to analyze the experimental results. Table 1. General Information about Data Sets Dataset Elephant Tiger Fox
Bags Attributes Instances Average Positive Negative Total Bag Size 100 100 100
100 100 100
200 200 200
230 230 230
1391 1220 1320
6.96 6.10 6.60
40
3.1
A. Zafra, M. Pechenizkiy, and S. Ventura
Comparison of Different Metrics for ReliefF-MI
To determine which metric is more interesting, representative paradigms in MIL have been considered. The different paradigms consider seventeen algorithms including: methods based on Diverse Density: MIDD, MIEMDD and MDD; methods based on Logistic Regression: MILR; methods based on Support Vector Machines: SMO and MISMO; distance-based Approaches: CitationKNN and MIOptimalBall; methods based on Rules: such as PART, Bagging with PART and AddaBoost with PART using MIWrapper and MISimple approach (they are different adaptations for working with MIL); method based on decision trees: MIBoost and methods based on Naive Bayes. More information about the algorithms considered could be consulted at the WEKA workbench [15] where these techniques have been designed. Table 2. Results with Reduced Feature Data Set Algorithms
Maximal
Minimal
Average
Adapted
Eleph Tiger Fox Eleph Tiger Fox Eleph Tiger Fox Eleph Tiger Fox citationKNN MDD RepTree 1 DecisionStump 1 MIDD MIEMDD MILR MIOptimalBall RBF Kernel2 Polynomial Kernel2 AdaBoost&PART3 Bagging&PART3 PART3 SMO3 Naive Bayes3 AdaBoost&PART4 PART4 RANKING
0.750 0.725 0.825 0.825 0.755 0.725 0.815 0.795 0.765 0.765 0.830 0.830 0.830 0.705 0.655 0.800 0.770
0.830 0.810 0.870 0.800 0.780 0.775 0.855 0.740 0.835 0.825 0.840 0.850 0.815 0.815 0.820 0.855 0.795 2.676
0.615 0.620 0.655 0.655 0.600 0.530 0.600 0.575 0.615 0.620 0.615 0.585 0.580 0.660 0.590 0.570 0.620
0.745 0.710 0.840 0.820 0.750 0.685 0.840 0.765 0.800 0.780 0.830 0.810 0.815 0.715 0.675 0.840 0.765
0.850 0.800 0.845 0.785 0.780 0.720 0.825 0.715 0.865 0.825 0.825 0.865 0.830 0.835 0.815 0.830 0.730 2.520
0.630 0.600 0.665 0.695 0.645 0.605 0.630 0.495 0.655 0.685 0.745 0.595 0.615 0.675 0.650 0.600 0.670
0.745 0.710 0.840 0.820 0.750 0.685 0.840 0.765 0.800 0.780 0.830 0.810 0.815 0.715 0.675 0.840 0.765
0.840 0.790 0.865 0.800 0.780 0.745 0.840 0.735 0.830 0.830 0.820 0.860 0.810 0.830 0.825 0.840 0.740
0.610 0.605 0.700 0.660 0.595 0.530 0.615 0.525 0.655 0.665 0.620 0.610 0.570 0.655 0.585 0.560 0.660
0.745 0.705 0.840 0.830 0.755 0.715 0.835 0.775 0.785 0.770 0.840 0.830 0.835 0.705 0.660 0.830 0.775
2.794
0.815 0.805 0.855 0.805 0.770 0.770 0.875 0.740 0.855 0.820 0.860 0.865 0.840 0.820 0.820 0.845 0.780
0.615 0.660 0.710 0.700 0.695 0.615 0.635 0.535 0.650 0.655 0.665 0.605 0.620 0.690 0.680 0.650 0.665
2.010 1 MIBoost 2 MISMO
3 MIWrapper 4 MISimple
The average results of accuracy for tiger, fox and elephant data sets obtained by the algorithms using the different proposals for ReliefF-MI are reported in Table 2. To check which metric achieves the most relevant features for different algorithms a statistical test is carried out (Friedman test). This test is a non parametric test that compares the average ranks of the proposals considered, where the metric that achieves the highest accuracy for one algorithm is given a rank of 1, the metric with the next highest accuracy value for this algorithm has the rank of 2, and so on with all algorithms and data sets used. These ranks let us know which metric obtains the best results considering all algorithms and data sets. In this way, the metric with the value closest to 1 indicates that the algorithms considered in this study generally obtain better results using the
Reducing Dimensionality in Multiple Instance Learning with a Filter Method
41
reduced feature set it provided. The ranks obtained by each metric can be seen in Table 2. The Friedman test determines at a confidence level of 95% that there are significant differences in the results when use different metrics because its value is 10.965 and the χ2 (n = 3) is 6.251, therefore the null hypothesis is rejected and it is obvious that there are differences between them. The Bonferroni test is carried out to determine which metric selects the most relevant features. Results of this test show that the methods with a threshold over 2.638 (confidence at 95%) are considered worse proposals than the control algorithm. In this case, this algorithm would be ReliefF-MI with metric designed because it gets the lowest ranking value and therefore it is the best option. Statiscally, the worst proposals would be average Hausdorff distance and a maximum of Hausdorff distance which has a higher ranking than the threshold set by this test. 3.2
Effectiveness of ReliefF-MI
Results obtained by algorithms when they use the reduced data set provided by ReliefF-MI using adapted Hausdorff distance are compared to results using the full data set to show the effectiveness of ReliefF-MI. Table 3 show the results of the seventeen algorithms for different data sets in two cases. A statistical study is carried out to check whether the use of ReliefF-MI improves the results of algorithms with respect to results when no dimensionality reduction is done. The Wilcoxon rank sum test is used to find whether there are or are not differences between accuracy values obtained by different algorithms using the feature set provided by ReliefF-MI. This test is a non-parametric recommended in Demsar’s study [16]. The null hypothesis of this test maintains that there are no significant differences between accuracy values obtained by algorithms when they use different feature sets, while the alternative hypothesis assures that there are. Table 4 shows the mean ranks and the sum of ranks for Table 3. Results with Full Feature Data Set Algorithms citationKNN MDD MIBoost (RepTree) MIBoost (DecisionStump) MIDD MIEMDD MILR MIOptimalBall MISMO (RBF Kernel) MISMO (Polynomial Kernel) MIWrapper (AdaBoost&PART) MIWrapper (Bagging&PART) MIWrapper (PART) MIWrapper (SMO) MIWrapper (Naive Bayes) MISimple (AdaBoost&PART) MISimple (PART)
Reduced Set Eleph Tiger Fox 0.745 0.815 0.615 0.705 0.805 0.660 0.840 0.855 0.710 0.830 0.805 0.700 0.755 0.770 0.695 0.715 0.770 0.615 0.835 0.875 0.635 0.775 0.740 0.535 0.785 0.855 0.650 0.770 0.820 0.655 0.840 0.860 0.665 0.830 0.865 0.605 0.835 0.840 0.620 0.705 0.820 0.690 0.660 0.820 0.680 0.830 0.845 0.650 0.775 0.780 0.665
Full Set Eleph Tiger Fox 0.500 0.500 0.500 0.800 0.755 0.700 0.815 0.825 0.670 0.815 0.780 0.650 0.825 0.740 0.655 0.730 0.745 0.600 0.780 0.840 0.510 0.730 0.625 0.530 0.800 0.795 0.590 0.790 0.785 0.580 0.840 0.790 0.685 0.845 0.810 0.600 0.790 0.780 0.550 0.715 0.800 0.635 0.680 0.760 0.590 0.840 0.795 0.625 0.765 0.765 0.635
42
A. Zafra, M. Pechenizkiy, and S. Ventura Table 4. Sum of Ranks and Mean Rank of the two proposals Mean Rank Sum of Ranks 1
ReliefF-MI Method Not Reducting features 1
57.03 45.97
2908.50 2344.50
Adapted Hausdorff Distance.
each of the two options. The scores are ranked from lowest to highest. Therefore, we can see that algorithms not using feature selection have a lower mean rank than algorithms using the ReliefF-MI method. This information can be used to ascertain a priori that ReliefF-MI is a better proposal. The results of the Wilcoxon statistical test are 2344 and the corresponding z-score is -1.888. According to these values, the results are significant (p-value = 0.059 < 0.1) to a 90% level of confidence when we reject the null hypothesis and determine that there are significant differences between the results obtained by algorithms when they use the feature selection method. Consequently, ReliefF-MI has significantly higher accuracy values than the option that does not use feature reduction. This conclusion is reached by noting that for ReliefFMI scores, the mean rank is higher in the algorithms using selection feature as pre-processing step (for example, at a value of 57.03) than the other option (at a value of 45.97). In general, we can conclude that the use of this method benefits the results achieved by the algorithms. That is, the usefulness of this method of feature selection in MIL is demonstrated because it manages to optimize the results obtained by different algorithms. Thus, the results with a lower number of features minimize the classification error, and the efficiency of this method is shown in the case of MIL. The study considers seventeen different algorithms and three different data sets, so it is a representative enough study to reach this conclusion.
4
Conclusions
The task of finding relevant features for multiple instance data is largely untouched. In this learning framework, the process is more complex because the class labels of particular instances in the patterns are not available. Thus, the classic method of feature selection for traditional supervised learning with single data does not work well in this scenario. For multiple instance data, where information shows uncertainty, only it has been proposed methods related with optimizing a particular algorithm (most of them are wrapper approaches). This paper addresses the problem of feature selection to reduce the dimensionality of data in MIL in a general way for any algorithm using filter method. Thus, we describe a new efficient algorithm based on ReliefF principles [7] that can be applied to continuous and discrete problems, is faster than wrapper methods and can be applied to any MIL algorithm designed previously because the method is applied as a preprocessing step. Experimental results shows the effectiveness of
Reducing Dimensionality in Multiple Instance Learning with a Filter Method
43
our approach using three different applications and seventeen algorithms with the reduced data. First, the different metrics are compared to evaluate their effect on the algorithm developed. Results show that the new metric proposed is the metric that statistically achieves the best results. This metric is designed to be adapted to the variable information of each pattern according to the class of that pattern. So, the Wilcoxon test shows the benefits of applying data reduction in MIL and obtains better results in all algorithms in general when they only work with the most relevant features. Thus, the relevance of using feature selection in this scenario is established for improving the performance of algorithms with high-dimensional data. More work that can be done in this area includes the designing of other metrics to measure the distance between bags in order to optimize the performance of this method, as well as the designing of other feature selection methods based on filtering in MIL scenario to study which methods work best in this learning context.
Acknowledgements The authors gratefully acknowledge the financial support provided by the Spanish department of Research under TIN2008- 06681-C06-03, P08-TIC-3720 Projects and FEDER funds.
References 1. Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artifical Intelligence 89(1-2), 31–71 (1997) 2. Zhang, M.L., Zhou, Z.H.: Improve multi-instance neural networks through feature selection. Neural Processing Letter 19(1), 1–10 (2004) 3. Chen, Y., Bi, J., Wang, J.: Miles: Multiple-instance learning via embedded instance selection. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(12), 1931–1947 (2006) 4. Yuan, X., Hua, X.S., Wang, M., Qi, G.J., Wu, X.Q.: A novel multiple instance learning approach for image retrieval based on adaboost feature selection. In: ICME 2007: Proceedings of the IEEE International Conference on Multimedia and Expo., Beijing, China, pp. 1491–1494. IEEE, Los Alamitos (2007) 5. Raykar, V.C., Krishnapuram, B., Bi, J., Dundar, M., Rao, R.B.: Bayesian multiple instance learning: automatic feature selection and inductive transfer. In: ICML 2008: Proceedings of the 25th International Conference on Machine Learning, pp. 808–815. ACM, New York (2008) 6. Herman, G., Ye, G., Xu, J., Zhang, B.: Region-based image categorization with reduced feature set. In: Proceedings of the 10th IEEE Workshop on Multimedia Signal Processing, Cairns, Qld, pp. 586–591 (2008) 7. Kononenko, I.: Estimating attributes: analysis and extension of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
44
A. Zafra, M. Pechenizkiy, and S. Ventura
8. Chevaleyre, Y.Z., Zucker, J.D.: Solving multiple-instance and multiple-part learning problems with decision trees and decision rules. Application to the mutagenesis problem. In: Stroulia, E., Matwin, S. (eds.) Canadian AI 2001. LNCS (LNAI), vol. 2056, pp. 204–214. Springer, Heidelberg (2001) 9. Zhang, D., Wang, F., Si, L., Li, T.: M3IC: Maximum margin multiple instance clustering, pp. 1339–1344 (2009) 10. Zhang, M.L., Zhou, Z.H.: Multi-instance clustering with applications to multiinstance prediction. Applied Intelligence 31, 47–68 (2009) 11. Edgar, G.: Measure, topology, and fractal geometry, 3rd edn. Springer, Heidelberg (1995) 12. Cohen, H.: Image restoration via n-nearest neighbour classification. In: ICIP 1996: Proceedings of the International Conference on Image Processing, pp. 1005–1007 (1996) 13. Yang, C., Lozano-Perez, T.: Image database retrieval with multiple-instance learning techniques. In: ICDE 2000: Proceedings of the 16th International Conference on Data Engineering, Washington, DC, USA, pp. 233–243. IEEE Computer Society, Los Alamitos (2000) 14. Pao, H.T., Chuang, S.C., Xu, Y.Y., Fu, H.: An EM based multiple instance learning method for image classification. Expert Systems with Applications 35(3), 1468–1472 (2008) 15. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005) 16. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 17, 1–30 (2006)
Graphical Exploratory Analysis of Educational Knowledge Surveys with Missing and Conflictive Answers Using Evolutionary Techniques Luciano S´ anchez , In´es Couso, and Jos´e Otero Computer Science and Statistics Departments, Universidad de Oviedo, Campus de Viesques s/n Gijon (Spain) {luciano,couso,jotero}@uniovi.es
Abstract. Analyzing the data that is collected in a knowledge survey serves the teacher for determining the student’s learning needs at the beginning of the course and for finding a relationship between these needs and the capacities acquired during the course. In this paper we propose using graphical exploratory analysis for projecting all the data in a map, where each student will be placed depending on his/her knowledge profile, allowing the teacher to identify groups with similar background problems, segment heterogeneous groups and perceive the evolution of the abilities acquired during the course. The main innovation of our approach consists in regarding the answers of the tests as imprecise data. We will consider that either a missing or unknown answer, or a set of conflictive answers to a survey, is best represented by an interval or a fuzzy set. This representation causes that each individual in the map is no longer a point but a figure, whose shape and size determine the coherence of the answers and whose position with respect to its neighbors determine the similarities and differences between the students. Keywords: Knowledge Surveys, Graphical Exploratory Analysis, Multidimensional Scaling, Fuzzy Fitness-based Genetic Algorithms.
1
Introduction
Knowledge surveys comprise short questions that students can answer writing a single line, or choosing between several alternatives in a printed or web-based questionnaire [5]. These surveys can be used for assessing the quality of the learning and they are also meaningful from a didactical point of view. On the one hand, they allow students to perceive the whole content of the course [6]. On the other hand, teachers can use these surveys for deciding the best starting level for the lectures, specially in Master or pre-doctoral lectures [10], where the
This work was funded by Spanish M. of Education, under the grant TIN2008-06681C06-04.
E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 45–52, 2010. c Springer-Verlag Berlin Heidelberg 2010
46
L. S´ anchez, I. Couso, and J. Otero
profiles of the students attending the same course are much different. Recently this has also been applied to teacher education and certification [11]. When the survey is done at the end of the course, the effectivity of the teaching methodology along with the attitude and dedication of the students is measured. There is certain consensus in the literature in that the relationship between methodology/dedication and scoring is weak [2]. Because of this, a survey (different than an exam, designed to score the students) is needed. Finally, elaborating the survey serves by itself to establish the course contents, timeline and the teaching methodology [9]. In this context, this paper is about graphically analyzing the data that is collected in a knowledge survey. We intend to determine the student’s learning needs at the beginning of the course and also to find a relationship between these needs and the capacities acquired during the course. To this end, we propose projecting the data in a map, where each student will be placed according to his/her knowledge profile, allowing the teacher to identify groups with similar background problems, segmenting heterogeneous groups and showing the evolution of the abilities acquired during the course. This is not a new technique by itself, since these statistical methods (and, generally speaking, intelligent techniques) for analyzing questionnaires and surveys are part of the common knowledge [9]. Moreover, the proliferation of free data mining software (see, for instance [1]) has driven many advances in the application of Artificial Intelligence in educational contexts [7]. Indeed, there exist some tools that can generate views of the aforementioned data for easily drawing conclusions and making predictions about the course effectivity. The innovation of our approach is not in the use of graphical techniques but in extending them to data that is possibly incomplete or imprecise. This is rarely done when analyzing surveys and as a matter of fact the extension of graphical exploratory analysis to low quality data-based problems is very recent [3,4]. As far as we know, these last techniques have not yet been applied in an educational context. Notwithstanding, we believe that their use will make possible to solve better two frequent problems: the situation where the student does not answer some questions of the survey and the cases where there are incompatible answers that might have been carelessly answered. Within our approach, we will consider that a missing or unknown answer in the survey is best represented by an interval. For instance, if the answer is a number between 0 and 10, an unanswered question will be associated with the interval [0,10]. We will not try to make up a coherent answer for the incomplete test, but we will carry the imprecision in all the calculations. In turn, an incoherent set of answers will also be represented by an interval. For instance, assume that the same question is formulated in three different ways (this can be done for detecting random answers to the tests) and the student answers incoherent results. Let {6, 2, 4} be the different answers to the question. With our methodology, instead of replacing this triplet by its mean, we will say that the answer is an unknown number in the range [2, 6] (the minimum and the maximum of the answers).
Graphical Analysis of Surveys with Conflictive Answers
47
Using intervals for representing unknown values produces that each individual in the map is no longer a point but a figure, whose shape and size determine the coherence of the answers and whose relative position determines the similarities between it and the other students. In this paper we will explain how this map can be generated with the help of interval (or fuzzy) valued fitness function-driven genetic algorithms. We will also show the results of this new analysis in three actual surveys, answered by Spanish engineering and pre-doctorate students. The structure of this paper is as follows: in Section 2 we introduce Graphical Exploratory Analysis for vague data and its relation with knowledge surveys. In the same section we explain an evolutionary algorithm for computing these maps, and in Section 3 we show the results of this method in three real-world cases. The paper concludes in Section 4.
2
Graphical Exploratory Statistics
There are many different techniques for performing graphical exploratory analysis of data: Sammon maps, Principal Component Analysis (PCA), Multidimensional Scaling (MDS) , self-organized maps (SOM), etc. [3]. These methods project the instances as points in a low dimensional Euclidean space so that their proximity reflects the similarity of their variables. However, we have mentioned that the surveys can be incomplete or they possibly contain conflicting answers, and also that an incomplete survey can be taken as the set of all surveys with any valid value in place of the missing answer. Observe that, in that case, the projection will not be a point but a shape whose size will be larger as the more incomplete or imprecise the survey is. This extension from a map of points to a map of shapes has already been done for some of the techniques mentioned before. For instance, Fuzzy MDS, as described in [3,4], extends MDS to the case where the distance matrix comprises intervals or fuzzy numbers, as happens in our problem. Crisp MDS consists in finding a low-dimensional cloud of points that minimizes an stress function. That function measures the difference between the matrix of distances between the data and the matrix of distances between this last cloud. The interval (or fuzzy) extension of this algorithm defines an interval (fuzzy) valued stress function that bounds the difference between the imprecisely known matrix of distances between the objects and the interval (fuzzy) valued distance matrix between a set of shapes in the low-dimensional projection. Let us assume for the time being that the distance between two surveys is an + interval. For two imprecisely measured multivariate values xi = [x− i1 , xi1 ] × . . . × − + − + − + [xif , xif ] and xj = [xj1 , xj1 ] × . . . × [xjf , xjf ], with f features each, the set of distances between their possible values is the interval Dij =
f
k=1 (xik
− xjk
)2
| xik ∈
+ [x− ik , xik ], xjk
∈
+ [x− jk , xjk ], 1
≤k≤f
.
(1)
Some authors have used a distance similar to this before [4], and further assumed that the shape of projection of an imprecise case is a circle. We have found that,
48
L. S´ anchez, I. Couso, and J. Otero
xj
− Rij
xi + Rij
Fig. 1. The projected data are polygons defined by the distances Rij in the directions that pairwise join the examples
in our problem, this last is a too restrictive hypothesis. Instead, we propose to approximate the shape of the projections by a polygon (see Figure 1) whose + − radii Rij and Rij are not free variables, but depend on the distances between the cases. For a multivariate set of imprecise data {x1 , . . . , xN }, let xi be the crisp centerpoint of the imprecise value xi (the center of gravity, if an interval, or the modal point, if fuzzy), and let {(z11 , . . . , z1r ), . . . , (zN 1 , . . . , zN r )} be a crisp + projection, with dimension r, of that set. We propose that the radii Rij and − Rij depend on the distance between xi and xj (see Figure 2 for a graphical explanation) as follows + Rij
= dij
+ δij −1 δij
− Rij
= dij
δij −1 − δij
(2)
r + 2 where dij = k=1 (zik − zjk ) , δij = {D(xi , xj )}, δij = max{D(xi , xj )}, and − δij = min{D(xi , xj )}. We also propose that the value of the stress function our map has to minimize is N N
− − + dH (Dij , [dij − Rij − Rji , dij + Rij + Rji ]+ )2
(3)
i=1 j=i+1
where dH is the Haussdorff distance between intervals. 2.1
Characteristic Points
We also propose adding several prototypic surveys (we will call them “characteristic points”) corresponding to a survey without mistakes, a completely wrong survey, one section well answered but the remaining ones wrong, etc. With the help of these points, the map can be used for evaluating the capacities of a student by comparing it with its closest characteristic point.
Graphical Analysis of Surveys with Conflictive Answers
49
− δij = min D(xi , xj )
xj δij = D(xi , xj )
xi
+ δij = max D(xi , xj )
zj
zi + Rij
− Rij
dij
− Rji
+ Rji
− − Fig. 2. The distance between the projections of xi and xj is between dij − Rij − Rji + + and dij + Rij − Rji
2.2
Evolutionary Algorithm
An evolutionary algorithm is used for optimizing the stress function and searching the best map. In previous works we have shown that interval and fuzzy fitness functions can be optimized with extensions of multiobjective genetic algorithms. In this paper we have used the extended NGSA-II defined in [8], whose main components are summarized in the following paragraphs. Coding scheme. Each individual of the population represents a set of coordinates in the plane, thus each chromosome consists of the concatenation of so many pairs of numbers as students, plus one pair for each characteristic point (i.e. “Everything”, “Nothing”, “Only Subject X”, “Every Subject but X”, etc). The chromosome is fixed-length, and real coding is used. Objective Function. The fitness function was defined in eq. (3). Evolutionary Scheme. A generational approach with the multiobjective NSGAII replacement strategy is considered. Binary tournament selection based on the crowding distance in the objective function space is used. The precedence operator derives from the bayesian coherent inference with an imprecise prior, the dominated sorting is based on the product of the lower probabilities of precedence, and the crowding in based on the Hausdorff distance, as described in [8].
50
L. S´ anchez, I. Couso, and J. Otero
Genetic Operators. Arithmetic crossover is used for combining two chains. The mutation operator consists in performing crossover with a randomly generated chain.
3
Results
In this section we will illustrate, with the help of three real-world datasets, how to identify groups of students and how to stack two maps from the same individuals at different times, for showing the temporal evolution of the learning. 3.1
Variation of Individual Capacities in the Same Group and between Groups
7
In the left part of Figure 3 a diagram for 30 students of subject “Statistics” in Ingenieria Telematica at Oviedo University, taken at the beginning of the 20092010 course is shown. This survey is related to students’ previous knowledge in other subjects. In particular, this survey evaluates previous knowledge in Algebra (A), Logic (B), Electronics (C), Numerical Analysis (D), Probability (E) and Physics (F). The positions of the characteristic points have been marked with labels. Those points are of the type “A” (all the questions about the subject “A” are correct, the others are erroneous) “NO A” (all the questions except “A” ones are correct, the opposite situation), etc. In the right part of Figure 3 we have plotted together the results of three different groups, attending lectures by the same teacher. Each intensification
ONLY C
ONLY C
6
8
ONLY D ONLY D ONLY E
NOTHING
NOTHING NO A
6
5
ONLY A ONLY F
NO D
4
4
ONLY B
NO A 2
2
3
ONLY B
NO D
NO E
1
0
ONLY A NO C B NO EVERYTHING
EVERYTHING NO C
NO F 0
NO B
0
2
4
6
8
0
2
4
6
Fig. 3. Left part: Differences in knowledge of Statistics for students in Ingenieria Telematica. Right part: Differences in knowledge about Computer Science between the students of Ingenieria Tecnica Industrial specialized in Chemistry, Electricity and Mechanics.
Graphical Analysis of Surveys with Conflictive Answers
51
has been coded with a distinctive colour. This teacher has evaluated, as before, the initial knowledge of the students in subjects that are a prerequisite. From the graphic in that figure the most relevant fact is that the students of the intensification coded in red (Ingenieria Industrial) consider themselves better prepared than those coded in blue (Ingeniera Tecnica Industrial Electrica), with the green group in an intermediate position, closer to red (Ingeniera Tecnica Industrial Quimica). All the students of all the groups have a neutral orientation to math subjects, and some students in the blue group think that their background is adequate only in subjects C (Operating Systems) and D (Internet). 3.2
Evaluation of Learning Results
Ten pre-doctoral students in Computer Science, Physics and Mathematics attending a research master were analyzed. The background of these students is heterogeneous. In the survey the students were asked about 36 subjects classified in “Control Algorithms” (A), “Statistical Data Analysis” (B), “Numerical Algorithms” (C) and “Lineal Models” (D). At the top of the figure 4 we can see that there is a large dispersion between the initial knowledges. Since the subject had strong theoretic foundations, students from technical degrees like Computer Science evaluated themselves with the lowest scores (shapes in the right part of each figure). NO A
NO A 6
EVERYTHING e7
2
8
NO B
0
e8 e6
1
2
b9 b2
ONLY C
b5
NOTHING
4
5
-1
0
1
2
ONLY C
b5 b3
NOTHING
ONLY A
ONLY D 3
ONLY B
b8 b6b10
NO B
ONLY A
ONLY D 0
b8 e5 e10 b6b10 e3
NO D
ONLY B
b3
0
NOTHING
ONLY A
-1
b9
NO B
3
b1 b4
4
e2b2
ONLY C
5
e9
2
2
6 10
b7
NO D
ONLY B
2
9
b7 e1 e4 b1 b4
4
4
NO D
EVERYTHING
0
7 1 4
NO A
NO C
6
NO C
6
NO C EVERYTHING
ONLY D 3
4
5
-1
0
1
2
3
4
5
Fig. 4. Evolution of the learning of pre-doctoral students. Left part: Initial survey. Center: superposition of initial and final maps. Right part: The displacement has been shown by arrows.
The same survey, at the end of the course, shows that all the students moved to the left, closer to characteristic point “EVERYTHING”. Additionally, the displacement has been larger for the students in the group at the right. This displacement can be seen clearly in the right part of the same figure, where the shapes obtained from the final survey were replaced by arrows that begin in the initial position and end in the final center. The length of the arrows is related with the progress of the student during the course.
52
4
L. S´ anchez, I. Couso, and J. Otero
Conclusions
In this work we have extended with the help of a fuzzy fitness-driven genetic algorithm the Multidimensional Scaling to imprecise data, and exploited the new capabilities of the algorithm for producing a method able to process incomplete or carelessly filled surveys that include conflictive answers. The map of a group of students consists on several shapes, whose volume measures the degree to which a survey lacks consistency. We have shown that these maps can help detecting heterogeneous groups and can also be used for assessing the results of a course.
References 1. Alcala-Fdez, L., et al.: KEEL: A Software Tool to Assess Evolutionary Algorithms to Data Mining Problems. Soft Computing 13(3), 307–318 (2009) 2. Cohen, P.A.: Student ratings of instruction and student achievement: a metaanalysis of multisection validity studies. Review of Educational Research 51(3), 281–309 (1981) 3. Denoeux, T., Masson, M.-H.: Multidimensional scaling of interval-valued dissimilarity data. Pattern Recognition Lett. 21, 83–92 (2000) 4. Hebert, P.A., Masson, M.H., Denoeux, T.: Fuzzy multidimensional scaling. Computational Statistics and Data Analysis 51, 335–359 (2006) 5. Knipp, D.: Knowledge surveys: What do students bring to and take from a class? United States Air Force Academy Educator (Spring 2001) 6. Nuhfer, E.: Bottom-Line Disclosure and Assessment. Teaching Professor 7(7), 8–16 (1993) 7. Romero, C., Ventura, S., Garca, E.: Data mining in course management systems: Moodle case study and tutorial. Computers & Education 51(1), 368–384 (2008) 8. Sanchez, L., Couso, I., Casillas, J.: Modeling vague data with genetic fuzzy systems under a combination of crisp and imprecise criteria. In: MCDM 2007, Honolulu, Hawaii, USA (2007) 9. Wirth, K., Perkins, D.: Knowledge Surveys: The ultimate course design and assessment tool for faculty and students. In: Proceedings: Innovations in the Scholarship of Teaching and Learning Conference, April 1-3, p. 19. St. Olaf College/Carleton College (2005) 10. Nagel, L., Kotz, T.: Supersizing e-learning: What a CoI survey reveals about teaching presence in a large online class. The Internet and Higher Education (2009) 11. Zeki Saka, A.: Hitting two birds with a stone: Assessment of an effective approach in science teaching and improving professional skills of student teachers. Social and Behavioral Sciences 1(1), 1533–1544 (2009)
Data Mining for Grammatical Inference with Bioinformatics Criteria Vivian F. L´ opez, Ramiro Aguilar, Luis Alonso, Mar´ıa N. Moreno, and Juan M. Corchado Departamento Inform´ atica y Autom´ atica, University of Salamanca, Plaza de la Merced S/N, 37008. Salamanca {vivian,ramiro,luis,maria,corchado}@usal.es
Abstract. In this paper we describe both theoretical and practical results of a novel data mining process that combines hybrid techniques of association analysis and classical sequentiation algorithms of genomics to generate grammatical structures of a specific language. We used an application of a compilers generator system that allows the development of a practical application within the area of grammarware, where the concepts of the language analysis are applied to other disciplines, such as Bioinformatic. The tool allows the complexity of the obtained grammar to be measured automatically from textual data. A technique of incremental discovery of sequential patterns is presented to obtain simplified production rules, and compacted with bioinformatics criteria to make up a grammar. Keywords: Grammatical Inference, Bioinformatic, Free Context Grammar, DNA, sequential patterns.
1
Introduction
In recent years many approaches have been introduced as data mining methods for pattern recognition in biological databases. Bioinformatics employs computational and data processing technologies to develop methods, strategies and programs that permit the immense quantity of biological data that has been generated and is currently being generated to be handled, ordered and studied. To this aim, the computational linguistics has received considerable attention in bioinformatics. The study in [18] indicated that a relation exists between formal languages theory and DNA, the linguistic view of DNA sequences being a rich source of ideas to model strings with correlated symbols. Most of the work [7,8] has involved examinations of the occurrences of ”words” in DNA. Searls et al [17] found that such a linguistic approach proves useful not only in theoretical characterization of certain structural phenomena in sequences, but also in generalized pattern recognition in this domain, via parsing. The information represented on sequences involves grammatical inference for pattern recognition. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 53–60, 2010. c Springer-Verlag Berlin Heidelberg 2010
54
V.F. L´ opez et al.
In this work a novel data mining process is described that combines hybrid techniques of association analysis and classical sequentiation algorithms of genomics to generate grammatical structures of a specific language. Subsequently, these structures are converted to Context-Free Grammars (CFG). Initially the method applies to Context-Free Languages with the possibility of being applied to other languages: structured programming, the language of the book of life expressed in the genome and proteome and even the natural languages. We used an application of the compiler generator called GAS 1.0 system [11], which represents an Integrated Development Environment (IDE) that allows a practical application the developed within the area for the automatic generation of language-based tools that starts from the traditional solutions and facilitates the use of formal language theory in other disciplines: Grammar-Based Systems (GBSs) [12]. The tool allows the complexity of the obtained grammar to be measured automatically from textual data. 1.1
Problem of Grammatical Inference
Grammatical Inference (GI) crosses a number of fields including machine learning, formal language theory, syntactic and structured pattern recognition, computational biology, speech recognition, etc. [6]. The problem of GI is the learning of a language description from language data. The problem of context free language inference involves practical and theoretical questions. Practical aspects include pattern recognition; an approach of pattern recognition is the CFG inference that builds a set of patterns [4].
2
Data Mining Procedure for the Grammatical Inference
The idea considers the experiences acquired [1], the literature and the existing theories [13,9,14], in order to process data that are not structured in relations or tables with differentiated attributes but are codified as a finite succession of sentences. The data mining procedure has the following phases: 1. Language generation by means of an CFG. This language will be the data source. 2. Codification of the strings of the language regarding its syntactic categories. 3. Dispensing with the initial grammar, Discovery of Sequential Patterns (DSP) [3] in the codified language. This discovery, called incremental, is a combination of the operation of the DSP and of the operation of the search for identical sequences. With this, patterns of sequences will be found and will be replaced by an identifier symbol. 4. Replace the discovered sequences by their identifiers. This way the identifier is stored and the sequence is stored as a production rule. 5. Repeat the two previous steps until all the sentences of the language are replaced by identifiers.
Data Mining for Grammatical Inference with Bioinformatics Criteria
2.1
55
Language Generation
We consider the CF G, Gæ proposed in [9] in the generation of arithmetic expressions; the majority of the programming languages are generated by grammars of this type. We can modify the formalism of this CF G in the following way (Table 1) to add a new syntactic construct to specify and search DNA patterns in data. A DNA molecule can then be represented as a finite string of symbols from this alphabet; a language, formally, is any set of such string [17]. The new grammar created Gæ does not change in essence the character of the original grammar. Table 1. Modification of the Gæ Grammar Gæ Grammar Gæ = (N, T, P, S) N = {Exp, N um, Dig, Op} T = {0, 1, +, ∗} P : Exp → Exp Op Exp| (Exp) | N um N um → Dig + Dig → 0|1 Op → +|∗
S = Exp
Gæ Grammar(modif ied) Gæ = (N , T , P, S) N = {E, d, b, o, a, c} T = {0, 1, +, ∗, (, )} P : E → E o E| aEc | n d → b+ b → 0|1 o → +|∗ a→( c →) S=E
With the previous criteria, a sample of the language generated by Gæ can be seen in the Figure 1, point (i). It is noted that each line corresponds to a sentence accepted by the grammar.
Fig. 1. Language of arithmetic expressions on which its grammar is inferred
56
2.2
V.F. L´ opez et al.
Language Codification
Considering the language that is generated with Gæ , all the symbols of T can be codified with the symbols of N . For this particular case the symbols to be used are {b, o, a, c} as syntactic categories. See the Figure 1, point (ii). 2.3
Incremental Discovery of Sequential Patterns and Associations
The hybrid discovery of sequential patterns applied to codified languages seeks key subsequences in the sentences of the language. Each subsequence q has a length Wq that indicates the number of symbols. In this particular case 1 ≤ Wq ≤ 5 and Q is defined as a string of length WQ . By convention, in the codified language many sentences exist that make up the population of the language. The idea consists of finding subsequences, identifying them with a symbol and replacing the appearances of the subsequences in the sentences of the population with that symbol, repeating the procedure until each sentence is identified by a single symbol (see Figure 2). The detailed steps of the general algorithm can be consulted in [2]. With the previous procedure production rules are generated that recognize the sentences of the language. The production rules number can be considerable, so we apply a particular method of simplification.
Fig. 2. Hybrid discovery of sequential patterns for the context-free languages
3 3.1
Experiments Rule Similarity
Considering the language Læ of arithmetic expressions, by applying the hybrid algorithm of DSP the production rules of Figure 3 were obtained. With the right hand rules, which form the sequential patterns of the language, a substitution
Data Mining for Grammatical Inference with Bioinformatics Criteria
57
Fig. 3. Production rules generated and some iterations in their simplification
matrix is computed, that shows the similarity values between terminal symbols. Similarity between a pair of consecutive symbols is related to the frequency of the symbols in the language (like the BLOSUM matrix) [5]. Subsequently, it is possible to make alignments among those sequences compacting them. In the substitution matrix m(i, j) each row i and each column j corresponds to a nonterminal symbol of the production rules generated. The symbols are ordered by frequency, this is, first d, A, C, o and so on. For Læ it generated 19 symbols A, B, ..., S that join with the symbols of the codification d, o, c and a they make up 23 non-terminal symbols (in the bioinformatics context, the symbols would correspond to the amino acids). The values of the matrix denote the importance of the alignment among the non-terminal symbols; for example, m(d, d) = 23 denotes a degree of high similarity between both symbols; m(d, A) = −1 denotes a degree of similarity of −1. 3.2
Rules Simplification and Compaction
With the right hand parts of the productions rules (where the first rules generated have greater importance) we search for similar sequences to compact them. The detailed steps of the algorithm can be consulted in [2] For example, for the language Læ the rules dod and aAc are not similar since m(dod,aAc) f (dod, aAc) = 0 since m(dod,dod) = −10−2−11 23+20+23 = −0.35 is not greater than 0.40. Nevertheless, the rules dod and doF are similar since, m(dod,doF) = 23+20−6 = 37 = 0.56. m(dod,dod) 23+20+23 66
58
V.F. L´ opez et al.
Fig. 4. Simplification and compaction of the production rules generated
This way, the generated rules are simplifed and compacted iteratively (Figures 3 and 4) until a grammar Gæ is built. The grammar Gæ is described in Table 2.
4
Practical Results Using GAS 1.0
GAS 1.0 [11] provides the basis to create new components in application fields possibly different from the traditional ones, more precisely in data mining for discovery of biological data. Following this aim, the languages like Læ created with the grammar Gæ were considered, for automatic design methods to generate analyzers and/or language translators that facilitate this task of parsing a string. In this respect, we used the compiler generator GAS 1.0 to automatically generate a scanner and a parser for the language specification. Taking as input the grammar specification Gæ , the syntactic analysis tables are created, giving as a result a Decorated Abstract Syntax Tree (DAST), or the syntactic error. The DAST reflects the grammar rules applied and gives a kind of structural description of grammatical features in the input string, exactly the kind of output that is desired in describing certain biological sequence data [17]. We used a measurement of complexity to the structure of the grammars that will provide a method to evaluate the syntactic framework of the grammars Gæ . The definition of the measurement comes from the concepts described in [10], which will be used in the objective evaluation of the quality of the grammars: – Number of non-terminals: This allows the size of a CFG to be measured, by applying fine degree metric whose use in the evaluation of the complexity of programs is focused on the number of procedures.
Data Mining for Grammatical Inference with Bioinformatics Criteria
59
Table 2. The generated Grammar Gæ
Gæ = (N , T , P , S ) N = {S, R, E, D, B, A, d, b, o, a, c} T = {0, 1, +, ∗, (, )} P : S → R|E|D|B|A|d R → DoA E → Cd|CB|CE|CA|CD D → BoB|BoE|BoD C → Ao B → aAC|adc A → dod|doB d → b+ b → 0|1 o → +|∗ a→( c →) S =S
– Ciclomatic complexity: In [15] and [16] the complexity of McCabe is defined, or ciclomatic complexity V of a flow graph G. This complexity is defined as follows: V (G) = A − N + 2 where A the number of edges of the flow graph and N the number of nodes. In the tool these metrics have been implemented. The measurement of the number of non-terminal elements is trivial. With regard to the ciclomatic complexity, in order to ease its understanding the associated graph is constructed. Our approach confirms the idea that the grammar complexity has been applied suc cessfully. For example for the grammars Gæ , the complexity is on the order of 9, which represents the minimum of all complexity of the sequences computed. Its high values confirm that the obtained grammars are good, and they can offer the best results for the analysis of biosequences, providing sufficient discrimination.
5
Conclusions
In the experiments, a language Læ generated by predetermined CFG Gæ is considered. But later none of the properties of that grammar were utilized to gen erate the set of production rules that then made up the grammar Gæ . We have proposed a new method of automatic generation of syntactic categories on a codified language. The approach can be extended to the processing of data that are believed to have a grammatical structure that could be automatically generated. The algorithm can be applied in different fields, and we can imagine finding something similar for the analysis of biosequences or for the natural languages. The IDE attenuates the complexity of the design of the grammar specification, improves the quality of the obtained product and sensibly diminishes the development time and cost. We tried to reduce the learning time for not-expert users
60
V.F. L´ opez et al.
in the area of compiler generation. The tool allows us measure the complexity of the obtained grammar automatically from textual data.
References 1. Aguilar, R.: Miner´ıa de datos. Fundamentos, t´ecnicas y aplicaciones. Salamanca University, Salamanca (2003) 2. Aguilar, R.: Descubrimiento incremental y alineaci´ on de patrones secuenciales en inferencia gramatical. Thesis for the Degree of Doctor in Computer Science. Salamanca University, Spain (2005) 3. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in knowledge discovery and data mining. MIT Press, Cambridge (1996) 4. Fu, K.S.: Syntactic methods in pattern recognition. Academic Press, London (1974) 5. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. National Academic Science 89, 10915–10919 (1992) 6. Higuera, C.: A bibliographical study of grammatical inference. Pattern Recognition (2004) 7. Jim´enez-Montao, M.A., Feistel, R., Diez - Mart´ınez, O.: On the information hidden in signals and macromolecules I. Symbolic time-series analysis (2003) 8. Jim´enez-Montao, M.A., Ortiz, R., Ramos, A.: Alfabetos reducidos para la compactaci´ on de secuencias de prote´ınas empleando m´etodos de miner´ıa de datos (2003) 9. Louden, K.C.: Compiler construction. Principles and practice. International Thomsom Publishing Inc. (1997) 10. L´ opez, V., Alonso, L., Moreno, M., Aguilar, R.: Aplicaci´ on de las m´etricas de calidad del software en la evaluaci´ on objetiva de gram´ aticas independientes de contexto inferidas. In: Moreno, M.N., y Garc´ıa, F.J. (eds.) Actas del I Simposio Avances en Gesti´ on de Proyectos y Calidad del Software, Salamanca, pp. 209–220 (2004) 11. L´ opez, V., S´ anchez, A., Alonso, L., Moreno, M.N.: A tool to create grammar based systems. In: Corchado, J.M., et al. (eds.) DCAI 2008. ASC, vol. 50, pp. 338–346. Springer, Heidelberg (2009) 12. Mernik, M., Crepinsek, M., Kosar, T., Rebernak, D., umer, V.: Grammar-Based systems: definition and examples. Univerty of Maribor (2004) 13. Mitra, S., Acharya, T.: Data mining. Multimedia, soft computing and bioinformatics. John Wiley and sons, Chichester (2003) 14. Moreno, A.: Lingu´ıstica computacional. Editorial S´ıntesis, Madrid (1998) 15. Piattini, M., Calvo-Manzano, J., Cervera, J., Fern´ andez, L.: An´ alisis y dise˜ no detallado de aplicaciones inform´ aticas de gesti´ on: una perspectiva de Ingenier´ıa del Software. Edit. Ra-Ma. Madrid (2004) 16. Pressman, R.S.: Ingenier´ıa del software, un enfoque pr´ actico. Quinta edici´ on. Edit. McGraw-Hill, Madrid (2002) 17. Searls, D.B., Dong, S.: A syntactic pattern recognition system for DNA sequences. In: Proc. 2nd Intl. Conf. on Bioinformatics, Supercomputing, and Complex Genome Analysis (1993) 18. Searls, D.B., et al.: Formal language theory and biological macromolecules (1999)
Hybrid Multiagent System for Automatic Object Learning Classification Ana Gil, Fernando de la Prieta, and Vivian F. López University of Salamanca, Computer Science Dpt, Plaza de la Merced s/n, 37007Salamanca, Spain {abg,fer,vivian}@usal.es
Abstract. The rapid evolution within the context of e-learning is closely linked to international efforts on the standardization of learning object metadata, which provides learners in a web-based educational system with ubiquitous access to multiple distributed repositories. This article presents a hybrid agent-based architecture that enables the recovery of learning objects tagged in Learning Object Metadata (LOM) and provides individualized help with selecting learning materials to make the most suitable choice among many alternatives. Keywords: learning object metadata, learning object repositories, federated search, e-learning, emerging e-learning technologies, neural networks.
1 Introduction One of the most widely accepted approaches within the context of e-learning is based on fragmenting the content into modular and self-contained units that can be reused in different environments and by different applications. The term learning objects (LO) is used to refer to these units. A LO can be considered a digital resource that is particularly apt for forming part of a course or other type of learning experience. One of the characteristics of LOs is that, by adding metadata to resources, they can be more easily managed. This means that metadata are created independently from the resource to which they are joined, in order to turn them into LOs. LOs are placed inside repositories so that they can be more easily stored and retrieved. The LO Repositories (LOR) are software systems that can store metadata either alone or together with educational resources Generally LORs provide some type of interface that allows for the recovery of LOs. Any interaction involved in the recovery of LOs can be carried out manually or automated across different software, such as an Agent architecture, or even by treating the LOs as Web Semantic Services. LORs have a high degree of heterogeneity in their characterizations with the coexistence of different standards and definitions. This implies a need to formalize the common repositories architecture while making them more flexible. Additionally, LOs have the possibility of being stored with different metadata formats addressing different types of conceptualizations. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 61–68, 2010. © Springer-Verlag Berlin Heidelberg 2010
62
A. Gil, F. de la Prieta, and V.F. López
This paper is structured as follows: section 2 explains the main concepts and characteristics that establish learning objects as the fundamental base within the current context of web-based e-learning. Section 3 introduces the proposed multi-agent architecture and the mechanism applied to retrieve LOs from different repositories using a Service Oriented Architecture (SOA). These LOs will be processed according to certain classification criteria that have been personalized and are considered most appropriate for the user. We conclude with section 4, which explains some of the more relevant aspects and work in progress.
2 The Actual Context of the E-Learning 2.1 Learning Objects The concept of learning objects has evolved into a central component within the current context of e-learning with web-based learning technology. The Learning Technology Standards Committee (LTSC) from the Institute of Electrical and Electronics Engineers defines a learning object as “any entity, digital or non-digital, which can be used, re-used or referenced during technology supported learning”. The generality of the definition can give way to practically any educational resource being considered a LO. As a result, the IEEE´s definition has been severely criticized since it does not clearly distinguish or identify what an LO actually is. By not agreeing on a universally accepted definition for a learning object, there has been a proliferation of ideas to define and delimit the boundaries of the concept [1, 2, 3, 4]. (Chiappe et al., 2007) recently described[5] a learning object as a digital, selfcontained and reusable entity with a clearly instructional content, containing at least three internal and editable components: content, learning activities, and elements of context. Additionally, learning objects should have an external information structure, the metadata, which can facilitate its identification, storage and retrieval. Given all of these possible definitions, it is possible to arrive at a certain consensus regarding LOs: they must be a minimal content unit (self-contained) that intends to teach something (instructional purpose) and can be reused (resability) on different platforms without any compatibility problems. It is essential that the LOs contain information that allows them to be searched for and identified through automatic recovery techinques that facilitate the task for which they were created, enabling a single object to be used at a low cost in different levels and educational disciplines. Existing standards and specifications about learning objects focus on facilitating the search, evaluation, acquisition, and reuse of learning objects so that they can be shared and exchanged across different learning systems. The most notable standards used for tagging LO with metadata are Dublin Core [6], MPEG-7[7] and, most importantly, Learning Object Metadata(LOM)[8]. Since 2002, the IEEE LOM has been the standard for specifying the syntaxis and semantics of learning object metadata. It uses a hierarchical structure that is commonly coded in XML, and includes element names, definitions, data types, taxonomies, vocabularies, and field lengths. LOM is focused on the minimal set of attributes needed to allow these learning objects to be managed, located and evaluated. LOM metadata descriptions support version management and maintenance,
Hybrid Multiagent System for Automatic Object Learning Classification
63
resource storage and recovery (searching, location, instantiation, packaging, editing,etc.) and resource sharing. 2.2 The Learning Objects Repositories In an attempt to facilitate its reusability, LOs are stored in public and private LOR. The previously mentioned LOR are highly heterogeneous, each with a different storage system, access to objects, query methods, etc. The heterogeneity is not in and of itself a problem, since there are currently different systems that are interoperable. [9]. One of the most important systems that has been increasingly used as an interface is SQI (Simple Query Interface) [10], which was normalized by CEN in 2005[11] and has been well defined by three APIs: Learning Objects Interoperability Framework, Authentication and Session Management and Simple Query Interface Specification.
Fig. 1. Learning Objects Interoperability Framework
SQI is an abstraction level between the internal logic of a repository and the different external client; it is a middleware defined generally enough to be used in different fields, independent of technology and protocol. However, this definition is not only generic on a technical level, but on a conceptual level, allowing different types of queries (synchronous and asynchronous) and user requests. Even more important is the fact that it does not define any specific query language or LO packaging. The basic functioning of a SQI interface is trivial; it is based on web services through which a client queries a LOR, usually in Very Simple Query Language (VSQL) [10] or Prolean Query Languaje (PLQL)[12]. The LOR then returns the LOs, usually packaged according to the LOM standard. This simple concept gave way to the birth of new types of applications dedicated to a federated search for learning objects in repositories. This software is used to perform simultaneous queries in different repositories, allowing a greater interoperability and, as a result, a better reusability of the resources where they are stored. As a result of these search applications, the topology of LO search systems has changed drastically. Figure 2 provides a graphical representation of the following classification for search systems:
64
A. Gil, F. de la Prieta, and V.F. López
• • • •
Autonomous repositories. Those that do not have a system allowing external searches and, as a result, require manual searches. Repositories. Those that have an external search interface and can be included in an automatic search system. Repositories with Federated search system. Those that, in addition to performing internal searches, can also perform automatic searches in other repositories. Federated search systems. Systems that can perform federated searches in different repositories; have the advantage of being able to perform filtering, cataloguing, etc. [13].
Fig. 2. Topology of Learning Object Repository
Because of continual research in search systems, the ability to create standardized and interoperability processes that can be applied to recovering LO has made it possible to formalize search and retrieval processes for LO in different repositories. One clear cut example would be the SILO search engine (the current Indexation and Query tool for learning objects) from the ARIADNE infrastructure [14]. Nevertheless, there are still many differences to overcome, among which the most important are: • Excessive response time for the repositories. • Limited number of results. • An elevated percentage of errors when accessing the repositories, which is primarily due to the numerous occasions that the repositories either do not respond to the queries, or are simply not functioning, as shown in figure 3. Beyond just the functioning of the repositories, which is partly due to the fact that they are young systems in constant evolution, is the fact that they lack other basic characteristics that are expected of any general search engine, such as classification
Hybrid Multiagent System for Automatic Object Learning Classification
65
Fig. 3. Faults vs. correct answer in the LOR retrieval
tasks, sorting results, the use of different filtering techniques (such as the collaborative technique), the automated management of repositories and the extraction of statistics that serve to improve the global query process.
3 Hybrid Multiagent-System for Automatic Classification Tools used to search and find Learning Objects in different systems do not provide a meaningful and scalable way to compare, rank and recommend learning material. As a solution to the problems observed with LOR search systems, we propose a hybrid system that integrates an agent based architecture that will serve to solve the issue of the federated search in repositories storing learning objects, and a neural network that sorts the obtained results. The system will be designed with the primary goal of performing simultaneous searches in various LOR, with a subsequent filtering and sorting process based on criteria related to the quality of the recovered elements. The system design will also include secondary objectives such as a series of characteristics that can be found in any search system; the idea being to attempt to homogenize the heterogeneous environment previously presented in this article. The system will establish uniformity in the automated management repositories, incorporating a search history and a statistical system. Each of these functionalities complements the search tool and facilitates its use in the educational sector. The system architecture is composed of two basic blocks: the interface, and the search system. These two blocks comprise the foundation of the system´s functionality. The primary interface is used for the communication between users and the search tool. A critical function in this block is the ability to take statistical data and used it to subsequently sort the results. The search system constitutes the core of the application and simultaneously performs tasks involving communication, extracting metadata, quality control, and sorting the LO.
66
A. Gil, F. de la Prieta, and V.F. López
Both blocks are designed using a hybrid system comprised of agents and a neural network that sorts the results. The agents are responsible for the communication, task flow, quality control, extrapolating statistical data, etc. The agents can modify their behavior to find the best solution for a problem, adjusting their behavior according to the knowledge they have acquired, a series of statistical data that they gather during each interaction with the repository containing learning objects, and what the end-user does with the results that have been provided. The following list explains the pre-defined agents that provide the basic functionalities of the architecture, as illustrated in Figure 4: • Repository agent. This agent is responsible for performing searches with the various repositories, extracting metadata, quality control for the LE received, and optimizing the search system. There will be one agent for each of the repositories so that multiple searches can be performed simultaneously. • Sort agent. Responsible for verifying, controlling and coordinating the results from the neural network, and classifying and cataloguing the results. • Statistical agent. This agent is responsible for gathering the statistical data obtained from the repositories and the interaction between the users and the search tool. It also provides the supervisor agent with the appropriate statistical data needed to effectively coordinate the tasks. • Supervisor agent. Responsible for supervising the other agents, and for coordinating tasks. It obtains data from the statistical agent and adapts the tasks to the system according to different variables, such as the state of communication, the system load, etc. A neural network is used for the sorting process, since it is specially designed for classification tasks such as those involved in the present study. In order to carry out the sorting process, it is necessary to establish a ranking system for the LO, indicating the rank for each LO. According to each position, the sort agent is responsible for presenting the results to the end-user working with the application. The learning process for
Repository 1 Neural Network
Repository 2
Client
Repository n
Fig. 4. Agent-Based architecture
Hybrid Multiagent System for Automatic Object Learning Classification
67
the network is supervised since the data used are gathered from the iteration with the users to evaluate the proposed ranking, and the weight is adjusted for each iteration.
4 Results and Conclusions The search and location services for educational content, and specifically LO, presented in this paper constitute the core of the development of distributed, open computer-based educational systems. For this reason the research in this area has been so active in recent years.
Fig. 5. Query duration and Query results
The design of the agent based architecture that we have constructed is ideal for solving problems that have been noted in LOR. It allows the system to adapt according to the workload and the state of the repositories with respect to the statistics data. One of the most significant advances that has been achieved is the reduction in the response time for the final results of the application, despite the high response time for LORs. Figure 5 clearly demonstrates these results. The sorting system proposed is also very convenient, given that the LOM standard does not define a minimal set of fields that a LO must have; this makes it difficult to evaluate if a LO has a sufficient quality. Using the feedback provided by the users, from the daily use of the application, the network goes through a learning process, which allows it to continually improve its results.
68
A. Gil, F. de la Prieta, and V.F. López
If we consider the results obtained with the proposed system, we are confident that the adaption and learning features of an agent-based architecture makes it ideal for solving federated search problems in heterogeneous repositories. Acknowledgements. This work has been supported by the MICINN TIN 200913839-C03-03 project and funded by FEDER.
References 1. Friesen, N.: Three Objections to Learning Objects. In: McGreal, R. (ed.) Online Education Using Learning Objects (2004), World Wide Web http://learningspaces.org/ n/papers/objections.html (retrieved February 15, 2010) 2. Sosteric, M., Hesemeier, S.: When is a Learning Object not an Object: A First Step towards a theory of learning objects. International Review of Research in Open and Distance Learning 3(2) (October 2002) 3. Wiley, D.A.: Connecting learning objects to instructional design theory: A definition a metaphor, and a taxonomy. In: Wiley, D.A. (ed.) The Instructional Use of Learning Objects. Association for Educational Communications and Technology, Bloomington (2001), World Wide Web http://www.reusability.org/read/ (retrieved March 23, 2009) 4. Polsani, P.: Use and Abuse of Reusable Learning Objects. Journal of Digital Information, Article No. 164, 3(4) (2003) 5. Chiappe, A., Segovia, Y., Rincon, H.Y.: Toward an instructional design model based on learning objects. Educational Technology Research and Development 55, 671–681 (2007) 6. DCMI Specifications, http://dublincore.org/specifications/ (retrieved February 18, 2010) 7. MPEG-7, MPEG Home Page, http://mpeg.chiariglione.org/standards/ mpeg-7/mpeg-7.htm (retrieved March 29) 8. IEEE 1484.12.1-2002, Final Draft Standard for Learning Object Metadata (LOM). The Institute of Electrical and Electronics Engineers, Inc., http://ltsc.ieee.org/wg12/files/LOM_1484_12_1_v1_Final_Draft .pdf (retrieved February 23, 2010) 9. Dagger, D., O’Connor, A., Lawless, S., Walsh, E., Wade, V.P.: Service-Oriented E-Learning Platforms: From Monolithic Systems to Flexible Services. IEEE Internet Computing 11(3), 28–35 (2007) 10. Simon, B., Massart, D., van Assche, F., Ternier, S., Duval, E., Brantner, S., Olmedilla, D.: A Simple Query Interface for Interoperable Learning Repositories. In: Workshop on Interoperability of Web-Based Educational Systems in conjunction with 14th International World Wide Web Conference (WWW 2005), Chiba, Japan (May 2005) 11. European Committe for Standardization, A Simple Query Interface Specification for Learning Repositories (November 2005) 12. Ternier, S., Massart, D., Campi, A., Guinea, S., Ceri, S., Duval, E.: Interoperability for Searching Learning Object Repositories. The ProLearn Query Language. D-Lib Magazine 14(1/2) (January/February 2008) 13. De la Prieta, F., Gil, A.: A Multi-agent System that Searches for Learning Objects in Heterogeneous repositories. In: Demazeau, Y., et al. (eds.) Trends in Practical Applications of Agents and Multiagent Systems: 8th International Conference on Practical Applications of agents and multiagent systems (PAAMS 2010). Advances in Intelligent and Soft Computing, pp. 355–362. Springer, Heidelberg (April 2010) 14. Ternier, S., Verbert, K., Parra, G., Vandeputte, B., Klerkx, J., Duval, E., Ordóñez, V., Ochoa, X.: The Ariadne Infrastructure for Managing and Storing Metadata. IEEE Internet Computing 13(4), 18–25 (2009)
On the Use of a Hybrid Approach to Contrast Endmember Induction Algorithms Miguel A. Veganzones and Carmen Hern´andez Computational Intelligence Group, UPV/EHU Facultad Informatica, Paseo Manuel de Lardizabal San Sebastian, Spain www.ehu.es/ccwintco
Abstract. In remote sensing hyperspectral image processing, identifying the constituent spectra (endmembers) of the materials in the image is a key procedure for further analysis. The contrast between Endmember Inductions Algorithms (EIAs) is a delicate issue, because there is a shortage of validation images with accurate ground truth information, and the induced endmembers may not correspond to any know material, because of illumination and atmospheric effects. In this paper we propose a hybrid validation method, composed on a simulation module which generates the validation images from stochastic models and evaluates the EIA through Content Based Image Retrieval (CBIR) on the database of simulated hyperspectral images. We demonstrate the approach with two EIA selected from the literature.
1 Introduction The high spectral resolution provided by current hyperspectral imaging devices facilitates identification of fundamental materials that make up a remotely sensed scene [1,7]. In the field of hyperspectral image processing, identify the constituent spectra (endmember) of the materials in the image is a key procedure for further analysis, i.e., unmixing, thematic map building, target detection, unsupervised segmentation. A library of known pure ground image spectra or laboratory sample spectra could be used. However, this poses several problems, such as the effects of the illumination on the observed spectra, the difference in sensor intrinsic parameters and the a priori knowledge about the material composition of the scene. Besides the methodological questions, this approach is not feasible when trying to process large quantities of image data. Current approaches try to induce automatically the endmembers from the image data itself, the so called Endmember Induction Algorithms (EIA). They try either to select some image pixel spectra as the best approximation to the endmembers in the image (i.e. [5]), or to compute estimations of the endmembers on the basis of the transformations of the image data (i.e. [6,11]). The comparison among the relative performances of these algorithms is a delicate issue. In essence, these algorithms are unsupervised: they explore the data or transformations of the unlabeled data. Therefore, validation approaches based on the quality of some classification performance measure may be inaccurate. Besides, there are big difficulties in obtaining good quality labeled hyperspectral test images. In this work E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 69–76, 2010. c Springer-Verlag Berlin Heidelberg 2010
70
M.A. Veganzones and C. Hern´andez
we propose a hybrid approach for validation. The first part of the approach is a hyperspectral image simulation module based on random field generation approaches. This module is used to generate the test images, with known endmembers and realistic abundance spatial distribution ground truth, that will be used for the comparison between algorithms. The second part of the approach consists of a Content Based Image Retrieval (CBIR) [10,3] scheme based on a distance defined on the set of endmembers induced from the image. We do not impose classification like schemes as the performance measures, but we evaluate the ability of the algorithms to uncover the underlying mixtures. We apply this methodology to compare the Endmember induction Heuristic Algorithm (EIHA) [5], with respect to the well known geometrical algorithm N-FINDER [11]. The structure of the paper is as follows: In section 2 we detail the proposed EIAs contrast methodology based on CBIR systems. Section 3 gives a short review of the the algorithms compared in this demonstration of the approach. In section 4 we define the experiments and present the results. Finally, we give some conclusions in section 5.
2 Contrast of EIAs Based on CBIR In this section we will first describe the details of the simulation module that provides the test images, then we present the similarity measure between hyperspectral images and finally we describe the comparison methodology as a whole. 2.1 Synthetic Hyperspectral Image Module The hyperspectral images used for the algorithm contrast are generated as linear mixtures of a set of spectra (the ground-truth endmembers) with synthesized abundance images. The ground-truth endmembers were randomly selected from a subset of the USGS spectral library. The synthetic ground-truth abundance images were generated in a two-step procedure. First, we simulate each abundance as a gaussian random field with Matern correlation function of parameters θ1 = 10 and θ2 = 1. We applied the procedure proposed by [8] for the efficient generation of a big domain of gaussian random fields. Second, to ensure that there are regions of almost pure endmembers, we selected for each pixel the abundance coefficient with the greatest value and we normalize the remaining coefficients to ensure that the normalized abundance coefficients sum up to one. It can be appreciated on the abundance images that each endmember has several regions of almost pure pixels, viewed as brighter regions in the images. We have synthesized a total of 6000 hyperspectral images divided in three datasets of 2000 images each. Each dataset is defined by the number of endmembers in the repository of ground-truth endmembers. We defined three repositories of ground-truth endmembers with 5, 10 and 20 endmembers each, representing an increasing diversity in the materials present in the dataset. The size of the images is 256x256 pixels with 269 spectral bands each. For each dataset we have generated collections of 500 images by the following procedure:
On the Use of a Hybrid Approach to Contrast Endmember Induction Algorithms
71
– First, we randomly decide the number of endmembers in the image, between 2 and 5. – Second, we select the image ground-truth endmembers from the corresponding repository of ground-truth endmembers. – Third, we generate the synthetic abundance images corresponding to each endmember, applying the corrections commented before. Figure 1 shows a subset of the collection of ground-truth endmembers. Figure 2 shows an example of the selected endmembers and the generated images of abundances to synthesize an hyperspectral image.
asphalt gds367 brick gds350 cardboard gds371 cedar gds360 dust debris wtc01 fabric gds436 fiberglass gds374 nylon gds432 particleboard gds364 plasticfilm gds402
0.7
0.6
Reflectance
0.5
0.4
0.3
0.2
0.1
0 0
50
100
150
200
250
Band
Fig. 1. Subset of endmembers selected from the USGS library to synthesize the hyperspectral images datasets
2.2 Dissimilarity between Hyperspectral Images A CBIR system is based on the definition of a similarity measure between the images. For hyperspectral images, two kind of informations can be used to build such a dissimilarity measure: the spectral and the spatial informations. Because we are interested in exploiting the spectral information, each hyperspectral image H is characterized by a set of induced endmembers E. A dissimilarity measure between two hyperspectral images, S (Hξ , Hγ ), is defined in terms of the distances between their corresponding sets of endmembers. Let it be Eξ = eξ1 , eξ2 , . . . , eξpξ the set of endmembers induced from the image Hξ in the database, where pξ is the number of induced endmembers from the ξ-th image. Given two images, Hξ , H γ , we compute the following matrix whose elements are the
72
M.A. Veganzones and C. Hern´andez 0.7
1 0.9
0.6
50
0.8
Reflectance
0.5
0.7 100
0.4
0.6 0.5
0.3
150
0.4 0.3
0.2
200
0.2
0.1
0.1 0
250 0
50
250
200
150
100
50
Band
100
150
200
250
1
1
0.9 50
0.8
0.9 50
0.8
0.7 100
0.6
0.7 100
0.6
0.5 150
0.4
0.5 150
0.4
0.3 200
0.2
0.3 200
0.2
0.1 250 50
100
150
200
250
0
0
0.1 250 50
100
150
200
250
0
Fig. 2. Example of the endmembers and abundance images used to generate a synthetic hyperspectral image. This example corresponds to a 10-endmembers dataset image using 3 endmembers.
distances between the pairs of endmembers built as all the possible combination of endmember from each image: Dξ,γ = [di,j ; i = 1, . . . , pξ ; j = 1, . . . , pγ ] ,
(1)
where di,j is any defined distance between the endmembers eξi and eγj , i.e. the Euclidean 2 distance, di,j = eξi − eγj . Then the dissimilarity between the images is given as a function of the distance matrix (1) by the following equation: S (Hξ , Hγ ) = (mr + mc ) (|pξ − pγ | + 1) ,
(2)
where mr and mc are the vectors built of the minimal values of the distance matrix, Dξ,γ , computed across rows and columns respectively. That is, the elements of the the row vector of minima are computed as follows: mr,i = min {dij } ; i = 1, . . . , pξ . j
Note that the endmember induction algorithm can give different number of endmembers for each image. The proposed dissimilarity function can cope with this asymmetry avoiding the combinatorial problem of trying to decide which endmembers can be matched and what to do in case that the number of endmembers is different from one image to the other. 2.3 Methodology for the Contrast of EIAs We propose here the summary methodology for the comparison among EIAs which hybridizes hyperspectral image simulation and CBIR performance measurements. The
On the Use of a Hybrid Approach to Contrast Endmember Induction Algorithms
73
CBIR approach is based on the dissimilarity measure (2) presented in the previous section. The comparison methodology consists on the following steps: 1. Build a database of synthetic hyperspectral images using a set of ground truth endmembers and simulated abundance images. 2. Compute the dissimilarity between each image pair in the database on the basis of the image ground truth endmembers. For each image rank the remaining images in the database with respect to their ground truth dissimilarity to it. 3. For each image in the dataset compute its endmembers using the EIAs. We will obtain as many endmember sets per image as EIAs are to be compared. 4. Compute the dissimilarity between each image pair in the database on the basis of the induced endmembers. For each image rank the remaining images in the database with respect to their induced dissimilarity to it. 5. Compare the rankings obtained by the use of ground truth endmembers and induced endmembers. First step involves the generation of in-lab controlled hyperspectral datasets. Although the ground-truth endmembers used to generate the synthetic images are going to be used to validate the EIA performance, we are not interested in compare them directly with the induced endmembers. The induced endmembers could have great differences respect to the real ones, but still they could retain enough discriminative information for the problem we are trying to solve, being of high relevance. Second step computes the dissimilarity measures between each image in the dataset using the ground-truth endmembers. This provide us the expected results for a given query, and so, the point of reference to define the performance measures. Third step makes use of an EIA to induce the endmembers from each image in the dataset. In the fourth step, those induced endmembers will be used to obtain the disimilarites between each image in the dataset, using the same dissimilarity function than in the step two. In figure 3 we illustrate how the dissimilarity ranking can vary when computed on the ground truth endmembers (blue line) or the induced endmembers (red line). An error measure of the induced endmembers could be the area between both lines, however we are not interested in such kind of measures, we prefer a more qualitative evaluation in terms of the recalling power of the CBIR system built over the above dissimilarity measure. From a CBIR point of view, the objective is to retrieve the k more similar images from a dataset given a query image. This methodology compares the results of a set of queries using the induced endmembers to the results using the ground-truth endmembers. This would indicate the ability of the used EIA to retrieve spectral information relevant for CBIR purposes. Step five use precision and recall measures to compare the EIAs on the basis of CBIR performance. Precision is defined as the fraction of the retrieved images that are relevant to the query, and recall as the fraction of the total number of relevant images (contained in the archive) that are retrieved [2]: | |R∩T | precisionK = |R∩T |T | and recallK = |R| , where T is the set of returned images and R is the set of images relevant to the query of size K.
74
M.A. Veganzones and C. Hern´andez
140
120
Groundtruth endmembers Induced endmembers
Dissimilarity
100
80
60
40
20
0 0
200
400
600
800
1000
Ranking
1200
1400
1600
1800
2000
Fig. 3. Dissimilarity respect to one image in the database, based on ground truth endmembers (blue line) and based on induced endmembers (red line). Images are ordered according to increasing ground truth dissimilarity.
3 Endmember Induction Algorithms Following the definition of the linear mixing model [7], the hyperspectral images are defined as the result of the linear combination of the pure spectral signature of ground components, so-called endmembers. Let E = [e1 , . . . , ep ] be the pure endmember signatures (normally corresponding to macroscopic objects in scene, such as water, soil, vegetation, ...) where each ei ∈ RL is an L-dimensional vector. Then, the hyperspectral p signature r at each pixel on the image is defined by the expression: r = i=1 ei φi + n, where the hyperspectral signature r is formed by the sum of the fractional contributions of each endmember and an independent additive noise component n. φ is the p-dimensional vector of fractional abundances at a given pixel . This equation can be extended to the full image as follows: H = EΦ + n, where H is the hyperspectral image and Φ is a matrix of fractional abundances. Therefore, the linear mixing model assumes that the endmembers are the vertices of a convex set that covers the image data. Because the distribution of the data in the hyperspace is usually tear-shaped most of the geometrical EIAs look for the minimum simplex that covers all the data. The N-FINDER [11] is one of the algorithms following this approach. It works by inflating a simplex inside the data, beginning with a random set of pixels. Previously, data dimensionality has to be reduced to n − 1 dimensions, being n the number of endmembers searched for. The algorithm starts by selecting an initial random set of pixels as endmembers. Then for each pixel and each endmember, the endmember is replaced with the spectrum of the pixel and the volume recalculated. If the volume increases, the endmember is replaced by the spectrum of the pixel. The procedure ends when no more replacements are done. The algorithm needs of some random initializations to avoid local maxima. The second algorithm tested is Endmember Induction Algorithm (EIHA) was fully described in [5], so that here we will only recall some of its main features. The algorithm
On the Use of a Hybrid Approach to Contrast Endmember Induction Algorithms
75
is based on the equivalence between Strong Lattice Independence and Affine Independence [9]. Strong Lattice Independence is a concept born in the field of Morphological Associative Memories, which became the field of Lattice Associative Memories. A set of vectors is said to be Lattice Independent if no one of them is a Linear Minimax combination of the remaining ones. It is Strong Lattice Independent if moreover there is min or max dominance defined on the set. One way to find sets of Strong Lattice Independent vectors is to progressively build Lattice Auto-Associative Memories (LAAM) with the detected endmembers. Because of the convergence properties of the Lattice AutoAssociative Memories, lattice dependent vectors will be recall-invariant, so lattice independent vectors can be detected as non-recall-invariant vectors. The EIHA proposed in [5] includes a noise filter that discards candidate vectors which are too close to the already detected endmembers.
4 Experimental Results Figure 4 shows the precisionk (H) and recallk (H) results of the N-FINDER and EIHA (denoted LAM in the figures) algorithms respect to three defined synthetic hyperspectral image databases, generated from a collection of 10 basic endmembers selected from the USGS library of spectral signatures, for all possible values of the size of the response K using the dissimilarity function 2. It can be appreciated that the behavior of both algorithms is quite similar. The recall is very low when the size of ground truth repository is 5, increases with the repository size, meaning that a greater variety of ground truth endmembers improves the probability of recalling relevant images. Contrary to that, the precision is greater for the smaller repository, and increasing the repository size decreases the precision of the responses. The precision of the EIHA in always better for small query size and for very big query sizes. There is some intermediate query size region where the precision of N-FINDER improves that of EIHA. Overall, both 1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
Euclidean Precision Euclidean Recall SAM Precision SAM Recall
0.3
0.2
0.2
0.1
0.1
0 1
Euclidean Precision Euclidean Recall SAM Precision SAM Recall
0.3
3
5
10
20
50
100
K
200
500
1000
0 1
2000
3
5
10
20
50
K
100
200
500
1000
2000
1
0.9
0.8
0.7
0.6
0.5
0.4
Euclidean Precision Euclidean Recall SAM Precision SAM Recall
0.3
0.2
0.1
0 1
3
5
10
20
50
K
100
200
500
1000
2000
Fig. 4. Precision and recall results for each dataset: (a) 5 endmembers dataset (b) 10 endmembers dataset (c) 20 endmembers dataset
76
M.A. Veganzones and C. Hern´andez
algorithms performance is comparable, and the selection of the most appropriate depends on the application setting. The query size may be the criterion for the selection.
5 Conclusions We propose a hybrid approach for the evaluation and comparison of Endmember Induction Algorithms (EIA). First a simulation module generates tailored databases of realistic hyperspectral images. Instead of the conventional classification performances we propose the use of CBIR based performance measures, where the CBIR is based on the spectral information of the images, that is, the dissimilarity between images is computed based on the distances between the sets of endmembers that characterize spectrally the image. We have show some results comparing two EIA from the literature. This comparison allows to identify some problem dependent parameters that would justify the selection of one algorithm over the other: query size, ground truth endmember variety. Further work may be addressed to test new EIA in this framework, and to the consideration of hierachical issues [4].
References 1. Clark, R.N., Roush, T.L.: Reflectance spectroscopy: Quantitative analysis techniques for remote sensing applications. Journal of Geophysics Research 89(B7), 6329–6340 (1984) 2. Daschiel, H., Datcu, M.: Information mining in remote sensing image archives: system evaluation. IEEE Transactions on Geoscience and Remote Sensing 43(1), 188–199 (2005) 3. Datcu, M., Seidel, K.: Human centered concepts for exploration and understanding of satellite images. In: IEEE Workshop on Advances in Techniques for Analysis of Remotely Sensed Data, pp. 52–59 (2003) 4. Gra˜na, M., Torrealdea, F.J.: Hierarchically structured systems. European Journal of Operational Research 25, 20–26 (1986) 5. Grana, M., Villaverde, I., Maldonado, J.O., Hernandez, C.: Two lattice computing approaches for the unsupervised segmentation of hyperspectral images. Neurocomputing 72, 2111–2120 (2009) 6. Ifarraguerri, A., Chang, C.-I.: Multispectral and hyperspectral image analysis with convex cones. IEEE Transactions on Geoscience and Remote Sensing 37(2), 756–770 (1999) 7. Keshava, N., Mustard, J.F.: Spectral unmixing. IEEE Signal Processing Magazine 19(1), 44– 57 (2002) 8. Kozintsev, B.: Computations with Gaussian Random Fields, PhD Thesis. University of Maryland (1999) 9. Ritter, G.X., Gader, P.: Fixed points of lattice transforms and lattice associative memories. In: Advances in imaging and electron physics. Advances in imaging and electron physics, vol. 143, p. 264. Academic press, London (2006) 10. Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(12), 1349–1380 (2000) 11. Winter, M.E., Descour, M.R., Shen, S.S.: N-FINDR: an algorithm for fast autonomous spectral end-member determination in hyperspectral data, Denver, CO, USA, October 1999, vol. 3753, pp. 266–275. SPIE, San Jose (1999)
Self-emergence of Lexicon Consensus in a Population of Autonomous Agents by Means of Evolutionary Strategies Dar´ıo Maravall1,2, Javier de Lope2 , and Ra´ ul Dom´ınguez2 1
Dept. of Artificial Intelligence, Faculty of Computer Science Universidad Polit´ecnica de Madrid 2 Centro de Autom´ atica y Rob´ otica (UPM – CSIC) Universidad Polit´ecnica de Madrid
[email protected],
[email protected],
[email protected] Abstract. In Multi-agent systems, the study of language and communication is an active field of research. In this paper we present the application of evolutionary strategies to the self-emergence of a common lexicon in a population of agents. By modeling the vocabulary or lexicon of each agent as an association matrix or look-up table that maps the meanings (i.e. the objects encountered by the agents or the states of the environment itself) into symbols or signals we check whether it is possible for the population to converge in an autonomous, decentralized way to a common lexicon, so that the communication efficiency of the entire population is optimal. We have conducted several experiments, from the simplest case of a 2 × 2 association matrix (i.e. two meanings and two symbols) to a 3 × 3 lexicon case and in both cases we have attained convergence to the optimal communication system by means of evolutionary strategies. To analyze the convergence of the population of agents we have defined the population’s consensus when all the agents (i.e. the 100% of the population) share the same association matrix or lexicon. As a general conclusion we have shown that evolutionary strategies are powerful enough optimizers to guarantee the convergence to lexicon consensus in a population of autonomous agents. Keywords: Multi-agent systems; Evolution of artificial languages; Computational semiotics; Evolutionary strategies; Self-collective coordination; Evolutionary language games; Signaling games.
1
Introduction
“La langue n’existe qu’en vertu d’un sorte de contrat pass´e entre les membres de la communaut´e.” F. de Saussure (C.L.G., 1916) In a multi-agent system (e.g. a multi-robot team) the obtaining of a common lexicon or vocabulary is a basic step towards an efficient performance of the whole system. In this paper we present the application of evolutionary strategies to the emergence of a common lexicon in a population of autonomous agents. We E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 77–84, 2010. c Springer-Verlag Berlin Heidelberg 2010
78
D. Maravall, J. de Lope, and R. Dom´ınguez
model the vocabulary or lexicon of each agent as an association matrix or lookup-table that maps the meanings (i.e. the objects and states of the environment) into symbols or signals. According to a long and well-stablished line of thought culminating with the work of the Swiss linguist Ferdinand de Saussure [1] and the American philosopher Charles S. Peirce [2], the pioneer of Semiotics, the association of the symbols of a language to their meanings are (1) arbitrary and (2) conventional. In this paper we use arbitrarily (in fact, randomly) initialized association matrices, for each agent and through and evolutionary process based on communicative or linguistic interactions. (Some authors, inspired in the ideas of the Austrian philosopher Ludwig Wittgenstein, have called them language games [3,4,5], although we believe a more founded denomination should be communication or signaling games as defined by David K. Lewis [6].)
2 2.1
Formal Definitions Multi-agent Communication System
We define a Communication System, CS, in a population of agents as the triple: CS M, Σ, Ai
(1)
where M = {m1 , . . . , mp } is the set of meanings (i.e. the objects or states in the environment that can be of relevance for communication in the population of agents, Σ = {s1 , . . . , sn } is the set of symbols or signals used by the agents in their communication acts and which represent the actual meanings, Ai (i = 1, . . . , N ) are the association matrices of the agents defining their specific associations between meanings and symbols: Ai = (arj )i
;
i = 1, . . . , N agents
(2)
in which the entries arj of the matrix A are nonnegative real numbers such that 0 ≤ arj ≤ 1, (r = 1, . . . , p; j = 1, . . . , n). These entries a give the strength of the association of meaning mr to symbol sj ; such that arj = 0 indicates no association at all and arj = 1 indicates total association. Note that these quantitative associations have a deterministic and nonprobabilistic nature so that the associations between meanings and symbols are based on the maximum principle, which means that the maximum value of the entries in a row (column) gives the valid association. An ideal, optimum association matrix is purely binary (the entries are either 0 or 1) and also have the additional restriction of having in each row only one 1 (i.e. no sinonyms are allowed) and having a unique 1 in each column, too (no homonyms are allowed).
3
Optimal Sausserean Communication System
As commented above and according to the semiotic tradition, the associations of the symbols of a lexicon to their meanings are arbitrary and conventional.
Self-emergence of Lexicon Consensus in a Population of Autonomous Agents
79
The arbitrariness of the association of a symbol (the signifier) to its meaning (the signified) means that the entries a for each agent’s association matrix A are arbitrarily assigned. The conventional nature of these assocaitions means that all the agents of the population must have the same ideal and optimum association matrix in order to attain an optimal communication performance (we call Sausserean such an optimal communication system). This situation is called lexicon or vocabulary consensus and is a hard multi-agent coordination problem.
4
Experimental Results
We have conducted several experiments from the simplest case of a 2 × 2 association matrix to a 3 × 3 association matrix. The aim in all these experiments is to check whether it is possible to obtain with evolutionary strategies an optimal Sausserean communication system. We have also investigated the effect of the population size on the convergence results. Concerning the influence of what can be called agents communication connectivity structure, we have focused in our simulations on the particular case in which each agent communicates with the rest of the population without any restriction. As commented in the introduction above, in this paper we use an arbitrarily — in fact, at random— initialized association matrix for each agent and through an evolutive dynamic process (implemented by means of evolutionary strategies) of communicative interactions of the population of agents we obtain a final lexicon consensus, or perfect coordination, such that all the agents converge to the same vocabulary. Once all the agents’ vocabularies or association matrices have been randomly initialized, then starts the interactive communicative process in which each agent communicates with the rest of the population. For each communicative act taking place between two agents there are two possible outcomes: (1) succcess (i.e. both agents are able to understand each other) and (2) failure (i.e. the two agents are unable to understand each other). Whenever a successful communicative act has ocurred the fitness of the two successful communicators are increased by the unity. After all the agents have communicated with all the remaining agents the current generation ends and a new generation is created (see the flow-chart in Fig. 1). Obviously, the higher an agent’s fitness, the higher its probability to pass to the next generation. Each communicative act between two agents is performed as follows: one of the two agents act as the sender (or speaker) and the other one acts as the receiver (or listener). Then, the speaker, using its own association matrix, sends all the existing meanings and the listener, in turn, decodes the symbols sent by the speaker, so that if the meaning decoded by the listener coincides with the meaning sent by the speaker, then a success has occurred. What agent assumes the role of speaker is irrelevant, as the possibilities of success do not change and the reward is the same for both agents.
80
D. Maravall, J. de Lope, and R. Dom´ınguez
Fig. 1. The flow-chart describes the communicative act between agents
Finally, another key point to understand the working of the evolutionary strategies in our experiments is the coding of the association matrix in each individual and how its genes might mutate between generations. Each gene corresponds to a value of a communication matrix, during the mutation process, one of the genes will mutate with a certain probability, i.e. one component of the correspondent communication matrix will change its value (e.g. from 1 to 0). 4.1
The Simplest Case of 2 × 2 Association Matrices
We have started our experimentation with a simple case of two meanings and two symbols. We consider ideal matrices those which, being the same for each communicator, can not lead to misunderstanding (i.e. no homonym exists). For this particular case, there are two ideal association matrices which are the following: 01 10 2 2 M1 = M2 = (3) 10 01 Thus, our aim is to check if by applying evolutionary strategies it is possible that a population of N agents converge to the optimal Sausserean communication system in which all the agents share one of the optimum association matrices given above. To analyze the convergence of the population it is necessary, first, to define numerically the lexicon consensus concept. Thus, we consider that the population has attained lexicon consensus when all the agents (i.e. the 100% of the population) have converged to the same association matrix. If this common lexicon or vocabulary of the population coincides with some of the optimum association matrices given above, then we can say that the population has converged to the optimum Sausserean communication system. Fig. 2(abc) show the average fitness of the population versus the number of generations for different population sizes. It can be noticed that in all the cases the average fitness increases progressively towards the maximum fitness for every population size. The lexicon consensus has been achieved in all the cases.
Self-emergence of Lexicon Consensus in a Population of Autonomous Agents
81
Fig. 2(d) shows the evolution of each individual fitness versus the number of generations for a population size of N = 200. Each agent fitness is represented by a dot in the figure and when several agent share the same fitness their dots are overlapped. The figure is interesting because it describes how the consensus is achieved. There is an almost random behavior for the initial generations (until the 20th generation aprox.). At that moment, the majority of agents use the same association matrix. As a consecuence, the average population fitness starts to grow up (see Fig. 2(c)). The evolutionary process goes on and the agents that used incorrect or different association matrices evolve to the same matrix. This is achieved by all the agents around the 120th generartion.
(a)
(b)
(c)
(d)
Fig. 2. Population average fitness for (a) N = 10, (b) N = 50, and (c) N = 200. (d) Agents fitness evolution for N = 200.
Table 1 presents numerical results concerning the number of generations needed to attain the lexicon consensus. We have run the simulation several times per each population size. The “maximum generation” column indicates the stop condition for the evolutionary process. When that generation is reached, it is considered that the lexicon consensus is not achieved. The “final generation” column shows the average generation in which the lexicon consensus has been achieved.
82
D. Maravall, J. de Lope, and R. Dom´ınguez Table 1. Summary of the simulation results for 2 × 2 matrices
Meanings 2 2 2
4.2
Symbols Agents Max. generation 2 10 2000 2 50 20000 2 200 20000
Final generation Std. deviation 78.416 65.128 43.427 33.536 408.834 310.952
A 3 × 3 Association Matrix
The ideal association matrices in the case of three meanings and three symbols result to be: ⎛ ⎛ ⎛ ⎞ ⎞ ⎞ 100 100 010 M13 = ⎝ 0 1 0 ⎠ M23 = ⎝ 0 0 1 ⎠ M33 = ⎝ 1 0 0 ⎠ 001 010 001 (4) ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 010 001 001 M43 = ⎝ 0 0 1 ⎠ M53 = ⎝ 0 1 0 ⎠ M63 = ⎝ 1 0 0 ⎠ 100 100 010 Fig. 3(abc) show the average fitness of the population versus the number of generations for different population sizes. It can be noticed in all the cases that the average fitness increases progressively towards the maximum fitness for every population size. As it already occurs in the simulations for two meanings and two symbols the lexicon consensus has been also achieved in all the cases. Fig 3(d) shows the evolution of each individual fitness versus the number of generations for a population of N = 50. It can be observed in this case that all the agents use incorrect or different communication matrices. As the average population fitness (see Fig 3(b)) as the individual agent fitness (see Fig 3(d)) get a value around 80, which is aproximately a half of the maximum fitness value. Around the 200th generation, the average population fitness and the number of agents sharing the same optimum communication matrix grows up and the consensus is achieved before the 250th generation. Table 2 presents numerical results concerning the number of generations needed to attain the lexicon consensus. The columns shows the same features than the previous 2 × 2 case. Table 2. Summary of the simulation results for 3 × 3 matrices Meanings 3 3 3
Symbols Agents Max. generation 3 10 2000 3 50 20000 3 200 20000
Final generation Std. deviation 105.472 644.261 683.624 1071.923 5168.819 5516.594
Self-emergence of Lexicon Consensus in a Population of Autonomous Agents
(a)
(b)
(c)
(d)
83
Fig. 3. Population average fitness for (a) N = 10, (b) N = 50, and (c) N = 200. (d) Agents fitness evolution for N = 50.
5
Conclusions and Further Research Work
As a general conclusion, in order to converge to an optimal decentralized communication system in a population of agents, we have shown that evolutionary strategies are powerful enough to solve this hard optimization problem. 5.1
Application of Genetic Algorithms
Evolutionary strategies are mainly based on the mutation operator and for that reason we are interested in investigating the potential of another search operators like cross-over, so that we plan to apply genetic algorithms to the lexicon consensus problem and to compare their results with those obtained with evolutionary strategies. 5.2
On-Line Learning Methods: The Cultural Transmission of Language
We have focused our attention on the evolutionary approach to language and communication development (or more precisely to lexicon consensus). Another different approach to lexicon consensus is based on the application of on-line learning algorithms, also known as the cultural (as opposed to evolutionary inheritance) transmission of language [4,7,8,9].
84
D. Maravall, J. de Lope, and R. Dom´ınguez
Different algoritms have been proposed so far: e.g. Willshaw associative neural networks [8] as well as heuristic algorithms for the updating of the entries of the association matrices [5,10]. We plan to compare these on-line learning techniques, including stochastic learning automata, with the evolutionary techniques used in this paper. 5.3
Implementation on Physical Robots Using Machine Vision and Sound Synthesizers
The research work described in this paper and the future research lines suggested above are the theoretical prolegomena of a future applied project we plan to develop aimed at building a working multi-robot system based on machine vision for the cognitive part (i.e. for the acquisition of the sensory information related to the meanings of the agents language) and also based on the use of sound synthesizers for the implementation of the symbols and signals used by the agents as words (i.e. our robots will communicate by singing instead of by speaking).
References 1. de Saussure, F.: Cours de Linguistic G´en´eral, Payot, Paris (1916); Ibidem Course on General Linguistics. English Edn. McGraw-Hill, New York (1969) 2. Peirce, C.S.: Selected Writings. Dover, New York (1966) 3. Nowak, M.A.: The evolutionary language game. J. Theor. Biol. 200, 147–162 (1999) 4. Steels, L., Kaplan, F.: Bootstrapping grounded word semantics. In: Briscoe, T. (ed.) Linguistic Evolution Through Language Acquisition, pp. 53–73. Cambridge University Press, Cambridge (2002) 5. Lenaerts, T., Jansen, B., Tuyls, K., et al.: The evolutionary language game: An orthogonal Approach. J. Theor. Biol. 235, 566–582 (2005) 6. Lewis, D.K.: Convention. Harvard University Press, Cambridge (1969) 7. Duro, R.J., Gra˜ na, M., de Lope, J.: On the potential contributions of hybrid intelligent approaches to multicomponent robotic system development. Information Sciences (in press) (2010) 8. Oliphant, M.: Formal approaches to innate and learned communication: Laying the foundation for language. Ph.D. Thesis Diss. U. of California, San Diego (1997) 9. Kaplan, F.: Simple models of distributed coordination. Connection Science 17(3-4), 249–270 (2005) 10. Divina, F., Vogt, P.: A hybrid model for learning word-meaning mappings. In: Vogt, P., Sugita, Y., Tuci, E., Nehaniv, C.L. (eds.) EELC 2006. LNCS (LNAI), vol. 4211, pp. 1–15. Springer, Heidelberg (2006)
Enhanced Self Organized Dynamic Tree Neural Network Juan F. De Paz, Sara Rodríguez, Ana Gil, Juan M. Corchado, and Pastora Vega Departamento de Informática y Automática, Universidad de Salamanca Plaza de la Merced s/n, 37008, Salamanca, España {fcofds,srg,abg,corchado,pvega}@usal.es Department of Computer Science and Automation, University of Salamanca Plaza de la Merced s/n, 37008, Salamanca, Spain
Abstract. Cluster analysis is a technique used in a variety of fields. There are currently various algorithms used for grouping elements that are based on different methods including partitional, hierarchical, density studies, probabilistic, etc. This article will present the ESODTNN neural network, an evolution of the SODTNN network, which facilitates the revision process by merging its operational process with dendrogram techniques, and enables the automatic detection of clusters in an increased number of situations. Keywords: Clustering, SOM, hierarchical clustering, PAM, Dendrogram.
1 Introduction Cluster analysis is a branch of multivariate statistical analysis that is used for detecting patterns in the classification of elements. Cluster analysis is used in a wide variety of fields including bioinformatics [9] [18] and surveillance [13] [14]. The methods used for clustering differ considerably according to the type of data and the amount of available information. Clustering techniques are typically broken down into the following categories [17] [18] hierarchical, which include dendrograms [6], agnes [8], Diana [8], Clara [8]; neural networks [15] [16] such as Self-Organized Maps [2] [20], GCS [3], ESOINN [1] [5]; methods based on minimizing objective functions, such as k-means [10] and PAM [8] (Partition around medoids); or probabilistic-based models such as EM [7] (Expectation-maximization) and fanny [8]. Traditionally the different methods try to minimize the distance that exists between the individuals and the groups. For certain algorithms, this assumes the need to either establish the number of clusters beforehand, or set the number once the algorithm has been completed. In certain cases, neural networks allow the number of clusters to be selected automatically based on the existing elements. The networks typically require a previous adaptation phase for the neurons and the initial data that generates the connections among the neurons. Some neural networks may also require establishing the level of connectivity for the neurons beforehand. This research presents an evolution of the Self Organized Dynamic Tree neural network (SODTNN) [19] called the Enhanced SODTNN (ESODTNN), which allows E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 85–92, 2010. © Springer-Verlag Berlin Heidelberg 2010
86
J.F. De Paz et al.
data to be grouped automatically, without having to specify the number of existing clusters. Additionally, the SODTNN eliminates the expansion phase before dividing and interconnecting the neurons, thus avoiding one of the most costly phases of the algorithm. The SODTNN uses algorithms to detect low density zones and graph theory procedures in order to establish a connection between elements. The SODTNN network presented certain deficiencies in creating groups with data that exhibited particular characteristics, such as elements distributed in parallel form. Furthermore, it did not incorporate a mechanism to facilitate the revision of the results. For these reasons it was necessary to further develop the network. The SODTNN integrates techniques from hierarchical and density-based models that allow the grouping and division of clusters according to the changes in the densities that are detected. The hierarchical process is based on the Kruskal algorithm, which creates a minimum spanning tree containing data for the problem at hand. Based on the information obtained from the minimum spanning tree, low density areas are detected by using a distance matrix for each cluster. The low density areas will allow the clusters to be separated iteratively. Furthermore, the minimum spanning tree determines the network structure and connections so that learning can take place according to the tree’s distribution. In addition to these features, the ESODTNN network incorporates modifications that enable distributed elements to be grouped according to certain condition that previous versions were unable to classify. To accomplish this, the definition of the density functions initially proposed in the SODTNN network was modified. Additionally, a global inheritance hierarchy, which is incorporated in a way similar to how a dendrogram functions, enables a revision of the newly created groups. This article is divided as follows: section 2 describes the SODTNN neural network, section 3 describes the ESODTNN, and section 4 presents the results and conclusions.
2 SODTNN The SODTNN neural network [19] can detect the number of existing groups or classes and, by using the Kruskal algorithm [11], create clusters based on the connections taken from the minimum spanning tree. As opposed to the ESOINN or GCS networks, the SODTNN does not distinguish between the original data and the neurons—during the initial training phase, the latter correspond to the position for each element. This makes it possible to eliminate the expansion phase for a Neural Gas (NG) [4] to adjust to the surface. However, this step can be applied in situations where the number of elements for carrying out the clustering process needs to be reduced. The SODTNN neural network is based on an initial state in which each piece of data contains an associated neuron. In general terms, the SODTNN neural network described in [19] bases its functionality on the Self Organized Map (SOM) networks, but it incorporates a new behavior in all aspects related to the neighborhood relationships that define the weight updates. Additionally, it provides an algorithm that enables the automatic detection of clusters. The neighborhood relationships are defined dynamically so that for every weight update, a minimal spanning tree, which defines the relationships within the
Enhanced Self Organized Dynamic Tree Neural Network
87
neighborhood, is established for each cluster. Based on these relationships, the neurons are updated according to the definition of SOMs, which use the following definitions for neighborhood functions and learning rate:
⎡ ⎢ i g (i, t ) = Exp ⎢− N ⎢ ⎣
( x j1 − x s1 ) 2 + K + ( x jn − nsn ) 2 Max{d ij } i, j
⎤ i ⋅t ⎥ −λ βN ⎥ ⎥ ⎦
⎡ t ⎤ η (t ) = Exp ⎢ − 4 ⎥ ⎣ βN ⎦
(1)
(2)
Where g (i , t ) represents the neighboring function η (t ) the learning rate [12], i represents the distance with the number of neighboring neurons, N is the total number of neurons, xij the coordinates, t the number of iterations and β a constant.
3 ESODTNN This study proposes the ESODTNN, an evolution of the SODTNN neural network [19]. The evolution of the network enables the detection of much more complex geometric forms than what the initial version could handle. Furthermore, it provides the necessary information for generating final trees using the information from the created clusters. The generation of the trees merges the concept of self-organizing maps and dendrograms. The division block was modified to correspond to the division used for generating the final network. It was also necessary to modify the part corresponding to the representation of the internal network information, which made it possible to gather the information associated with the clusters created in successive iterations. The nomenclature defines T as the set of neurons to be classified, A as the minimum spanning tree that contains all of the nodes from T where matrix C defines the connections between the nodes where element cij=1 if node i є T is connected with element j є T., D the distance matrix for T. For the network specifications, it is necessary to introduce the definition of the function fp, which makes it possible to navigate through the cluster tree that is created. 1.
Given ai is the neuron for the tree for which the average distance needs to be p
calculated, with i є T. where f (ai ) is the function that determines the parent node for ai, that is defined by f p :A→ A Where c = 1 and c ∈ C
ai → f p (ai ) = a s
si
si
The following sub-sections provide a general description of how the network functions with the newly applied modifications.
88
J.F. De Paz et al.
3.1 Density: Block 3 One of the main problems when assigning individuals into groups is knowing which divisions cause a significant rise in the density of the resulting clusters. ANN such as SOINN or ESOINN study the length of the links in order to determine if the length is different within the subgroup for each individual. This process requires the creation of subclasses within each cluster, which is done by using a set of functions that determines the threshold on which the creation of the subclasses is based. The ESODTNN searches for cut-off points in areas that produce a significant rise in density. It does so by using the relationship between the total distance calculated from the distance matrix, and the distance from the minimum spanning tree. In the original version of the algorithm, function fT was defined according to real distances, which made it difficult to create a cluster when dealing with data containing elements distributed in parallel form. The matrix for distances D is calculated by a determined measure of distance. In this case, the Euclidean distance was selected so that it would coincide with the measure used in different techniques. 1. Distance from tree f (C , D ) = A
∑d
ij
where cij = 1 , cij ∈ C , d ij ∈ D
i, j
2. Distance
between
neurons
in
f (C , D ) = ∑ d ( f ( as ), as ) + ∑ d ( f (at ), at ) T
p
s∈S
the
t∈T
Where
m−1
64 4744 8 64 4744 8 s = {( f p o K o f p )(as ), ( f p o K o f p )(as )..., f p (as ), as } m
#s = m
with
y
n n −1 64 47 44 8 64 47 44 8 p p p t = {( f o K o f )(at ), ( f o K o f p )(at )..., f p (at ), at }
#t = n
tree
p
with
, n and m are selected so that there cannot exist any value for n or m
n 64 4744 8 64 47 44 8 p p p ( f o K o f )( a s ) = ( f o K o f p )( a s ) m
3. Calculate the final density
f D (C , D) = f T ( D) / f A (C , D)
3.2 Division Algorithm The ESODTNN neural network incorporates the knowledge associated with successive divisions created by a neural network. To do so, the hierarchy of clusters is stored together with the information associated with the cut-off point obtained in the block of divisions. This way, the information from the successive divisions that are created, are stored in parallel. The initial tree node when there is only one cluster is denoted as
d (ak , a s ) = d ks with d ks ∈ D
Enhanced Self Organized Dynamic Tree Neural Network
89
nroot. The final leaf nodes will contains the clusters created, while the rest of the tree nodes will contain the information associated with the divisions. The final algorithms functions as follows: 1. If node nroot has not been initiated, create a new node in the tree and assign all of the elements nroot=T 2. Determine the cut-off point for the elements α , and the cut-off points for distance β . Both values are constant. 3. Recover the node associated to the explored cluster, which represents the cluster in the global hierarchy tree. This node will be represented as nc 4. Initiate i = 1 5. Select the greatest distance i for d jk ∈ D / c jk = 1 and remove the node
ak ∈ A 6. Given A1 , A2 are the remaining trees alter eliminating ak and the from the tree
T1 = {s ∈ T / as ∈ A1} and T2 = {s ∈ T / as ∈ A2 } with T = T1 ∪ T2 , T1 ∩ T2 = φ , C1 , C2 , D1 , D2 for the corresponding link and distance matrixes. 7. If # T1 /# T or # T2 /# T is less than α go to step 18. 8. Calculate the average distance from the node for the tree ak following the p
connection with the parent node f ( ak ) where
average distance algorithm
d amk = f m ( Aaei , D)
9. Determine if the distance from tree node average distance
ak and its parent is less than the
d sk ≤ d amk ⋅ β where s ∈ T and a s = f p (ak ) go to step
18. 10. the density for T ,
T1 and T2 following the density algorithm f D (C, D) ,
f D (C1 , D1 ) , f D (C2 , D2 ) 11. Calculate
the
new
δ (t + 1) = f (C1 , D1 ) + f (C2 , D2 ) δ (t ) = f D (C, D) D
D
density and
threshold the
previous
13. 14. 15.
If the value δ (t ) / δ (t + 1) < 1 /(δ (0) / δ (1) ⋅ ρ ) where ρ is constant, go to 17. Separate nc into nc_left, nc_right Assign the set of elements to each one of the nodes nc_left=T1 y nc_right=T2. Store the distance nc.length= d jk
16. 17. 18.
Finish Re-establish the connection ak with its parent node If i