Advances in Artificial Intelligence: Theories, Models, and Applications: 6th Hellenic Conference on AI, SETN 2010, Athens, Greece, May 4-7, 2010. ... Lecture Notes in Artificial Intelligence)

Lecture Notes in Artificial Intelligence Edited by R. Goebel, J. Siekmann, and W. Wahlster Subseries of Lecture Notes i...

Author: Stasinos Konstantopoulos | Stavros Perantonis | Vangelis Karkaletsis | Costas D. Spyropoulos | George Vouros

102 downloads 614 Views 8MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Artificial Intelligence Edited by R. Goebel, J. Siekmann, and W. Wahlster

Subseries of Lecture Notes in Computer Science

6040

Stasinos Konstantopoulos Stavros Perantonis Vangelis Karkaletsis Constantine D. Spyropoulos George Vouros (Eds.)

Artificial Intelligence: Theories, Models and Applications 6th Hellenic Conference on AI, SETN 2010 Athens, Greece, May 4-7, 2010 Proceedings

13

Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany Volume Editors Stasinos Konstantopoulos Stavros Perantonis Vangelis Karkaletsis Constantine D. Spyropoulos Institute of Informatics and Telecommunications NCSR Demokritos Ag. Paraskevi 15310, Athens, Greece E-mail: {konstant, sper, vangelis, costass}@iit.demokritos.gr George Vouros Department of Information and Communication Systems Engineering University of the Aegean Karlovassi, Samos 83200, Greece E-mail: [email protected]

Library of Congress Control Number: 2010925798

CR Subject Classification (1998): I.2, H.3, H.4, F.1, H.5, H.2.8 LNCS Sublibrary: SL 7 – Artificial Intelligence ISSN ISBN-10 ISBN-13

0302-9743 3-642-12841-6 Springer Berlin Heidelberg New York 978-3-642-12841-7 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2010 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper 06/3180

Preface

Artificial intelligence (AI) is a dynamic field that is constantly expanding into new application areas, discovering new research challenges, and facilitating the development of innovative products. Today’s AI tools might not pass the Turing test, but they are invaluable aids in organizing and sorting the ever-increasing volume, complexity, and heterogeneity of knowledge available to us in our rapidly changing technological, economic, cultural, and social environment. This volume aims at bringing to the reader all the latest developments in this exciting and challenging field, and contains papers selected for presentation at the 6th Hellenic Conference on Artificial Intelligence (SETN 2010), the official meeting of the Hellenic Society for Artificial Intelligence (EETN). SETN 2010 was organized by the Hellenic Society of Artificial Intelligence and the Institute of Informatics and Telecommunications, NCSR ‘Demokritos’ and took place in Athens during May 4–7. Previous conferences were held at the University of Piraeus (1996), at the Aristotle University of Thessaloniki (2002), at the University of the Aegean (Samos, 2004, and Syros, 2008), and jointly at the Foundation for Research and Technology–Hellas (FORTH) and the University of Crete (2006). SETN conferences play an important role in disseminating innovative and high-quality scientific results by AI researchers, attracting not only EETN members but also scientists advancing and applying AI in many and diverse domains and from various Greek and international institutes. However, the most important aspect of SETN conferences is that they provide the context in which AI researchers meet and discuss their work, as well as an excellect opportunity for students to attend high-quality tutorials and get closer to AI results. SETN 2010 continued this tradition of excellence, attracting submissions not only from Greece but also numerous European countries, Asia, and the Americas, which underwent a thorough reviewing process on the basis of their relevance to AI, originality, significance, technical soundness, and presentation. The selection process was hard, with only 28 papers out of the 83 submitted being accepted as full papers and an additional 22 submissions accepted as short papers. This proceedings volume also includes the abstracts of the invited talks presented at SETN 2010 by four internationally distinguished keynote speakers: Panos Constantopoulos, Michail Lagoudakis, Nikolaos Mavridis, and Demetri Terzopoulos. As yet another indication of the growing international influence and importance of the conference, the EVENTS international workshop on event recognition and tracking chose to be co-located with SETN 2010. And, finally, SETN 2010 hosted the first ever RoboCup event organized in Greece, with the participation of two teams from abroad and one from Greece. The Area Chairs and members of the SETN 2010 Programme Committee and the additional reviewers did an enormous amount of work and deserve the

VI

Preface

special gratitude of all participants. Our sincere thanks go to our sponsors for their generous financial support and to the Steering Committee for its assistance and support. The conference operations were supported in an excellent way by the ConfMaster conference management system; many thanks to Thomas Preuss for his prompt responding with all questions and requests. Special thanks go to to Konstantinos Stamatakis for the design of the conference poster and the design and maintenance of the conference website. We also wish to thank the Organizing Committee and Be to Be Travel, the conference travel and organization agent, for implementing the conference schedule in a timely and flawless manner. Last but not least, we also thank Alfred Hofmann, Anna Kramer, Leonie Kunz, and the Springer team for their continuous help and support. March 2010

Stasinos Konstantopoulos Stavros Perantonis Vangelis Karkaletsis Constantine D. Spyropoulos George Vouros

Organization

SETN 2010 was organized by the Institute of Informatics and Telecommunications, NCSR ‘Demokritos’, and EETN, the Hellenic Association of Artificial Intelligence.

Conference Chairs Constantine D. Spyropoulos Vangelis Karkaletsis George Vouros

NCSR ‘Demokritos’, Greece NCSR ‘Demokritos’, Greece University of the Aegean, Greece

Steering Committee Grigoris Antoniou John Darzentas Nikos Fakotakis Themistoklis Panayiotopoulos Ioannis Vlahavas

FORTH and University of Crete University of the Aegean University of Patras University of Piraeus Aristotle University

Organizing Committee Alexandros Artikis Vassilis Gatos Pythagoras Karampiperis Anastasios Kesidis Anastasia Krithara Georgios Petasis Sergios Petridis Ioannis Pratikakis Konstantinos Stamatakis Dimitrios Vogiatzis

Programme Committee Chairs Stasinos Konstantopoulos Stavros Perantonis

NCSR ‘Demokritos’ NCSR ‘Demokritos’

Programme Committee Area Chairs Ion Androutsopoulos Nick Bassiliades

Athens University of Economics and Business Aristotle University of Thessaloniki

VIII

Organization

Ioannis Hatzilygeroudis Ilias Maglogiannis Georgios Paliouras Ioannis Refanidis Efstathios Stamatatos Kostas Stergiou Panos Trahanias

University of Patras University of Central Greece NCSR ‘Demokritos’ University of Macedonia University of the Aegean University of the Aegean FORTH and University of Crete

Programme Committee Members Dimitris Apostolou Argyris Arnellos Alexander Artikis Grigorios Beligiannis Basilis Boutsinas Theodore Dalamagas Yannis Dimopoulos Christos Douligeris George Dounias Eleni Galiotou Todor Ganchev Vassilis Gatos Efstratios Georgopoulos Manolis Gergatsoulis Nikos Hatziargyriou Katerina Kabassi Dimitris Kalles Kostas Karatzas Dimitrios Karras Petros Kefalas Stefanos Kollias Yiannis Kompatsaris Dimitris Kosmopoulos Constantine Kotropoulos Manolis Koubarakis Konstantinos Koutroumbas Michail Lagoudakis Aristidis Likas George Magoulas Filia Makedon Manolis Maragoudakis Vassilis Moustakis Christos Papatheodorou Pavlos Peppas Sergios Petridis

University of Piraeus University of the Aegean NCSR ‘Demokritos’ University of Ioannina University of Patras IMIS Institute/‘Athena’ Research Center University of Cyprus University of Piraeus University of the Aegean TEI Athens University of Patras NCSR ‘Demokritos’ TEI Kalamata Ionian University National Technical University of Athens TEI Ionian Hellenic Open University Aristotle University of Thessaloniki TEI Chalkis City Liberal Studies National Technical University of Athens CERTH NCSR ‘Demokritos’ Aristotle University of Thessaloniki National and Kapodistrian University of Athens National Observatory of Athens Technical University of Crete University of Ioannina Birkbeck College, University of London (UK) University of Texas at Arlington (USA) University of the Aegean Technical University of Crete Ionian University University of Patras NCSR ‘Demokritos’

Organization

Stelios Piperidis Vassilis Plagianakos Dimitris Plexousakis George Potamias Ioannis Pratikakis Jim Prentzas Ilias Sakellariou Kyriakos Sgarbas John Soldatos Panagiotis Stamatopoulos Giorgos Stoilos Ioannis Tsamardinos George Tsichrintzis Nikos Vasilas Michalis Vazirgia Maria Virvou Spyros Vosinakis Dimitris Vrakas

ILSP-Athena RC University of Central Greece FORTH and University of Crete FORTH NCSR ‘Demokritos’ Democritus University of Thrace University of Macedonia University of Patras AIT National and Kapodistrian University of Athens Oxford University (UK) University of Crete and FORTH University of Piraeus TEI Athens Athens University of Economics and Business University of Piraeus University of the Aegean Aristotle University of Thessaloniki

Additional Reviewers Charalampos Doukas Anastasios Doulamis Giorgos Flouris Theodoros Giannakopoulos Katia Kermanidis Otilia Kocsis Eleytherios Koumakis Anastasia Krithara Pavlos Moraitis Nikolaos Pothitos Spyros Raptis Vassiliki Rentoumi Evangelos Sakkopoulos Themos Stafylakis Sophia Stamou Andreas Symeonidis Vassilios Vassiliadis Dimitrios Vogiatzis

IX

University of the Aegean Technical University of Crete FORTH NCSR ‘Demokritos’ Ionian University University of Patras Technical University of Crete NCSR ‘Demokritos’ Paris Descartes University (France) National and Kapodistrian University of Athens ILSP-Athena RC NCSR ‘Demokritos’ University of Patras ILSP-Athena RC University of Patras Aristotle University of Thessaloniki University of the Aegean NCSR ‘Demokritos’

Table of Contents

Invited Talks Digital Curation and Digital Cultural Memory . . . . . . . . . . . . . . . . . . . . . . . Panos Constantopoulos

1

RoboCup: A Challenge Problem for Artificial Intelligence . . . . . . . . . . . . . Michail G. Lagoudakis

3

Robots, Natural Language, Social Networks, and Art . . . . . . . . . . . . . . . . . Nikolaos Mavridis

5

Artificial Life Simulation of Humans and Lower Animals: From Biomechanics to Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Demetri Terzopoulos

7

Full Papers Prediction of Aircraft Aluminum Alloys Tensile Mechanical Properties Degradation Using Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . Nikolaos Ampazis and Nikolaos D. Alexopoulos

9

Mutual Information Measures for Subclass Error-Correcting Output Codes Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nikolaos Arvanitopoulos, Dimitrios Bouzas, and Anastasios Tefas

19

Conflict Directed Variable Selection Strategies for Constraint Satisfaction Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thanasis Balafoutis and Kostas Stergiou

29

A Feasibility Study on Low Level Techniques for Improving Parsing Accuracy for Spanish Using Maltparser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miguel Ballesteros, Jes´ us Herrera, Virginia Francisco, and Pablo Gerv´ as A Hybrid Ant Colony Optimization Algorithm for Solving the Ring Arc-Loading Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anabela Moreira Bernardino, Eugénia Moreira Bernardino, Juan Manuel S´ anchez-Pérez, Juan Antonio G´ omez-Pulido, and Miguel Angel Vega-Rodr´ıguez Trends and Issues in Description Logics Frameworks for Image Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stamatia Dasiopoulou and Ioannis Kompatsiaris

39

49

61

XII

Table of Contents

Unsupervised Recognition of ADLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Todor Dimitrov, Josef Pauli, and Edwin Naroska Audio Features Selection for Automatic Height Estimation from Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Todor Ganchev, Iosif Mporas, and Nikos Fakotakis Audio-Visual Fusion for Detecting Violent Scenes in Videos . . . . . . . . . . . Theodoros Giannakopoulos, Alexandros Makris, Dimitrios Kosmopoulos, Stavros Perantonis, and Sergios Theodoridis Experimental Study on a Hybrid Nature-Inspired Algorithm for Financial Portfolio Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Giorgos Giannakouris, Vassilios Vassiliadis, and George Dounias Associations between Constructive Models for Set Contraction . . . . . . . . . Vasilis Giannopoulos and Pavlos Peppas Semantic Awareness in Automated Web Service Composition through Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ourania Hatzi, Dimitris Vrakas, Nick Bassiliades, Dimosthenis Anagnostopoulos, and Ioannis Vlahavas Unsupervised Web Name Disambiguation Using Semantic Similarity and Single-Pass Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elias Iosif

71

81 91

101 113

123

133

Time Does Not Always Buy Quality in Co-evolutionary Learning . . . . . . Dimitris Kalles and Ilias Fykouras

143

Visual Tracking by Adaptive Kalman Filtering and Mean Shift . . . . . . . . Vasileios Karavasilis, Christophoros Nikou, and Aristidis Likas

153

On the Approximation Capabilities of Hard Limiter Feedforward Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Konstantinos Koutroumbas and Yannis Bakopoulos

163

EMERALD: A Multi-Agent System for Knowledge-Based Reasoning Interoperability in the Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kalliopi Kravari, Efstratios Kontopoulos, and Nick Bassiliades

173

An Extension of the Aspect PLSA Model to Active and Semi-Supervised Learning for Text Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anastasia Krithara, Massih-Reza Amini, Cyril Goutte, and Jean-Michel Renders A Market-Affected Sealed-Bid Auction Protocol . . . . . . . . . . . . . . . . . . . . . Claudia Lindner

183

193

Table of Contents

XIII

A Sparse Spatial Linear Regression Model for fMRI Data Analysis . . . . . Vangelis P. Oikonomou and Konstantinos Blekas

203

A Reasoning Framework for Ambient Intelligence . . . . . . . . . . . . . . . . . . . . Theodore Patkos, Ioannis Chrysakis, Antonis Bikakis, Dimitris Plexousakis, and Grigoris Antoniou

213

The Large Scale Artificial Intelligence Applications – An Analysis of AI-Supported Estimation of OS Software Projects . . . . . . . . . . . . . . . . . . . . Wieslaw Pietruszkiewicz and Dorota Dzega Towards the Discovery of Reliable Biomarkers from Gene-Expression Profiles: An Iterative Constraint Satisfaction Learning Approach . . . . . . . George Potamias, Lefteris Koumakis, Alexandros Kanterakis, and Vassilis Moustakis

223

233

Skin Lesions Characterisation Utilising Clustering Algorithms . . . . . . . . . Sotiris K. Tasoulis, Charalampos N. Doukas, Ilias Maglogiannis, and Vassilis P. Plagianakos

243

Mining for Mutually Exclusive Gene Expressions . . . . . . . . . . . . . . . . . . . . . George Tzanis and Ioannis Vlahavas

255

Task-Based Dependency Management for the Preservation of Digital Objects Using Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yannis Tzitzikas, Yannis Marketakis, and Grigoris Antoniou Designing Trading Agents for Real-World Auctions . . . . . . . . . . . . . . . . . . . Ioannis A. Vetsikas and Nicholas R. Jennings Scalable Semantic Annotation of Text Using Lexical and Web Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elias Zavitsanos, George Tsatsaronis, Iraklis Varlamis, and Georgios Paliouras

265 275

287

Short Papers A Gene Expression Programming Environment for Fatigue Modeling of Composite Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maria A. Antoniou, Efstratios F. Georgopoulos, Konstantinos A. Theofilatos, Anastasios P. Vassilopoulos, and Spiridon D. Likothanassis A Hybrid DE Algorithm with a Multiple Strategy for Solving the Terminal Assignment Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eugénia Moreira Bernardino, Anabela Moreira Bernardino, Juan Manuel S´ anchez-Pérez, Juan Antonio G´ omez-Pulido, and Miguel Angel Vega-Rodr´ıguez

297

303

XIV

Table of Contents

Event Detection and Classification in Video Surveillance Sequences . . . . . Vasileios Chasanis and Aristidis Likas The Support of e-Learning Platform Management by the Extraction of Activity Features and Clustering Based Observation of Users . . . . . . . . . . Dorota Dzega and Wieslaw Pietruszkiewicz Mapping Cultural Metadata Schemas to CIDOC Conceptual Reference Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manolis Gergatsoulis, Lina Bountouri, Panorea Gaitanou, and Christos Papatheodorou Genetic Algorithm Solution to Optimal Sizing Problem of Small Autonomous Hybrid Power Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yiannis A. Katsigiannis, Pavlos S. Georgilakis, and Emmanuel S. Karapidakis A WSDL Structure Based Approach for Semantic Categorization of Web Service Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dionisis D. Kehagias, Efthimia Mavridou, Konstantinos M. Giannoutakis, and Dimitrios Tzovaras Heuristic Rule Induction for Decision Making in Near-Deterministic Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stavros Korokithakis and Michail G. Lagoudakis Behavior Recognition from Multiple Views Using Fused Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dimitrios I. Kosmopoulos, Athanasios S. Voulodimos, and Theodora A. Varvarigou A Machine Learning-Based Evaluation Method for Machine Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Katsunori Kotani and Takehiko Yoshimi Feature Selection for Improved Phone Duration Modeling of Greek Emotional Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alexandros Lazaridis, Todor Ganchev, Iosif Mporas, Theodoros Kostoulas, and Nikos Fakotakis A Stochastic Greek-to-Greeklish Transcriber Modeled by Real User Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dimitrios P. Lyras, Ilias Kotinas, Kyriakos Sgarbas, and Nikos Fakotakis Face Detection Using Particle Swarm Optimization and Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ermioni Marami and Anastasios Tefas

309

315

321

327

333

339

345

351

357

363

369

Table of Contents

Reducing Impact of Conflicting Data in DDFS by Using Second Order Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luca Marchetti and Luca Iocchi Towards Intelligent Management of a Student’s Time . . . . . . . . . . . . . . . . . Evangelia Moka and Ioannis Refanidis Virtual Simulation of Cultural Heritage Works Using Haptic Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Konstantinos Moustakas and Dimitrios Tzovaras Ethnicity as a Factor for the Estimation of the Risk for Preeclampsia: A Neural Network Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Costas Neocleous, Kypros Nicolaides, Kleanthis Neokleous, and Christos Schizas A Multi-class Method for Detecting Audio Events in News Broadcasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sergios Petridis, Theodoros Giannakopoulos, and Stavros Perantonis

XV

375 383

389

395

399

Flexible Management of Large-Scale Integer Domains in CSPs . . . . . . . . . Nikolaos Pothitos and Panagiotis Stamatopoulos

405

A Collaborative System for Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . Vassiliki Rentoumi, Stefanos Petrakis, Vangelis Karkaletsis, Manfred Klenner, and George A. Vouros

411

Minimax Search and Reinforcement Learning for Adversarial Tetris . . . . Maria Rovatsou and Michail G. Lagoudakis

417

A Multi-agent Simulation Framework for Emergency Evacuations Incorporating Personality and Emotions . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alexia Zoumpoulaki, Nikos Avradinis, and Spyros Vosinakis

423

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

429

Digital Curation and Digital Cultural Memory Panos Constantopoulos Department of Informatics, Athens University of Economics and Business, Athens 10434, Greece and Digital Curation Unit, IMIS – ‘Athena’ Research Centre, Athens 11524, Greece

Abstract. The last two decades have witnessed an ever increasing penetration of digital media initially in the management and, subsequently, in the study of culture. From collections management, object documentation, domain knowledge representation and reasoning, to supporting the creative synthesis and re-interpretation of data in the framework of digital productions, significant progress has been achieved in the development of relevant knowledge and software tools. Developing a standard ontology for the cultural domain stands out as the most prominent such development. As a consequence of this progress, digital repositories are created that aim at serving as digital cultural memories, while a process of convergence has started among the different kinds of memory institutions, i.e., museums, archives, and libraries, in what concerns their information functions. The success of digital cultural memories will be decided against rivals with centuries-long tradition. The advantages offered by technology, mass storage, copying, and the ease of searching and quantitative analysis, will not suffice unless reliability, long-term preservation, and the ability to re-use, re-combine and re-interpret digital content are ensured. To this end digital curation is exercised. In this talk we will examine the development of digital cultural memories using digital curation. More specifically, we will discuss issues of knowledge representation and reasoning, we will present some examples of interesting research and development efforts, and will refer to certain current trends.

S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, p. 1, 2010. c Springer-Verlag Berlin Heidelberg 2010

RoboCup: A Challenge Problem for Artificial Intelligence Michail G. Lagoudakis Intelligent Systems Laboratory Department of Electronic and Computer Engineering Technical University of Crete Chania 73100, Greece

Abstract. The RoboCup competition is the international robotic soccer world cup organized annually since 1997. The initial conception by Hiroaki Kitano in 1993 led to the formation of the RoboCup Federation with a bold vision: By the year 2050, to develop a team of fully autonomous humanoid robots that can win against the human world soccer champions! RoboCup poses a real-world challenge for Artificial Intelligence, which requires addressing simultaneously the core problems of perception, cognition, action, and coordination under real-time constraints. In this talk, I will outline the vision, the challenges, and the contribution of the RoboCup competition in its short history. I will also offer an overview of the research efforts of team Kouretes, the RoboCup team of the Technical University of Crete, on topics ranging from complex motion design, efficient visual recognition, and self-localization to robotic software engineering, distributed communication, skill learning, and coordinated game play. My motivation is to inspire researchers and students to form teams with the goal of participating in the various leagues of this exciting and challenging benchmark competition and ultimately contributing to the advancement of the state-of-the-art in Artificial Intelligence and Robotics.


Robots, Natural Language, Social Networks, and Art Nikolaos Mavridis Interactive Robots and Media Lab United Arab Emirates University Al Ain 17551, U.A.E.

Abstract. Creating robots that can fluidly converse in natural language, and cooperate and sozialize with their human partners is a goal that has always captured human imagination. Furthermore, it is a goal that requires truly interdisciplinary research: engineering, computer science, as well as the cognitive sciences are crucial towards its realization. Challenges and current progress towards this goal will be illustrated through two real-world robot examples: the conversational robot Ripley, and the FaceBots social robots which utilize and publish social information on the FaceBook website. Finally, a quick glimpse towards novel educational and artistic avenues opened by such robots will be provided, through the Interactive Theatre installation of the Ibn Sina robot.


Artificial Life Simulation of Humans and Lower Animals: From Biomechanics to Intelligence Demetri Terzopoulos Computer Science Department University of California, Los Angeles Los Angeles, CA 90095-1596, U.S.A.

Abstract. The confluence of virtual reality and artificial life, an emerging discipline that spans the computational and biological sciences, has yielded synthetic worlds inhabited by realistic artificial flora and fauna. The latter are complex synthetic organisms with functional, biomechanically simulated bodies, sensors, and brains with locomotion, perception, behavior, learning, and cognition centers. These biomimetic autonomous agents in their realistic virtual worlds foster deeper computationallyoriented insights into natural living systems. Virtual humans and lower animals are of great interest in computer graphics because they are selfanimating graphical characters poised to dramatically advance the motion picture and interactive game industries. Furthermore, they engender interesting new applications in computer vision, medical imaging, sensor networks, archaeology, and many other domains.


Prediction of Aircraft Aluminum Alloys Tensile Mechanical Properties Degradation Using Support Vector Machines Nikolaos Ampazis and Nikolaos D. Alexopoulos Department of Financial and Management Engineering University of the Aegean 82100 Chios, Greece [email protected], [email protected]

Abstract. In this paper we utilize Support Vector Machines to predict the degradation of the mechanical properties, due to surface corrosion, of the Al 2024-T3 aluminum alloy used in the aircraft industry. Precorroded surfaces from Al 2024-T3 tensile specimens for various exposure times to EXCO solution were scanned and analyzed using image processing techniques. The generated pitting morphology and individual characteristics were measured and quantified for the different exposure times of the alloy. The pre-corroded specimens were then tensile tested and the residual mechanical properties were evaluated. Several pitting characteristics were directly correlated to the degree of degradation of the tensile mechanical properties. The support vector machine models were trained by taking as inputs all the pitting characteristics of each corroded surface to predict the residual mechanical properties of the 2024-T3 alloy. The results indicate that the proposed approach constitutes a robust methodology for accurately predicting the degradation of the mechanical properties of the material. Keywords: Material Science, Corrosion Prediction, Machine Learning, Support Vector Machines.

1

Introduction

The widely used aluminum alloy in the aircraft industry is the damage-tolerant Al 2024-T3 alloy, currently used in the skin and the wings of many civil aircrafts. The main problems of the design and inspection engineers are the fatigue, corrosion and impact damage that the fuselage and wing skins are subjected to. Corrosion damage of the material is also very essential to the structural integrity of the aircraft. It was calculated that among the maintenance periods of more than a decade in-service aircrafts, a 40% of the repairs were associated with corrosion damage. Figure 1 shows a typical surface corrosion damage produced at the wings of an in-service aircraft. Since the material of a component is subjected to corrosion, it is expected that its critical mechanical properties S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 9–18, 2010. c Springer-Verlag Berlin Heidelberg 2010

10

N. Ampazis and N.D. Alexopoulos

Fig. 1. Photograph showing corrosion products formed at the lower surface of an inservice aircraft wing. Source: Hellenic Aerospace Industry S.A.

might vary with increasing service time and thus, must be taken into account for the structural integrity calculation of the component. The effect of corrosion damage on the reference alloy has been studied in various works. The exposure of the alloy 2024-T3 on various accelerated, laboratory environments, e.g. [1,2,3,4], resulted in the formation of large pits and micro-cracks on the sub-surface of the specimens, that lead to exfoliation of the alloy with increasing exposure time. This has a deleterious impact on the residual mechanical properties, especially in the tensile ductility. Alexopoulos and Papanikos [3] noticed that after the exposure for only 2h (hours), the ductility of the 2024-T3 decreased by almost 20%. The decrease of all mechanical properties and for all the spectra of exposure to the corrosive solution was attributed to the pitting that was formed on the surface of the specimens and their induced cracks to the cross-section of the specimen. In a number of publications, e.g. [5,6] it was shown that machine learning methods can be used in the wider field of materials science and, more specifically, to predict mechanical properties of aluminium alloys. In [5] it was demonstrated that Least Squares Support Vector Machines (LSSVM) are quite applicable for simulation and monitoring of the ageing process optimization of AlZnMgCu series alloys. In [6] Artificial Neural Networks (ANNs) were used for the estimation of flow stress of AA5083 with regard to dynamic strain ageing that occurs in certain deformation conditions. The input variables were selected to be strain rate, temperature and strain, and the prediction variable was the flow stress. However the use of ANNs in coupled fields of corrosion / material science and mechanics is still limited. Some literature publications can be found for the exploitation of ANNs to the corrosion of steels and Ti alloys, e.g. [7,8,9]. In these cases, different chloride concentrations, pH and temperature were used to model and predict the surface pitting corrosion behaviour. Additionally in [9] various polarized corrosion data were used to predict the future maximum pit depth with good agreements between estimation/prediction and experimental data.

Prediction of Aircraft Aluminum Properties Degradation Using SVMs

11

The prediction of surface corrosion of aluminium alloys with the exploitation of ANNs has been also attempted in the literature. Leifer [10] attempted to predict via neural networks the pit depth of aluminium alloy 1100 when subjected to natural water corrosion. The trained model was found capable to predict the expected pit depth as a function of water pH, concentrations of carbonate (CO−2 3 ), copper Cu+2 and chloride Cl as well as storage time. Pidaparti et al. [11] trained an ANN on 2024-T3 to predict the degradation of chemical elements obtained from Energy Dispersive X-ray Spectrometry (EDS) on corroded specimens. Input parameters to the ANN model were the alloy composition, electrochemical parameters and corrosion time. Though the trained model worked in all the above cases, there is no information regarding the residual mechanical properties of the corroded materials, in order to calculate the structural health of a likewise structure. This was firstly attempted in the case of aluminium alloys, where models of neural networks were trained to predict the fatigue performance of pre-corroded specimens in [12,13,14]. The inputs of the models were maximum corrosion depth, fatigue performance, corrosion temperature and corrosion time. The models were trained with the back propagation learning algorithm, in order to predict the maximum corrosion depth and fatigue performance of prior-corroded aluminium alloys. All existing models in the case of corrosion of aluminium alloys take different parameters as inputs, such as the composition of the alloy, the maximum pit depth and the pitting density of the surface. In order to train an ANN model to predict the residual tensile mechanical behaviour of pre-corroded aluminium alloys, the input parameters in many cases are too few and the available training patterns do not usually exceed more than one hundred data points. In addition, in all cases only the value of maximum pit depth generated by the surface corrosion of an alloy has been taken into account within the ANN models. However this is not always the critical parameter to be utilized and usually other pit characteristics are neglected. Jones and Hoeppner [15] demonstrated that the shape and size of a pit are major factors affecting the fatigue life of the pre-corroded 2024-T3 specimens. The microcracking network is a forefather of the nucleating fatigue crack that seriously degrades the fatigue life of the specimen. Van der Walde and Hillberry [16] also showed that the fatigue crack initiation of the same alloy happens by approximately 60% in the maximum-depth pit. Hence, it is of imperative importance to characterize the whole surface corroded area of the alloy and correlate the findings of the corrosion-induced pits with the residual mechanical properties. In the present work, the corroded surfaces were analyzed by employing image analysis techniques in order to extract meaningful training features. Specific areas from tensile specimens gauge length locations were scanned before tensile testing. Any formation of corrosion-induced pits was characterized and quantified as a function of the materials exposure time to the corrosive environment. At each different case study, the number and the morphology of the corrosion-induced pits was correlated with the residual tensile mechanical properties of specimens of 2024-T3 alloy. Support vector machines were then trained as regressors with the

12


resulting features in order to predict the degradation of a number of mechanical properties for different exposure times.

2

Support Vector Machines

Support Vector Machines (SVM) were first introduced as a new class of machine learning techniques by Vapnik [17] and are based on the structural risk minimization principle. An SVM seeks a decision surface to separate the training data points into two classes and makes decisions based on the support vectors that are selected as the only effective elements from the training set. The goal of SVM learning is to find the optimal separating hyper-plane (OSH) that has the maximal margin to both sides of the data classes. This can be formulated as: 1 T w w 2 subject to yi (wxi + b) ≥ 1 Minimize

(1)

where yi ∈ [-1 +1] is the decision of SVM for pattern xi and b is the bias of the separating hyperplane. After the OSH has been determined, the SVM makes decisions based on the globally optimized separating hyper-plane by finding out on which side of the OSH the pattern is located. This property makes SVM highly competitive with other traditional pattern recognition methods in terms of predictive accuracy and efficiency. Support Vector Machines may also be used for regression problems with the following simple modification: n

Minimize

1 T w w+C (ξi + ξî ) 2 i=1

subject to (wxi + b) − yi ≤ + ξi and yi − (wxi + b) ≤ + ξî

(2)

where ξi is a slack variable introduced for exceeding the target value by more than and ξî a slack variable for being more than below the target value [18]. The idea of the Support Vector Machine is to find a model which guarantees the lowest classification or regression error by controlling the model complexity (VC-dimension) based on the structural risk minimization principle. This avoids over-fitting, which is the main problem for other learning algorithms.

3

Material Data and Experimental Procedure

The material used was a wrought aluminum alloy 2024-T3 which was received in sheet form with nominal thickness of 3.2 mm. The surfaces of the tensile specimens were cleaned with acetone and then they were exposed to the laboratory exfoliation corrosion environment (hereafter called EXCO solution) according to specification ASTM G34. The specimens were exposed to the EXCO solution


13

Table 1. Image analysis measuring parameters and their physical interpretation in the corrosion-induced pitting problem Feature

Measurements Physical interpretation of the measurements

Area

Area of each individual object (pit) - does not include holes area that have the same color with the matrix

Density (mean)

Average optical density (or intensity) of object is an indication of the mean depth of each pit

Axis (major)

Length of major axis of an ellipse - maximum length of a pit in one axis

Axis (minor)

Length of minor axis of an ellipse - maximum length of a pit in the transverse axis

Diameter (max)

Length of longest line joining two points of objects outline and passing through the centroid - calculation of the maximum diameter of each pit

Per-Area

Ratio of area of object to total investigated area

for a number of different exposure times. More details regarding the corrosion procedure can be seen in the respective specification as well as in [3,4]. After the exposure, the corroded specimens were cleaned with running water to remove any surface corrosion products, e.g. salt deposits. The reduced crosssection area (gauge length) of the specimens was scanned in individual images and in grayscale format. Only this part of the tensile specimen was examined for the surface corrosion pits as can be directly correlated with the relative mechanical properties of the same specimen. Image analysis was performed by using R the ImageP ro image processing, enhancement, and analysis software [19]. The same surface area of the approximate value of 500 mm2 was analyzed for each testing specimen and for various corrosion exposure durations, namely for 2h, 12h, 24h, 48h, and 96h. Individual characterization of each formed corrosioninduced surface pit was made and statistical values of the generated pits were calculated. The selected parameters for the quantification of the corrosion-induced surface pits as well as their physical interpretation are summarized in Table 1.

14


A number of different parameters were chosen to quantify the geometry of the pits, e.g. major and minor axis, aspect ratio and diameter of the pits. In addition, the number, area and perimeter of the pits were measured and used to calculate the pitting coverage area of the total investigated area. After the corrosion exposure for each of the aforementioned durations, the testing specimens were subjected to mechanical testing. Details regarding the mechanical testing part can be found elsewhere [3,4]. Evaluated properties which were later predicted by the SVMs were: yield strength Rp (0.2% proof stress), tensile strength Rm , elongation to fracture Af , and strain energy density W .

4 4.1

Results and Discussion Corrosion Analysis

Figure 2 shows the scanned images of the surfaces of four different 2024-T3 specimens after their exposure to different times to EXCO solution, (a) to (d), respectively. As can be seen in the figure, the pits are starting to be seen as small gray/black dots in the white surface of the reference, uncorroded specimen. With increasing exposure time, the pits seen in the specimen’s surface are increasing their coverage over the total investigated area. To the best of our knowledge their increase seems to follow an unknown rule.

(a)

(b)

(c)

(d)

Fig. 2. Pit surfaces after exposure for: (a) 2 hours, (b) 12 hours, (c) 48 hours, (d) 96 hours to EXCO solution

Quantitative data of the corrosion-induced surface pits, after using image analysis can be seen in Figure 3. The number of pits is continuously increasing with increasing exposure time to the solution; their total number almost reaches 15,000 by using an exponential decreasing fitting curve. The number of recorded pits for each exposure duration is shown in Table 2. Since the number of pits alone is not informative enough, a more representative parameter to denote the effect of corrosion is the pitting coverage area; it is calculated as the percentage fraction of the total area of the pits to the investigated area of the specimen. The results can also be seen in Figure 3, where also


15

Table 2. Number of corrosion-induced surface pits at different exposure durations

50

15000

40

12000

30

9000

20

6000

10

Number of pits [-]

Pitting area coverage [%]

Exposure Duration (hours) Number of pits 2 2199 12 3696 24 11205 48 12363 96 14699

3000 pitting area coverage [%] number of pits [-] curve fitting for pitting coverage area curve fitting for number of pits

0 0

20

40

60

80

0 100

Alloy exposure time to EXCO solution [h]

0,10

2,0

0,08

1,8

0,06

1,6

0,04

1,4

0,02

Aspect ratio [-]

Pit's measured values

(a)

1,2 mean value of major axis [mm] mean value of minor axis [mm] aspect ratio [-]

0,00 0

20

40

60

80

1,0 100

Alloy exposure time to EXCO solution [h]

(b) Fig. 3. Statistical quantitative analysis of (a) number of pits and pitting coverage area and (b) the aspect ratio of the formed pits

an exponential decrease curve fitting is proposed to simulate this phenomenon. Besides, it seems that up to 24h exposure the increase is almost linear with continuously increasing exposure.

16

4.2


SVM Prediction Results

We trained four different SVMs for predicting four tensile mechanical properties (namely yield strength Rp , tensile strength Rm , elongation to fracture Af and strain energy density W ) of the pre-corroded specimens and by taking into account their initial values of the reference - uncorroded specimens. As training patterns we used the various pit features corresponding to pits formated at 2h,12h,48h, and 96h exposure durations. This resulted in a total of 32957 training points for each SVM. The performance of each SVM was evaluated on the prediction of the mechanical properties residuals for the set of pits appearing at the 24h exposure (11205 testing points). This particular exposure time was selected since in [3] it was shown that at 24h the hydrogen embrittlement degradation mechanism of mechanical properties is saturated. As a performance measure for the accuracy of the SVMs we used the Root Means Square Error (RMSE) criterion between the actual and predicted values. For training the SVMs we used the SV M light package [20] compiled with the Intel C/C++ Compiler Professional Edition for Linux. Training of the SVMs were run on a 2.5GHz Quad Core Pentium CPU with 4G RAM running Ubuntu 9.10 desktop x86 64 (Karmic Koala) operating system. The total running time of each SVM training was approximately 5 to 10 seconds. In our experiments we linearly scaled each feature to the range [-1, +1]. Scaling training data before applying SVM is very important. The main advantage is to avoid attributes in greater numeric ranges to dominate those in smaller numeric ranges. Another advantage is to avoid numerical difficulties during the calculation. Because kernel values usually depend on the inner products of feature vectors, e.g. the linear kernel and the polynomial kernel, large attribute values might cause numerical problems [21]. With the same method, testing data features were scaled to the training data ranges before testing. The training target outputs were also scaled to [0, +1] and the output of each SVM was then transformed back from the [0, +1] range to it’s original target value in order to calculate the RMSE for each mechanical properties residual. The prediction accuracy of the trained SVMs is summarized in Table 3. Standard deviation values of the mechanical properties of pre-corroded specimens appearing in the table have been previously calculated based on three different experiments in order to get reliable statistical values. As it can be seen, all predicted mechanical properties for pre-corrosion of 2024-T3 for 24 hours to EXCO Table 3. RMSE of trained SVMs and Standard Deviation (calculated from real measurements) Mechanical Property RMSE Std. Dev. (Real Measurements) Rp 1.5 2.5 Rm 0.5 2.0 Af 0.45 0.23 W 1.35 1.16


17

solution are very close to the actually measured properties. The RMSE values are of the same order of magnitude or even lower to the standard deviation values of the experiments. Hence, it is eminent that the calculation of the residual mechanical properties of corroded specimens can be performed by quantitative analysis of the corroded surface and trained SVMs. This is of imperative importance according to the damage tolerance philosophy in aircraft structures as the corroded part may still carry mechanical loads and shall not be replaced.

5

Conclusions

Support Vector Machines were used to predict the effect of existing corrosion damage on the residual tensile properties of the Al 2024-T3 aluminum. An extensive experimental preprocessing was performed in order to scan and analyze (with image processing techniques) different pre-corroded surfaces from tensile specimens to extract training features. The pre-corroded tensile specimens were then tensile tested and the residuals of six mechanical properties (yield strength, tensile strength, elongation to fracture, and strain energy density) were evaluated. Several pitting characteristics were directly correlated to the degree of decrease of the tensile mechanical properties. The results achieved by the SVMs show that the predicted values of the mechanical properties have been in very good agreement with the experimental data and this can be proven valuable for optimizing service time and inspection operational procedures. Finally, the prediction accuracy achieved is encouraging for the exploitation of the SVM models also for other alloys in use in engineering structural applications.

References 1. Pantelakis, S., Daglaras, P., Apostolopoulos, C.: Tensile and energy density properties of 2024, 6013, 8090 and 2091 aircraft aluminum alloy after corrosion exposure. Theoretical and Applied Fracture Mechanics 33, 117–134 (2000) 2. Kamoutsi, H., Haidemenopoulos, G., Bontozoglou, V., Pantelakis, S.: Corrosioninduced hydrogen embrittlement in aluminum alloy 2024. Corrosion Science 48, 1209–1224 (2006) 3. Alexopoulos, N., Papanikos, P.: Experimental and theoretical studies of corrosioninduced mechanical properties degradation of aircraft 2024 aluminum alloy. Materials Science and Engineering A A 498, 248–257 (2008) 4. Alexopoulos, N.: On the corrosion-induced mechanical degradation for different artificial aging conditions of 2024 aluminum alloy. Materials Science and Engineering A A520, 40–48 (2009) 5. Fang, S., Wanga, M., Song, M.: An approach for the aging process optimization of Al-Zn-Mg-Cu series alloys. Materials and Design 30, 2460–2467 (2009) 6. Sheikh, H., Serajzadeh, S.: Estimation of flow stress behavior of AA5083 using artificial neural networks with regard to dynamic strain ageing effect. Journal of Materials Processing Technology 196, 115–119 (2008) 7. Ramana, K., Anita, T., Mandal, S., Kaliappan, S., Shaikh, H.: Effect of different environmental parameters on pitting behavior of AISI type 316L stainless steel: Experimental studies and neural network modeling. Materials and Design 30, 3770– 3775 (2009)

18


8. Wang, H.T., Han, E.H., Ke, W.: Artificial neural network modeling for atmospheric corrosion of carbon steel and low alloy steel. Corrosion Science and Protection Technology 18, 144–147 (2006) 9. Kamrunnahar, M., Urquidi-Macdonald, M.: Prediction of corrosion behavior using neural network as a data mining tool. Corrosion Science (2009) (in press) 10. Leifer, J.: Prediction of aluminum pitting in natural waters via artificial neural network analysis. Corrosion 56, 563–571 (2000) 11. Pidaparti, R., Neblett, E.: Neural network mapping of corrosion induced chemical elements degradation in aircraft aluminum. Computers, Materials and Continua 5, 1–9 (2007) 12. Liu, Y., Zhong, Q., Zhang, Z.: Predictive model based on artificial neural network for fatigue performance of prior-corroded aluminum alloys. Acta Aeronautica et Astronautica Sinica 22, 135–139 (2001) 13. Fan, C., He, Y., Zhang, H., Li, H., Li, F.: Predictive model based on genetic algorithm-neural network for fatigue performances of pre-corroded aluminum alloys. Key Engineering Materials 353-358, 1029–1032 (2007) 14. Fan, C., He, Y., Li, H., Li, F.: Performance prediction of pre-corroded aluminum alloy using genetic algorithm-neural network and fuzzy neural network. Advanced Materials Research 33-37, 1283–1288 (2008) 15. Jones, K., Hoeppner, D.: Prior corrosion and fatigue of 2024-T3 aluminum alloy. Corrosion Science 48, 3109–3122 (2006) 16. Van der Walde, K., Hillbrry, B.: Initiation and shape development of corrosionnucleated fatigue cracking. International Journal of Fatigue 29, 1269–1281 (2007) 17. Vapnik, V.: The Nature of Statistical Learning Theory. Wiley, New York (1998) 18. Webb, A.R.: Statistical Pattern Recognition, 2nd edn. Wiley, Chichester (2002) 19. MediaCybernetics: Image pro web page., http://www.mediacy.com/index.aspx?page=IPP 20. Joachims, T.: SVM light (2002), http://svmlight.joachims.org 21. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~ cjlin/libsvm

Mutual Information Measures for Subclass Error-Correcting Output Codes Classification Nikolaos Arvanitopoulos, Dimitrios Bouzas, and Anastasios Tefas Aristotle University of Thessaloniki, Department of Informatics Artificial Intelligence & Information Analysis Laboratory {niarvani,dmpouzas}@csd.auth.gr, [email protected]

Abstract. Error-Correcting Output Codes (ECOCs) reveal a common way to model multi-class classification problems. According to this state of the art technique, a multi-class problem is decomposed into several binary ones. Additionally, on the ECOC framework we can apply the subclasses technique (sub-ECOC), where by splitting the initial classes of the problem we aim to the creation of larger but easier to solve ECOC configurations. The multi-class problem’s decomposition is achieved via a searching procedure known as sequential forward floating search (SFFS). The SFFS algorithm in each step searches for the optimum binary separation of the classes that compose the multi-class problem. The separation decision is based on the maximization or minimization of a criterion function. The standard criterion used is the maximization of the mutual information (MI) between the bi-partitions created in each step of the SFFS. The materialization of the MI measure is achieved by a method called fast quadratic Mutual Information (FQMI). Although FQMI is quite accurate in modelling the MI, its computation is of high algorithmic complexity, which as a consequence makes the ECOC and sub-ECOC techniques applicable only on small datasets. In this paper we present some alternative separation criteria of reduced computational complexity that can be used in the SFFS algorithm. Furthermore, we compare the performance of these criteria over several multi-class classification problems. Keywords: Multi-class classification, Subclasses, Error-Correcting Output Codes, Support Vector Machines, Sequential Forward Floating Search, Mutual Information.

1

Introduction

In the literature one can find various binary classification techniques. However, in the real world the problems to be addressed are usually multi-class. In dealing with multi-class problems we must use the binary techniques as a leverage. This can be achieved by defining a method that decomposes the multi-class problem into several binary ones, and combines their solutions to solve the initial multiclass problem [1]. In this context, the Error-Correcting Output Codes (ECOCs) S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 19–28, 2010. c Springer-Verlag Berlin Heidelberg 2010

20

N. Arvanitopoulos, D. Bouzas, and A. Tefas

emerged. Based on the error correcting principles [2] and on its ability to correct the bias and variance errors of the base classifiers [3], this state of the art technique has been proved valuable in solving multi-class classification problems over a number of fields and applications. As proposed by Escalera et al. [4], on the ECOC framework we can apply the subclass technique. According to this technique, we use a guided problem dependent procedure to group the classes and split them into subsets with respect to the improvement we obtain in the training performance. Both the ECOC and sub-ECOC techniques can be applied independently to different types of classifiers. In our work we applied both of those techniques on Linear and RBF (Radial Basis Function) SVM (Support Vector Machine) classifiers with various configurations. SVMs are very powerful classifiers capable of materializing optimum classification surfaces that give improved results in the test domain. As mentioned earlier, the ECOC as well as the sub-ECOC techniques use the SFFS algorithm in order to decompose a multi-class problem into smaller binary ones. The problem’s decomposition is based on a criterion function that maximizes or minimizes a certain quantity acording to the nature of the criterion used. The common way is to maximize the MI (mutual information) in both the bi-partitions created by SFFS. As proposed by Torkkola [5], we can model the MI in the bi-partitions through the FQMI (Fast Quadratic Mutual Information) method. However, although the FQMI procedure is quite accurate in modelling the MI of a set of classes, it turns out to be computational costly. In this paper we propose some novel MI measures of reduced computational complexity, where in certain classification problems yield better performance results than the FQMI. Furthermore, we compare these MI measures over a number of multi-class classification problems in the UCI machine learning repository [6]. 1.1

Error Correcting Output Codes (ECOC)

Error Correcting Output Codes is a general framework to solve multi-class problems by decomposing them into several binary ones. This technique consists of two separate steps: a) the encoding and b) the decoding step [7]. a) In the encoding step, given a set of N classes, we assign a unique binary string called codeword 1 to each class. The length n of each codeword represents the number of bi-partitions (groups of classes) that are formed and, consequently, the number of binary problems to be trained. Each bit of the codeword represents the response of the corresponding binary classifier and it is coded by +1 or -1, according to its class membership. The next step is to arrange all these codewords as rows of a matrix obtaining the so-called coding matrix M, where M ∈ {−1, +1}N ×n. Each column of this matrix defines a partition of classes, while each row defines the membership of the corresponding class in the specific binary problem. 1

The codeword is a sequence of bits of a code representing each class, where each bit identifies the membership of the class for a given binary classifier.

Mutual Information Measures for Subclass ECOCs Classification

21

An extension of this standard ECOC approach was proposed by Allwein et al. [1] by adding a third symbol in the coding process. The new coding matrix M is now M ∈ {−1, 0, +1}N ×n. In this approach, the zero symbol means that a certain class is not considered by a specific binary classifier. As a result, this symbol increases the number of bi-partitions to be created in the ternary ECOC framework. b) The decoding step of the ECOC approach consists of applying the n different binary classifiers to each data sample in the test set, in order to obtain a code for this sample. This code is then compared to all the codewords of the classes defined in the coding matrix M (each row in M defines a codeword) and the sample is assigned to the class with the closest codeword. The most frequently used decoding methods are the Hamming and the Euclidean decoding distances. 1.2

Sub-ECOC

Escalera et al. [4] proposed that from an initial set of classes C of a given multiclass problem, we can define a new set of classes C , where the cardinality of C is greater than that of C, that is |C | > |C|. The new set of binary problems that will be created will improve the created classifiers’ training performance. Additionally to the ECOC framework Pujol [8] proposed that we can use a ternary problem dependent design of ECOC, called discriminant ECOC (DECOC) where, given a number of N classes, we can achieve a high classification performance by training only N − 1 binary classifiers. The combination of the above mentioned methods results in a new classification procedure called subECOC. The procedure is based on the creation of discriminant tree structures which depend on the problem domain. These binary trees are built by choosing the problem partitioning that maximizes the MI between the samples and their respective class labels. The structure as a whole describes the decomposition of the initial multi-class problem into an assembly of smaller binary sub-problems. Each node of the tree represents a pair that consists of a specific binary sub-problem with its respective classifier. The construction of the tree’s nodes is achieved through an evaluation procedure described in Escalera et al. [4]. According to this procedure, we can split the bi-partitions that consist the current sub-problem examined. Splitting can be achieved using K-means or some other clustering method. After splitting we form two new problems that can be examined separately. On each one of the new problems created, we repeat the SFFS procedure independently in order to form two new separate sub-problem domains that are easier to solve. Next, we evaluate the two new problem configurations against three user defined thresholds {θp , θs , θi } described below. If the thresholds are satisfied, the new created pair of sub-problems is accepted along with their new created binary classifiers, otherwise they are rejected and we keep the initial configuration with its respective binary classifier. – θp : Performance of created classifier for newly created problem (after splitting).

22


– θs : Minimum cluster’s size. – θi : Performance’s improvement of current classifier for newly created problem against previous classifier (before splitting). 1.3

Loss Weighted Decoding Algorithm

In the decoding process of the sub-ECOC approach we use the Loss Weighted Decoding algorithm [7]. As already mentioned, the 0 symbol in the decoding matrix allows to increase the number of binary problems created and as a result the number of different binary classifiers to be trained. Standard decoding techniques, such as the Euclidean or the Hamming distance do not consider this third symbol and often produce non-robust results. So, in order to solve the problems produced by the standard decoding algorithms, the loss weighted decoding was proposed. The main objective is to define a weighting matrix MW that weights a loss function to adjust the decision of the classifiers. In order to obtain the matrix MW , a hypothesis matrix H is constructed first. The elements H(i, j) of this matrix are continuous values that correspond to the accuracy of the binary classifier hj classifying the samples of class i. The matrix H has zero values in the positions which correspond to unconsidered classes, since these positions do not contain any representative information. The next step is the normalization of the rows of matrix H. This is done, so that the matrix MW can be considered as a discrete probability density function. This is very important, since we assume that the probability of considering each class for the final classification is the same. Finally, we decode by computing the weighted sum of our coding matrix M and our binary classifier with the weighting matrix MW and assign our test sample to the class that attains the minimum decoding value. 1.4

Sequential Forward Floating Search

The Floating search methods are a family of suboptimal sequential search methods that were developed as an alternative counterpart to the more computational costly exhaustive search methods. These methods allow the search criterion to be non-monotonic. They are also able to counteract the nesting effect by considering conditional inclusion and exclusion of features controlled by the value of the criterion itself. In our approach we use a variation of the Sequential Forward Floating Search (SFFS) [9] algorithm. We modified the algorithm so that it can handle criterion functions evaluated using subsets of classes. We apply a number of backward steps after each forward step, as long as the resulting subsets are better than the previously evaluated ones at that level. Consequently, there are no backward steps at all if the performance cannot be improved. Thus, backtracking in this algorithm is controlled dynamically and, as a consequence, no parameter setting is needed. The SFFS method is described in algorithm 1.


23

Algorithm 1. SFFS for Classes Input: Y = {yj |j = 1, . . . , Nc } // available classes Output: // disjoint subsets with maximum MI between the features and their class labels Xk = {xj |j = 1, . . . , k, xj ∈ Y }, k = 0, 1, . . . , Nc Xk = {xj |j = 1, . . . , k , xj ∈ Y }, k = 0, 1, . . . , Nc Initialization: X0 := ∅, XN := Y ; k := 0, k := Nc // k and k denote the number of classes in each subset c Termination: Stop when k = Nc and k = 0 Step 1 (Inclusion) the most significant 11: x+ := arg max J(Xk + x, Xk − x) class with respect to the group {Xk , Xk } x∈Y −Xk 1: 2: 3: 4: 5: 6: 7: 8: 9: 10:

12: Xk+1 := Xk + x+ ; Xk −1 := Xk − x+ ; k := k + 1, k := k − 1 13: Step 2 (Conditional exclusion) the least significant class 14: x− := arg max J(Xk − x, Xk + x) with respect to the group {Xk , Xk } x∈Xk 15: if J(Xk − x− , Xk + x− ) > J(Xk−1 , Xk +1 ) then

16: Xk−1 := Xk − x− ; Xk +1 := Xk + x− ; k := k − 1, k := k + 1 17: go to Step 2 18: else 19: go to Step 1 20: end if

1.5

Fast Quadratic Mutual Information (FQMI)

Consider two random vectors x1 and x2 and let p(x1 ) and p(x2 ) be their probability density functions respectively. Then the MI of x1 and x2 can be regarded as a measure of the dependence between them and is defined as follows: I(x1 , x2 ) =

p(x1 , x2 ) log

p(x1 , x2 ) dx1 dx2 p(x1 )p(x2 )

(1)

Note that when the random vectors x1 and x2 are stochastically independent, it holds that p(x1 , x2 ) = p(x1 )p(x2 ). It is of great importance to mention that (1) can be interpreted as a KullbackLeibler divergence, defined as follows: f1 (x) K(f1 , f2 ) = f1 (x) log dx (2) f2 (x) where f1 (x) = p(x1 , x2 ) and f2 (x) = p(x1 )p(x2 ). According to Kapur and Kesavan [10], if we seek to find the distribution that maximizes or alternatively minimizes the divergence, several axioms could be relaxed and it can be proven that K(f1 , f2 ) is analogically related to D(f1 , f2 ) = (f1 (x) − f2 (x))2 dx. Consequently, maximization of K(f1 , f2 ) leads to maximization of D(f1 , f2 ) and vice versa. Considering the above we can define the quadratic mutual information as follows IQ (x1 , x2 ) = (p(x1 , x2 ) − p(x1 )p(x2 ))2 dx1 dx2 (3)

24


Using Parzen window estimators we can estimate the probability density functions in (3) and combining with Gaussian kernels the following property is applicable: Let N (x, Σ) be a n-dimensional Gaussian function; it can be shown that N (x − a1 , Σ1 )N (x − a2 , Σ2 )dx = N (a1 − a2 , Σ1 − Σ2 ) (4) and by the use of this property we avoid one integration. In our case, we calculate the amount of mutual information between the random vector x of the features and the discrete random variable associated to the class labels created for a given partition (y). The practical implementation of this computation is defined as follows: Let N be the number of pattern samples in the entire data set, Ji the number of samples of class i, let Nc be the number of classes in the entire data set, let xi be the ith feature vector of the data set, and let xij be the jth feature vector of the set in class i. Consequently, p(y = yp ) and p(x|y = yp ), where 1 ≤ p ≤ Nc can be written as: p(y = yp ) = p(x|y = yp ) =

Jp , N Jp 1 Jp

N (x − xpj , σ 2 I),

j=1

Jp 1 p(x) = N (x − xj , σ 2 I). N j=1

By the expansion of (3) while using a Parzen estimator with symmetrical kernel of width σ, we get the following equation: IQ (x, y) = VIN + VALL − 2VBT W ,

(5)

where VIN =

y

VALL =

y

VBT W =

x

x

y

p(x, y)2 dx =

Jp Jp Nc 1 N (xpl − xpk , 2σ 2 I), 2 N p=1

(6)

l=1 k=1

p(x)2 p(y)2 dx =

2 Nc N N 1 Jp N (xl − xk , 2σ 2 I), (7) N 2 p=1 N l=1 k=1

Nc N Jp 1 Jp p(x, y)p(x)p(y)dx = 2 N (xl − xpk , 2σ 2 I). (8) N p=1 N x l=1 k=1

The computational complexity of (5) is comprised of the computational complexity of (6) - (8) and is given in table 1. Furthermore, it is known that the FQMI requires many samples to be accurately computed by Parzen window estimation. Thus, we can assume that when the number of samples N is much greater than their respective dimensionality, that is, N >> d, the complexity of VALL , which is quadratic with respect to N , is dominant for the equation (5).


25

Table 1. Computational Complexity for terms VIN , VALL , VBT W [Nc = classes #, N = samples #, Jp = samples # in class p, d = samples’ dimension] FQMI Terms Computational Complexity VIN O(Nc Jp2 d2 ) VALL O(Nc N 2 d2 ) VBT W O(Nc N Jp2 d2 )

2

Separation Criterions

The standard separation criterion for use in the SFFS algorithm, as proposed by Escalera et al. [4], is the maximization of the Mutual Information between the two created bi-partitions of classes and their respective class labels. That is, in each iteration of the SFFS algorithm two partitions of classes are constructed with labels {−1, +1} respectively. As already mentioned, the above procedure is computationaly costly because the FQMI computation in each step of SFFS is applied on all the samples of the considered bi-partitions. We reduce the computational cost if we avoid the computation of FQMI for both of the bipartitions and apply it only on one of them in each step of SFFS. As can be seen in table 1, another possibility is to avoid computing the term VALL which is of quadratic complexity with respect to the number of samples N . By discarding the computation of the VALL term in the FQMI procedure and considering a Fisher like ratio with the available terms VIN and VBT W which are of lower complexity, we can reduce significantly the running time. Finally, we can further reduce the running time if in the Fisher like ratio mentioned, we consider only a representative subset of classes’ samples. Based on these ideas we propose three different variations of the standard criterion {C1 , C2 , C3 } which are outlined below: – Criterion C1 : In criterion C1 we apply the standard FQMI computation only in the current subset of classes that are examined by SFFS in each iteration step. That is, we do not consider in the computation the remaining set of classes that do not belong in the current subset. In this case our goal is to minimize the above measure. In particular, the criterion J(X, X ) in the lines 11, 14, 15 of the SFFS algorithm reduces to the criterion J(X). Here, FQMI is evaluated between the subset X and the original class labels of the samples that consist it. The computational complexity of this variation remains quadratic with respect to the number of samples of the group in which the FQMI is evaluated. The evaluation, though, is done using much less data samples and consequently the running time is less than the original approach. – Criterion C2 : In criterion C2 we consider the maximization of the ratio C2 =

VIN VBT W

26


where VIN and VBT W are computed as in equations (6) and (8). Here we omit the costly computation of the quantity VALL . The resulting computational complexity as can be seen from table 1 is quadratic to the number of samples Jp of the binary group, that is p ∈ {−1, +1}. – Criterion C3 : The computational cost of FQMI is mostly attributed to the number of samples N . Thus, if we reduce the number of samples we can achieve a drastic reduction of the computational complexity. To this end we can represent each class by only one sample. This sample can be a location estimator such as the mean or the median. We propose the use of the mean vector as the only representative of each class and the criterion C2 reduces to minimizing of VBT W where in this case VBT W is given by: VBT W =

Nc Nc 1 ˜ j , 2σ 2 I) N (˜ xi − x Nc2 i=1 j=1

˜ i is the mean vector of class i. where x The new variation has quadratic complexity with respect to the number of classes Nc of the bipartition, since the computation of the mean vectors takes linear time with respect to number of samples in each class Jp .

3

Experimental Results

Datasets. We compared the proposed criteria using eight datasets of the UCI Machine Learning Repository. The characteristics of each dataset can be seen in table 2. All the features of each dataset were scaled to the interval [−1, +1]. To evaluate the test error on the different experiments, we used 10-fold cross validation. Sub-class ECOC configuration. The set of parameters θ = {θp , θs , θi } in the subclass approach were fixed in each dataset to the following values: – θp = 0%, split the classes if the classifier does not attain zero training error. – θs = |J| 50 , minimum number of samples in each constructed cluster, where |J| is the number of features in each dataset. – θi = 5%, the improvement of the newly constructed binary problems after splitting. Furthermore, as a clustering method we used the K-means algorithm with the number of clusters K = 2. As stated by Escalera et al. [4], the K-means algorithm obtains similar results with other more sophisticated clustering algorithms, such as hierarchical and graph cut clustering, but with much less computational cost. In the tables 3 and 4 we present the results from our experiments in the UCI datasets using the DECOC and sub-ECOC approaches. In each column we illustrate the corresponding 10 fold cross-validation performance and in the case of the sub-ECOC method the (mean number of rows × mean number of columns) of the encoding matrices which are formed in each fold.


27

Table 2. UCI Machine Learning Repository Data Sets Characteristics Database Samples Attributes Classes Iris 150 4 3 Ecoli 336 8 8 Wine 178 13 3 Glass 214 9 7 Thyroid 215 5 3 Vowel 990 10 11 Balance 625 4 3 Yeast 1484 8 10

Table 3. UCI Repository Experiments for linear SVM C=100 Database

ECOC

Iris

97.33%

Ecoli

82.98%

Wine

96.07%

Glass

63.16%

Thyroid

96.77%

Vowel

73.94%

Balance

91.7%

Yeast

56.6%

FQMI sub-ECOC 97.33% (3.3 × 2.3) 80.71% (10.2 × 10.6) 96.07% (3 × 2) 66.01% (13 × 14.3) 96.77% (3.3 × 2.6) 77.47% (27.2 × 29) 83.56% (54.3 × 64.6) 53.49% (29.5 × 36.7)

Criterion 1 ECOC sub-ECOC 97.33% 84.85% 96.07% 60.58% 96.77% 50.91% 91.7% 39.36%

97.33% (3.3 × 2.3) 84.85% (8.2 × 7.2) 96.07% (3 × 2) 63.64% (7.1 × 6.1) 96.77% (6 × 7.1) 52.73% (18.1 × 16.9) 89.31% (26.4 × 27) 39.36% (10 × 9)


97.33% (3.3 × 2.3) 78.21% (8 × 7) 96.73% (3 × 2) 59.78% (7 × 6) 94.89% (5.9 × 7.6) 45.35% (15.1 × 14) 75.71% (416 × 508) 42.63% (10.2 × 9.2)


97.33% (3.3 × 2.3) 80.63% (8.4 × 7.6) 96.07% (3 × 2) 62.85% (9.4 × 8.8) 96.77% (3 × 2) 86.57% (23.1 × 22) 88.65% (9.5 × 8.4) 36.23% (15.7 × 17)

SVM configuration. As a standard classifier for our experiments we used the libsvm [11] implementation of the Support Vector Machine with linear and RBF kernel. For both linear and RBF SVM we fixed the cost parameter C to 100 and for the RBF SVM we fixed the σ parameter to 1. Table 4. UCI Repository Experiments for RBF SVM C=100, σ = 1 Database

ECOC

Iris

96%

Ecoli

82.83%

Wine

97.74%

Glass

69.39%

Thyroid

95.35%

Vowel

99.09%

Balance

95.04%

Yeast

58.6%

FQMI sub-ECOC 96% (3 × 2) 82.56% (13.1 × 16) 97.74% (3 × 2) 70.78% (7.9 × 7.6) 95.35% (3.2 × 2.4) 99.09% (11 × 10) 95.04% (3 × 2) 55.44% (27.3 × 33.4)

Criterion 1 ECOC sub-ECOC 96% 85.10% 97.74% 69.39% 95.35% 99.09% 95.04% 56.66%

96% (3 × 2) 85.13% (8.6 × 7.6) 97.74% (3 × 2) 69.39% (6 × 5) 95.82% (3.8 × 3.4) 99.09% (11 × 10) 95.04% (3 × 2) 56.66% (10 × 9)


96% (3 × 2) 84.08% (8.1 × 7.1) 97.18% (3 × 2) 64.77% (6 × 5) 95.32% (5 × 5.4) 98.59% (11 × 10) 95.51% (3 × 2) 52.75% (10.5 × 9.5)


96 % (3 × 2) 85.04% (8.1 × 7.1) 97.74% (3 × 2) 68.48% (6 × 5) 95.35% (3 × 2) 98.99% (11 × 10) 95.04% (3 × 2) 52.04% (20.7 × 22.1)

From the experiments it is obvious that the proposed criteria attain similar performance in most cases with the FQMI criterion whereas, in terms of computational speed we found that for the tested databases C1 and C2 run approximately 4 times faster and criterion C3 runs approximately 100 times faster. Moreover, FQMI cannot be applied to databases having a great number of

28


samples. However, the proposed criterion C3 can be used in very large databases arising in applications such as Data Mining.

4

Conclusion

Although FQMI is a quite accurate method for modeling the MI between classes, its computational complexity makes it impractical for real life classification problems. FQMI’s inability to address large datasets makes the ECOC - sub-ECOC methods also impractical. As it has been illustrated in our paper, we can substitute FQMI with other MI measures of less computational complexity and attain similar or even in quite a few cases better classification results. These novel MI measures proposed, make the ECOC and sub-ECOC methods applicable in large real-life datasets.

References 1. Allwein, E.L., Schapire, R.E., Singer, Y.: Reducing multi-class to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research 1, 113–141 (2002) 2. Dietterich, T.G., Bakiri, G.: Solving multi-class learning problems via errorcorrecting output codes. Journal of Machine Learning Research 2, 263–282 (1995) 3. Kong, E., Dietterich., T.: Error-correcting output coding corrects bias and variance. In: Proc. 12th Intl Conf. Machine Learning, pp. 313–321 (1995) 4. Escalera, S., Tax, D.M., Pujol, O., Radeva, P., Duin, R.P.: Subclass problemdependent design for error-correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(6), 1041–1054 (2008) 5. Torkkola, K.: Feature extraction by non-parametric mutual information maximization. Journal of Machine Learning Research 3, 1415–1438 (2003) 6. Asuncion, A., Newman, D.: Uci machine learning repository (2007) 7. Escalera, S., Pujol, O., Radeva, P.: Loss-weighted decoding for error-correcting output coding. In: Proc. Int’l Conf. Computer Vision Theory and Applications, June 2008, vol. 2, pp. 117–122 (2008) 8. Pujol, O., Radeva, P., Vitria, J.: Discriminant ecoc: A heuristic method for application dependent design of error correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 1001–1007 (2006) 9. Pudil, P., Ferri, F., Novovicova, J., Kittler., J.: Floating search methods for feature selection with non-monotonic criterion functions. In: Proc. Int’l Conf. Pattern Recognition, March 1994, vol. 3, pp. 279–283 (1994) 10. Kapur, J., Kesavan, H.: Entropy Optimization principles with Applications (1992) 11. Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines (2001)

Conflict Directed Variable Selection Strategies for Constraint Satisfaction Problems Thanasis Balafoutis and Kostas Stergiou Department of Information and Communication Systems Engineering University of the Aegean, Samos, Greece {abalafoutis,konsterg}@aegean.gr

Abstract. It is well known that the order in which variables are instantiated by a backtracking search algorithm can make an enormous difference to the search effort in solving CSPs. Among the plethora of heuristics that have been proposed in the literature to efficiently order variables during search, a significant recently proposed class uses the learning-from-failure approach. Prime examples of such heuristics are the wdeg and dom/wdeg heuristics of Boussemart et al. which store and exploit information about failures in the form of constraint weights. The efficiency of all the proposed conflict-directed heuristics is due to their ability to learn though conflicts encountered during search. As a result, they can guide search towards hard parts of the problem and identify contentious constraints. Such heuristics are now considered as the most efficient general purpose variable ordering heuristic for CSPs. In this paper we show how information about constraint weights can be used in order to create several new variants of the wdeg and dom/wdeg heuristics. The proposed conflict-driven variable ordering heuristics have been tested over a wide range of benchmarks. Experimental results show that they are quite competitive compared to existing ones and in some cases they can increase efficiency. Keywords: Constraint Satisfaction, Variable Ordering Heuristics, Search.

1

Introduction

Constraint satisfaction problems (CSPs) and propositional satisfiability (SAT) are two automated reasoning technologies that have a lot in common regarding the approaches and algorithms they use for solving combinatorial problems. Most complete algorithms from both paradigms use constraint propagation methods together with variable ordering heuristics to improve search efficiency. Learning from failure has become a key component in solving combinatorial problems in the SAT community, through literals learning and weighting, e.g. as implemented in the Chaff solver [7]. This approach is based on learning new literals through conflict analysis and assigning weights to literals based on the number of times they cause a failure during search. This information can be then exploited by S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 29–38, 2010. c Springer-Verlag Berlin Heidelberg 2010

30

T. Balafoutis and K. Stergiou

the variable ordering heuristic to efficiently choose the variable to assign at each choice point. In the CSP community, learning from failure has followed a similar direction in recent years, in particular with respect to novel variable ordering heuristics. Boussemart et al. were the first to introduce SAT influenced heuristics that learn from conflicts encountered during search [3]. In their approach, constraint weights are used as a metric to guide the variable ordering heuristic towards hard parts of the problem. Constraint weights are continuously updated during search using information learned from failures. The advantage that these heuristics have is that they use previous search states as guidance, while most formerly proposed heuristics either use the initial or the current state. The heuristics of [3], called wdeg and dom/wdeg, are now probably considered as the most efficient general purpose variable ordering heuristic for CSPs. Subsequently, a number of alternative heuristics based on learning during search were proposed [8,4,6]. As discussed by Grimes and Wallace, heuristics based on constraint weights can be conceived in terms of an overall strategy that except from the standard Fail-First Principle also obeys the Contention Principle, which states that variables directly related to conflicts are more likely to cause a failure if they are chosen instead of other variables [6]. In this paper we focus on conflict-driven variable ordering heuristics based on constraint weights. We concentrate on an investigation of new general purpose variants of conflict-driven heuristics. These variants differ from wdeg and dom/wdeg in the way they assign weights to constraints. First we propose three new variants of the wdeg and dom/wdeg heuristics that record the constraint that is responsible for any value deletion during search. These heuristics then exploit this information to update constraint weights upon detection of failure. We also examine a SAT influenced weight aging strategy that gives greater importance to recent conflicts. Finally, we propose a new heuristic that tries to better identify contentious constraints by detecting all the possible conflicts after a failure. Experimental results from various random, academic and real world problems show that some of the proposed heuristics are quite competitive compared to existing ones and in some cases they can increase efficiency. The rest of the paper is organized as follows. Section 2 gives the necessary background material and an overview on the existing conflict-driven variable ordering heuristics. In Section 3 we propose several new general purpose variants of conflict-driven variable ordering heuristics. In Section 4 we experimentally compare the proposed heuristics to dom/wdeg on a variety of real, academic and random problems. Finally, conclusions are presented in Section 5.

2

Background

A Constraint Satisfaction Problem (CSP) is a tuple (X, D, C ), where X is a set containing n variables {x1 , x2 , ..., xn }; D is a set of domains {D(x1 ), D(x2 ),..., D(xn )} for those variables, with each D(xi ) consisting of the possible values which xi may take; and C is a set of constraints {c1 , c2 , ..., ck } between variables

Conflict Directed Variable Selection Strategies for CSPs

31

in subsets of X. Each ci ∈ C expresses a relation defining which variable assignment combinations are allowed for the variables vars(ci ) in the scope of the constraint. Two variables are said to be neighbors if they share a constraint. The arity of a constraint is the number of variables in the scope of the constraint. The degree of a variable xi , denoted by Γ (xi ) , is the number of constraints in which xi participates. A binary constraint between variables xi and xj will be denoted by cij . A partial assignment is a set of tuple pairs, each tuple consisting of an instantiated variable and the value that is assigned to it in the current search state. A full assignment is one containing all n variables. A solution to a CSP is a full assignment such that no constraint is violated. An arc is a pair (c, xi ) where xi ∈ vars(c). Any arc (cij , xi ) will be alternatively denoted by the pair of variables (xi ,xj ), where xj ∈ vars(cij ). That is, xj is the other variable involved in cij . An arc (xi ,xj ) is arc consistent (AC) iff for every value a ∈ D(xi ) there exists at least one value b ∈ D(xj ) such that the pair (a,b) satisfies cij . In this case we say that b is a support of a on arc (xi ,xj ). Accordingly, a is a support of b on arc (xj ,xi ). A problem is AC iff there are no empty domains and all arcs are AC. The application of AC on a problem results in the removal of all non-supported values from the domains of the variables. The definition of arc consistency for non-binary constraints, usually called generalized arc consistency (GAC), is a direct extension of the definition of AC. A support check (consistency check) is a test to find out if two values support each other. The revision of an arc (xi ,xj ) using AC verifies if all values in D(xi ) have supports in D(xj ). A domain wipeout (DWO ) revision is one that causes a DWO. That is, it results in an empty domain. In the following will use MAC (maintaining arc consistency) [9,1] as our search algorithm. In MAC a problem is made arc consistent after every assignment, i.e. all values which are arc inconsistent given that assignment, are removed from the current domains of their variables. If during this process a DWO occurs, then the last value selected is removed from the current domain of its variable and a new value is assigned to the variable. If no new value exists then the algorithm backtracks. 2.1

Overview of Existing Conflict-Driven Variable Ordering Heuristics

The order in which variables are assigned by a backtracking search algorithm has been understood for a long time to be of primary importance. A variable ordering can be either static, where the ordering is fixed and determined prior to search, or dynamic, where the ordering is determined as the search progresses. Dynamic variable orderings are considerably more efficient and have thus received much attention in the literature. One common dynamic variable ordering strategy, known as “fail-first”, is to select as the next variable the one likely to fail as quickly as possible. All other factors being equal, the variable with the smallest number of viable values in its (current) domain will have the fewest subtrees

32


rooted at those values, and therefore, if none of these contain a solution, the search can quickly return to a path that leads to a solution. Recent years have seen the emergence of numerous modern heuristics for choosing variables during CSP search. The so called conflict-driven heuristics exploit information about failures gathered throughout search and recorded in the form of constraint weights. Boussemart et al. [3] proposed the first conflict-directed variable ordering heuristics. In these heuristics, every time a constraint causes a failure (i.e. a domain wipeout) during search, its weight is incremented by one. Each variable has a weighted degree, which is the sum of the weights over all constraints in which this variable participates. The weighted degree heuristic (wdeg) selects the variable with the largest weighted degree. The current domain of the variable can also be incorporated to give the domain-over-weighted-degree heuristic (dom/wdeg) which selects the variable with minimum ratio between current domain size and weighted degree. Both of these heuristics (especially dom/wdeg) have been shown to be extremely effective on a wide range of problems. Grimes and Wallace [6] proposed alternative conflict-driven heuristics that consider value deletions as the basic propagation events associated with constraint weights. That is, the weight of a constraint is incremented each time the constraint causes one or more value deletions. They also used a sampling technique called random probing with which they can uncover cases of global contention, i.e. contention that holds across the entire search space. The three heuristics of [6] work as follows: 1. constraint weights are increased by the size of the domain reduction leading to a DWO. 2. whenever a domain is reduced in size during constraint propagation, the weight of the constraint involved is incremented by 1. 3. whenever a domain is reduced in size, the constraint weights are increased by the size of domain reduction (allDel heuristic).

3

Heuristics Based on Weighting Constraints

As stated in the previous section, the wdeg and dom/wdeg heuristics associate a counter, called weight, with each constraint of a problem. These counters are updated during search whenever a DWO occurs. Although experimentally it has been shown that these heuristics are extremely effective on a wide range of problems, in theory it seems quite plausible that they may not always assign weights to constraints in an accurate way. To better illustrate our conjecture about the accuracy in assigning weights to constraints, we give the following example. Example 1. Assume we are using MAC-3 (i.e. MAC with AC-3) to solve a CSP (X, D, C) where X includes, among others, the three variables {xi , xj , xk }, all having the same domain {a, b, c, d, e}, and C includes, among others, the two binary constraints cij , cik . Also assume that a conflict-driven variable ordering heuristic (e.g. dom/wdeg) is used, and that at some point during search AC tries


33

to revise variable xi . That is, it tries to find supports for the values in D(xi ) in the constraints where xi participates. Suppose that when xi is revised against cij , values {a, b, c, d} are removed from D(xi ) (i.e. they do not have a support in D(xj )). Also suppose that when xi is revised against cik , value {e} is removed from D(xi ) and hence a DWO occurs. Then, the dom/wdeg heuristic will increase the weight of constraint cik by one but it will not change the weight of cij . It is obvious from this example that although constraint cij removes more values from D(xi ) than cik , its important indirect contribution to the DWO is ignored by the heuristic. A second point regarding potential inefficiencies of wdeg and dom/wdeg has to do with the order in which revisions are made by the AC algorithm used. Coarse-grained AC algorithms, like AC-3, use a revision list to propagate the effects of variable assignments. It has been shown that the order in which the elements of the list are selected for revision affects the overall cost of search. Hence a number of revision ordering heuristics have been proposed [10,2]. In general, revision ordering and variable ordering heuristics have different tasks to perform when used in a search algorithm like MAC. Before the appearance of conflict-driven heuristics there was no way to achieve an interaction with each other, i.e. the order in which the revision list was organized during the application of AC could not affect the decision of which variable to select next (and vice versa). The contribution of revision ordering heuristics to the solver’s efficiency was limited to the reduction of list operations and constraint checks. However, when a conflict-driven variable ordering heuristic like dom/weg is used, then there are cases where the decision of which arc (or variable) to revise first can affect the variable selection. To better illustrate this interaction we give the following example. Example 2. Assume that we want to solve a CSP (X, D, C) using a conflictdriven variable ordering heuristic (e.g. dom/wdeg), and that at some point during search the following AC revision list is formed: Q={(x1 ), (x3 ), (x5 )}. Suppose that revising x1 against constraint c12 leads to the DWO of D(x1 ), i.e. the remaining values of x1 have no support in D(x2 ). Suppose also that the revision of x5 against constraint c56 leads to the DWO of D(x5 ), i.e. the remaining values of x5 have no support in D(x6 ). Depending on the order in which revisions are performed, one or the other between the two possible DWOs will be detected. If a revision ordering heuristic R1 selects x1 first then the DWO of D(x1 ) will be detected and the weight of constraint c12 will increased by 1. If some other revision ordering heuristic R2 selects x5 first then the DWO of D(x5 ) will be detected, but this time the weight of a different constraint (c56 ) will increased by 1. Although the revision list includes two variables (x1 , x5 ) that can cause a DWO, and consequently two constraint weights can be increased (c12 , c56 ), dom/wdeg will increase the weight of only one constraint depending on the choice of the revision heuristic. Since constraint weights affect the choices of the variable ordering heuristic, R1 and R2 can lead to different future decisions for variable instantiation. Thus, R1 and R2 may guide search to different parts of the search space.

34


From the above example it becomes clear that known heuristics based on constraint weights are quite sensitive to revision orderings and their performance can be affected by them. In order to overcome the above described weaknesses that the weighted degree heuristics seem to have, we next describe a number of new variable ordering heuristics which can be seen as variants of wdeg and dom/weg. All the proposed heuristics are lightweight as they affect the overall complexity only by a constant factor. 3.1

Constraints Responsible for Value Deletions

The first enhancement to wdeg and dom/wdeg tries to alleviate the problem illustrated in Example 1. To achieve this, we propose to record the constraint which is responsible for each value deletion from any variable in the problem. In this way, once a DWO occurs during search we know which constraints have, not only directly, but also indirectly contributed to the DWO. Based on this idea, when a DWO occurs in a variable xi , constraint weights can be updated in the following three alternative ways: – Heuristic H1: for every constraint that is responsible for any value deletion from D(xi ), we increase its weight by one. – Heuristic H2: for every constraint that is responsible for any value deletion from variable D(xi ), we increase its weight by the number of value deletions. – Heuristic H3: for every constraint that is responsible for any value deletion from variable D(xi ), we increase its weight by the normalized number of value deletions. That is, by the ratio between the number of value deletions and the size of D(xi ). The way in which the new heuristics update constraint weights is displayed in the following example. Example 3. Assume that when solving a CSP (X, D, C), the domain of some variable e.g. x1 is wiped out. Suppose that D(x1 ) initially was {a, b, c, d, e} and each of the values was deleted because of constraints: {c12 , c12 , c13 , c12 , c13 } respectively. The proposed heuristics will assign constraint weights as follows: H1(weightH1 [c12 ] = weightH1 [c13 ] = 1), H2(weightH2 [c12 ] = 3, weightH2 [c13 ] = 2) and H3(weightH3 [c12 ] = 3/5, weightH3 [c13 ] = 2/5) Heuristics H1, H2, H3 are closely related to the three heuristics proposed by Grimes and Wallace [6]. The last two heuristics in [6], record constraints responsible for value deletions and use this information to increase weights. However, the weights are increased during constraint propagation in each value deletion for all variables. Our proposed heuristics differ by increasing constraints weights only when a DWO occurs. As discussed in [6], DWOs seem to be particularly important events in helping identify hard parts of the problem. Hence we focus on information derived from DWOs and not just any value deletion.


3.2

35

Constraint Weight Aging

Most of the clause learning SAT solvers like BerkMin [5] and Chaff [7], use the strategy of weight “aging”. In such solvers, each variable is assigned a counter that stores the number of clauses responsible for at least one conflict . The value of this counter is updated during search. As soon as a new clause responsible for the current conflict is derived, the counters of the variables, whose literals are in this clause, are incremented by one. The values of all counters are periodically divided by a small constant greater than 1. This constant is equal to 2 for Chaff and 4 for BerkMin. In this way, the influence of “aged” clauses is decreased and preference is given to recently deduced clauses. Inspired from SAT solvers, we propose here the use of “aging” to periodically age constraint weights. As in SAT, constraint weights can be “aged” by periodically dividing their current value by a constant greater than 1. The period of divisions can be set according to a specified number of backtracks during search. With such a strategy we give greater importance to recently discovered conflicts. The following example illustrates the improvement that weight “aging” can contribute to the solver’s performance. Example 4. Assume that in a CSP (X, D, C) with D={0,1,2}, we have a ternary constraint c123 ∈ C for variables x1 , x2 , x3 with disallowed tuples {(0,0,0), (0,0,1), (0,1,1), (0,2,2)}. When variable x1 is set to a value different from 0 during search, constraint c123 is not involved in a conflict and hence its weight will not increase. However, in a branch that includes assignment x1 = 0, constraint c123 becomes highly “active” and a possible DWO in variable x2 or x3 should increase the importance of constraint c123 (more that a simple increment of its weight by one). We need a mechanism to quickly adopt changes in the problem caused by a value assignment. This can be done, by “aging” the weights of the other previously active constraints. 3.3

Fully Assigned Weights

When arc consistency is maintained during search using a coarse grained algorithm like AC-3, a revision list is created after each variable assignment. The variables that have been inserted into the list are removed and revised in turn. We observed that in the same revision list, different revision ordering heuristics can lead to the DWOs of different variables. To better illustrate this, we give the following example. Example 5. Assume that we use two different revision ordering heuristic R1 , R2 to solve a CSP (X, D, C), and that at some point during search the following AC revision list is formed for R1 and R2 . R1 :{X1 ,X2 }, R2 :{X2 ,X1 }. We also assume the following: a) The revision of X1 deletes some values from the domain of X1 and it causes the addition of the variable X3 in the revision list. b) The revision of X2 deletes some values from the domain of X2 and it causes the addition of the variable X4 in the revision list. c) The revision of X3 deletes some values

36


from the domain of X1 . d ) The revision of X4 deletes some values from the domain of X2 . e). A DWO occurs after a sequential revision of X3 and X1 . f ) A DWO occurs after a sequential revision of X4 and X2 . Considering the R1 list, the revision of X1 is fruitful and adds X3 in the list (R1 :{X3 ,X1 }). The sequential revision of X3 and X1 leads to the DWO of X1 . Considering the R2 list, the revision of X2 is fruitful and adds X4 in the list (R2 :{X4 ,X2 }). The sequential revision of X4 and X2 leads to the DWO of X2 . From the above example it is clear that although only one DWO is identified in a revision list, both X1 and X2 can be responsible for this. In R1 where X1 is the DWO variable, we can say that X2 is also a “potential” DWO variable i.e. it would be a DWO variable, if the R2 revision ordering was used. The question that arises here is: how can we identify the “potential” DWO variables that exists on a revision list? A first observation that can be helpful in answering this question is that “potential” DWO variables are among variables that participate in fruitful revisions. Based on this observation, we propose here a new conflict-driven variable ordering heuristic that takes into account the “potential” DWO variables. This heuristic increases the weights of constraints that are responsible for a DWO by one (as the wdeg heuristic does) and also, only for revision lists that lead to a DWO, increases by one the weights of constraints that participates in fruitful revisions. Hence, to implement this heuristic we record all variables that delete at least one value during the application of AC. If a DWO is detected, we increase the weight of all these variables. An interesting direction for future work can be a more selective identification of “potential” DWO variables.

4

Experiments and Results

In this section we experimentally investigate the behavior of the new proposed variable ordering heuristics on several classes of real, academic and random problems. All benchmarks are taken from C. Lecoutre’s web page1 , where the reader can find addition details about the description and the formulation of all the tested benchmarks. We compare the new proposed heuristics with dom/wdeg and allDel. Regarding the heuristics of Section 3.1, we only show results from dom/wdegH1 , dom/wdegH2 and dom/wdegH3 , denoted as H1, H2 and H3 for simplicity, which are more efficient than the corresponding versions that do not take the domain size into account. In our tests we have used the following measures of performance: cpu time in seconds (t) and number of visited nodes (n). The solver we used applies lexicographic value ordering and employs restarts. Concerning the restart policy, the initial number of allowed backtracks for the first run has been set to 10 and at each new run the number of allowed backtracks increases by a factor of 1.5. Regarding the aging heuristic, we have selected to periodically decrease all constraint weights by a factor of 2, with the period set 1

http://www.cril.univ-artois.fr/∼lecoutre/benchmarks.html


37

Table 1. Averaged values for Cpu times (t), and nodes (n) from 6 different problem classes. Best cpu time is in bold. Problem Class RLFAP scensMod (13 instances) RLFAP graphMod (12 instances) Driver (11 instances) Interval Series (10 instances) Golomb Ruler (6 instances) geo50-20-d4-75 (10 instances) frb30-15 (10 instances)

t n t n t n t n t n t n t n

dom/wdeg

H1

H2

H3

1,9 734 9,1 6168 22,4 10866 34 32091 274,9 7728 62,8 15087 37,3 20176

2 768 5,2 3448 7 2986 19,4 18751 321,4 10337 174,1 36949 35,1 18672

2,2 824 6,1 4111 7,8 3604 23,4 23644 173,1 4480 72,1 16970 45,8 24326

2,3 873 5,5 3295 11,6 5829 13,3 13334 143,4 3782 95 23562 57,2 30027

aged f ully allDel dom/wdeg assigned 1,7 2,2 2,2 646 738 809 12,9 13,4 11,1 8478 11108 9346 6,4 18,8 20 1654 4746 4568 6,5 66,4 17,4 5860 74310 26127 342,1 208,3 154,4 7863 6815 3841 69 57,6 76 15031 12508 18094 42,3 32,9 26,1 21759 17717 14608

to 20 backtracks. Our search algorithm is MGAC-3, denoting MAC with GAC-3. Experiments run on an Intel T4200 @2.00 GHz with 3GB RAM. Table 1 show results from six different problem classes. The first two classes are from the real world Radio Link Frequency Assignment Problem (RLFAP). For the scensMod class we have run 13 instances and in this table we present the averaged values for cpu time and nodes visited. Since these instances are quite easy to solve, all the heuristics have almost the same behavior. The aged version of the dom/wdeg heuristic has a slightly better performance. For the graphMod class we have run 12 instances. Here heuristics H1, H2, H3 that record the constraint which is responsible for each value deletion display better performance. The third problem class is from another real world problem, which is called Driver. In these 11 instances the aged dom/wdeg heuristic has on average the best behavior. The next 10 instances are from the non-binary academic problem “All Interval Series” and have maximum constraint arity of 3. We must notice here that the aged dom/wdeg heuristic, which has the best performance is five times faster compared to dom/wdeg. This good performance that the aged dom/wdeg heuristic has, is not generic within different problem classes. This can be seen in the next academic problem class (the well known Golomb Ruler problem) where the aged dom/wdeg heuristic, has the worst performance. The last two classes are from the “geo”quasirandom instances (random problems which contain some structure) and from the “frb” pure random instances that are forced to be satisfiable. Here, although on average the fullyAssigned and allDel heuristics have the best performance, within each class we observed a big variation in cpu time among all the tested heuristics. A possible explanation for this diversity is the lack of structure that random instances have. Finally we must also comment that interestingly the dom/wdeg heuristic does not achieve any win, in all the tested experiments. As a general comment we can say that experimentally, all the proposed heuristics are competitive with dom/wdeg and in many benchmarks a notable improvement is observed.

38

5


Conclusions

In this paper several new general purpose variable ordering heuristics are proposed. These heuristics follow the learning-from-failure approach, in which information regarding failures is stored in the form of constraint weights. By recording constraints that are responsible for any value deletion, we derive three new heuristics that use this information to spread constraint weights in a different way compared to the heuristics of Boussemart et al. We also explore a SAT inspired constraint aging strategy that gives greater importance to recent conflicts. Finally we proposed a new heuristic that tries to better identify contentious constraints by recording all the potential conflicts upon detection of failure. The proposed conflict driven variable ordering heuristics have been tested over a wide range of benchmarks. Experimental results shows they are quite competitive compared to existing ones and in some cases they can increase efficiency.

References 1. Bessière, C., Régin, J.C.: MAC and combined heuristics: two reasons to forsake FC (and CBJ?). In: Freuder, E.C. (ed.) CP 1996. LNCS, vol. 1118, pp. 61–75. Springer, Heidelberg (1996) 2. Boussemart, F., Hemery, F., Lecoutre, C.: Revision ordering heuristics for the Constraint Satisfaction Problem. In: Proceedings of CP 2004 Workshop on Constraint Propagation and Implementation, Toronto, Canada, pp. 29–43 (2004) 3. Boussemart, F., Hemery, F., Lecoutre, C., Sais, L.: Boosting systematic search by weighting constraints. In: Proceedings of 16th European Conference on Artificial Intelligence (ECAI 2004), Valencia, Spain, pp. 146–150 (2004) 4. Cambazard, H., Jussien, N.: Identifying and Exploiting Problem Structures Using Explanation-based Constraint Programming. Constraints 11, 295–313 (2006) 5. Goldberg, E., Novikov, Y.: BerkMin: a Fast and Robust Sat-Solver. In: Proceedings of DATE 2002, pp. 142–149 (2002) 6. Grimes, D., Wallace, R.J.: Sampling strategies and variable selection in weighted degree heuristics. In: Bessière, C. (ed.) CP 2007. LNCS, vol. 4741, pp. 831–838. Springer, Heidelberg (2007) 7. Moskewicz, M., Madigan, C., Malik, S.: Chaff: Engineering an efficient SAT solver. In: Proceedings of Design Automation Conference, pp. 530–535 (2001) 8. Refalo, P.: Impact-based search strategies for constraint programming. In: Wallace, M. (ed.) CP 2004. LNCS, vol. 3258, pp. 556–571. Springer, Heidelberg (2004) 9. Sabin, D., Freuder, E.C.: Contradicting conventional wisdom in constraint satisfaction. In: Proceedings 2nd Workshop on Principles and Practice of Constraint Programming (CP 1994), pp. 10–20 (1994) 10. Wallace, R., Freuder, E.: Ordering heuristics for arc consistency algorithms. In: AI/GI/VI, Vancouver, British Columbia, Canada, pp. 163–169 (1992)

A Feasibility Study on Low Level Techniques for Improving Parsing Accuracy for Spanish Using Maltparser Miguel Ballesteros1, Jes´ us Herrera1, Virginia Francisco2 , and Pablo Gerv´ a s2 1

Departamento de Ingenier´ıa del Software e Inteligencia Artificial 2 Instituto de Tecnolog´ıa del Conocimiento Universidad Complutense de Madrid C/ Profesor José Garc´ıa Santesmases, s/n E–28040 Madrid, Spain {miballes,jesus.herrera,virginia}@fdi.ucm.es, [email protected]

Abstract. In the last years dependency parsing has been accomplished by machine learning–based systems showing great accuracy but usually under 90% for Labelled Attachment Score (LAS). Maltparser is one of such systems. Machine learning allows to obtain parsers for every language having an adequate training corpus. Since generally such systems can not be modified the following question arises: Can we beat this 90% LAS by using better training corpora? Some previous work points that high level techniques are not sufficient for building more accurate training corpora. Thus, by analyzing the words that are more frequently incorrectly attached or labelled, we study the feasibility of some low level techniques, based on n–version parsing models, in order to obtain better parsing accuracy.

1

Introduction

In the 10th edition of the Conference of Computational Natural Language Learning (CoNLL) a first Shared Task on Multilingual Dependency Parsing was accomplished [1]. Thirteen different languages including Spanish were involved. Participants should implement a parsing system that could be trained for all these languages. Maltparser achieved great results in this task, in which Spanish was proposed for parsing. The goal of the present work was to study the feasibility of low level techniques to obtain a better parsing performance when the parsing system (based on machine learning) can not be modified. 90% Labelled Attachment Score seems to be a de facto limit for contemporary dependency parsers. Some previous works [2] have been developed on how to improve dependency parsing by applying high level tecnhiques to obtain better training corpora. The conclusions of these works are that overall accuracy can not be enhanced by modifying training corpus’ size or its sentences’ lengths. In adition local accuracy is important too, but it has not been solved yet. N–version parsers could be the way to obtain better overall S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 39–48, 2010. c Springer-Verlag Berlin Heidelberg 2010

40

M. Ballesteros et al.

accuracies by obtaining better local accuracies. N–version parsers consist of n specifically trained models, each one able to parse one kind or a small range of kinds of sentences. Thus, a n–version parser should select the specific model that would better parse the sentence that is used as input. Each specific model would improve parsing accuracy of the sentences for which is specialized, producing a better overall parsing acurracy. After selecting a small number of words that are more frequently incorrectly attached or labelled, we started a thorough analysis of the parsings that contained those words. We selected the two most frequently incorrectly attached or labelled words, i.e., the conjunction and (“y” or “e” in Spanish) and the preposition to (“a” in Spanish.). These words led us to develop preliminary works on low level techniques useful to reach better parsing accuracy by improving attachment and labelling. Maltparser 0.4 is the public available software of the system presented by Nivre’s group to the CoNLL–X Shared Task. Since Spanish was the language for which we decided to develop the present work and we have already developed some previous work on dependency parsing using Maltparser [3,4,5], we used Maltparser 0.4 to carry out our experiments. The paper is organized as follows: Section 2 describes the CoNLL–X Shared Task focusing on Spanish participation; also we show our results when replicating the participation of Nivre’s group. Section 3 shows our consideration about local parsing accuracy. Section 4 shows two cases study in which the conjunction and preposition “a” are used to evaluate the feasibility of low level techniques oriented to obtaining better local parsing results. Finally, Section 5 shows the conclusions of the presented work and suggests some future work.

2

The CoNLL–X Shared Task

Each year the Conference of Computational Natural Language Learning (CoNLL) features a shared task, the 10th CoNLL Shared Task was Multilingual dependency parsing [1]. The goal of this Shared Task was to label dependency structures by means of a fully automatic dependency parser. This task provided a benchmark for evaluating the parsers presented to it accross 13 languages among which is Spanish. Systems were scored by computing their Labelled Attachment Score (LAS), i.e. the percentage of “scoring” tokens for which the system had predicted the correct head and dependency label [6]. Also Unlabelled Attachment Score (UAS) and Label Accuracy (LA). UAS is the percentage of “scoring” tokens for which the system had predicted the correct head [7]. LA is the percentage of “scoring” tokens for which the system had predicted the correct dependency label [8]. Our research is focused on Spanish parsing. For this language results across the 19 participants ranged from 47.0% to 82.3% LAS, with an average of 73.5%. The Spanish treebank used was AnCora [9], [10], a 95,028 wordforms corpus containing open–domain texts annotated with their dependency analyses. AnCora

A Feasibility Study on Low Level Techniques

41

was developed by the Clic group at Barcelona University. The results of Spanish parsing were in the average. The two participant groups with the highest total score for Spanish were McDonald’s group [11] (82.3% LAS) and Nivre’s group [12] (81.3% LAS). We are specially interested in Nivre’s group research because we used their system (Maltparser 0.4) for the experiments presented in this paper. Other participants that used the Nivre algorithm in the CoNLL–X Shared Task were Johansson’s group [13] and Wu’s group [14]. Their scores on Spanish parsing were 78.2% (7th place) and 73.2% (13th place), respectively. The evaluation shows that the approach given by Nivre gives competitive parsing accuracy for the languages studied. More specifically Spanish parsing scored 81.3% LAS, only 1 point under the best one [11], which did not use the Nivre algorithm but a Eisner’s bottom–up span algorithm in order to compute maximum spanning trees. In our work, the first step was to replicate the participation of Nivre’s group in the CoNLL–X Shared Task for Spanish. We trained Maltparser 0.4 with the section of AnCora that was provided as training corpus in the CoNLL–X Shared Task (89,334 wordforms) and the system was set as referred by Nivre’s group in [12]. Once a model was obtained, we used it to parse the section of AnCora that was provided as test set in the CoNLL–X Shared Task (5,694 wordforms). We obtained the same results as the Nivre’s group in the Shared Task, i.e., LAS = 81.30%, UAS = 84.67% and LA = 90.06%. These results serve us as a baseline for our work which is presented in the following sections.

3

Local Parsing Accuracy

Considering the baseline experiment described in Section 2, despite a high overall parsing accuracy only 358 wordforms of the test corpus obtain a 100% LAS, UAS and LA in all parsed sentences, i.e., only 6.3% of the wordforms. If considering sentences, only 38 sentences of the test corpus (18.4% of them) were parsed without errors. An end user should usually expect a high local parsing accuracy (at the sentence level) rather than a high overall parsing accuracy. But nowadays a remarkable percentage of sentences in Spanish shows almost one error when parsed by Maltparser. Our hypothesis is that by enhancing local accuracy, not only overall accuracy should be enhanced, but end user satisfaction should be increased. We found that there is a small set of words that show an incorrect attachment, labelling or both. These words are the prepositions “a” (to), “de” (of ), “ en” (in), “con” (with), “por” (for ), the conjunction and, which has two wordings: “y” or “e”, and the nexus “que” (that ). All these words sometimes cause error in the dependency, in the head tag, or in both tags. For instance there are only 20 sentences (340 wordforms), in the test corpus presented in Section 2, with only one error after parsing. That is 9.7% of the corpus’ sentences and 5.98% of its wordforms. We found that in 10 of these 20 sentences the only failure is caused by one of the words listed above.

42

4


Cases Study: The Conjunction and the Preposition “a”

The conjunction and the preposition “a” are the words that caused a parsing error more frequently. This is why we selected them as cases to study to determine if low level techniques are feasible to increase parsing accuracy. We started experimenting these tecnhiques with the conjunction. The study of the obtained errors when parsing conjunctions, began with a manual analysis of AnCora. Thus, we extracted from AnCora every sentence containing a conjunction (“y” or “e”). There are 1.586 sentences with at least one conjunction in the whole AnCora. We inspected these sentences to find labelling patterns and in doing so we obtained a list of patterns that depend on conjunction’s action. For instance, a pattern is given when conjunction acts as nexus in a coordinated copulative sentence and another pattern is given when it acts as the last nexus in a list of nouns. For example, the following sentence match these two patterns: Los activos en divisas en poder del Banco Central y el Ministerio de Finanzas se calculan en d´ olares estadounidenses y su valor depende del cambio oficial rublo–d´ olar que establece el Banco Central (The foreign exchange assets held by the Central Bank and the Ministry of Finance are calculated in U.S. dollars and its value depends on the ruble-dollar official exchange rate established by the Central Bank). In this example the first y is a nexus between the proper nouns Banco Central and Ministerio de Finanzas and the second y acts as a coordinated copulative nexus. These patterns guided the experiments described below. 4.1

The Conjunction

In this subsection we present two different approaches we have applied to the conjuction. First Approach to the Conjuction. The first approach that we studied was an n–version parsing model. Our idea was to determine if some kind of “difficult” sentences could be succesfully parsed by specific parsers while a general parser would parse the non–troubled sentences. The first specific parser that we tried to obtain was supposed to accurately parse quoted sentence sections containing conjunctions. This situation is quite commonly given and corresponds to one of the labbeling patterns that we have identified as problematic. This way we trained a parsing model with Maltparser 0.4 for sentences that contain conjuctions. The system was set as in Nivre’s group participation in the CoNLL–X Shared Task. The training corpus consisted of only quoted sentence sections containing conjunctions. These sentence sections were obtained from the section of AnCora provided as training corpus for Spanish in the CoNLL– X Shared Task. It consisted of 22 sentence sections starting and finishing by a quotation mark and containing conjunctions. The test corpus was obtained in a similar way from the section of AnCora provided as test corpus for Spanish in the CoNLL–X Shared Task. This test corpus contained 7 sentences. To analyse this approach, we incrementally built a training corpus and we evaluated the


43

parsing performance for every trained model. The method we followed to build this corpus is described below: – First of all, we selected the longest sentence of the training subcorpus of quoted sentence sections and this was the first subcorpus added to the incremental training corpus. – Then we iterated until every sentence section was added to the incremental training corpus. In each iteration we did the following: • Malparser 0.4 was trained with the incremental corpus. • The trained model was tested by parsing the test subcorpus with it. • The remaining longest sentence section was added to the incremental corpus. The results of this experiment are showed in Figure 1, in which we plotted LAS, UAS and LA for every iteration. In the x axis we represented the number of sentences contained in the incremental training corpus in every iteration and in the y axis the values for LAS, UAS and LA.

Fig. 1. LAS, UAS and LA when training a parsing model incrementally with quoted sentence sections containing conjunctions from AnCora

If we take only conjunction parsing into consideration the results are quite good. In the first iteration 3 conjunctions were incorrectly parsed, but in the second and the other iterations only 1 conjunction was incorrectly parsed. But as seen in Figure 1 the overall results did worse than those obtained by the general parser. Therefore, despite the improvement in local accuracy this approach does not seem to be realistic. This is because the number of available samples is not sufficient to train a specific model. This model should not only be able to obtain good results for parsing conjunctions but also for all the words of the whole quoted sentence. This led us to investigate another approach which is explained in the next section. A more Complex Approach. In this section we study the feasibility of a more complex n–version parsing model. As seen in Section 4.1, specific models can be

44


trainied to obtain high accurate parsings for a specific word, but these models cannot deal with the whole sentence in which the specific word is contained. This is what inspired this new approach. The idea is to obtain several specific models, each one able to accurately parse a single word into a specific context. Thus, the word would be one of the words that are more frequently incorrectly parsed and the context would be one of the labelling patterns referred in the beginning of Section 4. For instance, one of these words is the conjunction “y” and one of the contexts in which it can be found is the one presented in Subsection 4.1, i.e., quoted sentence sections. This way, after parsing a sentence with a general model (such as the one presented in Section 2) a program should decide if the parsed sentence contains a word that must be parsed by a specific model. In that case the program should choose the appropriated specific model for this word in the context in which it is. Once the sentence is parsed with the specific model, the result for the “problematic” word is replaced in the result obtained by the general model. This way the best of both parsings can be made. In the case of the conjunction, the labelling given to it by the specific parser is cut from this parsing and pasted into the parsing given by the general model, by replacing the labelling given to the conjunction by the general parser. This easy solution is posible because the conjunction is always a leaf of the parsing tree and its labellings can be changed without affecting the rest of the parsing. To study if this n–version parsing model could be useful to get more accurate parsings we developed the experiments described below. For the first experiment we trained a specific model for coordinated copulative sentences. We built a specific training corpus with the set of unambiguous coordinated copulative sentences contained in the section of AnCora that was provided as training corpus in the CoNLL–X Shared Task. This specific training corpus contains 361 sentences (10,561 wordforms). Then we parsed all the coordinated copulative sentences contained in the section of AnCora that was provided as test corpus in the CoNLL–X Shared Task (16 sentences, 549 wordforms). MaltParser uses history–based feature models for predicting the next action in the deterministic derivation of a dependency structure, which means that it uses features of the partially built dependency structure together with features of the (tagged) input string. More precisely, features are defined in terms of the wordform (LEX), part–of–speech (POS) or dependency type (DEP) of a token defined relative to one of the data structures STACK, INPUT and CONTEXT. A feature model is defined in an external feature specification1 . We set the experiments described above with the same feature model that Nivre’s group used in its participation in the CoNLL–X Shared Task. We also used this feature model in the present experiment and we obtained that the conjuntion was incorrectly parsed 8 times (in a test set containing 16 conjunctions). This fact led us to investigate other feature models. After a few failed attempts we found a feature model where 12 of the 16 conjunctions were parsed correctly. This feature model is shown in Figure 2. 1

An in–depth description of these feature models can be found http://w3.msi.vxu.se/∼nivre/research/MaltParser.htmlfeatures

in


45

Fig. 2. Feature model for coordinated copulative sentences

Despite that the results were enhanced by using the new feature model, the general parsing model (obtained in Section 2) parses correctly 13 of these 16 conjunctions. It could mean that specific models are not feasible for our objectives. Since the accuracies reached by both models were very similar, we developed some other experiments to confirm or reject this hypothesis. Thus, we tried new specific parsers for other combinations conjunction–context. For the second experiment we developed a specific parser for conjunctions acting as a nexus in a list of proper nouns. We built a specific training corpus with the set of unambiguous sentences containing conjunctions acting as a nexus in lists of proper nouns, from the section of AnCora that was provided as training corpus in the CoNLL–X Shared Task. This specific training corpus contains 59 sentences (1,741 wordforms). After the training we parsed all the sentences containing conjunctions acting as a nexus in the lists of proper nouns, from the section of AnCora that was provided as test corpus in the CoNLL–X Shared Task (5 sentences, 121 wordforms). We set this training with the same feature model that Nivre’s group used in its participation in the CoNLL–X Shared Task. This specific model parsed all 5 conjunctions of the test set successfully, while the general model parsed only 4 of these conjunctions successfully. We developed a third experiment to evaluate a specific model for parsing conjunctions acting as a nexus in the lists of common nouns. We built a specific training corpus with the set of unambiguous sentences containing conjunctions acting as a nexus in the lists of common nouns, from the section of AnCora that was provided as training corpus in the CoNLL–X Shared Task. This specific training corpus contains 266 sentences (8,327 wordforms). After the training we parsed all the sentences containing conjunctions acting as a nexus in the lists of proper nouns, from the section of AnCora that was provided as test corpus in the CoNLL–X Shared Task (15 sentences, 480 wordforms). Once again the best feature model was the one that Nivre’s group used in its participation in the CoNLL–X Shared Task. This specific model parsed 12 of the 15 conjunctions of the test set successfully, while the general model parsed only 10 of these conjunctions successfully. A last experiment was accomplished to find more evidences for the feasability of this n–version parsing model. Doing this we developed a specific model for parsing conjunctions acting as a nexus in the lists of adjectives or constructions acting as adjectives. We built a specific training corpus with the set of unambiguous

46


sentences containing conjunctions acting as nexus in lists of adjectives or constructions acting as adjectives, from the section of AnCora that was provided as training corpus in the CoNLL–X Shared Task. This specific training corpus contains 59 sentences (3,155 wordforms). After the training we parsed all the sentences containing conjunctions acting as a nexus in the lists of adjectives, from the section of AnCora that was provided as test corpus in the CoNLL–X Shared Task (5 sentences, 113 wordforms). The feature model that Nivre’s group used in its participation in the CoNLL–X Shared Task gave the best results again. This specific model parsed all the 5 conjunctions of the test set successfully, while the general model parsed 4 of these conjunctions successfully. The parsings given by the general parsing model to the conjunctions involved in the previous four experiments were replaced by the parsings given by the specific models. This way we combined both parsings as seen above in this section. Then, we recomputed LAS, UAS and LA for this combined parsing, obtaining the following values: LAS = 81.92%, UAS = 85.31% and LA = 90.06%. The results show a slight enhancement with respect to the results given by the general parsing model presented in Section 2. In addition, in the combined parsing the conjunction does not belong to the set of words that are more frequently incorrectly parsed. This improvement seems to indicate that this n–version parsing model is feasible and overall accuracy could be improved via local accuracy improvement. 4.2

The Preposition “a”

Once we found the promising approach presented in Section 4.1 we applied it to the following word in the list of more frequently wrong parsed words. This way we followed the steps stated previously. We started looking for the different ways in which the preposition “a” is attached and labelled. Six cases were found, as shown in Table 1. A specific parser was trained for each case using Maltparser 0.4 set as in the CoNLL-X Shared Task. We used the feature model proposed in the Shared Task, except for case number 1 for which we used a m3.par model. This model was chosen empirically because the one proposed in the Shared Task was not suitable for tackling case number 1. In all the cases, except for case number 5, the quality of the labelling and the attachment of the word “a” were clearly improved, as shown in Table 1. Case number 5 is very challenging because we had only 8 Table 1. Attachment and labelling of the preposition “a” in AnCora. Found cases and LAS only for the preposition “a”, before and after the application of our method. Case #1 #2 #3 #4 #5 #6 Label CD CI CC CREG Attached to a verb noun LASa before 62.5% 42.9% 60.0% 25.0% 0.0% 50.0% LASa after 87.5% 100% 100% 75.0% 0.0% 100%


47

sentences containing it in the training set and 1 sentence in the test set. Perhaps the problem is in the small number of sentences used for training. Since case number 5 is not frequently given we did not make any particular efforts to solve it in such a preliminary work. Nevertheless, it remains as a very interesting case study for future work. Once again the improvement in local accuracy is beneficial to the overall accuracy. When aplying the labellings and attachments given by all the specific parsers presented in Sections 4.1 and 4.2, we obtain the following new overal values for the test set: LAS = 82.17%, UAS = 85.51% and LA = 90.32%.

5

Conclusions and Future Work

Previous work shows that high level techniques, such as controlling training corpus size or its sentences’ lengths, are not sufficient for improving parsing accuracy when using machine learning–based systems that can not be modified. This led us to investigate low level techniques, based on the detailed study of the words that are more frequently incorrectly parsed. In this work we study the feasibility of these low level techniques to reach better parsing accuracy. The idea presented in this paper is to develop n–version parsing models. Each parsing model is trained to accurately parse a specific kind of word in a specific context. This way, local accuracy is enhanced by avoiding errors given by general parsers, i.e., by enhacing local accuracy. Therefore, if a sentence contains one of the words that are more frequently incorrectly parsed by general parsers, it is simultaneously sent to a specific parser and to a general parser. After this, both parsings are combined in order to make the best of them. This work relies on two cases study: the conjunction and the preposition “a”, because these are the parts of speech most frequently incorrectly parsed. These preliminary experiments show that these kinds of low level techniques are promising for improving parsing accuracy under the circumstances described in this paper. A lot of promising future work remains encouraged by the present one. This future work includes not only similar studies on the rest of the words that are more frequently incorrectly parsed, but on the development of programs that must accurately send each sentence to adecuate specific parsers, when necessary. Also, some effects that could be given in this kind of work, such as overfitting, should be studied. This work focused on Maltparser 0.4 and Spanish, but similar analyses could be accomplished to study other languages and/or parsers, complementing the present one.

Acknowledgments This work has been partially funded by Banco Santander Central Hispano and Universidad Complutense de Madrid under the Creaci´ on y Consolidaci´ on de Grupos de Investigaci´ on program, Ref. 921332–953.

48


References 1. Buchholz, S., Marsi, E.: CoNLL–X Shared Task on Multilingual Dependency Parsing. In: Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL–X), pp. 149–164 (2006) 2. Ballesteros, M., Herrera, J., Francisco, V., Gervás, P.: Improving Parsing Accuracy for Spanish using Maltparser. Journal of the Spanish Society for Natural Language Processing (SEPLN) 44 (in press, 2010) 3. Herrera, J., Gerv´ as, P.: Towards a Dependency Parser for Greek Using a Small Training Data Set. Journal of the Spanish Society for Natural Language Processing (SEPLN) 41, 29–36 (2008) 4. Herrera, J., Gerv´ as, P., Moriano, P.J., Moreno, A., Romero, L.: Building Corpora for the Development of a Dependency Parser for Spanish Using Maltparser. Journal of the Spanish Society for Natural Language Processing (SEPLN) 39, 181–186 (2007) 5. Herrera, J., Gerv´ as, P., Moriano, P.J., Moreno, A., Romero, L.: JBeaver: un Analizador de Dependencias para el Espa˜ nol Basado en Aprendizaje. In: Borrajo, D., Castillo, L., Corchado, J.M. (eds.) CAEPIA 2007. LNCS (LNAI), vol. 4788, pp. 211–220. Springer, Heidelberg (2007) 6. Nivre, J., Hall, J., Nilsson, J.: Memory–based Dependency Parsing. In: Proceedings of CoNLL–2004, Boston, MA, USA, pp. 49–56 (2004) 7. Eisner, J.: Three New Probabilistic Models for Dependency Parsing: An Exploration. In: Proceedings of the 16th International Conference on Computational Linguistics (COLING 1996), Copenhagen, pp. 340–345 (1996) 8. Yamada, H., Matsumoto, Y.: Statistical Dependency Analysis with Support Vector Machines. In: Proceedings of International Workshop of Parsing Technologies (IWPT 2003), pp. 195–206 (2003) 9. Palomar, M., Civit, M., D´ıaz, A., Moreno, L., Bisbal, E., Aranzabe, M., Ageno, ´ A., Mart´ı, M.A., Navarro, B.: 3LB: Construcci´ on de una Base de Datos de Arboles Sint´ actico–Sem´ anticos para el Catal´ an, Euskera y Espa˜ nol. In: Proceedings of the XX Conference of the Spanish Society for Natural Language Processing (SEPLN), Sociedad Espa˜ nola para el Procesamiento del Lenguaje Natural, pp. 81–88 (2004) 10. Taulé, M., Mart´ı, M., Recasens, M.: AnCora: Multilevel Annotated Corpora for Catalan and Spanish. In: Proceedings of 6th International Conference on Language Resources and Evaluation (2008) 11. McDonald, R., Lerman, K., Pereira, F.: Multilingual Dependency Analysis with a Two-Stage Discriminative Parser. In: Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL–X), pp. 216–220 (2006) 12. Nivre, J., Hall, J., Nilsson, J., Eryi˘ git, G., Marinov, S.: Labeled Pseudo–Projective Dependency Parsing with Support Vector Machines. In: Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL–X), pp. 221–225 (2006) 13. Johansson, R., Nugues, P.: Investigating Multilingual Dependency Parsing. In: Proceedings of the Conference on Computational Natural Language Learning, CoNLL– X (2006) 14. Wu, Y., Lee, Y., Yang, J.: The Exploration of Deterministic and Efficient Dependency Parsing. In: Proceedings of the Conference on Computational Natural Language Learning, CoNLL–X (2006)

A Hybrid Ant Colony Optimization Algorithm for Solving the Ring Arc-Loading Problem Anabela Moreira Bernardino1, Eugénia Moreira Bernardino1, Juan Manuel Sánchez-Pérez2, Juan Antonio Gómez-Pulido2, and Miguel Angel Vega-Rodríguez2 1

Research Center for Informatics and Communications, Department of Computer Science, School of Technology and Management, Polytechnic Institute of Leiria, 2411 Leiria, Portugal {anabela.bernardino,eugenia.bernardino}@ipleiria.pt 2 Department of Technologies of Computers and Communications, Polytechnic School, University of Extremadura, 10071 Cáceres, Spain {sanperez,jangomez,mavega}@unex.es

Abstract. The past two decades have witnessed tremendous research activities in optimization methods for communication networks. One important problem in communication networks is the Weighted Ring Arc-Loading Problem (combinatorial optimization NP-complete problem). This problem arises in engineering and planning of the Resilient Packet Ring (RPR) systems. Specifically, for a given set of non-split and uni-directional point-to-point demands (weights), the objective is to find the routing for each demand (i.e., assignment of the demand to either clockwise or counter-clockwise ring) so that the maximum arc load is minimised. In this paper, we propose a Hybrid Ant Colony Optimization Algorithm to solve this problem. We compare our results with the results obtained by the standard Genetic Algorithm and Particle Swarm Optimization, used in literature. Keywords: Communication Networks, Optimization Algorithms, Ant Colony Optimization Algorithm, Weighted Ring Arc-Loading Problem.

1 Introduction Resilient Packet Ring (RPR), also known as IEEE 802.17, is a standard, designed to optimise the transport of data traffic through optical fiber ring networks [1]. The RPR aims to combine the appealing functionalities of Synchronous Optical Network/Synchronous Digital Hierarchy (SONET/SDH) networks with the advantages of Ethernet networks. It is a ring-based architecture that consists of two counter directional optical fiber rings. The bandwidth utilisation in RPR is further increased by means of spatial reuse. Spatial reuse is achieved in RPR through the so-called destination stripping, which means that the destination node takes a transmitted packet off the fiber ring. Thus, a given transmission traverses only the ring segment from the source node to the destination node, allowing other nodes on the ring segment between the destination node and the source node to exchange transmissions at the same S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 49–59, 2010. © Springer-Verlag Berlin Heidelberg 2010

50

A.M. Bernardino et al.

time on the same fiber ring. Furthermore, the RPR provides fairness and allows the full ring bandwidth to be used under normal operation conditions. To effectively use the RPR’s potential, namely the spatial reuse, the statistical multiplexing and the bi-directionality, it is necessary to route the demands efficiently. Given a network and a D set of communications’ requests, a fundamental problem is to design a transmission route (direct path) for each request, to avoid high load on the arcs, where an arc is an edge endowed with a direction (clockwise or counterclockwise). The load of an arc is defined as the total weight of those requests that are routed through the arc in its direction. In general each request is associated with a non-negative integer weight. Practically, the weight of a request can be interpreted as a traffic demand or as the size of the data to be transmitted. The Weighted Ring Arc-Loading Problem (WRALP) can be classified into two formulations: with demand splitting (WRALP) or without demand splitting (non-split WRALP). The split loading allows the splitting of a demand into two portions to be carried out in both directions, while in a non-split loading each demand must be entirely carried out in either clockwise or counter-clockwise direction. For the research on the no-split formulation, Cosares and Saniee [2], and Dell’Amico et al. [3] studied the problem on SONET rings. Cosares and Saniee [2] proved that this formulation is NP-complete. It means that we cannot guarantee to find the best solution in a reasonable amount of time. Recent studies apply evolutionary algorithms to solve the non-split formulation [4][5]. For the split problem, various approaches are summarised by Schrijver et al. [6], and their algorithms are compared in Myung and Kim [7] and Wang [8]. The non-split WRALP considered is this paper is identical to the one described by Kubat and Smith [9] (non-split WRALP), Cho et al. [10] (non-split WRALP and WRALP) and Yuan and Zhou [11] (WRALP). They try to find approximate solutions in a reduced amount of time. Our purpose is different - we want to compare the performance of our algorithm with others in the achievement of the best-known solution. Using the same principle Bernardino et al. [12] presented four hybrid Particle Swarm Optimization (PSO) algorithms to solve the non-plit WRALP. An Ant Colony Optimization algorithm (ACO) is essentially a system based on agents which simulate the natural behaviour of ants, including mechanisms of cooperation and adaptation. This metaheuristic has shown to be both robust and versatile. The ACO algorithm has been successfully applied to a range of different combinatorial optimization problems [13]. In this paper we present an ACO algorithm coupled with a local search (HACO), applied to the WRALP. Our algorithm is based on the algorithm proposed by Gambardella et al. [14] to solve the quadratic assignment problem. The HACO uses pheromone trail information to perform modifications on WRALP solutions, unlike the more traditional ant systems that use pheromone trail information to construct complete solutions. We compare the performance of HACO with the standard Genetic Algorithm (GA) and Local Search - Probability Binary PSO (LS-PBPSO), used in literature. The paper is structured as follows. In Section 2 we describe the WRALP; in Section 3 we describe the implemented HACO algorithm; in Section 4 we present the studied examples and we discuss the computational results obtained and in Section 5 we report the conclusions.

A Hybrid Ant Colony Optimization Algorithm for Solving the Ring Arc-Loading Problem

51

2 Problem Definition Let Rn be a n-node bidirectional ring with nodes {n1, n2, …, nn} labelled clockwise. Each edge {ek, ek+1} of Rn, 1≤ k ≤ n is taken as two arcs with opposite directions, in which the data streams can be transmitted in either direction: ak+ = (ek , ek +1 ), ak− = (ek +1 , ek ) . A communication request on Rn is an ordered pair (s,t) of distinct nodes, where s is the source and t is the destination. We assume that data can be transmitted clockwise or counter-clockwise on the ring, without splitting. We use P+(s,t) to indicate the directed (s,t) path clockwise around Rn, and P(s,t) to indicate the directed (s,t) path counter-clockwise around Rn. A request (s,t) is often associated with an integer weight w>=0; we denote this weighted request by (s,t; w). Let D={(s1,t1; w1),(s2,t2; w2),..., (sm,tm; wm)} be a set of integrally weighted requests on Rn. For each request/pair (si,ti) we need to design a directed path Pi of Rn from si to ti. A set P={Pi: i=1, 2, ..., m} of such directed paths is called a routing for D. Table 1. Solution representation Pair(s,t) Demand 1: 2: 3: 4: 5: 6:

(1, (1, (1, (2, (2, (3,

2) 3) 4) 3) 4) 4)

Æ Æ Æ Æ Æ Æ

15 3 6 15 6 14

Representation (V)

C–clockwise 15 3 6 15 6 14

CC–counterclockwise

C CC CC C CC C

Pair1 Pair2 1 0

Pair3 0

Pair4 1

Pair5 0

Pair6 1

In this work, the solutions are represented using binary vectors (Table 1). For some integer Vi =1, 1≤ i ≤ m, the total amount of data is transmitted along P+(s,t); Vi=0, the total amount of data is transmitted along P-(s,t). The vector V=(V1, V2, …, Vm) determines a routing scheme for D.

3 Proposed Hybrid Ant Colony Optimization ACO is a population-based optimization method to solve hard combinatorial optimization problems. The first ACO algorithm was presented by Dorigo, Maniezzo and Colorni [15][16], Dorigo [17] and since then, many diverse variants of the basic principle have been reported in literature [13]. In real life, ants indirectly communicate among them (with each other) by depositing pheromone trails on the ground, influencing the decision processes of other ants. This simple form of communication between individual ants causes complex behaviours and capabilities of the colony as a whole. The real ants behaviour is transposed into an algorithm by making an analogy between: the real ants search and the set of feasible solutions to the problem; the amount of food in a source and the fitness function; the pheromone trail and an adaptive memory [14].

52


The pheromone trails in ACO serves as a distributed, numerical information which the ants use to probabilistically construct solutions to the problem to be solved and which they adapt during the algorithm execution to reflect their search experience. Gambardella et al. [14] present a Hybrid Ant Colony System coupled with a local search (HAS_QAP) that uses pheromone trail information to perform modifications on QAP solutions. The simplest way to exploit the ants search experience is to make the pheromone updating process a function of the solution quality achieved by each particular ant. In HACO, only the best solution found during the search process contributes to pheromone trail updating. This makes the search more aggressive and requires less time to reach good solutions [14]. Moreover, this has been strengthened by an intensification mechanism that allows it to return to previous best solutions [14]. The algorithm proposed by Gambardella et al. [14] also performs a diversification mechanism after performing a predefined number of S iterations without improving the best solution found so far. We verify that in our algorithm the diversification mechanism doesn’t produce better solutions, mainly due to the LS method used. The main steps of the HACO algorithm are given below: Initialize Parameters Initialize Solutions (ants) Evaluate Solutions Apply Local Search Procedure Evaluate Solutions Initialize Pheromone Trails WHILE TerminationCriterion() FOR each Solution in Population Modify Solution using Pheromone Trails Evaluate Solution Apply Local Search Procedure Evaluate Solution Apply Intensification Mechanism Update Pheromone Trails

Initialisation of parameters The following parameters must be defined by the user: number of ants (NA); maximum number of iterations (MI); value used to initialise the pheromone trails (Q); probability exploration/ exploitation (q); pheromone evaporation rate (x1); pheromone influence rate (x2) and number of modifications (R). Initialisation of solutions The initial solutions can be created randomly or in a deterministic form. The deterministic form is based in a Shortest-Path Algorithm (SPA). The SPA is a simple traffic demand assignment rule in which the demand will traverse the smallest number of segments. Evaluation of solutions The fitness function is responsible for performing the evaluation and returning a positive number (fitness value) that reflects how optimal the solution is. To evaluate the solutions, we use the following fitness function:

A Hybrid Ant Colony Optimization Algorithm for Solving the Ring Arc-Loading Problem

Wi,…,wm Æ demands of the pairs (si,ti),…,(sm,tm) Vi, …, Vm = 0 Æ P-(si,ti); 1 Æ P+(si,ti) Load on arcs:

L(V, a k+ )=

∑ wi

L(V, a k− )=

i: a +k ∈ P + (s i , t i )

Fitness function:

∀k=1,…,n; ∀i=1,…,m max{max L(V, a k+ ),max L(V, ak− )}

∑ wi

53

(1a) (1b) (2a)

i : a −k ∈ P − (s i , t i )

(2b) (3)

The fitness function is based on the following constraints: (1) between each node pair (si,ti) there is a demand value >=0 and each positive demand value is routed in either clockwise (C) or counter-clockwise (CC) direction; (2) for an arc the load is the sum of wk for clockwise or counter-clockwise direction between nodes ek and ek+1. The purpose is to minimise the maximum load on the arcs of a ring (3). Initialisation of pheromone trails For the WRALP, the set of pheromone trails is maintained in a matrix T of size 2*m, where each Tij measures the desirability of assigning the direction i to the pair j. All pheromone trails Tij are set to the same value T0=1/(Q*Fitness(G))[14]. G is the best solution found so far and Q a parameter. Modification of solutions The algorithm performs R modifications. A modification consists on assigning a direction d to a pair p. First a pair p is randomly chosen (between 1 and m) and after a direction d is chosen (clockwise or counter-clockwise). A random number x is generated between 0 and 1. If x is smaller than q (parameter), the best direction d is chosen in a way that Tdp is the maximum. This policy consists in exploiting the pheromone trail. If x is higher than q, the direction d is chosen with a probability proportional to the values contained in the pheromone trail. This consists in exploring the solution space. Local Search The LS algorithm applies a partial neighbourhood examination. Some pairs of the solution are selected and their directions are exchanged (partial search). This method can be summarised in the following pseudo-code steps [12]: For t=0 to numberNodesRing/4 P1 = random (number of pairs) P2 = random (number of pairs) N = neighborhoods of ACTUAL-SOLUTION (one neighborhood results of interchange the direction of P1 and/or P2) SOLUTION = FindBest (N) If ACTUAL-SOLUTION is worst than SOLUTION ACTUAL-SOLUTION = SOLUTION

Intensification mechanism The intensification mechanism allows exploring the neighbourhood in a more complete way and allows it to return to the previous best solutions. If the intensification is active and the solution V in the beginning of the iteration is better, the ant comes back to the initial solution V. The intensification is activated when the best solution found

54


so far has been improved and remains active while at least one ant succeeds on improving its solution during the iteration. Pheromone trails update To speed-up the convergence, the pheromone trails are updated by taking into account only the best solution found so far [14]. The pheromone trails are updated by setting: Tij=(1-x1)*Tij, with 0<x1 (assert(A ?A)))

5 Trust Trust has been recognized as a key issue in SW MAS, where agents have to interact under uncertain and risky situations. Thus, a number of researchers have proposed, in different perspectives, models and metrics of trust, some involving past experience or using only a single agent’s previous experience. Five such metrics are described in [11], among them Sporas seems to be the most widely used metric, although CR (Certified Reputation) is one of the most recently proposed methodologies. The overall goal for EMERALD is to adopt a variety of trust models, both proposed in the literature and original. Currently, EMERALD adopts two reputation mechanisms, a decentralized and a centralized one. The decentralized mechanism is a combination of Sporas and CR, and was presented in [8]. In the centralized approach, presented here, AYPS keeps the references given from agents interacting with Reasoners or other agents in EMERALD. Each reference is in the form of Refi=(a, b, cr, cm, flx, rs), where a is the trustee, b is the trustor and cr (Correctness), cm (Completeness), flx (Flexibility) and rs (Response time) are the evaluation criteria. Ratings vary from -1 (terrible) to 1 (perfect), r∈[-1,1], while newcomers start with reputation equal to 0 (neutral). The final reputation value (Rb) is based on the weighted sum of the relevant references stored in AYPS and is calculated according to the formula: ∑Rb=w1*cr+w2*cm+w3*flx+w4*rs, where w1+w2+w3+w4=1. AYPS supports two options for Rb, a default where the weights are equivalent, namely wk∈[1,4]=0.25 each and a user-defined, where the weights vary from 0 to 1 depending on user priorities. The simple evaluation formula of the above approach, compared to the decentralized one, leads to time profit as it needs less evaluation and calculation time. Moreover, it provides more guaranteed and reliable results (Rb) as it is centralized (AYPS), overcoming the difficulty to locate references in a distributed mechanism. Agents can use either one of the above mechanisms or even both, complementarily,

EMERALD: A MAS for Knowledge-Based Reasoning Interoperability in the SW

179

namely they can use the centralized mechanism provided by AYPS in order to find the most trusted service provider and/or they can use the decentralized approach for the rest EMERALD agents.

6 Use Case Reasoning is widely used in various applications. This section presents an apartment renting use case paradigm that applies both deductive and defeasible logic. The scenario aims at demonstrating the overall functionality of the framework and, more specifically, the usability of the Reasoners and the modularity of the KC-Agents prototype and its ability to easily adapt to various applications. The scenario, adopted from [12], involves two independent parties, represented by IAs and one of the four Reasoners provided in EMERALD. The first party is the customer, a potential renter that uses defeasible logic and wishes to rent an apartment based on his requirements (e.g. size, location) and personal preferences. The other party is the broker, who uses deductive logic and possesses a database of available apartments. His role is to match customer’s requirements with the apartment specifications and eventually propose suitable apartments to the potential renter. The R-Reasoner and the DR-Reasoner are the two reasoners involved in the paradigm. The scenario is carried out in ten distinct steps (shown in Fig 4). A similar but more simplistic brokering scenario was presented in [8], where only one type of logic (defeasible) was applied and the broker did not possess any private interaction strategy, expressed in any kind of logic, but it was just mediating between the customer and the Reasoner. Initially, the customer finds a broker, by asking the AYPS (step 1). The AYPS returns a number of potential brokers accompanied with their reputation ratings (step 2). Customer selects the more trusted broker and sends his requirements to him, in order to get back all the available apartments with the proper specifications (step 3). The broker has a list of all available apartments which cannot be communicated to the customer, because they belong to the broker’s private assets. However, since the broker cannot process customer’s requirements using defeasible logic, he finds a defeasible logic reasoner (DR-Reasoner) (step 4), by using the AYPS (this step is not shown). DR-Reasoner returns the apartments that fulfill all requirements (step 5); however, the broker checks the results in order to exclude the unavailable apartments or apartments reserved for a special private-to-the-broker reason. Thus, the broker agent requests from the R-Reasoner to process the results with his own private interaction strategy expressed in a deductive logic rulebase (step 6). When he receives the remaining apartments (step 7), he sends them to customer’s agent (step 8). Eventually, the customer receives the appropriate list and has to decide which apartment he prefers. However, his agent does not want to send customer’s preferences to the broker, in order not to be exploited; thus, customer’s agent selects his most preferred apartment by sending to the DR-Reasoner his preferences, as a defeasible logic rulebase, along with the list of acceptable apartments (step 9). The Reasoner replies and proposes the best apartment to rent (step 10). The apartment selection procedure ends. Now the customer has to negotiate with the owner for the renting contract. This process is carried out in two basic steps, as shown in Fig. 5: first

180

K. Kravari, E. Kontopoulos, and N. Bassiliades

Fig. 4. Apartment renting scenario steps

Fig. 5. Negotiation scenario steps

the customer’s agent has to find out the apartment owner’s name and then negotiate with him for the rent. The customer sends a REQUEST message to the broker containing the chosen apartment and waits for its owner’s name. The broker, sends back his reply via an INFORM message. Afterwards, the customer starts a negotiation process with the owner, negotiating among others, the price and rental duration. Following the generic, abstract specification for agents, the customer agent’s description contains a fact, ruleml_path, which is part of its internal knowledge and represents the rulebase URL. Moreover, due to the dynamic environment (AYPS is constantly updating the environment), a new fact with the agent name (agent_name) is added to the working memory. Agent behavior is represented by rules; one of these is the ‘read’ rule that calls the BJL’s fileToString method. It has only a single precondition (actually fact), the ruleml_path, as shown below.

Fucust ≡ {ruleml_path}, Fecust ≡ {agent_name} J cust ≡ {rule_base_content ← (bind ((new Basic) fileToString ruleml_path))} Similarly, the broker agent’s description contains facts and rules: fact url represents (part of) its internal knowledge and stands for the URL of the RDF document containing all available apartments, while reasoner_name (DR-Reasoner’s name) is added by the environment due to AYPS and rules “request” and “read” (BJL’s fileToString) comprise part of the agent’s behavior.

Fubrok ≡ {url}, Febrok ≡ {reasoner_name} C brok ≡ {(ACLMessage (communicative-act REQUEST) (sender Broker) (receiver reasoner_name) (content “request”)) ← request (reasoner_name)} brok ≡ {rule_base_content ←(bind ((new Basic) fileToString url))} J

EMERALD: A MAS for Knowledge-Based Reasoning Interoperability in the SW

181

7 Related Work DR-BROKERING, a system for brokering and matchmaking, is presented in [13]. The system applies RDF in representing offerings and a deductive logical language for expressing requirements and preferences. Three agent types are featured (Buyer, Seller and Broker) and a DF agent plays the role of the yellow pages service. Also, DR-NEGOTIATE [14], another system by the same authors, implements a negotiation scenario using JADE and DR-DEVICE. Similarly, our approach applies the same technologies and identifies roles such as Broker and Buyer. Conversely, we provide a number of independent reasoning services, offering both deductive and defeasible logic. Moreover, our approach takes into account trust issues, providing two reputation approaches in order to guarantee the interactions’ safety. The Rule Responder [15] project builds a service-oriented methodology and a rulebased middleware for interchanging rules in virtual organizations, as well as negotiating about their meaning. Rule Responder demonstrates the interoperation of distributed platform-specific rule execution environments, with Reaction RuleML as a platform-independent rule interchange format. We have a similar view of reasoning service for agents and usage of RuleML. Also, both approaches allow utilizing a variety of rule engines. However, contrary to Rule Responder, EMERALD is based on FIPA specifications, achieving a fully FIPA-compliant model and deals with trust issues. Finally, and most importantly, our framework does not rely on a single rule interchange language, but allows each agent to follow its own rule formalism, but still be able to exchange its rule base with other agents, which will use trusted third-party reasoning services to infer knowledge based on the received ruleset.

8 Conclusions and Future Work This paper argued that agents are vital in realizing the Semantic Web vision and presented EMERALD, a knowledge-based multi-agent framework that provides reasoning interoperability, designed for the SW. EMERALD, developed on top of JADE, is fully FIPA-compliant and features trusted, third party reasoning services, a generic, reusable agent prototype for knowledge-customizable agents, consisted of an agent model, a yellow pages service and several external Java methods. Also, since the notion of trust is vital here, a reputation mechanism was integrated for ensuring trust in the framework. Finally, the paper presents a use case scenario that illustrates the usability of the framework and the integration of all the technologies involved. As for future directions, it would be interesting to verify our model’s capability to adapt to an even wider variety of Reasoners and KC modules, in order to form a generic environment for cooperating agents in the SW. Another direction would be towards developing and integrating more trust mechanisms. As pointed out, trust is essential, since each agent will have to make subjective trust judgements about the services provided by other agents. Considering the parameter of trust would certainly lead to more realistic and efficient applications. A final direction could be towards equipping the KC-Agents prototype with user-friendly GUI editors for each KC module, such as the KCj module.

182

K. Kravari, E. Kontopoulos, and N. Bassiliades

References [1] Hendler, J.: Agents and the Semantic Web. IEEE Intelligent Systems 16(2), 30–37 (2001) [2] JADE, http://jade.tilab.com/ [3] Bassiliades, N., Vlahavas, I.: R-DEVICE: An Object-Oriented Knowledge Base System for RDF Metadata. Int. J. on Semantic Web and Information Systems 2(2), 24–90 (2006) [4] Prova, http://www.prova.ws [5] Nute, D.: Defeasible Reasoning, 20th Int. Conference on Systems Science, pp. 470–477. IEEE Press, Los Alamitos (1987) [6] Bassiliades, N., Antoniou, G., Vlahavas, I.: A Defeasible Logic Reasoner for the Semantic Web. Int. J. on Semantic Web and Information Systems 2(1), 1–41 (2006) [7] Lam, H., Governatori, G.: The Making of SPINdle. Rule ML 2009. Int. Symp. on Rule Interchange and Applications, 315–322 (2009) [8] Kravari, K., Kontopoulos, E., Bassiliades, N.: A Trusted Defeasible Reasoning Service for Brokering Agents in the Semantic Web. In: 3rd Int. Symp. on Intelligent Distributed Computing (IDC 2009), Cyprus, vol. 237, pp. 243–248. Springer, Heidelberg (2009) [9] Kravari, K., Kontopoulos, E., Bassiliades, N.: Towards a Knowledge-based Framework for Agents Interacting in the Semantic Web. In: IEEE/WIC/ACM Int. Conf. on Intelligent Agent Technology (IAT 2009), Italy, vol. 2, pp. 482–485 (2009) [10] JESS, http://www.jessrules.com/ [11] Macarthur, K.: Trust and Reputation in Multi-Agent Systems, AAMAS, Portugal (2008) [12] Antoniou, G., Harmelen, F.: A Semantic Web Primer. MIT Press, Cambridge (2004) [13] Antoniou, G., Skylogiannis, T., Bikakis, A., Bassiliades, N.: DR-BROKERING – A Defeasible Logic-Based System for Semantic Brokering. In: IEEE Int. Conf. on ETechnology, E-Commerce and E-Service, pp. 414–417 (2005) [14] Skylogiannis, T., Antoniou, G., Bassiliades, N., Governatori, G., Bikakis, A.: DRNEGOTIATE - A System for Automated Agent Negotiation with Defeasible Logic-based Strategies. Data & Knowledge Engineering 63(2), 362–380 (2007) [15] Paschke, A., Boley, H., Kozlenkov, A., Craig, B.: Rule responder: RuleML-based Agents for Distributed Collaboration on the Pragmatic Web. In: 2nd Int. Conf. on Pragmatic Web, vol. 280, pp. 17–28. ACM, New York (2007)

An Extension of the Aspect PLSA Model to Active and Semi-Supervised Learning for Text Classification Anastasia Krithara1 , Massih-Reza Amini2 , Cyril Goutte2 , and Jean-Michel Renders3 1

National Center for Scientific Research (NCSR) ’Demokritos’, Athens, Greece 2 National Research Council Canada, Gatineau, Canada 3 Xerox Research Centre Europe, Grenoble, France

Abstract. In this paper, we address the problem of learning aspect models with partially labeled examples. We propose a method which benefits from both semi-supervised and active learning frameworks. In particular, we combine a semi-supervised extension of the PLSA algorithm [11] with two active learning techniques. We perform experiments over four different datasets and show the effectiveness of the combination of the two frameworks.

1

Introduction

The explosion of available information during the last years has increased the interest of the Machine Learning (ML) community for different learning problems that have been raised in most of the information access applications. In this paper we are interested in the study of two of these problems which are the handling of partially labeled data and the modeling of the generation of textual observations. Semi-Supervised Learning (SSL) has emerged in the Machine Learning community in the late 90 s. Under this framework, the aim is to establish a decision rule based on both labeled and unlabeled training examples. To achieve this goal, the decision rule is learned by simultaneously optimizing a supervised empirical learner on the labeled set, while respecting the underline structure of the unlabeled training data in the input space. In the same vein, Active Learning addresses also the issue of the annotation burden, but from a different perspective. Instead of using all the unlabeled data together with the labeled ones, it tries to minimize the annotation cost by labeling as few examples as possible and focussing on the most useful examples. Different types of active learning methods have been introduced in the literature, such as uncertainty-based methods ([13,21,3]) , expected error minimization methods ([10,19,7]) and query by committee methods ([20,8,17,6]). By combining semi-supervised and active learning, an attempt is made to benefit from both frameworks to address the annotation burden problem. The semisupervised learning component improves the classification rule and the measure S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 183–192, 2010. c Springer-Verlag Berlin Heidelberg 2010

184

A. Krithara et al.

of its confidence, while the active learning queries for labelling the most relevant and potentially useful examples. On the other hand, new generative aspect models have recently been proposed which aim to take into account data with multiple facets. In this class of models, observations are generated by a mixture of aspects, or topics, each of which being a distribution over the basic features of the observations. Aspect models ([9,1]) have been succesfully used for various textual information access and image analysis tasks such as document clustering and categorization or scene segmentation. In many of these tasks, acquiring the annotated data necessary to apply supervised learning techniques is a major challenge, especially in very large data sets. These annotations require humans who can understand the scene or the text, and are therefore very costly, especially in technical domains. In this paper, we explore the possibility to learn such models with the help of the unlabeled examples, by combining SSL and active learning. This work is the continuation of, and builds on earlier work on SSL for PLSA [11]. In particular, the combination of the SSL variant of PLSA with two active learning techniques.

2

Combining SSL and Active Learning

The idea of combining active and semi-supervised learning was first introduced by [15]. The idea is to integrate an EM algorithm with unlabeled data into an active learning, and more particularly in a query by committee (QBC) method. The commitee members are generated by sampling classifiers according to the distribution of classifier parameters specified by the training data. In [16], CoEMT is proposed. This algorithm combines Co-Testing and Co-EM. As opposed to Co-Testing algorithm, which learns hypotheses h1 and h2 based only on the labeled examples, Co-EMT learns the two hypotheses by running Co-EM on both labeled and unlabeled examples. Then, in the active learning step, it annotates the example on which the predictions of h1 and h2 are the most divergent, that is the example for which h1 and h2 have an equally strong confidence at predicting a different label. [24] also presents a combination of semi-supervised and active learning using Gaussian fields and harmonic functions. [23] presented the so-called method Semi-Supervised Active Image Retrieval (SSAIR) for a different task of relevance feedback. The method was inspired by co-training [2] and co-testing [17], but instead of using two sufficient but redundant views of the dataset, it employs two different learners on the same data. In the context of multi-view active learning, [18] proposed a method which combines semisupervised and active learning. The first step uses co-EM with naive Bayes as the semi-supervised algorithm. They present an approximation to co-EM with naive Bayes that can incorporate user feedback almost instantly and can use any sample-selection strategy for active learning. Why the combination should work? The combination of both semi-supervised and active learning appears to be particularly beneficial in reducing the annotation burden for the following reasons:

An Extension of the Aspect PLSA Model to Active and SSL

185

1. It constitutes an efficient way of solving the exploitation/exploration problem: semi-supervised learning is more focused on exploitation, while active learning is more dedicated to exploration. Semi-supervised learning alone may lead to poor performance in the case of very scarce initial annotation. It strongly suffers from poorly represented classes, while being very sensitive to noise and potential instability. On the other hand, active learning alone may spend too much time querying useless examples, as it can not exploit the information given by the unlabeled data. 2. In the same vein, it may alleviate the data imbalance problem due to each method separately. Semi-supervised learning tends to over-weight easy-toclassify examples that will dominate the process, while active learning has the opposite strategy, resulting in exploring more deeply the hard-to-classify examples [22]. 3. Semi-supervised learning is able to provide a more motivated estimation of the confidence score associated to the class prediction for each example, taking into account the whole data set, including the unlabelled data. As a consequence, active learning based on these better confidence scores is expected to be more efficient.

3

Semi-Supervised PLSA with a Mislabeling Error Model

In this section we present the semi-supervised variant of the Probabilistic Latent Semantic Analysis (PLSA) model which is used in combination with active learning. This method incorporate a misclassification error model (namely the ssPLSA-mem) [11]. We assume that the labeling errors made by the generative model for unlabeled data come from a stochastic process and that these errors are inherent to semi-supervised learning algorithms. The idea here is to characterize this stochastic process in order to reduce the labeling errors computed by the classifier for unlabeled data in the training set. We assume that for each unlabeled example x ∈ Xu , there exists a perfect, true label y, and an imperfect label y˜, estimated by the classifier. Assuming also that the estimated label is dependent on the true one, we can model these labels by the following probabilities: y = k|y = h) ∀(k, h) ∈ C × C, βkh = P (˜

(1)

subject to the constraint that ∀h, k βkh = 1. The underlying generation process associated to this latent variable model for unlabeled data is: – Pick an example x with probability P (x), – Choose a latent variable α according to its conditional probability P (α | x) – Generate a feature w with probability P (w | α)

186

A. Krithara et al.

– Generate the latent class y according to the probability P (y | α) – The imperfect class label y˜ is generated with probability βy˜|y = P (˜ y | y) The values of P (y|α) depend on the value of latent topic variable α. The cardinal of α is given. The number of latent topics α per class is also known for both labeled and unlabeled examples. We initialize by forcing to zero the P (y|α) for the latent topic variables α which do not belong to the particular class y. These values remain fixed. In other words, we perform hard clustering. We have to note that the hard clustering is done for each class separately, since in each class (y) the corresponing feature examples may aggregate to several clusters. In algorithm 1 the estimation of model parameters Φ = {P (α | x), P (w | α), βy˜|y : x ∈ X , w ∈ W, α ∈ A, y ∈ C, y˜ ∈ C} is described. This algorithm is an EM-like algorithm. With n(x, w) we denote the frequency of the feature w in the example x. For more information about this model, please refer to [11]. Algorithm 1. ssPLSA-mem Input : – A set of partially labeled data X = Xl ∪ Xu , – Random initial model parameters Φ(0) . – j←0 – Run a simple PLSA algorithm for the estimation of the initial y˜ for each example repeat E-step: Estimate the latent class posteriors P (α|x)P (w|α)P (y|α) πα (w, x, y) = , if x ∈ Xl α P (α|x)P (w|α)P (y|α) P (α|x)P (w|α) y P (y|α)βy|y ˜ π ˜α (w, x, y˜) = , if x ∈ Xu P (α|x)P (w|α) P (y|α)β y|y ˜ α y M-step: Estimate the new model parameters Φ(j+1) by maximizing the complete-data log-likelihood P (j+1) (w|α) ∝ n(w, x)πα(j) (w, x, y(x)) + n(w, x)˜ πα(j) (w, x, y˜(x)) x∈Xl

P (j+1) (α|x) ∝

n(w, x) ×

w

(j+1) βy|y ˜

∝

w x∈Xu

x∈Xu (j) πα (w, x, y(x)), (j) π ˜α (w, x, y˜(x)),

n(w, x)

for x ∈ Xl for x ∈ Xu

π ˜α(j) (w, x, y˜)

α|α∈y

j ←j+1 until convergence of the complete-data log-likelihood ; Output : A generative classifier with parameters Φ


4

187

Active Learning

In this section, we extend the presented semi-supervised model, by combining it with two active learning methods. The motivation is to try to take advantage of the characteristics of both frameworks. In both models, we choose to annotate the less confident example. Their difference lies on the measure of confidence they use. Margin Based Method. The first active learning method (the so-called margin based method) chooses to annotate the example which is closer to the classes’ boundaries [12]. The latter gives us a notion of confidence the classifier has on the classification of these examples. In order to measure this confidence we use the following class-entropy measure for each unlabeled example: B(x) = − P (y|x) log P (y|x), where x ∈ Xu (2) y

The bigger the B is, the less confident the classifier is about the labeling of the example. After having selected an example, we annotate it and we add it to the initial labeled set Xl . More than one examples can be selected at each iteration. The reason is that, especially for classification problems with a big amount of examples and many classes, the annotation of only one example at a time, can be proved time-consuming, as a respectful amount of labeled examples will be needed in order to achieve a good performance. If we choose to do the latter, it is not wise to choose examples that are next to each other, as they cannot give us significantly more information than each of them does. As a result, it is better to choose, for instance, examples with big class-entropy which have been given different labels. That way the classifier can get information about different classes and not only for a single one. Entropy Based Method. Based on the method presented in [5], we calculate the entropy of the annotation of the unlabeled data, during the iterations of the Algorithm 2. Combining ssPLSA and Active Learning Input : A set of partially labeled examples X = Xl ∪ Xu repeat – Run the ssPLSA algorithm (and calculate the P (y|x)) – Estimate the confidence of the classifier on the unlabeled examples – Choose the example(s) with low confidence (if we choose more than one example to label, we choose examples with have been classified into different classes), annotate them and add them in the labeled dataset Xl until a certain number of queries or a certain performance ; Output : A generative classifier

188

A. Krithara et al.

model. This method can be seen as a query by committee approach, where, in contrast to the method of [5], the committees here are the different iterations of the same model. In contrast to the margin based method presented previously, the current one does not use the probabilities P (y|x) of an example x to be assigned the label y, but instead, is uses the deterministic votes of the classifier during the different iterations. We denote by V (y, x) the number of times that the label y was assigned in the example x during the previous iterations. Then, we denote as Vote Entropy of an example x as: V E(x) = −

V (y, x) y

iters

log

V (y, x) iters

(3)

where iters refers to the number of iterations. The examples to be labeled are chosen using equation (3), that is, examples with higher entropies are selected. As long as we add new examples during the iterations, the labeling of some examples will change as, new information will be given to the classifier. The strategy chooses the examples for which the classifier changes its decision more often during the iterations. We have to note, that during the first 2-3 iterations, we do not have enough information in order to choose the best examples to label, but very quickly the active learner manage to identify these examples. The intuition behind this model is that examples which tend to change labels are those for which the classifier seems more undecided. Algorithm 2 gives us the general framework under which the above active learning methods can be combined with the semi-supervised variant of the PLSA model.

5

Experiments

In our experiments we used four different datasets: two collections from the CMU World Wide Knowledge Base project - WebKB1 [4] and 20Newsgroups2 , the widely used text collection of Reuters (Reuters − 21578)3 and a real-world dataset from Xerox. As mentioned before, we are concentrated in document classification; nevertheless, the algorithms described in the previous sections can be also used for different applications in which there is a relation of co-occrence between objects and variables such as image classification. These three datasets were pre-processed by removing the email tags and other numeric terms, discarding the tokens which appear in less than 5 documents and removing a total of 608 stopwords from the CACM stoplist4 . No other form of preprocessing (stemming, multi-word recognition etc.) was used on the documents. Table 1 summarizes the characteristics of these datasets. 1 2 3 4

http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/ http://people.csail.mit.edu/jrennie/20Newsgroups/ http://www.daviddlewis.com/resources/testcollections/reuters21578/ http://ir.dcs.gla.ac.uk/resources/test collections/cacm/


189

Table 1. Characteristics of the datasets 20Newsgroups WebKB Reuters

Dataset Collection size # of classes, K Vocabulary size, |W| Training set size, |Xl ∪ Xu | Test set size

20000 20 38300 16000 4000

4196 4 9400 3257 839

4381 7 4749 3504 876

Except the previous datasets which are widely used for evaluation of different classification algorithms in the Machine Learning community, we used a real world dataset (called XLS) which comes from a Xerox Business Group (XLS). This dataset is constituted of 20000 documents in the training set and 34770 in the test set. The documents consist of approximately 40% emails, 20% Microsoft Word documents; 20% Microsoft Excel documents, 10% Microsoft Power point documents and 10% PDF and miscellaneous documents. We want to classify the documents as Responsive and Non-Responsive to a particular given case. Evaluation Measures. In order to evaluate the performance of the models, we used the microaverage F-score measure. For each classifier, Gf , we first compute its microaverage precision P and recall R by summing over all the individual decisions it made on the test set: R(Gf ) = K

K

k=1

θ(k, Gf )

k=1 (θ(k, Gf )

P (Gf ) = K

K

k=1

+ ψ(k, Gf ))

θ(k, Gf )

k=1 (θ(k, Gf )

+ φ(k, Gf ))

Where, θ(k, Gf ), φ(k, Gf ) and ψ(k, Gf ) respectively denote the true positive, false positive and false negative documents in class k found by Gf , and K denotes the number of classes. The F-score measure is then defined as [14]: F (Gf ) = 5.1

2P (Gf )R(Gf ) P (Gf ) + R(Gf )

Results

We run experiments for all semi-supervised variants, for both active learning techniques, and for all four datasets. In our experiments, we label one example in each iteration and 100 iterations are performed for WebKB, Reuters and 150 for 20Newsgroups dataset. For the XLS dataset we label 2 examples in each iteration, and we perform 100 iterations (as the dataset is bigger than the other three we need more data for achieving a good performance). For the Margin Method, it is not wise to choose 2 examples that are next to each other, as they cannot gives us more information that each of them does. As a result, we chose

190

A. Krithara et al. Reuters

WebKB 0.75

0.81

0.7

0.78

0.65

F−score

F−score

0.84

0.75 0.72 0.69

0.5

Entropy method Margin Method Random method

0.66 0.63

0.6 0.55

0.4 10

20

30

40

50

60

70

80

90

Entropy method Margin method Random method

0.45

100

10

20

30

40

50

60

70

80

90

100

# of labeled examples


XLS

20 Newsgroups 0.8

0.74

0.75

0.71

0.7 0.68

F−score

F−score

0.65 0.6 0.55 0.5 0.45 0.4

0.65 0.62 0.59 0.56

0.35


0.3 15

30

45

60

75

90

105


120

135


0.53

150

20

40

60

80

100

120

140

160

180

200


Fig. 1. F-Score (y-axis) versus, the number of labeled examples in the training set |Dl |, (x-axis) graphs for the combination of the two ssPLSA algorithms with active learning on Reuters, WebKB and 20Newsgroups datasets

the two examples with the biggest class-entropy but, in addition, with different assigned labels. In order to evaluate the performance of the active learning methods, we also run experiments for the combination of the semi-supervised algorithms with a random selection method, where in each iteration the documents to be labeled are chosen randomly. As we can notice from the figure 1 the use of active learning helps, in comparison with the random query for all four datasets. The performance of the two different active learning techniques are comparable, and their difference is not statistically significant. Nevertheless, they clearly outperfom the random method, especially when very few labeled data are available. For the XLS dataset in particular, as we can notice, active learning helps, comparing to the random method, although the gain is less than the other three datasets. As in the previous case, the two active learning methods give similar results.

6

Conclusions

In this work, a variant of the semi-supervised PLSA algorithm has been combined with two active learning techniques. Experiments on four different datasets validate a consistent significant increase inperformance. The evaluation we performed has shown that this combination can further increase classifier’s


191

performance. Using active learning we manage to chose our training labeled set carefully, using the most informative examples. Working this way, we can achieve a better performance using less labeled examples. This work was focused on the PLSA model. Nevertheless, this does not mean that the developed models can exclusively used with it. On the contrary, the proposed techniques are very easily applicable to different aspect models. Another possible extension is the use of different active learning techniques. Also, the combination of more than one active learning technique could be considered.

Acknowledgment This work was supported in part by the IST Program of the European Community, under the PASCAL Network of Excellence, IST-2002-506778. This publication only reflects the authors’ views.

References 1. Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research (2003) 2. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Conference on Computational Learning Theory (COLT), pp. 92–100 (1998) 3. Campbell, C., Cristianini, N., Smola, A.J.: Query learning with large margin classifiers. In: Proceedings of the 17th International Conference on Machine Learning (ICML), San Francisco, CA, USA, pp. 111–118 (2000) 4. Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S.: Learning to extract symbolic knowledge from the World Wide Web. In: Proceedings of the 15th Conference of the American Association for Artificial Intelligence, Madison, US, pp. 509–516 (1998) 5. Dagan, I., Engelson, S.P.: Committee-based sampling for training probabilistic classifiers. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 150–157 (1995) 6. Davy, M., Luz, S.: Active learning with history-based query selection for text categorisation. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 695–698. Springer, Heidelberg (2007) 7. D¨ onmez, P., Carbonell, J.G., Bennett, P.N.: Dual strategy active learning. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladeniˇc, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 116–127. Springer, Heidelberg (2007) 8. Freund, Y., Seung, H., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Machine Learning 28(2-3), 133–168 (1997) 9. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42(1-2), 177–196 (2001) 10. Iyengar, V., Apte, C., Zhang, T.: Active learning using adaptive resampling. In: Proceedings of the 6th International Confenrence on Knowledge Discovery and Data Mining, pp. 92–98 (2000)

192

A. Krithara et al.

11. Krithara, A., Amini, M.-R., Renders, J.-M., Goutte, C.: Semi-supervised document classification with a mislabeling error model. In: European Conference on Information Retrieval (ECIR), Glasgow, Scotland (2008) 12. Krithara, A., Goutte, C., Amini, M.-R., Renders, J.-M.: Reducing the annotation burden in text classification. In: 1st International Conference on Multidisciplinary Information Sciences and Technologies (InSCiT), Merida, Spain (2006) 13. Lewis, D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proceedings of the 17th International Conference on Research and Development in Information Retrieval (SIGIR), Dublin, pp. 3–12 (1994) 14. Lewis, D., Ringuette, M.: A comparison of two learning algorithms for text categorization. In: Proceedings of the Symposium on Document Analysis and Information Retrieval (SDAIR), pp. 81–93 (1994) 15. McCallum, A., Nigam, K.: Employing EM and pool-based active learning for text classification. In: Proceedings of the 15th International Conference on Machine Learning (ICML), pp. 350–358 (1998) 16. Muslea, I., Minton, S., Knoblock, C.: Active + Semi-supervised Learning = Robust Multi-View Learning. In: Proceedings of the 19th International Conference on Machine Learning (ICML), pp. 435–442 (2002) 17. Muslea, I., Minton, S., Knoblock, C.A.: Selective sampling with redundant views. In: Proceedings of the 17th National Conference on Artificial Intelligence (AAAI), pp. 621–626 (2000) 18. Probst, K., Ghani, R.: Towards ’interactive’ active learning in multi-view feature sets for information extraction. In: Proceedings of European Conference on Machine Learning (ECML), pp. 683–690 (2007) 19. Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of the 18th International Conference on Machine Learning (ICML), San Francisco, CA, pp. 441–448 (2001) 20. Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. Computational Learning Theory, 287–294 (1992) 21. Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: Proceedings of 17th International Conference on Machine Learning (ICML), Stanford, US, pp. 999–1006 (2000) 22. T¨ ur, G., Hakkani-T¨ ur, D., Schapire, R.: Combining active and semi-supervised learning for spoken language understanding. Speech Communication 45(2), 171– 186 (2005) 23. Zhou, Z.-H., Chen, K.-J., Dai, H.-B.: Enhancing relevance feedback in image retrieval using unlabeled data. ACM Transactions on Information Systems 24(2), 219–244 (2006) 24. Zhu, X., Lafferty, J., Ghahramani, Z.: Combining active learning and semisupervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning (2003)

A Market-Affected Sealed-Bid Auction Protocol Claudia Lindner Institut für Informatik, Universität Düsseldorf, 40225 Düsseldorf, Germany

Abstract. Multiagent resource allocation defines the issue of having to distribute a set of resources among a set of agents, aiming at a fair and efficient allocation. Resource allocation procedures can be evaluated with regard to properties such as budget balance and strategy-proofness. Designing a budget-balanced and strategy-proof allocation procedure that always provides a fair (namely, envyfree) and efficient (namely, Pareto-optimal) allocation poses a true challenge. To the best of our knowledge, none of the existing procedures combines all four properties. Moreover, in previous literature no attention is given to the allocation of unwanted resources (i.e., resources that seem to be of no use for all agents) in a way as to maximize social welfare. Yet, dealing inappropriately with unwanted resources may decrease each agent’s benefit. Therefore, we extend the scope of sealed-bid auctions by means of involving market prices so as to always provide an optimal solution under consideration of each agent’s preferences. We present a new market-affected sealed-bid auction protocol (MSAP) where agents submit sealed bids on indivisible resources, and we allow monetary side-payments. We show this protocol to be budget-balanced and weakly strategy-proof, and to always provide an allocation that maximizes both utilitarian and egalitarian social welfare, and is envy-free and Pareto-optimal. Keywords: multiagent systems, multiagent resource allocation, auctions.

1 Introduction In multiagent resource allocation, agents participate in an allocation procedure to obtain a fair and efficient allocation of a set of resources.1 There are two types of procedures: centralized and decentralized. Auction protocols give a good example for centralized procedures: All agents are asked to state their preferences (utility values) for the resources given, and based on these the protocol makes a decision on the final allocation (see, e.g., [11,6]). In contrast, in a decentralized environment the final allocation is the result of a sequence of conducted negotiations between single agents (see, e.g., [7]). Many resource allocation approaches are about allocating each of the goods to any of the agents (see, e.g., [14,1,17]), and thus an important aspect is missed out: Among the goods to be allocated there may be “unwanted goods”, i.e., goods none of the agents is interested in. Though some procedures consider that a particular agent may not be interested in each good available (see, e.g., [17]), the issue of unwanted goods is no attention given to. The market-affected sealed-bid auction protocol (MSAP) to be presented 1

Supported in part by the DFG under grant RO 1202/12-1 (within the ESF’s EUROCORES program LogICCC: “Computational Foundations of Social Choice”). We will use “resource” and “good”, and “procedure” and “protocol” interchangeably.

S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 193–202, 2010. c Springer-Verlag Berlin Heidelberg 2010

194

C. Lindner

in Section 4 fills this gap by involving market prices. This new auction protocol has highly relevant properties such as budget balance, weak strategy-proofness, and always providing an allocation that maximizes both utilitarian and egalitarian social welfare, and that is envy-free and Pareto-optimal (see Section 5). Utilitarian and egalitarian social welfare are the most common notions of social welfare (see, e.g., [5]). Informally speaking, for a given allocation, utilitarian social welfare states the sum of all agents’ utilities, whereas egalitarian social welfare states the utility of the worst off agent. Both notions make meaningful statements about the quality of an allocation, utilitarian social welfare measures the overall benefit for society, and egalitarian social welfare measures the level of fairness as to satisfying minimum needs. Another substantial concept of fairness is envy-freeness: An allocation is said to be envy-free if none of the agents has an incentive to swap his or her share with any other agent’s share. Regarding efficiency, the most fundamental concept is the notion of Pareto-optimality: An allocation is Paretooptimal if any change to make an agent better off results in making another agent worse off. With regard to resource allocation protocols that involve monetary side-payments, budget balance states that all payments sum up to zero, i.e., the application of the protocol causes neither a profit nor a loss. These notions and the general framework are specified in Section 2. Moreover, the MSAP is proven to be “weakly strategy-proof”, a notion to be motivated and introduced in Section 3. In short, weakly strategy-proof is a somewhat milder notion of strategy-proofness implying that a cheating attempt may be successful but is always at the risk of an overall loss. Pioneering work in the field of auction procedures for indivisible goods was done by Knaster. His auction protocol of sealed bids2 is about agents that submit sealed bids for single goods, and the agent whose bid is the highest is assigned the good; monetary side-payments are used for compensation. Knaster’s procedure always provides an efficient allocation but does not guarantee envy-freeness (see, e.g. [3]). Just as in Knaster’s procedure of sealed bids, the MSAP asks agents to submit sealed bids on single goods reflecting their individual welfare from receiving these goods. This has the advantage of making winner determination easy (as opposed to the hardness of winnerdetermination problems in combinatorial auctions, see [11,6]). In this regard, in order to create a mutual bid-basis and to account for unwanted goods, the option to sell goods on the open market is included, i.e., the MSAP combines the actual value of a good with each agent’s preferences. Furthermore, we allow monetary side-payments, and a central authority manages the allocation procedure. Put simply, the MSAP extends the scope of sealed-bid auctions by means of involving market prices in order to always provide a fair and efficient allocation—even when taking unwanted goods into account. The New Economy, which is rooted in continuous information-technological progress, provides the basis for the MSAP to act on the global market. The internet and related technologies overcome geographical borders and increase market transparency, and hence create some kind of perfect information environment [4]. Being generated and used by internet users worldwide, information (e.g., with reference to demand and supply) turns out to be the crucial factor when performing global market activities. The global market provides a shared platform for market research, giving each agent the same chances to sell unwanted goods, and thus to make bids on a comparable basis. 2

This protocol has been proposed by Knaster and was first presented in Steinhaus [13].

A Market-Affected Sealed-Bid Auction Protocol

195

Application areas could be the allocation of inheritance items or collective raffle prices, or in the case of two agents even the allocation of household goods within a divorce settlement. In general, the MSAP can be applied to a mixture of insignificant replaceable goods (e.g., a car that is of no (considerable) personal significance to a particular agent) and significant replaceable goods, i.e., goods that are sort of irreplaceable to an agent, but a certain monetary compensation would be accepted (e.g., a car that is of considerable personal significance to a particular agent for some reason—such as this agent has been born in this car—but the agent would be willing to accept some money to set the car aside). The MSAP can also be applied to a set of either solely significant or solely insignificant replaceable goods.

2 Preliminaries and Notation Let A = {a1 , a2 , . . . , an } be a set of n agents, and let G = {g1 , g2 , . . . , gm } be a set of m indivisible and nonshareable goods (i.e., each good is to be allocated in its entirety and to one agent only). If some amount of money is among the goods to be allocated, this money is excluded from G and its value is split equally among all agents in A. Moreover, the number of agents as well as the number of goods are not restricted, and there is no limitation on how many goods are to be allocated per agent. While in previous literature the focus is mostly on scenarios where only one single good is to be assigned per agent (see e.g., [14,15,9]), we do not need this restriction (for related work, see, e.g., [2,17]). Let U = {u1 , u2 , . . . , un } be a set of n utility functions representing each agent’s preferences (i.e., bids), where u j : G → R specifies agent a j ’s utility of each single good in G. An allocation of G is a mapping X : A → G with X(a j ) ∩ X(ak ) = 0/ for any two agents a j and ak , j = k. At this, u j (X(a j )) gives agent a j ’s additive utility of the subset (bundle) of goods allocated to him or her by allocation X; to simplify notation, we will write u j (X) instead of u j (X(a j )). Note that agents do not have any knowledge about the utility values of other agents. Let C = {c1 , c2 , . . . , cn } be a set of n side-payments that agents a j in A either make (i.e., c j ∈ R− ) or receive (i.e., c j ∈ R+ ) in conjunction with an allocation. At this, the value of money is supposed to be the same for all agents. The MSAP asks agents to bid on single goods only. Hence, direct synergetic effects caused by the allocation of bundles of goods are disregarded. However, in Section 5 it is shown that the final allocation involves bundles when accounting for side-payments, and that statements regarding social welfare, fairness, and efficiency can be made. Various criteria have been introduced to measure the quality of an allocation such as the concepts of social welfare, envy-freeness, and Pareto-optimality. Concerning a given society of agents, concepts of social welfare measure the benefit an allocation yields. Transferring this measure to individual agents gives the notion of individual welfare. Definitions 1, 2, 3, and 4 each are given with reference to a resource allocation setting where monetary side-payments are allowed. Definition 1. Consider an allocation X of a set G of goods to a set A of agents, where the agents’ preferences are represented by utility functions U. Let C be the set of sidepayments agents in A either make or receive, as appropriate. The individual welfare of agent a j obtained through allocation X and side-payment c j is defined as iw j (X(a j )) = u j (X(a j )) + c j . We will write iw j (X) instead of iw j (X(a j )).

196

C. Lindner

Note that for an allocation X and any two agents a j and ak in A, a j = ak , the individual welfare agent a j would obtain through the assignment of agent ak ’s share is defined as iw j (X(ak )) = u j (X(ak )) + ck . In terms of social welfare, in this paper, we focus on the following two types (see, e.g., [5]). Definition 2. Consider an allocation X of a set G of goods to a set A of agents, where iw j (X) is the individual welfare agent a j in A obtains through allocation X. (1) The utilitarian social welfare is defined as swu (X) = ∑a j ∈A iw j (X). (2) The egalitarian social welfare is defined as swe (X) = min{iw j (X) | a j ∈ A}. We now define envy-freeness and Pareto-optimality. Informally speaking, an allocation is envy-free if every agent is at least as happy with his or her share as he or she would be with any other agent’s share. An allocation is Pareto-optimal if no agent can be made better off without making another agent worse off. Definition 3. Let a set G of goods and a set A of agents be given, and let X and Y be two allocations of G to A. Let iw j (X) and iw j (Y ) be the individual welfares agent a j in A obtains through allocations X and Y . (1) X is said to be envy-free if for any two agents a j and ak in A, we have iw j (X(a j )) ≥ iw j (X(ak )). (2) Y is said to be Pareto-dominated by X if for each agent a j in A, we have iw j (X) ≥ iw j (Y ), and there exists some agent ak in A such that iwk (X) > iwk (Y ). An allocation is said to be Pareto-optimal (or, Paretoefficient) if it is not Pareto-dominated by any other allocation. The notion of budget balance makes a statement on the quality of a resource allocation protocol that involves monetary side-payments. Definition 4. A resource allocation protocol is said to be budget-balanced if for every allocation obtained it holds that all side-payments sum up to zero, i.e., ∑c j ∈C c j = 0.

3 Motivation Considering the design of a multiagent resource allocation procedure, the common goal is to guarantee a fair (namely, envy-free) and efficient (namely, Pareto-optimal) allocation. In this context, two more desirable properties for a resource allocation protocol to possess are budget balance, and strategy-proofness (i.e., none of the agents has an incentive to bid dishonestly). While the well-known Groves mechanisms [8] satisfy both Pareto-optimality and strategy-proofness, they in general do not guarantee to provide an envy-free allocation and are not budget-balanced (see, e.g., [10]). In fact, Tadenuma and Thomson [16] showed that envy-freeness and strategy-proofness are mutually exclusive. Thus, in this paper, we aim at guaranteeing envy-freeness, Pareto-optimality and budget balance while weakening the requirement of strategy-proofness. In literature it is common to define strategy-proofness by incentive compatibility which states that truthfulness is the dominant strategy (see, e.g., [12]). In order to deal with the impossibility result given, we focus on a somewhat weaker notion of strategy-proofness. Definition 5. Given that all agents have no knowledge about the utility functions of other agents, a resource allocation protocol is said to be weakly strategy-proof if a cheating agent is always risking a loss and is never guaranteed to cheat successfully.


197

In the context of additive valuation functions, Knaster’s procedure of sealed bids satisfies Pareto-optimality and budget balance, but lacks of envy-freeness and strategyproofness—though it is weakly strategy-proof according to Definition 5 (see, e.g. [3]). Willson [17] presented a procedure that indeed is envy-free, Pareto-optimal, budgetbalanced and weakly strategy-proof (according to Definition 5). However, this procedure does not give consideration to welfare maximization when having unwanted goods. According to Willson’s procedure, agents are expected to state negative values for those goods they do not want (i.e., to specify some monetary compensation to be paid to the agents in order to persuade them to accept those goods nonetheless), but there are no restrictions in terms of some sort of value limit. Thus, regarding an unwanted good (i.e., a good that is a burden on each of the agents), it is possible that the absolute equivalent of each single negative value is higher than the overall value of all other goods to be allocated. This allows agents to receive compensations that, in the end, may cause a moneylosing allocation. As an example, consider the setting that we have two agents a1 and a2 , three goods g1 , g2 and g3 , and the following utility values u1 (g1 ) = 100, u1 (g2 ) = 50, u1 (g3 ) = −300, u2 (g1 ) = 80, u2 (g2 ) = 60 and u2 (g3 ) = −250. According to the procedure given in [17], each good is assigned to the highest bidder. Thus, good g1 is assigned to agent a1 , and goods g2 and g3 are assigned to agent a2 . The overall benefit sums up to 100 + 60 + (−250) = −90, and hence agent a1 has to pay side-payments worth −145 to agent a2 , resulting in a negative share of −45 for each of the agents. Just one unwanted good, here g3 , can smash a whole allocation. But, if an agent does not want a good for personal use, this does not necessarily mean that this good is worth nothing to the agent as he or she may have good selling opportunities. For example, let us assume good g3 is a car and both agents have no use for it, hence they only see the cost involved such as the cost for scrapping or insurance. By missing the option to sell unwanted goods and to distribute the related profit, agents may end up paying rather than benefiting. Moreover, there may be goods that, though being wanted by some agents, do not make up a high personal significance. Having no common basis for the specification of utility values, agents with a similar preference for one particular good (e.g., considering the good to be of no personal significance) may state significantly different utilities. Without a common basis the values stated may neither be related to the actual value of the good nor to one another, and thus a lower overall benefit may be caused. To address the issues mentioned above, we present and analyze a new marketaffected sealed-bid auction protocol that is proven to be budget-balanced and weakly strategy-proof, and to always provide an allocation that is envy-free, Pareto-optimal, and that always maximizes both utilitarian and egalitarian social welfare.

4 A Market-Affected Sealed-Bid Auction Protocol The MSAP is about allocating goods that are to give away in as fair a way as possible. However, agents may have diverse preferences for the goods in G, and thus some allocations may result in an advantage for one agent and in a disadvantage for another—which is unfair as every agent is to be treated equally. For the purpose of achieving not only an efficient but also a fair allocation, the aim is to assign all goods in G in such a way that the individual welfares of all agents are equalized according to how valuable the goods

198

C. Lindner

are to them. Regarding a good that is significant to an agent, the utility value reflects the level of personal significance of this good, whereas, regarding an insignificant good, the utility value states the profit the agent could make by selling this good.3 Note that the central authority (CA) managing the allocation procedure is not one of the agents. If the CA needs to be paid for its job, this is done proportionally by all agents once the protocol is finished. It is assumed though that the CA generally does not have to be paid for organizing the allocation. Moreover, the MSAP may involve sidepayments, but, as opposed to other approaches (see, e.g., [16,17]), none of the goods in G needs to be infinitely divisible. Furthermore, neither any agent nor the CA will lose any value by the application of the MSAP, because all side-payments are included in the overall value of the goods in G. Let X Σ denote the final allocation obtained by the MSAP. We write swΣ instead of sw(X Σ ), and iwΣj instead of iw j (X Σ ). The MSAP is a multi-stage resource allocation protocol and consists of three phases: the bidding phase, the assignment phase and the compensation phase. In the course of the bidding phase agents are asked to specify a utility value for each of the goods. During the assignment phase each good is assigned to the agent whose benefit from receiving this good is the highest, where ties can be broken arbitrarily. Finally, the compensation phase is about equalizing the individual welfares of all agents by means of monetary side-payments. The MSAP is presented in detail in Figure 1. Remark 1. Some remarks on the steps of the protocol in Figure 1 are in order: 1. B5. Agent a j states u j (gi ) (i.e., the individual welfare agent a j would obtain from receiving good gi ) according to the following rules. (a) If a j is not interested in gi , a j would not keep gi but would sell it and thus states a utility value fulfilling u j (gi ) ≤ M(gi ). Agent a j would receive revenue M(gi ) for selling gi but he or she may also have some expenses S j (gi ) caused by selling gi , i.e., u j (gi ) = M(gi ) − S j (gi ). If agent a j wants to make sure to receive good gi by no means, he or she states a utility value of zero.4 (b) If a j would like to have gi for him- or herself, a j states a utility value fulfilling u j (gi ) ≥ M(gi ). At this, the value difference between u j (gi ) and M(gi ) expresses the degree of significance of gi to a j , i.e., the higher the difference the more significant gi is to a j . 2. A3. If uk (gi ) < PCA (gi ) holds true, the CA has the lowest selling cost for gi and none of the agents in A is interested in keeping gi for him- or herself, i.e., gi is an unwanted good. m M(g ). 3. C1. If none of the goods in G needs to be sold, it holds that swΣ ≥ M Σ = Σi=1 i After the assignment phase is finished, all agents concerned and the CA go into the matter of selling those goods that are to be sold, since in the compensation phase all related profits are involved. However, if agents concerned have sufficient cash at hand, the selling could be done later on, though at the risk of financial loss and the chance of additional profit as market prices may change over time. 3 4

There is no need to consider selling opportunities for significant goods, since agents are intersted in keeping those goods due to their significance. Note that stating “0” would simplify the process for agent a j (i.e., no selling activity would be required), but this may cause a lower overall benefit compared to when a j would sell gi .


199

Bidding Phase: For each good gi in G perform steps B1 to B5. B1. Based on market research, the CA determines market price M(gi ), i.e., the revenue the CA or an agent would receive when selling good gi on the market. B2. The CA determines selling cost SCA (gi ), i.e., the cost caused by the CA selling gi on the market (e.g., the cost for meeting a potential buyer, or the cost for shipping the good). B3. The CA calculates profit PCA (gi ) := M(gi ) − SCA (gi ). B4. The CA discloses market price M(gi ), but conceals selling cost SCA (gi ) and profit PCA (gi ). B5. Each agent a j in A specifies utility value u j (gi ) ≥ 0 and submits this one to the CA. Assignment Phase: For each good gi in G perform steps A1 to A3. A1. Find agent ak in A such that there is no agent a j in A with u j (gi ) > uk (gi ), i.e., find a highest bidder for gi . (Ties can be broken arbitrarily.) A2. If uk (gi ) ≥ PCA (gi ) and uk (gi ) > 0, good gi is allocated to agent ak and the highest bid for good gi is recorded by setting u (gi ) := uk (gi ). A3. If uk (gi ) < PCA (gi ) or uk (gi ) = PCA (gi ) = 0, the CA is going to keep gi for the time being and the highest bid for gi is recorded by setting u (gi ) := PCA (gi ). Compensation Phase: m u (g ). C1. For final allocation X Σ calculate the overall social welfare by swΣ := Σi=1 i C2. In compliance with values u j (X Σ ), divide set A into three disjoint sets, R, S, and T with A = R ∪ S ∪ T , such that: (1) ur (X Σ ) > (1/n) · swΣ for all ar in R; (2) us (X Σ ) < (1/n) · swΣ for all as in S; (3) ut (X Σ ) = (1/n) · swΣ for all at in T . C3. All agents ar in R (i.e., all advantaged agents) have to make side-payments cr ∈ R− such that iwΣr = ur (X Σ ) + cr = (1/n) · swΣ . C4. For goods g in G with u (g ) = PCA (g ) that had to be sold by the CA, the CA has to make side-payments cCA ∈ R− such that cCA = −Σλ ∈{} PCA (gλ ). C5. All agents as in S (i.e., all disadvantaged agents) receive side-payments cs ∈ R+ such that iwΣs = us (X Σ ) + cs = (1/n) · swΣ . Note that −Σσ ∈{s} cσ = cCA + Σρ ∈{r} cρ . The CA discloses social welfare swΣ , the assignment of goods gi according to X Σ and sidepayments c j for each agent a j in A. Fig. 1. A market-affected sealed-bid auction protocol for any number of goods and agents

4. C5. Side-payments cr and cs can be made to or received from several agents, and agents either make side-payments or receive side-payments. Agents at in T have been in possession of a proportional share of social welfare swΣ after the assignment phase already, i.e., ct = 0 and iwtΣ = ut (X Σ ) = (1/n) · swΣ for all at in T .

5 Results and Discussion The easiest way of equitably allocating all goods in G would be if the CA itself would sell all goods on the market, and distribute the profit made in a proportional manner among all agents. In this case, the overall social welfare swΣ would equal m PΣ = Σi=1 PCA (gi ). Thus, (1/n) · PΣ specifies the minimal individual welfare each agent in A is guaranteed to obtain through the allocation of all goods in G by the MSAP. However, taking each agent’s preferences into consideration may increase all individual welfares, and the overall social welfare accordingly, up to any amount. Concerning

200

C. Lindner

unwanted goods, the MSAP includes the option to sell those goods with the best possible profit, which gives an opportunity to increase the overall social welfare, and which guarantees that the overall social welfare is not devaluated by some “out-of-favor” good. Furthermore, the MSAP takes into account that agents may have exceptionally low selling costs for one or the other good, and by this keeps all selling costs as low as possible. After the application of the MSAP, every agent a j in A possesses a bundle of goods (which may be empty) and some side-payments (which may be positive, negative, or zero). Note that each agent’s individual welfare iwΣj is at least as high as a proportional share of the overall social welfare that could have been achieved if all goods m u (g ). Consequently, in G would have been allocated to this agent, i.e., iwΣj ≥ 1/n · Σi=1 j i by including all agents’ preferences each agent experiences an increase of what he or she actually receives over what he or she anticipated to receive according to his or her measure. Combining utility values and side-payments, individual welfare iwΣj can be interpreted as the bundle (consisting of goods and/or money) agent a j received by final allocation X Σ . Analogously, social welfare swΣ can be linked to the concept of utilitarian social welfare, i.e., swu (X Σ ) = swΣ , and to the concept of egalitarian social welfare, i.e., swe (X Σ ) = 1/n · swΣ , again in consideration of monetary side-payments. Theorem 1. Every allocation obtained by the MSAP maximizes both utilitarian social welfare and egalitarian social welfare according to the agents’ valuations. Proof. The allocation of all goods gi in G is conducted in a way such that each good is assigned to a highest bidder, or to the CA, respectively. By this, the overall social welfare to be distributed, which turns out to correspond to the utilitarian social welfare of final allocation X Σ (including side-payments), is maximized on the basis of all agents’ valuations of the goods. Given each agent’s utility that would result from receiving good gi , every other allocation that assigns at least one good to another agent than one of the highest bidders would result in a lower utilitarian social welfare. Maximization of egalitarian social welfare follows immediately from maximization of utilitarian social welfare, since the MSAP makes each agent to receive a proportional share of utilitarian social welfare swu (X Σ ) according to his or her measure.

From an inter-agent perspective, each agent values every other agent’s bundle at most as much as his or her own, and thus an envy-free allocation is guaranteed. Moreover, no agent can be made better off without making any other agent worse off. Theorem 2. Every allocation obtained by the MSAP is Pareto-optimal and envy-free. Proof. The notion of efficiency implies that there is no better overall outcome for the set of agents involved [3]. Taking this statement into account, Pareto-optimality follows immediately from Theorem 1. Envy-freeness is easy to see when considering that each good gi in G is allocated to one of the agents that bid the highest value u (gi ), and that the very same agent has to make side-payments (each valued (1/n) · u (gi )) to the n − 1 other agents.5 Since each of the n − 1 other agents values good gi at most u (gi ), this guarantees that none of those n − 1 agents envies this agent for having received good gi . Concerning goods that are sold by the CA, each agent is receiving the same proportional monetary share of the profit made, and hence, in this case too, no envy is created.

5

For the sake of convenience, all side-payments of agent a j sum up to one final side-payment c j .


201

Budget balance of the MSAP (in the sense of Definition 4) follows immediately from steps C3, C4, and C5 as given in Figure 1. Corollary 1. The MSAP is budget-balanced. Our last result shows that the market-affected sealed-bid auction protocol presented in Figure 1 is weakly strategy-proof (in the sense of Definition 5). Theorem 3. The MSAP is weakly strategy-proof. Proof. With reference to any good gi in G, a “cheater” (i.e., an agent in A not telling the truth) could cheat by either stating a higher or a lower than his or her true utility value. In the former case, if the cheater wins the bid for gi , he or she, just as all other agents, will obtain a higher individual welfare, but at the expense of the cheater as he or she has to compensate for the difference—a difference which in fact is only fictitious. That is, the cheater has to pay compensations out of a fund that does not exist and which is based on untruthful values only. Consequently, all agents but the cheater would benefit from this type of cheating. A cheating attempt would be reasonable only if a utility value lower than the true one is stated. Referring to this, if an agent’s true utility value is higher than the market price M(gi ) (i.e., if this agent wants good gi for him- or herself due to its high significance), he or she is motivated not to cheat by stating a lower utility value, since in this case he or she may end up not getting gi at all. On the other hand, if an agent that is not interested in keeping good gi would cheat by stating a lower than the true utility value, this cheating attempt would succeed if, firstly, the cheater has the highest bid, and secondly, the cheater’s bid is at least as high as profit PCA (gi ). In contrast, if the second condition is not fulfilled, good gi would be sold by the CA causing a decreased individual welfare for all agents, including the cheater. This is the reason why for all gi in G selling cost SCA (gi ) and profit PCA (gi ) are not disclosed, aiming to motivate all agents to state true utility values as otherwise they would risk their share. To sum up, trying to cheat by stating a higher than the true utility value is of no advantage to the cheater, and trying to cheat by stating a lower than the true utility value always bears the risk of ending up with even less. Thus, the MSAP is weakly strategy-proof.

6 Conclusions We have proposed a new market-affected sealed-bid auction protocol that can be applied to a set of any number of agents and a set of any number of goods. We have shown this budget-balanced and weakly strategy-proof protocol to possess nice properties such as always providing an envy-free and Pareto-optimal allocation with maximal utilitarian and egalitarian social welfare. In addition, this protocol guarantees each agent to receive a bundle that is worth at least as much as a proportional share of all goods, according to his or her measure. These advantages notwithstanding, we mention the following limitations of this protocol. Depending on the scenario, agents not selling those goods received may need to have some cash at hand or to hold sufficient liquid assets in order to make side-payments. In this regard, poorer agents not holding sufficient liquid assets could, to play safe, accept each good for selling only, and by this avoid any trouble.

202

C. Lindner

However, the total of side-payments to be made by an agent never exceeds the value of all to this agent assigned goods, and thus each agent is guaranteed to gain in individual welfare by the application of the MSAP. Moreover, having huge amounts of the same good to be sold may have an impact on the market price of this good. Note also that weak strategy-proofness ( in the sense of Definition 5) is a quite softer concept than the common notion of strategy-proofness. In terms of future work, one direction to go could be the involvement of each agent’s wealth by using weighted utility values for all significant goods. In this way, poorer agents could be motivated to make bids that reflect each good’s true significance; rather than, fearing side-payments, to accept each good for selling only.

References 1. Aragones, E.: A solution to the envy-free selection problem in economies with indivisible goods. Technical Report 984, Northwestern University, Center for Mathematical Studies in Economics and Management Science (April 1992) 2. Beviá, C.: Fair allocation in a general model with indivisible goods. Review of Economic Design 3(3), 195–213 (1998) 3. Brams, S., Taylor, A.: Fair Division: From Cake-Cutting to Dispute Resolution. Cambridge University Press, Cambridge (1996) 4. Cassiman, B., Sieber, S.: The impact of the internet on market structure. Technical Report D/467, IESE Business School (July 2002) 5. Chevaleyre, Y., Dunne, P., Endriss, U., Lang, J., Lemaˆıtre, M., Maudet, N., Padget, J., Phelps, S., Rodr´ıguez-Aguilar, J., Sousa, P.: Issues in multiagent resource allocation. Informatica 30, 3–31 (2006) 6. Conitzer, V., Sandholm, T., Santi, P.: Combinatorial auctions with k-wise dependent valuations. In: Proceedings of the 20th National Conference on Artificial Intelligence, pp. 248– 254. AAAI Press, Menlo Park (2005) 7. Dunne, P., Wooldridge, M., Laurence, M.: The complexity of contract negotiation. Artificial Intelligence 164(1–2), 23–46 (2005) 8. Groves, T.: Incentives in teams. Econometrica 41(4), 617–631 (1973) 9. Ohseto, S.: Implementing egalitarian-equivalent allocation of indivisible goods on restricted domains. Journal of Economic Theory 23(3), 659–670 (2004) 10. Pápai, S.: Groves sealed bid auctions of heterogeneous objects with fair prices. Social Choice and Welfare 20(3), 371–385 (2003) 11. Sandholm, T., Suri, S., Gilpin, A., Levine, D.: Winner determination in combinatorial auction generalizations. In: Proceedings of the 1st International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 69–76. ACM Press, New York (2002) 12. Shoham, Y., Leyton-Brown, K.: Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, New York (2009) 13. Steinhaus, H.: The problem of fair division. Econometrica 16, 101–104 (1948) 14. Svensson, L.: Large indivisibles: An analysis with respect to price equilibrium and fairness. Econometrica 51(4), 939–954 (1983) 15. Tadenuma, K., Thomson, W.: No-envy and consistency in economies with indivisible goods. Econometrica 59(6), 1755–1767 (1991) 16. Tadenuma, K., Thomson, W.: Games of fair division. Games and Economic Behavior 9(2), 191–204 (1995) 17. Willson, S.: Money-egalitarian-equivalent and gain-maximin allocations of indivisible items with monetary compensation. Social Choice and Welfare 20(2), 247–259 (2003)

A Sparse Spatial Linear Regression Model for fMRI Data Analysis Vangelis P. Oikonomou and Konstantinos Blekas Department of Computer Science, University of Ioannina P.O. Box 1186, Ioannina 45110 - GREECE {voikonom,kblekas}@cs.uoi.gr

Abstract. In this study we present an advanced Bayesian framework for the analysis of functional Magnetic Resonance Imaging (fMRI) data that simultaneously employs both spatial and sparse properties. The basic building block of our method is the general linear model (GML) that constitute a well-known probabilistic approach for regression. By treating regression coefficients as random variables, we can apply an appropriate Gibbs distribution function in order to capture spatial constraints of fMRI time series. In the same time, sparse properties are also embedded through a RVM-based sparse prior over coefficients. The proposed scheme is described as a maximum a posteriori (MAP) approach, where the known Expectation Maximization (EM) algorithm is applied offering closed form update equations. We have demonstrated that our method produces improved performance and enhanced functional activation detection in both simulated data and real applications.

1

Introduction

Functional magnetic resonance imaging (fMRI) measures the tiny metabolic changes that take place in an active part of the brain. It is becoming a common diagnostic method of the behavior of a normal, diseased or injured brain, as well as for assessing the potential risks of surgery or other invasive treatments of the brain. Functional MRI is based on the increase in blood flow to the local vasculature that accompanies neural activity of the brain [1]. When neurons are activated, the resulting increased need for oxygen is overcompensated by a large increase in perfusion. As a result, the venous oxyhemoglobin concentration increases and the deoxyhemoglobin concentration decreases. The latter has paramagnetic properties and the intensity of the fMRI images increases in the activated areas. The signal in the activated voxels increases and decreases according to the paradigm. fMRI detects changes of deoxyhemoglobin levels and generates blood oxygen level dependent (BOLD) signals related to the activation of the neurons [1]. The fMRI data analysis consists of two basic stages: preprocessing and statistical analysis. The first stage is usually carried out in four steps: slice timing, motion correction, spatial normalization and spatial smoothing [1]. Statistical analysis can be done using the parametric general linear regression model S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 203–212, 2010. c Springer-Verlag Berlin Heidelberg 2010

204

V.P. Oikonomou and K. Blekas

(GLM) [2] under a Maximum Likelihood (ML) framework for parameter estimation. Sequentially, the t or F statistic is used on order to form a so-called statistical parametric map (SPM) that maps the desired active areas. A significant drawback of the basic GLM approach is that spatial and temporal properties of fMRI data are not taken into account. However, it is well known that the BOLD signal is constrained spatially due to its physiological nature and preprocessing steps such as realignment and spatial normalization [1]. Within the literature there are several methods that incorporate spatial and temporal correlations into the estimation procedure. A common approach is to apply Gaussian filter smoothing or adaptive thresholding techniques that adjust statistical significance of active regions, according to their size. Alternatively, spatial characteristics of fMRI can be naturally described in a Bayesian framework through the use of Markov Random Fields (MRF) priors [3, 4] and autoregressive (AR) spatio-temporal models [5, 6]. The estimation process of most of these works is achieved by either Markov Chain Monte Carlo (MCMC), or Variational Bayes framework. An alternative methodology has been presented in [7], where the image of the regression coefficient is first spatially decomposed using wavelets, and secondly a sparse prior is applied over the wavelet coefficients. Apart from spatial another desired property of analysis is to embody a mechanism that automatically selects the model order. This is a very important issue in many model based applications including regression. If the order of the regressor model is too large it may overfit the observations and does not generalize well. On the other hand, if it is too small it might miss trends in the data. Sparse Bayesian regression offers a solution to the above problem [8, 9] by introducing sparse priors on the model parameters. In this paper we propose a model-based framework that simultaneously employs both spatial and sparse properties in a more systematic way. The basic regression model GLM can be spatially constrained by considering that the regression coefficients follow a Gibbs distribution [10]. By using then a modification of the clique potential function, we can allow the incorporation of sparse properties based on the notion of Relevance Vector Machine (RVM) [8]. A maximum a posteriori expectation maximization algorithm (MAP-EM) [11] is applied next to train this model. This is very efficient since it leads to update rules of model parameters in closed form during the M -step and improves data fitting. The performance of the proposed methodology is evaluated using a variety of simulated and real datasets. Comparison has been made using the typical maximum likelihood (ML) and the spatially variant alone regression model. As the experimental study has showed, the proposed method is more flexible and robust providing with quantitatively and qualitatively superior results. In section 2 we briefly describe the general linear model and its spatially variant version by setting a Gibbs prior. The proposed simultaneous spatial and sparse regression model is then presented in section 3 and the MAP-based learning procedure. To assess the performance of the proposed methodology we present in section 4 numerical experiments with artificial and real fMRI datasets. Finally, in section 5 we give conclusions and suggestions for future research.

A Sparse Spatial Linear Regression Model for fMRI Data Analysis

2

205

A Spatially Variant Generalized Linear Regression Model

Suppose we are given a set of N fMRI time-series Y = {y1 . . . , yN }, where each observation yn is a sequence of M values over time, i.e. yn = {ynm }M m=1 . The Generalized Linear Model (GLM) assumes that the fMRI time series yn are described with the following manner: yn = Φwn + en ,

(1)

where Φ is the design matrix of size M × D and wn is the vector of the D regression coefficients which are unknown and must be estimated. Moreover, the last term en in Eq. 1 is a M -dimensional vector determining the error term that is assumed to be Gaussian with zero mean, independent over time with a precision (inverse variance) λn , i.e. en ∼ N (0, λ−1 n I). The design matrix Φ contains some explanatory variables that describes various experimental factors. In block design related experiments it usually has one regressor for the BOLD response plus the mean constant, i.e. it is a two-column matrix. However, we can expand it containing regressors related to other components of the fMRI time series such as drift and movement effects [6]. In fMRI data analysis the goal is to find the involvement of experimental factors in the generation process of time series, that is achieved through the estimation of coefficients wn . Since Φwn is deterministic, we can model the probability density of the sequence yn with the normal distribution p(yn |wn , λn ) = N (Φwn , λ−1 n I). Thus, the problem becomes a maximum likelihood (ML) estimation problem for the regression parameters Θ = {wn , λn }N n=1 . The maximization of the log-likelihood function: LML (Θ) =

N

log p(yn |wn , λn ) =

n=1

N M n=1

2

log λn −

λn yn − Φwn 2 , (2) 2

leads to the following rules: ˆn = ˆ n = (ΦT Φ)−1 ΦT yn , λ w

M . ˆ n 2 yn − Φw

(3)

After the estimation procedure, we calculate the t-statistic for each voxel for drawing the statistical map and identifying the activation regions. The fMRI data are biologically generated by structures that involve spatial properties, since adjacent voxels tend to have similar activation level [12]. Moreover, the produced ML-based activation maps contain many small activation islands and so there is a need for spatial regularization. The Bayesian formulation offers a natural platform for automatically incorporating these ideas. We assume that the vector of coefficients wn follows the Gibbs density function according to the following form: β n p(wn |βn ) ∝ βn|Nn | exp − wn − wk 2 , (4) 2 k∈Nn

206


where βn is the regularization parameter. The summation term denotes the cliques potential function within the neighborhood Nn of the n-th voxel, i.e. |N | horizontally, vertically or diagonally adjacent voxels, while the first term βn n acts as a normalizing factor. In addition, a Gamma prior is imposed on the regularization parameter βn as well as the noise precision parameter λn with Gamma parameters {cβ , bβ } and {cλ , bλ }, respectively. The estimation problem can now be formulated as a maximum a posteriori (MAP) approach, in the sense of maximizing the posterior of Θ = {wn , βn , λn }N n=1 : LMAP (Θ) =

N

log p(yn |wn , λn ) + log p(wn |βn ) + log p(βn ) + log p(λn ) (5)

n=1

The maximization problem can be easily found that leads to the following updated rules: ˆ n = (λn ΦT Φ + Bn )−1 (λn ΦT y + BWn ) , w (6) |N | + c n β βˆn = 1 , (7) ˆ ˆ k 2 + bβ w − w n k∈Nn 2 M + cλ ˆn = λ , (8) 1 ˆ n 2 + bλ y − Φw n 2 where Bn = k∈Nn (βn + βk )I and BWn = k∈Nn (βn + βk )wk that determine the contribution of neighbors inside the clique. Equations 6-8 are applied iteratively until the convergence of the MAP log-likelihood function. The above scheme can be also described within an Expectation-Maximization (EM) framework [11], where the E-step computes the expectation of the hidden variables (wn ) and use them next for updating the model parameters during the M -step. This approach will be referred next as SVGLM.

3

Simultaneous Sparse and Spatial Regression

A desired property of the linear regression model is to offer an automatic mechanism that will zero out the coefficients that are not significant and maintain only large coefficients that are considered significant based on the model. Moreover, an important issue when using the regression model is how to define its order D. The problem can be tackled using the Bayesian regularization method that has been successfully employed in the Relevance Vector Machine (RVM) model [8]. In order to capture both spatial and sparse properties over regression coefficients, the Gibbs distribution function needs to be reformulated. This can be accomplished by using the following Gibbs density function: D 1 1/2 (1) (2) p(wn |βn , zn , αn ) ∝ βn|Nn | znk αnd exp − VNn (wn )+VNn (wn ) . 2 k∈Nn

d=1

(9)


207

The first term in the exponential part of this function is the sparse term used for describing local relationships of the n-th voxel coefficients. This is given by: (1)

VNn (W) = wnT An wn ,

(10)

where An is a diagonal matrix containing the D elements of the hyperparameter vector αn = (αn1 , . . . , αnD )T . By imposing a Gamma prior over hyperparameters, a two-stage hierarchical prior is achieved, which is actually a Student-t distribution with heavy tails [8]. This scheme enforces most αnd to be large, thus the corresponding coefficients wnd are set zero and finally eliminated. The second term of the exponential part (Eq. 9) captures the sparse property and is responsible for the clique potential of the nth voxel: (2) VNn (W) = βn znk wn − wk 2 . (11) k∈Nn

In comparison with the potential function of the SVGLM method (Eq. 4), here each neighbor contribute with a different weight, as denoted by parameters znk , to the computation of the clique energy value. The introduction of these weights can increase the flexibility of spatial modeling. As experimentally have shown, this can be proved advantageous in cases around the borders of activation regions (edges). Finally, the first part of Eq. 9 acts as a normalization factor. We also assume that the regularization parameter βn , the noise precision λn and the weights znk follow Gamma distribution. Training of the proposed model is therefore converted into a MAP-estimation problem for the set of model N parameters Θ = {θn }N n=1 = {wn , βn , λn , zn , αn }n=1 : LMAP (Θ) =

N

log p(yn |θn ) + log{p(wn |βn , zn , αn )p(βn )p(λn )p(zn )p(αn )} .

n=1

(12) By setting the partial derivative equal to zero the following closed form update rule for regression coefficients can be obtained: ˆ n = (λn ΦT Φ + BZn + An )−1 (λn ΦT yn + BZWn ) , w (13) where the matrices BZn and BZWn are: BZn = βn k∈Nn (znk + zkn )I and BZWn = βn k∈Nn (znk + zkn )wk . For the other model parameters we have: |Nn | + cβ , 2 k∈Nn znk wn − wk + bβ 1 + cz = 1 , ˆn w ˆ ˆ k 2 + bz β n−w 2 1 + 2ca = 2 , w ˆnd + 2ba

βˆn = zˆnk α ˆ nd

1 2

(14) (15) (16)

while the noise precision λn has the same form as previously defined in SVGLM, (Eq. 8). The whole procedure can be integrated in an EM framework, where the

208


expectation of regression coefficients are computed in the E-step (Eq. 13), and the maximization of the complete-data log-likelihood is performed during the M-step (Eqs 14-16), giving update equations for model parameters. The above scheme is iteratively applied until the convergence of the MAP function. Notice that in the above equations we took into consideration that the weights of n-th voxel occurs two times into the summation term, one as the central voxel, and |Nn | times as a neighbor of different voxels. We call this method SSGLM.

4


We have tested the proposed method, SSGLM, using various simulated and real datasets. Comparison has been made with the simple ML method and the SVGLM as has been presented in Section 2. The SVGLM and SSGLM have been initialized with the same manner. First, the ML estimates of the regression coefficients wn are obtained and use them next for initializing the rest model parameters βn ,λn ,zkn and anp , according to Eqs. (14)-(16), respectively. During the experiments the parameters of Gamma prior distributions were set cβ = bβ = cz = bz = 1, cλ = bλ = 10−8 and bα = cα = 10−8 (making them non-informative as suggested by the RVM methodology [8]). 4.1

Experiments with Simulated Data

The simulated datasets used in our experiments were created using the following generation mechanism. We applied a design matrix (Φ) of size M × 2 with two pre-specified regressors, the first one captures the BOLD signal (Fig. 1 (a)), and the second one being a constant with ones. Then, we constructed an image with the activation regions that corresponds to the value of the first coefficient (wn1 ), while the second coefficient wn2 had a constant value equal to 100. In our study we have used two such images of size 80 × 80 with different shape of activation areas, rectangular (Fig. 1(b)) and circular (Fig. 1 (c)), respectively. The time series data (yn ) were finally produced by using the generative equation of GLM (Eq. 1) with an additive white Gaussian noise of various signal-tonoise-ratio (SNR) levels, where we performed 50 runs and computed their mean

1

10

10

20

20

30

30

40

40

50

50

60

60

Amplitude

0.8

0.6

0.4

0.2

0

−0.2 0

70

70

80 10

20

30

40 Time

(a)

50

60

70

80

80 10

20

30

40

(b)

50

60

70

80

10

20

30

40

50

60

70

80

(c)

Fig. 1. Simulated data generative features: (a) Bold signal, (b) rectangular and (c) circular shape image of true activated areas


209

performance. Evaluation has done using two criteria: 1) The Area Under Curve (AUC) of the Receiver Operating Curve (ROC) based on t-statistic calculations and 2) the normalized mean square error (NMSE), between the estimated and the true coefficients responsible for the BOLD signal. We present in Table 1 the comparative performance results in terms of the above two criteria for several SNR values in the case of rectangular and circular activation regions, respectively. As it is obvious, the proposed spatial sparse model (SSGLM) improves functional activation detection quality, especially for lower values of examined SN R values. In all cases both MAP-based approaches perform significantly better than the simple ML method. Figure 2 presents the mapping results of a typical run in the case of SNR =-20 dB. As it is obvious the proposed SSGLM approach manages to construct much smoother maps of brain activity than the spatial SVGLM model. That is interesting to observe is that SVGLM method has the tendency to overestimate the activation areas and Table 1. Comparative results for simulated data in various noisy environments

SNR 0 -5 -10 -15 -20 -30

SSGLM 0.999 0.998 0.998 0.986 0.920 0.763

circular areas AUC NMSE SVGLM ML SSGLM SVGLM 0.999 0.999 0.118 0.177 0.999 0.929 0.551 0.464 0.998 0.795 0.704 0.633 0.988 0.674 0.807 0.802 0.914 0.600 0.993 1.084 0.724 0.558 1.748 1.854

SSGLM

ML SSGLM 0.294 0.998 0.933 0.998 1.642 0.995 2.948 0.978 5.214 0.898 9.257 0.747

rectangular areas AUC NMSE SVGLM ML SSGLM SVGLM 0.995 0.980 0.129 0.170 0.992 0.819 0.415 0.318 0.991 0.712 0.541 0.478 0.972 0.624 0.641 0.665 0.883 0.570 0.855 0.971 0.716 0.536 1.437 1.641

SSGLM

AUC=0.9969, NMSE=0.70787

ML

AUC=0.99759, NMSE=0.73026

AUC=0.70245, NMSE=3.0523

10

10

10

20

20

20

30

30

30

40

40

40

50

50

50

60

60

60

70

70

80

70

80 10

20

30

40

50

60

70

80

ML 0.255 0.812 1.439 2.554 4.579 8.074

80 10

20

30

40

50

60

70

80

10

20

30

40

50

60

70

80

60

70

80

(a) rectangular activated areas AUC=0.9604, NMSE=0.57853

AUC=0.95256, NMSE=0.59935

AUC=0.63168, NMSE=2.6525

10

10

10

20

20

20

30

30

30

40

40

40

50

50

50

60

60

60

70

70

80

80 10

20

30

40

50

60

70

80

70 80 10

20

30

40

50

60

70

80

10

20

30

40

50

(b) circular activated areas Fig. 2. An example of the statistical map produced by three comparative methods for two kind of activity (a) rectangular and (b) circular. The SNR value is -20 dB.

210

V.P. Oikonomou and K. Blekas SSGLM

SVGLM

ML

10

10

10

20

20

20

30

30

30

40

40

40

50

50

50

60

60

60

70

70

70

80

80

80

90

90 10

20

30

40

50

60

70

90 10

20

30

40

50

60

70

10

20

30

40

50

60

70

Fig. 3. Maps of the estimated BOLD signal (wn1 ) obtained by three methods

discover larger regions than their true size. The proposed SSGLM exhibits very clean edges between activated and non - activated areas, and thus visual improvement. Finally, the ML approach completely fails to discover any activation pattern in this experiment. 4.2

Experiments with Real fMRI Data

The proposed approach was also evaluated in real applications. Experiments were made using a block design real fMRI dataset that was downloaded from the SPM web page1 which was designed for auditory processing task on a healthy volunteer. In our study, we followed the standard preprocessing steps of the statistical parametric mapping package (SPM) manual, which are realignment, segmentation, and spatial normalization, without performing the spatial smoothing step. We selected the slice 29 of this dataset for making experiments. Figure (3) presents the maps of the BOLD signal (regression coefficients wn1 ) as estimated by the three comparative approaches SSGLM, SVGLM and ML. As it is obvious the proposed SSGLM approach achieves significantly smoother results, where brain activity is found on the auditory cortex, as it was expected. In addition, produced activation areas are less noisy and very clean in comparison with those produced by the SVGLM which overestimates the brain activity, thus making the decision harder. On the other hand, the resulting map of the ML method is confused without showing any significant distinction between the activated and non activated areas. Moreover, we find it useful to visually inspect the resulting activation maps obtained by the t-test. In Figure 4 the SPMs of each method are shown, calculated without setting a threshold (Figure 4a), or by using a threshold (t0 = 1.6) on t-value (Figure 4b). Notice that the activation maps of the SSGLM approach are similar in both cases that makes our approach less sensitive to the threshold value. The latter can be more apparent by plotting in Figure 5(a) the estimated size (number of voxels) of activation areas from each method in terms of the threshold value t0 . This behavior can be proved very useful, since there is not need to resort in multiple comparison between t-tests. This can be also viewed in 1

http://www.fil.ion.ucl.ac.uk/spm/

A Sparse Spatial Linear Regression Model for fMRI Data Analysis SSGLM

SVGLM

ML

10

10

10

20

20

20

30

30

30

40

40

40

50

50

50

60

60

60

70

70

70

80

80

90

80

90 10

20

30

40

50

60

70

211

90 10

20

30

40

50

60

70

10

20

30

40

50

60

70

10

20

30

40

50

60

70

(a) 10

10

10

20

20

20

30

30

30

40

40

40

50

50

50

60

60

60

70

70

70

80

80

80

90

90 10

20

30

40

50

60

70

90 10

20

30

40

50

60

70

(b) Fig. 4. Statistical parametric maps from the t-statistics (a) without and (b) with a threshold value t0 = 1.6 15 SSGLM SVGLM ML

2500

SSGLM SVGLM 10

n

t−value (t )

# of Activated voxels

2000

1500

1000

0

−5

500

0 0

5

0.5

1

1.5 2 2.5 Threshold value (tn)

(a)

3

3.5

−10 0

1000

2000

3000 # of Voxels

4000

5000

(b)

Fig. 5. (a) Plots of the estimated number of activated voxels in terms of threshold value used for producing the SPMs. (b) Plots of the t-values as computed by comparative methods SSGLM (thick line) and SVGLM (thin line).

Figure 5(b) where we plot the calculated t-values of the SSGLM and the SVGLM methods. The distinction between the activated and non activated areas is much more apparent in the case of SSGLM plot.

5

Conclusions

In this work we present an advanced regression model for fMRI time series analysis by incorporating both spatial correlations and sparse capabilities. This is done by using an appropriate prior over the regression coefficients based on the MRF and the RVM schemes. Training is achieved through a maximum a posteriori (MAP) framework that allows the EM algorithm to be effectively used for

212


estimating the model parameters. This has the advantage of establishing update rules in closed form during the M -step and thus data fitting is computationally efficient. Experiments on artificial and real datasets have demonstrated the ability of the proposed approach to improve the detection performance by providing cleaner and more accurate estimates. We are planning to make experiments with extended kernel design matrix and also to improve its specification by an adaption mechanism, as well as to examine the appropriateness of other types of sparse priors [9].

References 1. Frackowiak, R.S.J., Ashburner, J.T., Penny, W.D., Zeki, S., Friston, K.J., Frith, C.D., Dolan, R.J., Price, C.J.: Human Brain Function, 2nd edn. Elsevier Science, USA (2004) 2. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2007) 3. Descombes, X., Kruggel, F., von Cramon, D.Y.: fMRI signal restoration using a spatio-temporal Markov Random Field preserving transitions. NeuroImage 8, 340– 349 (1998) 4. Gossl, C., Auer, D.P., Fahrmeir, L.: Bayesian spatiotemporal inference in functional magnetic resonance imaging. Biometrics 57, 554–562 (2001) 5. Woolrich, M.W., Jenkinson, M., Brady, J.M., Smith, S.M.: Fully bayesian spatiotemporal modeling of fmri data. IEEE Transactions on Medical Imaging 23(2), 213–231 (2004) 6. Penny, W.D., Trujillo-Barreto, N.J., Friston, K.J.: Bayesian fmri time series analysis with spatial priors. NeuroImage 24, 350–362 (2005) 7. Flandin, G., Penny, W.: Bayesian fmri data analysis with sparse spatial basis function priors. NeuroImage 34, 1108–1125 (2007) 8. Tipping, M.E.: Sparse Bayesian Learning and the Relevance Vector Machine. Journal of Machine Learning Research 1, 211–244 (2001) 9. Seeger, M.: Bayesian Inference and Optimal Design for the Sparse Linear Model. Journal of Machine Learning Research 9, 759–813 (2008) 10. Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. on Pattern Analysis and Machine Intelligence 6, 721–741 (1984) 11. Dempster, A., Laird A., Rubin D.: Maximum likelihood from incomplete data via the em algorithm. Journal of Royal Statistical Society, Series B 39, 1–38 (1977) 12. Harrison, L.M., Penny, W., Daunizeau, J., Friston, K.J.: Diffusion-based spatial priors for functional magnetic resonance images. NeuroImage 41(2), 408–423 (2008)

A Reasoning Framework for Ambient Intelligence Theodore Patkos, Ioannis Chrysakis, Antonis Bikakis, Dimitris Plexousakis, and Grigoris Antoniou Institute of Computer Science, FO.R.T.H. {patkos,hrysakis,bikakis,dp,antoniou}@ics.forth.gr

Abstract. Ambient Intelligence is an emerging discipline that requires the integration of expertise from a multitude of scientific fields. The role of Artificial Intelligence is crucial not only for bringing intelligence to everyday environments, but also for providing the means for the different disciplines to collaborate. In this paper we describe the design of a reasoning framework, applied to an operational Ambient Intelligence infrastructure, that combines rule-based reasoning with reasoning about actions and causality on top of ontology-based context models. The emphasis is on identifying the limitations of the rule-based approach and the way action theories can be employed to fill the gaps.

1

Introduction

The Ambient Intelligence (AmI) paradigm has generated an enabling multidisciplinary research field that envisages to bring intelligence to everyday environments and facilitate human interaction with devices and the surrounding. Artificial Intelligence has a decisive role to play for the realization of this vision promising commonsense reasoning and better decision making in dynamic and highly complex conditions, as advocated by recent studies [1]. Within AmI environments human users do not experience passively the functionalities of smart spaces, instead they participate actively in by it performing actions that change its state in different ways. At the same time, the smart space itself and its devices are expected to perform actions and generate plans either in response to changes in the context or to predict user desires and adapt to user needs. In this paper we describe the design of a reasoning framework intended for use in an AmI infrastructure that is being implemented in our institute. The framework integrates Semantic Web technologies for representing contextual knowledge with rule-based and causality-based reasoning methodologies for supporting a multitude of general-purpose and domain-specific reasoning tasks imposed by the AmI system. Given that during the first phases of this project we have fully implemented the rule-based features of the reasoner and applied them in practice,

This work has been supported by the FORTH-ICS internal RTD Programme “Ambient Intelligence and Smart Environments”.


214

T. Patkos et al.

the present study concentrates primarily on the identified limitations concerning certain challenging issues of this new field, and the way action theories can be employed to offer efficient solutions. Action theories, a fundamental field of research within KR&R, are formal tools for performing commonsense reasoning in dynamic domains. In this paper we present how they can contribute to the AmI vision and what types of problems they can resolve. We report on our experiences in developing the proposed solutions as distinct functionalities, but also describe how we plan to integrate them in the overall framework in order to provide a powerful hybrid reasoning tool for application in real-world situations. Our objective is to illustrate the impact of combining logic-based AI methods for addressing a broad range of practical issues. The paper is organized as follows. We first describe the overall architecture of the framework and continue with the tasks assigned to the rule-based reasoning component. In section 4 we elaborate on the contribution of causality-based approaches to AmI and we concludes with a discussion of related work.

2

Event-Based Architecture

The design goals of the reasoning framework have been the efficient representation, monitoring and dissemination of any low- or high-level contextual information in our AmI infrastructure, as well as the support for a number of general-purpose and domain-specific inferencing tasks. For that purpose we deploy the hybrid event-based reasoning architecture shown in Fig. 1, that comprises four main components: the Event Manager that receives and processes

Fig. 1. Event management framework architecture

A Reasoning Framework for Ambient Intelligence

215

incoming events from the ambient infrastructure, the Reasoner that can perform both rule-based and causality-based reasoning, the Knowledge Base that stores semantic information represented using ontology-based languages, and the Communication Module that forwards Reasoner requests for action execution to appropriate services. A middleware layer undertakes the role of connecting applications and services implemented by different research groups and with different technologies. Services denote standalone entities that implement specific functionalities about world aspects, such as voice recognition, localization, light management etc., whereas applications group together service instances to provide an AmI experience in smart rooms. The semantic distinction between events and actions is essential for coordinating the behavior of AmI systems. Events are generated by services and declare changes in the state of context aspects. In case these aspects can be modified on demand, e.g., CloseDoor(doorID), the allocated service provides also the appropriate interface for performing the change, otherwise the service plays the role of a gateway for monitoring environmental context acquired from sensors or devices. Actions, on the other hand, reflect the desire of an application for an event to occur. In a sense, events express atomic, transient facts that have occurred, while actions can either be requests for atomic events or complex combination of event occurrences according to certain operators that form compound events, i.e., sets of event patterns. It is the responsibility of the reasoner to examine individual actions before allowing their execution and guarantee that the state of the system remains consistent at all times. Implementation: Our framework is part of a large-scale AmI facility that is being implemented in FORTH and has completed the first year of life. It expands in a three-room set up, where a multitude of hardware and software technologies contribute services, such as camera network support for 2D person localization and 3D head pose estimation, RFID, iris and audio sensors for person identification and speech recognition, multi-protocol wireless communications etc. The middleware responsible for creating, connecting and consuming the services is CORBA-based and provides support for C++, Java, .NET, Python, and Flash/ActionScript languages. The rule-based component of the reasoner module uses Jess1 as its reasoning engine, while the Validator component uses both Jess and DEC Reasoner2 , a SAT-based Event Calculus reasoner. The former is responsible for run-time action validation, while the latter performs more powerful reasoning tasks, such as specification analysis and planning. All available knowledge is encoded in OWL ontologies using the Prot´ eg´ e platform for editing and the Prot´ eg´ e-OWL API for browsing and querying the corresponding models.

3

Context Information Modeling and Reasoning

The task of context management in an AmI environment requires an open framework to support seamless interoperability and mutual understanding of 1 2

Jess, http://www.jessrules.com/ DECReasoner, http://decreasoner.sourceforge.net/

216

T. Patkos et al.

the meaning of concepts among different devices and services. Ontology-based models are arguably the most enabling approach for modeling contextual information, satisfying the representation requirements set by many studies in terms of type and level of formality, expressiveness, flexibility and extensibility, generality, granularity and valid context constraining [2,3]. In our framework we design ontologies that capture the meaning and relations of concepts regarding low-level context acquired from sensors, high-level context inferred through reasoning, user and device profiling information, spatial features and resource characteristics (Fig. 1). An aspect of the framework that acknowledges the benefit of ontologies is the derivation of high-level context knowledge. Complex context inferred by means of rule-based reasoning tasks on the basis of raw sensor data or other highlevel knowledge, is based on ontology representations and may concern a user’s emotional state, identity, intentions, location etc. For instance, the following rule specifies that a user is assumed to have left the main room and entered the adjacent warmup room only if she was standing near the open main-room door and is no longer tracked by the localizer, which is installed only in this room: (user (id ?u) (location DOORMAIN)) ∧ (door (id DOORMAIN) (state OPEN)) ∧ (event (type USERLOST) (user ?u)) ⇒ (user (id ?u) (location WARMUP)) Under a different context the same event will trigger a different set of rules, capturing for example the case where the user stands behind an obstacle. Rule-based reasoning is a commonly adopted solution for high-level context inference in AmI [4]. Within our system it contributes to the design of enhanced applications and also provides feedback to services for sensor fusion purposes, in order to resolve conflicts of ambiguous or imprecise context and detect erroneous context. Furthermore, rule-based reasoning is also employed to coordinate the overall system behavior and offer explanations; the operation is partitioned into distinct modes that invoke appropriate rulesets, according to relevant context and the functionality that we wish to implement. Finally, this component also provides procedures for domain-specific reasoning tasks for search and optimization problems driven by application demands, such as for determining the best camera viewpoint to record a user interacting inside a smart room.

4

Causality-Based Reasoning in Ambient Intelligence

Event-based architectures offer opportunities for flexible processing of the information flow and knowledge evolution over time. Rule-based languages provide only limited expressiveness to describe certain complex features, such as compound actions, therefore they do not fully exploit the potential of the event-based style to solve challenging event processing tasks raised by ubiquitous computing domains. The Event-Condition-Action (ECA) paradigm that is most frequently applied can be used for reacting to detected events, viewing them as transient atomic instances and consuming them upon detection. They do not consider their duration, how far into the past or future their effects extend nor do they investigate causality issues originating from the fact that other events are known


217

to have occurred or are planned to happen. Paschke [5] has already shown that the ECA treatment may even result in unintended semantics for some compositions of event patterns. However, real-life systems demand formal semantics for verification and traceability purposes. To this end, we apply techniques for reasoning about actions and causality. Action theories are formal tools, based on the predicate calculus, particularly designed for addressing key issues of reasoning in dynamically changing worlds by axiomatizing the specifications of their dynamics and exploiting logic-based techniques in order to draw conclusions. Different formalisms have been developed that model action preconditions and effects and solve both deduction and abduction problems about a multitude of commonsense reasoning phenomena, such as effect ramifications, non-deterministic actions, qualifications and others. For our purposes we apply the Event Calculus formalism [6,7], which establishes a linear time structure enabling temporal reasoning in order to infer the intervals in which certain world aspects hold. The notion of time, inherently modeled within the Event Calculus, as opposed to other action theories, is a crucial leverage for event-based systems, e.g. to express partial ordering of events or timestamps. In the remaining subsections we describe our approach to integrate action theories with our Ambient Intelligence semantic infrastructure and present the types of reasoning problems that have been assigned to them. 4.1

Event Ontology and Complex Actions

For any large-scale event-based system it is important to identify event patterns that describe the structure of complex events built from atomic or other complex event instances. Their definition and processing must follow formal rules with well-defined operators, in order for their meaning to be understood by all system entities. In an Ambient Intelligence infrastructure in particular, this need is even more critical due to the multidisciplinary nature and the demand for collaborative actions by entities with significantly different backgrounds. In order to promote a high-level description of the specifications of applications and to achieve a high degree of interoperability among services we design an event ontology to capture the notions of atomic and compound events and define operators among event sets, such as sequence, concurrency, disjunction and conjunction (Fig. 2). An event is further characterized by its initiation and termination occurrence times (for atomic events they coincide), the effect that it causes to context resources, the physical location at which it occurred, the service that detected or triggered it etc. Our intention is to focus only on generic event attributes that satisfy the objectives of our system, based on previous studies that define common top ontologies for events (e.g. [8]), rather than reproduce a complete domain-independent top event ontology, which would be in large part out-of-focus and result in a less scalable and efficient implementation. To define formal semantics for the operators, we implement them as container elements that collect resources and translate them to Event Calculus axioms. Notice the compound event e1 in Fig. 2, for instance, that expresses the partially ordered event type [[T urnOnLight; StartLocalizer] ∧ StartM apService]

218

T. Patkos et al.

Fig. 2. Event ontology sample. Event operators, such as sequenceSetOf and concurrentSetOf, are implemented as rdf : Seq, rdf : Bag and rdf : Alt container elements.

where (;) represents the sequence operator. In order to utilize e1 for reasoning tasks we axiomatize its temporal properties in Event Calculus: Happens(Start(e1 ), t) ≡ Happens(T urnOnLight(l), t1) ∧ Happens(StartLocalizer(), t2)∧ Happens(StartM apService(), t3 ) ∧ (t1 < t2 ) ∧ (t = min(t1 , t3 )) (respectively for Happens(Stop(e1 ), t)), as well as its causal properties: Initiates(Start(e1 ), LightOn(l), t) ∧ T erminates(Stop(e1 ), T rainingM ode(), t) We may formalize the effects of compound events to act cumulatively or canceling the effects of their atomic components and also we can specify whether certain effects hold at beginning times or at ending times. We currently model the duration of compound events in terms of their Start and Stop times, but also investigate the potentials of other approaches, such as the interval-based Event Calculus [5] or the three-argument Happens Event Calculus axiomatization [9]. Main advantages of the Event Calculus, in comparison to rule-based approaches, are its inherent ability to perform temporal reasoning considering both relative and absolute times of event occurrences and that it can reevaluate different variations of event patterns as time progresses. With ECA style reactive rules events are consumed as they are detected and cannot contribute to the detection of other complex events afterwards. Finally, the combination of semantic event representation and causality-based event processing makes the process of describing the specifications of applications much more convenient for non-AIexpert developers in our system, without undermining the system’s reasoning capabilities. As we show next, these descriptions are expressive enough to enable inferences about future world states during application execution, as well as the identification of potential system restriction violations.


4.2

219

Design-Time Application Verification

It is of particular significance to the management of an AmI system to separate the rules that govern its behavior from the domain-specific functionalities in order to enable efficient and dynamic adaptation to changes during development. We implement a modular approach that distinguishes the rules that express system policies and restrictions that guarantee a consistent and error-free overall execution at all times, from service specifications that change in frequent time periods and by a multitude of users, as well as from application specifications that are usually under the responsibility of non-experts who only possess partial knowledge about the system restrictions. Table 1 shows samples of the type of information that these specifications contain expressed in Event Calculus axioms. The specifications of services, for instance, retrieved from the different ontologies, describe the domains and express inheritance relations, instantiations of entities, and potentially context-dependent effect properties. Application descriptions express primarily the intended behavior of a developed application as a narrative of context-dependent action occurrences. Finally, system restrictions capture assertions about attributes of system states that must hold for every possible system execution (sometimes also called safety properties [10]). A core task of our framework is to verify that the specifications of AmI applications are in compliance with the overall system restrictions and detect errors early in the development phase. This a priori analysis is performed at designtime and can formally be defined as the abductive reasoning process to find a set P of permissible actions that lead a consistent system to a state where some of its constraints are violated, given a domain description D, an application description APi for application i and a set of system constraints C: D ∧ APi ∧ P |= ∃t¬C(t) where D ∧ APi ∧ P is consistent Table 1. Defined specification axioms for application verification

220

T. Patkos et al.

In fact, if such a plan is found it acts as a counterexample providing diagnostic information about violated safety properties. Apparently, such inferences are computationally expensive and most importantly semidecidable. Nevertheless, Russo et al. [10] proved that a reduction considering only two timepoints, current (tc ) and next (tn ), can transform such an abductive framework to fully decidable and tractable under certain conditions (no nesting temporal quantifiers): D(T ) ∧ APi (T ) ∧ C(tc ) ∧ P |= ¬C(tn ) given a 2-timepoint structure T This way, we do not need to fully specify the state at time tc ; the generated plan P is a mixture of HoldsAt and Happens predicates without requiring a complete description of the initial system state, in contrast to similar model-checking techniques. We plan to even expand the type of system specifications to allow for optimizing the process of application designing, capturing for instance inefficient action executions that for the developer may seem harmless or unimportant. Example. A developer uploads an application description file to the system containing, among others, the two axioms shown in Table 1. The new application must first be examined for consistency with respect to the set of restrictions already stored in the system by service engineers. The developer executes the ApplicationCheck functionality of the Validator accessible through the middleware, which identifies a potential restriction violation whenever a user sits on a chair; the event causes the T urnOf f Light action to occur that conflicts with the Localization being at a Running state (any substantial change in lighting destabilizes the localization process). As a result, the developer needs to review the application, pausing for instance the Localizer before turning off the lights. 4.3

Run-Time Validation

Although application analysis can be accomplished at design-time and in isolation, action validation must be performed at run-time considering the current state of the system, as well as potential conflicts with other applications that might share the same resources. For that purpose, action validation is not implemented as an abduction process as before. Instead, a projection of the current state is performed to determine potential abnormal resulting states, which is a more efficient approach for the needs of run-time reasoning. We have identified the following situations where action theory reasoning can contribute solutions: Resource management. An AmI reasoner needs to resolve conflicts raised by applications that request access to the same resource (e.g., speakers). We introduce axioms to capture integrity constraints, as below: ∀app1, app2, t HoldsAt(InU seBy(Speaker01, app1), t) ∧ HoldsAt(InU seBy(Speaker01, app2), t) ⇒ (app1 = app2) We also intend to expand the Validator’s resource allocation policies with shortterm planning based on known demands. Ramifications and priorities. Apart from direct conflicts between applications, certain actions may cause indirect side-effects to the execution of others. Terminating a service may affect applications that do not use it directly, instead


221

invoke services that depend on it. Since the reasoner is the only module that is aware of the current state of the system as a whole it can detect unsafe effect ramifications and take measures to prevent unintended situations to emerge, either by denying the initial actions or by reconfiguring certain system aspects. Towards this direction, actions are executed in terms of prioritization policies. Uncertainty handling. Imagine a system constraint requiring for the mainroom door to be in locked state iff no user is located inside. The multi-camera localization component may lose track of users under certain circumstances, e.g. if they stand behind an obstacle. In cases where the reasoner is not aware of whether the user has left the room or not, it needs to perform reasoning based on partial knowledge. As a result, the state constraint must be formulated as: ∀user, t HoldsAt(Knows(¬U serInRoom(user)), t) ⇔ HoldsAt(Knows(DoorLocked(DOORM AIN )), t) Situations of ambiguous knowledge are very common in AmI systems. For that purpose, we plan to integrate a recent extension of the Event Calculus axiomatization that accounts for knowledge-producing actions in partially observable domains, enabling knowledge update based on sense actions and context [11].

5

Discussion and Conclusions

Responding to the need for event processing, rule-based approaches have contributed in manifold ways to the development of event-driven IT systems over the last years, from the field of active databases to distributed event-notification systems [12]. Within the field of ubiquitous computing, this paradigm is most frequently applied to the design of inference techniques for recognizing high-level context, such as user activity in smart spaces. For instance, FOL rules for managing and deriving high-level context objects and resolving conflicts is applied in SOCAM [13], whereas [14] describes a system that executes rules for in-home detection of daily living activities for elderly health monitoring. Our framework applies rule-based techniques in a style similar to these systems, but as previously argued, a full-scale AmI system requires much more potent reasoning both for context-modeling and for regulating the overall system operation. The need for hybrid approaches is highlighted in many recent studies [3,4]. Towards this direction, the COSAR system [15] combines ontological with statistical inferencing techniques, but concentrates on the topic of activity recognition. In [16] a combination of rule-based reasoning, Bayesian networks and ontologies is applied to context inference. Our approach, on the other hand, combines two logic-based approaches, namely rule- and causality-based reasoning and achieves a general-purpose reasoning framework for AmI, able to address a broad range of aspects that arise in a ubiquitous domain. The proposed integration of technologies is a novel and enabling direction for the implementation of the Ambient Intelligence vision. It is our intention, while developing the run-time functionalities of the framework, to contribute to the action theories research, as well. A Jess-based Event Calculus reasoner, offering efficient online inferencing, is already under implementation.

222

T. Patkos et al.

References 1. Ramos, C., Augusto, J.C., Shapiro, D.: Ambient Intelligence–the Next Step for Artificial Intelligence. IEEE Intelligent Systems 23(2), 15–18 (2008) 2. Thomas Strang, C.L.P.: A Context Modeling Survey. In: 1st International Workshop on Advanced Context Modelling, Reasoning and Management (2004) 3. Bettini, C., Brdiczka, O., Henricksen, K., Indulska, J., Nicklas, D., Ranganathan, A., Riboni, D.: A Survey of Context Modelling and Reasoning Techniques. Pervasive and Mobile Computing (2009) 4. Bikakis, A., Patkos, T., Antoniou, G., Plexousakis, D.: A Survey of Semanticsbased Approaches for Context Reasoning in Ambient Intelligence. In: Proceedings of the Workshop Artificial Intelligence Methods for Ambient Intelligence, pp. 15–24 (2007) 5. Paschke, A.: ECA-RuleML: An Approach combining ECA Rules with temporal interval-based KR Event/Action Logics and Transactional Update Logics. CoRR abs/cs/0610167 (2006) 6. Kowalski, R., Sergot, M.: A Logic-based Calculus of Events. Foundations of knowledge base management, 23–51 (1989) 7. Miller, R., Shanahan, M.: Some Alternative Formulations of the Event Calculus. In: Computational Logic: Logic Programming and Beyond, Essays in Honour of Robert A. Kowalski, Part II, London, UK, pp. 452–490. Springer, London (2002) 8. Kharbili, M.E., Stojanovic, N.: Semantic Event-Based Decision Management in Compliance Management for Business Processes. In: Intelligent Event Processing - AAAI Spring Symposium 2009, pp. 35–40 (2009) 9. Shanahan, M.: The Event Calculus Explained. In: Veloso, M.M., Wooldridge, M.J. (eds.) Artificial Intelligence Today. LNCS (LNAI), vol. 1600, pp. 409–431. Springer, Heidelberg (1999) 10. Russo, A., Miller, R., Nuseibeh, B., Kramer, J.: An Abductive Approach for Analysing Event-Based Requirements Specifications. In: Stuckey, P.J. (ed.) ICLP 2002. LNCS, vol. 2401, pp. 22–37. Springer, Heidelberg (2002) 11. Patkos, T., Plexousakis, D.: Reasoning with Knowledge, Action and Time in Dynamic and Uncertain Domains. In: 21st International Joint Conference on Artificial Intelligence, pp. 885–890 (2009) 12. Paschke, A., Kozlenkov, A.: Rule-Based Event Processing and Reaction Rules. In: Governatori, G., Hall, J., Paschke, A. (eds.) RuleML 2009. LNCS, vol. 5858, pp. 53–66. Springer, Heidelberg (2009) 13. Gu, T., Pung, H.K., Zhang, D.Q.: A Service-oriented Middleware for Building Context-aware Services. Journal of Network and Computer Applications 28(1), 1–18 (2005) 14. Cao, Y., Tao, L., Xu, G.: An Event-driven Context Model in Elderly Health Monitoring. Ubiquitous, Autonomic and Trusted Computing 0, 120–124 (2009) 15. Riboni, D., Bettini, C.: Context-Aware Activity Recognition through a Combination of Ontological and Statistical Reasoning. In: 6th International Conference on Ubiquitous Intelligence and Computing, pp. 39–53 (2009) 16. Bulfoni, A., Coppola, P., Della Mea, V., Di Gaspero, L., Mischis, D., Mizzaro, S., Scagnetto, I., Vassena, L.: AI on the Move: Exploiting AI Techniques for Context Inference on Mobile Devices. In: 18th European Conference on Artificial Intelligence, pp. 668–672 (2008)

The Large Scale Artificial Intelligence Applications – An Analysis of AI-Supported Estimation of OS Software Projects Wieslaw Pietruszkiewicz1 and Dorota Dzega2 1 West Pomeranian University of Technology, Faculty of Computer Science and Information Technology ul. Zolnierska 49, 71-210 Szczecin, Poland [email protected] 2 West Pomeranian Business School, Faculty of Economics and Computers Science ul. Zolnierska 53, 71-210 Szczecin, Poland [email protected]

Abstract. We present the practical aspects of large scale AI-based solutions, by analysing an application of Artificial Intelligence for estimation of Open Source projects being hosted on the leading platform for Open Source - Sourceforge.net. We start by introducing the steps of data extraction task, that transformed tens of tables and hundreds of fields, originally designed to be used by web-based project collaboration system, into four datasets–dimensions important to the project management i.e skills, time, costs and effectiveness. Later, we present the structure and results of experiments, that were performed using various algorithms i.e. decision trees (C4.5, RandomTree and CART), Neural Networks and Bayesian Belief Networks. Later, we describe how metaclassification algorithms improved the prediction quality and influenced the generalization ability or prediction accuracy. In the final part we evaluate the deployed algorithms from practical point of view, presenting their characteristic beyond purely scientific perspective. Keywords: Classification, Metaclassification, Decision trees, Software estimation, AI usage factors.

1

Introduction

Currently many popular software applications are being developed as Free Libre/Open Source Software (OS later in this article). Their results, achieved by projects’ teams usually cooperating via web–systems, often overperform the proprietary software e.g. the web servers as well as Artificial Intelligence or Data Mining software packages. Hence OS projects must be considered the strong competitors to the classic proprietary software products with closed source. To effectively manage these project we must develop the methods of software management specially tailored to OS characteristic. S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 223–232, 2010. c Springer-Verlag Berlin Heidelberg 2010

224

W. Pietruszkiewicz and D. Dzega

Concerning the basic assumption of project management we must notice it assumes that the experience and knowledge, both acquired during the previously managed projects, will help in an effective project management. This supposition is consistent with the basic purpose of AI, that focuses on the knowledge extraction from the past observations and its conversion to the forms easily applicable in the future. In this paper we present a large scale Artificial Intelligence application that was aimed at two directions. The first one was the creation of the sets of models supporting OS project management via the prognosis of important features relating to the projects. The second was the usage of prepared datasets to examine the practical usefulness of AI methods in a large scale application. The data source was SourceForge.net being the leading OS hosting platform. The complexity of data tables, their various interconnections as well as the number of stored records were a serious test to capabilities of used AI methods, that being applied to this real life problem had to prove their practical usefulness, contrasting with a purely scientific usage common to the most of research papers.

2

Datasets

The data source we used in the experiments was “A Repository of Free / Libre / Open Source Software Research Data” which is a copy of internal databases used by SourceForge.net web–based platform [1]. In the research presented herein the data source contained a large number of features and its form of storage was not designed to be later used in the knowledge extraction process. It was built to store all data necessary to run a web-based project management services e.g. web forums, subversion control or tasks assignments. Due to this reason the examined problem was a modelling example of real life AI application (oriented on data mining) that is a situation where all meaningful attributes have to be extracted from data repositories and the scale causes that some machine learning algorithms fail. It is being caused by the high demands on memory, low speed of learning and simulation or by too many adjustable parameters influencing the easiness of usage. From the technical perspective the data source contained almost 100 tables and a monthly increase of data was approx. 25 GB. The mentioned data repository was a subject previous of research e.g. [2] examined how machine learning algorithms like logistic regression, decision trees and neural networks could be used to analyse the success factors for Open Source Software. In the other paper [3] were presented similarity measures for OSS projects and clustering was performed , the other research [4] explained how to predict if OS project abandonment is likely. Comparing presented herein research with previous ones, that focus only on selected aspects of AI applied to OSS, we conducted a complete analysis of 4 projects dimensions. This was a outcome of the basic premise i.e. that for a successful project managing it more important to forecast a few factors, than to predict if project become successful or not (comparing to a assumption in some papers).

The Large Scale AI Applications – The Estimation of OS Projects

225

Scope

Time

Project's Success

Costs

Effects

Fig. 1. The datasets relating to the project’s success

We extracted the data from databases (some attributes had to be calculated using the others) and divided them into four groups being the most important dimensions of OS projects and relating to the project’s success (see Figure 1) [5]: – project scope Zt - the duration of project from the moment of project initialization (project registration) till the last published presentation of the project effects; containing 39 attributes, including 8 attributes pertaining to the project field (dp ), 28 attributes pertaining to the project resources (zp ) and 3 attributes pertaining to project communication (kp ), – project time Ct - the time of task completion expressed in working hours spent on completing a particular task; containing 12 attributes, including 7 attributes pertaining to general conditions of task completion (wt ) and 5 attributes pertaining to the resources of persons completing the task (zt ), – project cost Kt - the average number of working hours spent by a particular project contractor on task completion; containing 18 attributes, including 8 attributes pertaining to the participant competence (zu ) and 10 attributes pertaining to the participant activity (au ), – project effects Et - the number of completed tasks as of the date of diagnosis; containing 21 attributes, including 16 attributes pertaining to activity of project execution (ra ) and 5 attributes pertaining to communication activity related to project execution (ka ). The numeric characteristic of created dataset was presented in Table 1. The column Reduced records denotes how many records passed through filters e.g. we have excluded empty projects or projects with an empty development team. To select the most important attributes we have used Information Gain Ratio. For each dataset we have examined various models, where number of input was changing from 1 to Di , where Di was a number of attributes for i dataset. Figure 2 presents an example of this step of experiments for Time dataset. The selected sets of information attributes for the best prediction models (desired attributes) were:

226

W. Pietruszkiewicz and D. Dzega Table 1. Details for Scope, Time, Costs and Effects datasets Dataset Unique records Reduced records Scope 167698 167698 Time 233139 104912 Costs 127208 20353 Effects 96830 15492

Objects Attributes 2881 39 77592 12 10889 18 64960 21

49% 47%

Accuracy

45% 43% 41% Naive_Bayes_D TS

39%

Naive_Bayes_D CV-10

37%

Naive_Bayes_D CV-5

35% w1

w7

w5

w2

z5

w4

z4

z3

z2

w6

w3

z1

Attributes

Fig. 2. Prediction accuracy vs used inputs for Time dataset

Zt = {z1 , d8 , d1 , z4 , d7 , z8 , z2 , d5 , d3 }, Ct = {w1 , w7 , w5 , w2 , z5 , w4 , z4 , z3 , z2 }, Kt = {z1 , a1 , z6 , z8 , a2 , a4 }, Et = {r2 , r13 , r11 , r14 , r10 , r9 , r8 , r12 , k5 , k4 , r7 , k1 }. We found that the number of features may be reduced more, to a smaller subsets without a significant decrease of the accuracy. The selected smaller subsets of information attributes, required for the prediction of project features were: Zt = {z1 , d8 , d1 , z4 }, Ct = {w1 , w7 , w5 , w2 }, Kt = {z1 , a1 }, Et = {r2 , r13 , r11 }. These attributes were used in next stage of experiments.

3

Experiments with Classifiers

The practical applications of software estimation often use description of software risk or complexity in form of label e.g. high, mid or low. Thus, we decided to use classification as a method of prediction instead of regression that often gives large errors for software. To ensure an unbiased environment, each dataset was filtered to form uniform distribution of its classes. It must be kept in mind, that for 5 uniform classes, the accuracy of blind choice equals to 20%. Table 2 contains comparison of accuracy ratios for C4.5, RandomTree (RT in abbrv.) and Classification and Regression Tree (CART in abbrv.) classifiers


227

Table 2. Comparison of prediction models for 4 datasets - Scope, Time, Costs and Effects Classifier Scope C4.5 97% RT 99% CART 96%

Time 68% 72% 65%

Cost 78% 92% 75%

Effects 55% 77% 70%

– the detailed information about these methods may be found in [6], [7], [8] and [9]. The values presented in Table 2 are the best accuracies achieved by each classifier. As it can be noticed, the most accurate method for each dataset was RandomTree. There were other important practical issues concerning these algorithms that will be presented in Section 5.

4

Experiments with Metaclassifiers

In the second stage of experiments, we have tested the metaclassifiers to check if they were able to increase the performance of previously examined classifiers. During this stage of experiments we have used 2 methods of boosting i.e. AdaBoost [10], LogitBoost [11] and Bagging metaclassifier [12], [13]. Adaptive–Boosting (AdaBoot) is a process of iterative classifiers learning, where training sets are resampled according to the weighted classification error. The more errors occur in the class, the bigger weight this class receives. LogitBoost is another variant of Boosting that uses binomial log-likelihood weight calculating function. Bagging is an acronym for Bootstrap AGGregatING. Its idea is to create an ensemble of classifiers built using bootstraping of the training dataset. Output of this ensemble is a result of plurality vote. The process of internal classifiers creation might be parallel and therefore their outputs are independent. Tables 3 and 4 show accuracy of AdaBoost and Bagging methods with different number of internal iterations. The decision trees described in the previous section were used as the core classifiers in each of metaclassifiers. This part of experiment was performed using Effects dataset, because it was as the most difficult of all four datasets. The results from C4.5 and CART decision trees increased after usage of metaclassifiers. RandomTree, due to its construction and random internal loops, was Table 3. The prediction accuracy vs. number of iteration for AdaBoost mataclassifier (Effects dataset) Iterations 2 5 10

C4.5 55% 62% 64%

RT 77% 77% 77%

CART 69% 76% 76%

228


Table 4. The prediction accuracy vs. number of iteration for Bagging mataclassifier (Effects dataset) Iterations 2 5 10

C4.5 58% 63% 65%

RT 72% 77% 78%

CART 53% 57% 58%

75 reweighting resampling

Accuracy ratio [in %]

70

65

60

55

50

0

2

4

6

8 10 12 14 Metaclassifiers internal iterations

16

18

20

Fig. 3. Classification accuracy for LogitBoost (Effects dataset) 0.94 0.92 0.9

ROC value

0.88 0.86 0.84 0.82 class 1 class 2 class 3 class 4 class 5

0.8 0.78 0.76

0

2

4

6

8

10 Iterations

12

14

16

18

20

Fig. 4. ROC values for each class vs. no. of iterations for MultiBoost (Effects dataset)

not affected by Boosting. Therefore, we claim that this algorithm is a robust member of decision trees family. The Bagging also caused accuracy increase for all analysed classifiers, but a low number of iterations for this metaclassifier resulted in slightly decrease of it accuracy. For another boosting method – LogitBoost we have compared its accuracy in 2 different variants. This metaclassifier can use resampling or reweighting during boost procedure. Figure 3 presents plots with dataseries for both variants of


229

68 67

Accuracy [in %]

66 65 64 63 62 61 60 59

0

2

4

6

8

10 Iterations

12

14

16

18

20

Fig. 5. Classification accuracy vs no. of iterations for MultiBoost (Effects dataset)

LogitBoost. The core classifier used in LogitBoost was REPTree being an algorithm of “Fast decision tree learner”. It must be noted that value corresponding to 0–iterations is the accuracy for REPTree not LogitBoost. Thus, an increase of accuracy after LogitBoosting can be noticed. During the results evaluation another important characteristic, apart of accuracy ratio, was examined – Receiver Operating Characteristic (ROC) [14]. As it can be noticed on Figure 4, ROC for each class increased with a number of MultiBoost iterations (ROC equals to 1 for an ideal classifier). Similarly to the previous experiments the metaclassifiers were built over REPTree. Figure 5 presents how classification accuracy increases for MultiBoost. It shall be noticed that this metaclassifier also managed to achieve better prediction results than its core classifier. Therefore it is possible to claim that metaclassifiers offer a potential to significantly increase the estimation accuracy.

5

Practical Observations

During the presented experiments we have observed different characteristics for all algorithms and that influence their practical usefulness. We decided to do that as many researchers examine the AI methods evaluating them on academic problems, neglecting their overall usage factors. In our opinion the usage factors for AI methods could be divided into two groups (see Figure 6) i.e. scientific and practical factors. The factors important in a purely scientific application of AI are: – the quality – being the most common factor presented in the research papers that usually evaluate different AI methods from the perspective of results quality, – the stability – means that the results should be repeatable or very coherent during multiple runs of the AI method (with the same adjustments).

230

W. Pietruszkiewicz and D. Dzega Practical usage factors Adjustable parameters Speed of learning

AI method

Memory requirements Speed of simulation

Scientific usage factors Quality of results Stability of results

Understandable results

Fig. 6. Scientific and practical usage of AI methods 25 RandomTree C4.5

20 15

BBN

10

Average time

CART

5 0 1

2

3

4

5

6

7

8

9

10 11 12 13 14 15 16 17 18 19 20 21

No. of attributes

Fig. 7. The time of learning for different classifiers (Effects dataset)

We would like to point out the second group of usage factors commonly omitted. The practical usage of AI involves factors like: – the speed/time of learning – as knowledge induction isn’t one step process, the time consuming methods increase the length of experiments, that usually contain multiple run–test–adjust steps, – the number of adjustable parameters – influences the easiness with which a method could be used, higher numbers of parameters, increases space in which an optimal set of parameters should be found; it is important that for some parameters they finely tune in e.g. like confidence factor for C4.5, in other cases they changing deeply method’s behaviour e.g. evaluator for Bayesian Networks, – the speed/time of simulation – is the time taken by method to find an answer for the asked question (output for inputed data), – the easiness of understanding the results – for many applications it is desired for the method to return the results in a form easily understandable by humans and that could be straightforwardly implemented later. The measures of time of learning for four methods i.e. RandomTree, C4.5, CART and Bayesian Belief Network run for the number of attributes changing from 1


231

Table 5. The speed and easiness of application for different classifiers Method Speed Easiness Memory Understandable RandomTree Fast Easy Low Easy C4.5 Average Average Low Easy CART Slow Easy Low Easy Bayesian Belief Networks Fast Challenging High Easy

to 21 are presented in Figure 7. To reduce the possible bias each value is an average for 5–times repeated learning process. Our overall opinion about them was presented in Table 5. We decided not to include quality factor as it may vary in the other experiments. Summarising, RandomTree did not require any careful and sophisticated adjustments of parameters like other methods and it was the fastest learning algorithm. It must be mentioned that the most popular decision tree classifier i.e. C4.5 had the largest number of adjusting parameters. That increase searching space in which researchers must be looking for an optimal configuration.

6

Conclusions

The research presented herein prove that Open Source projects management may be supported by data mining methods. As all four datasets presented herein were extracted or calculated from database fields, designed to store data required by web-based project management platform, it is possible to predict important factors for project without necessity to use any other (external) data sources. The examined classifiers and metaclassifiers showed large differences in performance e.g. Neural Networks and Support Vector Machines were rejected at early stages, due to their low accuracy for each dataset. It was caused by a large number of unordered–labelled attributes. The decision trees showed high accuracy for examined data and we claim that it designates them to problems with the similar datasets. The next stage of experiments by incorporating metaclassifiers allowed to increase significantly the prediction accuracy. It must be noted that some methods of classification were practically impossible to use due to their low speed or to high memory requirements. In our opinion, researchers focus their attention too much on developing new methods of data processing, that achieve a slight increase of the quality, forgetting to check if these methods work of fail in the front of a complicated tasks. In the future research, we plan to analyse the dynamics of modelled process, that could lead to better understanding and more effective decision support for OS projects. We also plan to further investigate the computational complexity, computer resources requirements and the other usage factors for the popular machine learning algorithms. These comparisons could be used to establish a comprehensive guideline for the AI methods.

232


References 1. Madey, G.: The sourceforge research data archive (srda) (2008), http://zerlot.cse.nd.edu 2. Raja, U., Tretter, M.J.: Experiments with a new boosting algorithm. In: Proceedings of the Thirty-first Annual SAS: Users Group International Conference. SAS (2006) 3. Gao, Y., Huang, Y., Madey, G.: Data mining project history in open source software communities. In: North American Association for Computational Social and Organization Sciences (2004) 4. English, R., Schweik, C.M.: Identifying success and abandonment of floss commons: A classification of sourceforge.net projects. Upgrade: The European Journal for the Informatics Professional VIII(6) (2007) 5. Dzega, D.: The method of software project risk assessment. PhD thesis, Szczecin University of Technology (June 2008) 6. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993) 7. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984) 8. Breiman, L.: Random forests. In: Machine Learning, pp. 5–32 (2001) 9. McColm, G.L.: An introduction to random trees. Research on Language and Computation 1, 203–227 (2004) 10. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148– 156. Morgan Kaufmann, San Francisco (1996) 11. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Annals of Statistics 28, 337–407 (2000) 12. Breiman, L.: Bagging predictors. In: Machine Learning, pp. 123–140 (1996) 13. Kuncheva, L.: Combining Pattern Classifiers: Methods and Algorithms. WileyIEEE, Hoboken (2004) 14. Gonen, M.: Analyzing Receiver Operating Characteristic Curves Using SAS. SAS Press (2007)

Towards the Discovery of Reliable Biomarkers from Gene-Expression Profiles: An Iterative Constraint Satisfaction Learning Approach George Potamias1, Lefteris Koumakis1,2, Alexandros Kanterakis1,2, and Vassilis Moustakis1,2 1

Institute of Computer Science, Foundation for Research & Technology – Hellas (FORTH), N. Plastira 100, Vassilika Vouton, 700 13 Heraklion, Crete, Greece 2 Technical University of Crete, Department of Production and Management, Management Systems Laboratory, Chania 73100, Greece {potamias,koumakis,kantale,moustaki}@ics.forth.gr

Abstract. The article demonstrates the use of Multiple Iterative Constraint Satisfaction Learning (MICSL) process in inducing gene-markers from microarray gene-expression profiles. MICSL adopts a supervised learning from examples framework and proceeds by optimizing an evolving zero-one optimization model with constraints. After a data discretization pre-processing step, each example sample is transformed into a corresponding constraint. Extra constraints are added to guarantee mutual-exclusiveness between gene (feature) and assigned phenotype (class) values. The objective function corresponds to the learning outcome and strives to minimize use of genes by following an iterative constraint-satisfaction mode that finds solutions of increasing complexity. Standard (c4.5-like) pruning and rule-simplification processes are also incorporated. MICSL is applied on several well-known microarray datasets and exhibits very good performance that outperforms other established algorithms, providing evidence that the approach is suited for the discovery of biomarkers from microarray experiments. Implications of the approach in the biomedical informatics domain are also discussed. Keywords: Bioinformatics, constraint satisfaction, data mining, knowledge discovery, microarrays.

1 Introduction The completion of the human genome posts new scientific and technological challenges. In the rising post-genomic era the principal item in the respective research agenda concerns the functional annotation of the human genome. Respective activities signal a major shift toward trans-disciplinary team science and translational research [1], and the cornerstone task concerns the management and the analysis of heterogeneous clinico-genomic data and information sources. The vision is to fight major diseases, such as cancer, on an individualized diagnostic, prognostic and treatment manner. This requires not only an understanding of the genetic background S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 233–242, 2010. © Springer-Verlag Berlin Heidelberg 2010

234

G. Potamias et al.

of the disease but also the correlation of genomic data with clinical data, information and knowledge [2]. The advent of genomic and proteomic high-throughput technologies, such as transcriptomics realized by DNA microarray technology [3], enabled a ‘systems level analysis’ by offering the ability to measure the expression status of thousands of genes in parallel, even if the heterogeneity of the produced data sources make interpretation especially challenging. The high volume of data being produced by the numerous studies worldwide, post the need for a long-term initiative on bio-data analysis [4,5] in the context of ‘translational bioinformatics’ research [6]. The target is the customization and application, but also the invention of new data mining methodologies and techniques suitable for the development of disease classification systems and detection of new disease classes. In a number of gene-expression studies, associations between genomic and phenotype profiles have proved feasible, especially for several types of cancer such as leukemia [7], breast cancer [8], colon cancer [9], central nervous system [10], lung cancer [11], lymphoma [12], ovarian cancer using mass spectrometry sample profiles [13], and other malignancies. This article demonstrates the application of Multiple Iterative Constraint Satisfaction Learning or MICSL for short [14] on microarray gene-expression data. MICSL integrates iterative constrain satisfaction with zero-one integer programming and it is cast in the realm of concept learning from examples [15,16]. The article is organized into separate sections. Section 2 presents the data preprocessing discretization step. Section 3 provides a sustain overview of MICSL; section 4 presents experiments on indicative gene-expression domains and datasets, and overviews results. Finally, section 5 incorporate concluding remarks, potentials and limitations, and points to future R&D plans.

2 Discretization of Gene Expression Values For microarrays the binary constraint is not extortionate. Gene expression values are originally numeric yet, physical meaning corresponds to either low or high, which is of course binary, or expressed vs. not expressed, which has identical meaning and is binary too. Of course, the process makes it necessary to represent the original numeric values to binary equivalents; however, this is common practice in microarray data analysis and exploration. We employ an Entropy-based binary discretization process in order to transform gene expression values into binary equivalents: high (expressed / upregulated) or low (not-expressed / down-regulated). Transformation of gene expression values into high/low values is rationalized by the two-class gene expression modeling problems we cope with. Our method resembles the Fayyad-Irani approach [17], as well as a similar approach presented in [18]. Both methods employ entropy-based statistics; however, they do not incorporate an explicit parameter to force binary split, which may result to uncontrolled numbers of discretization intervals (more than two), which would be difficult to interpret in the presence of two classes for our sample cases. Assume a gene expression matrix with M gene (rows) and S samples (columns) where, each sample is assigned to one of two (mutually exclusive) classes, P (positive) and N (negative), i.e., for the leukemia domain P may denote the ALL and N may denote AML leukemia sub-types, respectively (see below). For each gene g, 1 ≤ g ≤ M,

Towards the Discovery of Reliable Biomarkers from Gene-Expression Profiles

235

consider the descending ordered vector of its values, Vg = , ng;i ≥ ng;i+1 1 ≤ i ≤ S-1. Each ng;i associates with one of the classes. We seek a point estimate μg to split interval [n1, nS] in two parts so that μg discriminates between classes P and N in the best possible way; μg is used to split vector Vg elements in: high (h) and low (l) values. Binary transformation of Vg into h and l value intervals proceeds via two distinct steps. Step 1: Calculation of midpoint values μg;i across Vg elements, e.g. μg;i = (ng;1+ng;i+1)/2, and formulation of the descending ordered vector of midpoint values: Mg = , μg;i ≥ μg;i+1. Step 2: Assessment of point estimate μg. For each midpoint μg;i we assess its information gain IG(S,μg;i) with respect to set of samples S using the information theoretic model reflected in formula (1) below (utilized as well in the standard C4.5 decision tree induction system [19]). IG(S,μg;i) = E(S) – E(S,μg;i)

(1)

E(S) corresponds to the class-related entropy of the original set of samples S, calculated by formula (2) using an information theoretic model [20]. E (S ) = −

Sp S

log 2

Sp S

−

S SN log 2 N S S

(2)

where, Sp and SN denote the samples from S that belong to class P and N, respectively; |S(.)| denotes set cardinality. For each gene g, E(S,μg;i) corresponds to the conditional entropy when μg;i is used as a split point for the target gene g, and it is computed by formula (3). E (S, μ g ; i) = −

⎞ S S S Sh ⎛ SP,h log 2 P , h + N , h log 2 N , h ⎟ − ⎜ S ⎝ Sh Sh Sh Sh ⎠ SP ,l SN ,l ⎞ SN ,l Sl ⎛ SP ,l log 2 + log 2 ⎜ ⎟ S ⎝ Sl Sl Sl Sl ⎠

(3)

where, SP,h, SP,l ⊆ SP denote the class P samples with high (ng;i ≥ μg;i) and low (ng;i < μg;i) values for the target gene, respectively; SN,h, SN,l are defined in an analogous way. The estimate μg;i, which maximizes IG(S,μg;i) is set as the value of μ. This point is selected to transform each gene-expression value, gi, to its binary equivalents, v(gi) ∈ {0,1} (0 and 1 represent low (l) and high (h) gene-expression values, respectively) Binary transformation proceeds independently across genes. When all gene expression values are transformed, a matrix that includes the binary equivalent values replaces the original gene expression matrix. In addition, optimal midpoint values are stored in order to be used for the binary transformation of the corresponding test (unseen) cases. The C4.5/Rel8 decision tree induction algorithm [19] follows a similar to our discretization approach, aiming to improve the use of continuous attributes during decision tree induction. Reported results slightly favor a ‘local’ (i.e., during the growth of the tree) discretization process in contrast to a ‘global’ one. However, running Weka’s C4.5/Rel8 implementation (called J48 [21]) on the gene-expression

236

G. Potamias et al.

datasets used in this paper, performance figures were not satisfactory (data not shown). This could be attributed to the intrinsic difficulty of decision tree induction approaches to cope with many irrelevant attributes.

3 Multiple Iterative Constraint Satisfaction Based Learning MICSL proceeds by viewing samples as constraints that should be satisfied in context of a mathematical programming model in which the objective function maps to rules generated. The specific optimization model is 0-1 linear, which means that all values are binary (valued either 0 or 1). This is achieved by transforming the original learning problem to a binary equivalent, which in turn implies that all attributes (or features) are valued over nominal scales. To explain MICSL we develop a notation system. We use gi to represent a gene. Then v(gik) represents the binary value equivalent of sequence gi with respect to sample k, which means that v(gik) = 0 or v(gik) =1 and value of 0 means that gik is under-expressed (or it is ‘low’) and value of 1 means that gik is over-expressed (or it is ‘high’). A sample is represented as an implication between the conjunction of v(gik) and the class of interest. Microarray samples are assigned into two classes often tagged as positive or negative, P or N, respectively. Thus a positive sample sk representation is:

sk ≡ P ← ∧i (v(gik)} - negative sample representation is identical, only class assignment changes. Given I example samples, and that v(gik) and P, N are binary then, the binary sk representation leads to the formation of a non-linear constraint, namely: I

P ≥ ∏ v(gik ) i= 1

The constraint has a linear equivalent [22], which is: I

∑ v(g

ik

)−P ≤ I−1

i= 1

Additional constraints are introduced, namely: P + N = 1, which mean that a derived solution, or rule, can point (exclusively) either to P or to N class, along with a series of constraints that will guarantee that any v(gik) can be either 0 or 1, that is a series of constraints like: [v(gik) = 0] + [ v(gik) = 1] ≤ 1, ∀i ≤ I. The objective function is the summation across all v(gik). In order to learn minimal description rules a control parameter R is introduced – it is linked to the available samples I, and helps to form the necessary number of constraints in the following form: I

∑ v(g

ik

) − (P, N ) ≤ R − 1,1 ≤ R ≤ I

i= 1

Parenthesis (P, N) means that for samples, which are positive then P is used, and samples, which are negative then N is used.


237

To summarize model formulation we present as example the model for the breast cancer microarray - we restrict to the training set of 78 samples, and to the selected 70 gene-markers reported in the original publication [8], namely: 70

min{ ∑

∑ v(g

ik

)}

(4)

i= 1 k ∈{0, 1}

Subject to: 70

∑v(g

ik

) − (P, N) ≤ R − 1, R ≥ 1

i= 1

(5)

78 such constraints are formed [ v ( g i ) = 0 ] + [ v ( g i ) = 1] ≤ 1, ∀ i ≤ 70

(6)

P+N =1

(7)

v(gik ) = 0,1,∀i ≤ I

(8)

Constraint (5) works iteratively until all constraints are satisfied. Optimization starts by setting R=1 and stops when all constraints are satisfied. At each iteration the model produces a solution in the form:

∧i [v(gi)∈{0,1}] ==> Class with Class taking the values P or N. In general, and at each iteration R (1 ≤ R ≤ I), the process terminates when all solutions are found. Then the value of R increases, the corresponding constraints are formed, and the process iterates. The system could be parameterized in order to stop when special criteria are met, e.g., when all examples are covered by the so far formed. The union of solutions, derived by objective function (5), forms the set of knowledge learned, i.e., induced set of rules. Constraints (6) and (7) are the additional constraints added in order to guarantee: the mutual exclusiveness between binary gene-expression values, and the mutual exclusiveness between class values with the extra requirement that at least one class is true (note the equality in constraint (7)). Constraint (8) declares all variables (gene values) as binary. Potamias [14] demonstrates that the optimization model (4) – (8), yields the R minimal consistent models that explain the domain. In addition, there is a limit for the value of R at which the process stops, which means that optimization converges. Model complexity is linear and relates to complexity of the above presented binary integer programming model.

4 Experimental Results Experimentation follows a 2x2 scheme. Different learning approaches are used on a variety of publicly available samples. Nine learning methods were used and results compared with MICSL. Learning methods used are: (1) J48: A variation of the classic C4.5 for generating a pruned or un-pruned C4.5 decision tree [19]; (2) Jrip: A propositional rule learner, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which was proposed by Cohen [23]; (3) PART: Algorithm for generating a

238

G. Potamias et al.

PART decision list [24] – it utilizes a separate-and-conquer strategy, build a partial C4.5 decision tree at each iteration, and make the "best" leaf into a rule; (4) Conjunctive Rule: this algorithm implements a single conjunctive rule learner that can predict for numeric and nominal class labels [25]; (5) Decision Table: an algorithm for building and using a simple decision table majority classifier [26]; (6) DTNB: an algorithm for building and using a decision table equipped with a naive Bayes hybrid classifier [27] - at each point in the search, the algorithm evaluates the merit of dividing the attributes into two disjoint subsets: one for the decision table, the other for naive Bayes; (7) NNge: Nearest-neighbor-like algorithm using non-nested generalized exemplars which are hyper-rectangles that can be viewed as if-then rules [28]; (8) OneR: Algorithm for building and using a 1R classifier that utilizes the minimum-error (discretised) attribute for prediction [29]; and (9) RiDor: an implementation of a Ripple-Down Rule learner [30] - it generates a default rule first and then the exceptions for the default rule with the least (weighted) error rate. The nine methods plus MICSL were assessed over seven public domain datasets with the following characteristics: i.

Veer: Veer et al [8] proposed a signature for breast cancer with 70 genes (genesignature) using supervised classification from a training dataset of 78 samples and a test dataset of 19 samples; ii. West: West et al [33] used DNA microarray expression data from a series of primary breast cancer samples to discriminate and predict the estrogen receptor status of these tumors as well as the lymph node status of the patient at the time the tumor was surgically removed. The dataset has 38 samples in the training and 9 in the test set. The proposed gene-signature contains 102 genes; iii. Golub: The Golub dataset [7] contains data for acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL). The proposed gene-signature consists of 50 genes; the dataset contains 38 samples in the training and 34 in the test set; iv. Gordon: The Gordon dataset [11] consists of 32 samples in the training set and 149 samples in the test set for the distinction between malignant pleural mesothelioma (MPM) and adenocarcinoma (ADCA) of the lung cancer. The reported gene-signature contains 8 genes. v. Golub: The Golub dataset [7] contains data for acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL). The reported gene-signature consists of 50 genes; the dataset contains 38 samples in the training and 34 in the test set. vi. Gordon: The Gordon dataset [11] consists of 32 samples in the training set and 149 samples in the test set for the distinction between malignant pleural mesothelioma (MPM) and adenocarcinoma (ADCA) of the lung cancer. The reported gene-signature genes contains 8 genes. vii. Singh: In Singh et al [31] we have microarray data for prostate tumor. The proposed set of genes was identified that is strongly correlated with the state of tumor differentiation as measured by Gleason score. This set consists of 26 genes, the training dataset contains 102 samples and the test dataset contain 34 samples. viii. Sorace: Sorace et al [32] proposed a gene profile with 22 genes for early detection of ovarian cancer. The experimental data collection comes from the Clinical Proteomics Data Bank and contains 125 samples in the training set and 128 samples in the test set.


239

ix. Alizadeh: Based on the work of [12] Alizadeh et al and contains data for distinct types of diffuse large B-cell lymphoma. The proposed gene profile contain 380 genes, the training dataset contains 34 samples and the test 13 samples. Statistical performance assessment was done using predictive accuracy (PA), sensitivity (SE) and specificity (SP) figures across all published domain test samples. Results are summarized in Table 1. Experimentation was done on the reduced original samples; for instance, the breast-cancer dataset included about 25000 genes; however experimentation across all learning methods was done using the 70 genes molecular signature as suggested and reported in [8]. Thus, by using the genes selected in the respective original study we avoid getting into the medical specifics and focus just on the statistical performance of each learning method, including MICSL. Table 1. Statistical performance results. Bold entries are used to indicate superior performance; last column to the right averages results across domains with respect to learning method. ANOVA analysis revealed significant differences across methods and domains (p = 0.001). Method MICSL

J48

Jrip

PART

CR

DT

DTNB

PA SE SP PA SE SP PA SE SP PA SE SP PA SE SP PA SE SP PA SE SP

Veer 78.9% 78.9% 81.8% 63.2% 63.2% 72.6% 63.2% 63.2% 60.7% 73.7% 73.7% 66.8% 52.6% 52.6% 48.6% 73.7% 73.7% 78.7% 68.4% 68.4% 75.6%

West 77.8% 77.8% 82.2% 66.7% 66.7% 68.3% 66.7% 66.7% 68.3% 66.7% 66.7% 68.3% 66.7% 66.7% 68.3% 66.7% 66.7% 68.3% 66.7% 66.7% 68.3%

Golub 97.1% 97.1% 95.8% 94.1% 94.1% 93.7% 94.1% 94.1% 93.7% 94.1% 94.1% 93.7% 94.1% 94.1% 93.7% 94.1% 94.1% 93.7% 94.1% 94.1% 93.7%

Gordon 98.7% 97.5% 86.7% 98.7% 98.7% 93.9% 98.7% 98.7% 93.9% 98.7% 98.7% 98.7% 98.7% 98.7% 88.0% 59.0% 98.7% 93.9% 98.7% 98.7% 88.0%

Singh 88.2% 88.2% 95.8% 61.8% 61.8% 86.2% 76.5% 76.5% 91.5% 85.3% 85.3% 94.7% 97.1% 97.1% 98.9% 97.1% 97.1% 98.9% 94.1% 94.1% 97.9%

Sorace 99.2% 98.4% 98.8% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 98.4% 98.4% 97.2% 99.2% 99.2% 98.6% 97.7% 97.7% 98.7% 97.7% 97.7% 98.7%

Alizadeh 92.3% 92.3% 95.2% 92.3% 92.3% 87.7% 92.3% 92.3% 87.7% 92.3% 92.3% 87.7% 92.3% 92.3% 95.2% 61.5% 61.5% 76.0% 92.3% 92.3% 87.7%

Average 90.3% 90.0% 90.9% 82.4% 82.4% 86.1% 84.5% 84.5% 85.1% 87.0% 87.0% 86.7% 85.8% 85.8% 84.5% 78.5% 84.2% 86.9% 87.4% 87.4% 87.1%

77.8%

94.1%

98.7%

79.4%

100.0%

100.0%

86.1%

PA

52.6%

NNge

SE

52.6%

77.8%

94.1%

98.7%

79.4%

100.0%

100.0%

86.1%

OneR

SP PA SE SP PA

66.4% 42.1% 42.1% 60.3% 73.7%

82.2% 66.7% 66.7% 68.3% 44.4%

91.6% 94.1% 94.1% 93.7% 94.1%

88.0% 98.7% 98.7% 93.9% 98.7%

92.6% 70.6% 70.6% 75.2% 97.1%

100.0% 99.2% 99.2% 98.6% 100.0%

100.0% 76.9% 76.9% 85.6% 38.5%

88.7% 78.3% 78.3% 82.2% 78.1%

Ridor

SE

73.7%

44.4%

94.1%

98.7%

97.1%

100.0%

38.5%

78.1%

SP

78.7%

50.6%

93.7%

93.9%

98.9%

100.0%

61.5%

82.5%

240

G. Potamias et al.

Based on average across domains, MICSL outperformed all comparison learning methods. In addition, our method held exceptional performance across all selected domains, with the exception of one domain, in which MICSL trailed in performance.

5 Discussion and Concluding Remarks Overall performance of MICSL is at least as good as compared to performance of standard learning methods. It is noteworthy that MICSL performed very well in domain with rather few training samples while trailed in domain with a rather larger sample space. Better performance in smaller sample spaces can be attributed to the fact that MICSL strives to maximize usage of domain samples with zero assumptions, excluding of course gene-value discretization, which however is common across methods. For instance, the entropy metric used by J48 brings the assumption that, in Veer, underlying population is split between 44 (or 54%) with good prognosis and 34 (or 44%) with bad prognosis. The assumption implies an almost 50/50 split between bad and good prognosis patients, an assumption, which may not always be real. Instead MICSL needs not such a limiting assumption. MISCL made very well in all domains expect, slightly, on the Singh domain (a fact that cannot be readily explained). In general, as MICSL does very well in domains with few or very few samples, it may provide a useful tool in novel domain investigation, in which the number of samples is small – a common situation in research clinico-genomic trials. MICSL couples learning with optimization. Such coupling places the method with support vector methodology and future research should focus on a hybrid architecture combining the two method families. In addition, in our R&D plans is to port MICSL on a more efficient environment, e.g., ILOG CP (http://ilog.com.sg/products/cp/). Acknowledgments. Work presented herein was partially supported by the ACTIONGrid EU project (FP7 ICT 224176), as well by the ACGT (FP6-IST-2005-026996) and GEN2PHEN (European Commission, Health theme, project 200754) projects. Authors hold full responsibility for opinions, results and views expressed in the text.

References 1. Sander, C.: Genomic Medicine and the Future of Health Care. Science 287(5460), 1977– 1978 (2000) 2. Sanchez, F.M., Iakovidis, I., et al.: Synergy between medical informatics and bioinformatics: facilitating genomic medicine for future health care. Journal of Biomedical Informatics 37(1), 30–42 (2004) 3. McConnell, P., Johnson, K., Lockhart, D.J.: An introduction to DNA microarrays. In: 2nd Conference on Critical Assessment of Microarray Data Analysis (CAMDA 2001) Methods of Microarray Data Analysis II, pp. 9–21 (2002) 4. Dopazo, J.: Microarray data processing and analysis. In: 2nd Conference on Critical Assessment of Microarray Data Analysis (CAMDA 2001) - Methods of Microarray Data Analysis II, pp. 43–63 (2002)


241

5. Piatetsky-Shapiro, G., Tamayo, P.: Microarray Data Mining: Facing the Challenges. ACM SIGKDD Explorations 5(5), 1–5 (2003) 6. Butte, A.J.: Translational Bioinformatics: Coming of Age. J Am. Med. Inform. Assoc. 15(6), 709–714 (2008) 7. Golub, T.R., Slonim, D.K., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999) 8. Van’t Veer, L.J., Dai, H., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536 (2002) 9. Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissue probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999) 10. Pomeroy, S.L., Tamayo, P., et al.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–442 (2002) 11. Gordon, G.J., Jensen, R.V., et al.: Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma. Cancer Research 62, 4963–4967 (2002) 12. Alizadeh, A.A., Eisen, M.B., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000) 13. Petricoin, E.F., Ardekani, A.M., et al.: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359(93056), 572–577 (2002) 14. Potamias, G.: MICSL: Multiple Iterative Constraint Satisfaction based Learning. Intell. Data Anal. 3(4), 245–265 (1999) 15. Hunt, E.B., Marin, J., Stone, P.J.: Experiments in Induction. Academic Press, New York (1966) 16. Michalski, R.C.: Concept Learning. Encyvlopedia of Artificial Intelligence 1, 185–194 (1986) 17. Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning. In: 13th International Joint Conference of Artificial Intelligence, pp. 1022–1027 (1993) 18. Li, J., Wong, L.: Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics 18(5), 725–734 (2002) 19. Quinlan, J.R.: C4.5: Programs for Machine Learning. Kaufmann Publishers Inc., San Mateo (1993) 20. Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27(379–423), 623–656 (1948) 21. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.E.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1), 10–18 (2009) 22. Bell, C., Nerode, A., Raymond, T.N., Subrahmanian, V.S.: Implementing deductive databases by mixed integer programming. ACM Transactions on Database Systems 21(2), 238–269 (1996) 23. Cohen, W.W.: Fast Effective Rule Induction. In: 12th International Conference on Machine Learning, pp. 115–123 (1995) 24. Frank, E., Witten, I.H.: Generating Accurate Rule Sets Without Global Optimization. In: 15th International Conference on Machine Learning, pp. 144–151 (1998) 25. Pazzani, M.J., Sarrett, W.: A framework for the average case analysis of conjunctive learning algorithms. Machine Learning 9, 349–372 (1992) 26. Kohavi, R.: The Power of Decision Tables. In: 8th European Conference on Machine Learning, pp. 174–189 (1995)

242

G. Potamias et al.

27. Hall, M., Frank, E.: Combining Naive Bayes and Decision Tables. In: 21st Florida Artificial Intelligence Society Conference, pp. 15–17 (2008) 28. Martin, B.: Instance-based learning: nearest neighbor with generalization. Master Thesis, University of. Waikato, Hamilton, New Zealand (1995) 29. Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–91 (1993) 30. Gaines, B.R., Compton, P.: Induction of Ripple-Down Rules. In: 5th Australian Joint Conference on Artificial Intelligence, pp. 349–354 (1992) 31. Singh, D., Febbo, P.G., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002) 32. Sorace, J.M., Zhan, M.: A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics 4, 24 (2003) 33. West, M., Blanchette, C., et al.: Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. 98(20), 11462–11467 (2001)

Skin Lesions Characterisation Utilising Clustering Algorithms Sotiris K. Tasoulis1 , Charalampos N. Doukas2 , Ilias Maglogiannis1 , and Vassilis P. Plagianakos1 1

2

Department of Computer Science and Biomedical Informatics, University of Central Greece, Papassiopoulou 2–4, Lamia, 35100, Greece {stas,imaglo,vpp}@ucg.gr Department of Information and Communication Systems Engineering, University of the Aegean, Karlovassi, 83200, Samos, Greece [email protected]

Abstract. In this paper we propose a clustering technique for the recognition of pigmented skin lesions in dermatological images. It is known that computer vision-based diagnosis systems have been used aiming mostly at the early detection of skin cancer and more specifically the recognition of malignant melanoma tumour. The feature extraction is performed utilising digital image processing methods, i.e. segmentation, border detection, colour and texture processing. The proposed method belongs to a class of clustering algorithms which are very successful in dealing with high dimensional data, utilising information driven by the Principal Component Analysis. Experimental results show the high performance of the algorithm against other methods of the same class. Keywords: Pigmented Skin Lesion, Image Analysis, Feature Extraction, Classification, Unsupervised clustering, Cluster analysis, Principal Component Analysis, Kernel density estimation.

1

Introduction

Several studies found in literature have proven that the analysis of dermatological images and the quantification of tissue lesion features may be of essential importance in dermatology [9,13]. The main goal is the early detection of malignant melanoma tumour, which is among the most frequent types of skin cancer, versus other types of non-malignant cutaneous diseases. The interest in melanoma is due to the fact that its incidence has increased faster than that of almost all other cancers and the annual incidence rates have increased on the order of 3 − 7% in fair-skinned populations in recent decades [10]. The advanced cutaneous melanoma is still incurable, but when diagnosed at early stages it can be cured without complications. However, the differentiation of early melanoma from other non-malignant pigmented skin lesions is not trivial S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 243–253, 2010. c Springer-Verlag Berlin Heidelberg 2010

244

S.K. Tasoulis et al.

even for experienced dermatologists. In several cases, primary care physicians underestimate melanoma in its early stage [13]. A promising technique to deal with such a problem seems to be the use of data mining methods. In particular, using clustering could be the key step on understanding the differences between the types and subtypes of skin lesions. In this work we focus on a powerful class of algorithms that reduces the dimensionality of the data without significant loss of information. In this class of algorithms the Principal Direction Divisive Partitioning algorithm (PDDP) is of particular value [1]. PDDP is a “divisive” hierarchical clustering algorithm. Any divisive clustering algorithm can be characterised by the way it chooses to provide answers to the following three questions: Q1 : Which cluster to split further? Q2 : How to split the selected cluster? Q3 : When should the iteration terminate? The PDDP based algorithms in particular, use information from the Principal Component Analysis (PCA) of the corresponding data matrix to provide answers, in a computationally efficient manner. This is achieved by incorporating information from only the first singular vector and not a full rank decomposition of the data matrix. In this paper, we will show the strength of an enhanced version of the PDDP algorithm that uses extracted features from dermatological images aiming at the recognition of malignant melanoma versus dysplastic nevus and non-dysplastic skin lesion. The paper is organised as follows: in section 2 we present the image dataset, as well as the preprocessing and segmentation, and feature extraction techniques applied. In Section 3, the proposed clustering algorithm is presented. In Section 4 we illustrate the experimental results and discuss the potential of our approach. The paper ends with concluding remarks and pointers for future work.

2

The Skin Lesions Images Dataset

The image data set used in this study is an extraction of the skin database that exists at the Vienna Hospital, kindly provided by Dr. Ganster. The whole data set consists of 3631 images, 972 of them are displaying nevus (dysplastic skin lesions), 2590 featuring non-dysplastic lesions and the rest 69 images contain malignant melanoma cases. The number of the melanoma images set is not small considering the fact that malignant melanoma cases in a primordial state are very rare. It is very common that many patients arrive at specialised hospitals with partially removed lesions. A standard protocol was used for the acquisition of the skin lesion images ensuring the reliability and reproducibility of the collected images. Reproducibility is considered quite essential for the image characterisation and recognition attempted in this study, since only standardised images may produce comparable results.

Skin Lesions Characterisation Utilising Clustering Algorithms

2.1

245

Image Pre-processing and Segmentation

The segmentation of an image containing a cutaneous disease involves the separation of the skin lesion from the healthy skin. For the special problem of skin lesion segmentation, mainly region-based segmentation methods are applied [2,21]. A simple approach is thresholding, which is based on the fact that the values of pixels that belong to a skin lesion differ from the values of the background. By choosing an upper and a lower value it is possible to isolate those pixels that have values within this range. The information for the upper and the lower limits can be extracted from the image histogram, where the different objects are represented as peaks. The bounds of the peeks are good estimates of these limits. It should be noted though that, simple thresholding as it is described here can not be used in all cases because image histograms of skin lesions are not always multi-modal [7]. In this study, a more sophisticated approach of a local/adaptive thresholding technique was adopted, where the window size, the threshold value and degree of overlap between successive moving windows were the procedure parameters. This algorithm is presented in the flowcharts of Figure 1. The parameters of the proposed thresholding algorithm were tuned so that skin lesions separation was performed satisfactory. START

START Choose a new pixel in the image

Choose a new pixel in the image NO

Compute average intensity of all pixels

Is pixel "dark" and and unscanned already?

in pixel window YES Characterize pixel as scanned

Compare pixel intensity with average intensity minus threshold

Create an array of adjacent "dark" unscanned pixels

K then repeat + + end if end while gbests position = left upper corner of detected face

After K iterations we inspect whether the particles converge or not. If for R+1 successive iterations gbest is at the same position, the swarm seems to converge and we assume that this location probably contains a face. We check whether the value of SVM output at this point is above a predetermined threshold. If the value of gbest is larger than this threshold, we terminate the detection procedure. If gbest ’s value is below the threshold, we initialize the particles again at random positions and we repeat the above procedure. After fine-tuning the algorithm, we selected for our experiments the values 5 and 3 for parameters K and R respectively.

4


Frame Detection Accuracy (FDA) is an evaluation metric used to measure any object detection algorithm’s performance. This measure calculates the spatial overlap between the ground-truth and the algorithm’s output [5]. If the face detection algorithm aims to detect multiple faces, the sum of all the overlaps is normalized over the average number of ground-truth and detected objects. If NG is the number of ground-truth objects and ND the number of detected objects, FDA is defined as: Overlap Ratio F DA = (3) NG +ND 2

where

Nmapped

Overlap Ratio =

i=1

Gi ∩ Di Gi ∪ Di ,

(4)

Face Detection Using PSO and Support Vector Machines

373

Nmapped is the number of mapped object pairs in the image, Gi is the i-th groundtruth object image region and Di is the i-th detected object image region. The speed performance of the presented face detector is directly related to various parameters, such as the swarm size, the image’s initial size and the repeat cycles. The average number of positions examined using PSO, was a small percentage of all the possible combinations of the 2D coordinates ranging from 5 to 6 %. This demonstrates the significant reduction in the number of possible solutions to which we have to apply the classifier each time. That is, using PSO for searching we are able to reduce the time needed for a detection by a factor of 20 for any given face detection algorithm. Moreover, using linear SVMs instead of nonlinear gives another ×103 boost in the detection speed. We applied the presented algorithm to the BioID Face Database (available at http://www.bioid.com/support/downloads/software/bioid-face-database.html) consisted of 1521 gray level images with an initial resolution of 384 × 286 pixels. To detect faces of various sizes, prior to applying the algorithm, we scaled the initial image using different scaling factors. Emphasis has been placed on real world conditions and, therefore, the testset features a large variety of illumination, background and face size. Figure 1 shows the output of our face detector on some images from the BioID Face Database along with the overlap and the detector output value. Black rectangles represent the ground-truths of every image, while green (lighter gray) windows represent the proposed algorithm’s output. Table 1 lists the detection rate for the presented algorithm in comparison with Viola-Jones’ state-of-the-art algorithm [6] using as threshold for overlap the value 25.00. For initial scaling factor 0.19 and 0.24, images are too small for the Viola-Jones algorithm to be detected while our algorithm gives a very good detection rate. So, we apply the algorithms to larger images (scaling factor 0.25

Fig. 1. Output of our face detector on a number of test images from the BioID Database

Table 1. Detection rates for the presented algorithm (SVM-PSO) in comparison with Viola-Jones’ algorithm (OpenCV) for the BioID Face Database. The initial scaling factors for the tested images are given in the first row. Detector SVM-PSO OpenCV

Scale 0.19 93.95% 0%

Scale 0.24 93.23% 0%

Scale 0.25 93.82% 83.30%

Scale 0.3 94.08% 94.21%

374

E. Marami and A. Tefas

and 0.3) and the detection rates for our algorithm remain very good. The ViolaJones algorithm gives for scaling factor 0.25 a relatively good detection rate, whilst for scaling factor 0.3 it gives a detection rate similar to our algorithm. We should, also, mention that the classifier used in Viola-Jones’ algorithm is trained using many more training samples than our classifier.

5

Conclusions

We presented a fast and accurate face detection system searching for frontal faces in the image plane. To avoid exhaustive search in all possible combinations of coordinates in 2D space, we used a PSO algorithm. What is more, in order to save time and decrease the computational complexity we used as classifier a linear SVM. Experimental results demonstrated the algorithm’s good performance in a dataset with images recorded under real world conditions and proved its efficiency. The proposed method can be combined with any face detector, e.g., the one used in OpenCV, to reduce their execution time.

References 1. Goldmann, L., M¨ onich, U., Sikora, T.: Components and their topology for robust face detection in the presence of partial occlusions. IEEE Transactions on Information Forensics and Security 2(3) (September 2007) 2. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of the 1995 IEEE International Conference on Neural Networks, pp. 1942–1948. IEEE Service Center, Piscataway (1995) 3. Reyes-Sierra, M., Coello, C.C.: Multi-objective particle swarm optimizers: A survey of the state-of-the-art. International Journal of Computational Intelligence Research 2(3), 287–308 (2006) 4. Burges, C.: A tutorial on support vector machines for pattern recognition. Kluwer Academic Publishers, Boston (1998) 5. Kasturi, R., Goldgof, D., Soundararajan, P., Manohar, V., Garofolo, J., Bowers, R., Boonstra, M., Korzhova, V., Zhang, J.: Framework for performance evaluation of face, text, and vehicle detection and tracking in video: Data, metrics, and protocol. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(2), 319–336 (2009) 6. Viola, P., Jones, M.: Robust real-time face detection. International Journal of Computer Vision 57(2), 137–154 (2004)

Reducing Impact of Conflicting Data in DDFS by Using Second Order Knowledge Luca Marchetti and Luca Iocchi Dpt. of Computer and System Sciences Sapienza, University of Rome {lmarchetti,iocchi}@dis.uniroma1.it

Abstract. Fusing estimation information in a Distributed Data Fusion System (DDFS) is a challenging problem. One of the main issues is how to detect and handle conflicting data coming from multiple sources. In fact, a key of success of a Data Fusion System is the ability to detect wrong information. In this paper, we propose the inclusion of reliability assessment of information sources in the fusion process. The evaluated reliability imposes constraints on the use of information data. We applied our proposal in the challenging scenario of Multi-Agent Multi-Object Tracking.

1

Introduction

Most of the work on Distributed Data Fusion Systems (DDFS) investigates how to optimize or improve the fusion process by optimistically assuming the correctness of uncertainty models. The impact of using poor-quality information is not well addressed. In fact, most of the literature focuses its attention on establishing reliability of the belief computed within the framework of the selected model. This approach fails to produce good estimation in particular situations, in which it is not possible to detect errors from inside the fusion process. For example, a fusion process cannot detect ambiguous features, unpredictable systematic errors or conflicting data. In this paper, we address the problem of conflicting data, by using a Second Order of knowledge and introducing the concept of Reliability associated to each source, as uncertainty of evaluation of uncertainty [1][2]. Before accepting or rejecting the measurements from a source, the system checks the quality of the source itself. Given a priori information about the environment, the relations among objects and agents, the introspective analysis of the filtering process, it is possible to measure the reliability of the sources. Therefore, a smarter integration of information can be adopted, and the resulting estimation could be improved.

2

Related Work

The main problems we have investigated is how to fuse data when they contain conflicting information, and how to assess the quality of information sources. S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 375–381, 2010. c Springer-Verlag Berlin Heidelberg 2010

376

L. Marchetti and L. Iocchi

One possibility aims at reducing the drawback of bad information inclusion by weighting the belief among sources. Another way is to throw away bad information and select the most consistent ones. All these approaches use information gathered by the filtering process and the conflicts are evaluated by using metrics from inside the filter. A more interesting point of view is detecting and filtering conflicting data by using information gathered externally to the filter. This leads to a Data Fusion framework that takes into account contextual information [3], relations among objects in the environment [4], consistency measurements of beliefs [5]. These information introduce a second order knowledge, that represents a level of knowledge that cannot be evaluated by using the fused measurements. Using the reliability as degree of trustworthiness of a source, the measurements can be modified to reflect the guessed quality. As pointed out in [1], two strategies can be adopted to handle reliability information: Discount and Pruning. In this paper, we present a general framework for DDFS, that explicitly adds an additional step to a recursive filtering algorithm, providing assessment and handling for a higher level knowledge.

3

Introducing Reliability in DDFS

In Figure 1, the global view of this proposal is depicted. We implemented a Multiple Hypotheses Kalman Filter for multi-object tracking. However, this approach could be easily extended to any filtering algorithm [6].

Fig. 1. Data flow in MAMOT-R Algorithm

First, a Kalman Filter (KF) predict step is performed. At the same time, in the World Knowledge block, we use the previous state estimation and the current measurements, to update the world model representation. The Perceptions are expressed in terms of < ρ, θ >, where ρ is the relative distance of the tracked object related to the agent coordinates, and θ is the relative angle. The block labelled Evaluation assesses the quality of each information source, as explained in Section 3.1. The reliability coefficients, Rs for each information source s, are evaluated here and then passed to the Handling block, where the measurements

Reducing Impact of Conflicting Data in DDFS by II Order Knowledge

377

are modified to reflect the quality of the generating source itself. The complete operation is illustrated in Section 3.2. After the KF update step, the new state of the system < xt , Σt > is estimated. In case of a Multi-Agent System, we propagate the information to other teammates. In particular, we send the hypotheses (considering the < x, Σ > as the mean and covariance of each KF hypotheses), the position of the agents and the updated world representation. 3.1

Evaluating Reliability

We identified four classes of feature, used to assign a quality value: Filter Introspection (meta-information from reasoning about fusion results’ characteristics); Relations among Objects (inter-relationship among objects); A priori Knowledge (contextual information about the environment); and Consensus among agents’ perceptions). Parameters and description of the implemented features are described in the following Table: Class Filter Introspection (FI ) Relations among Objects (RO) A Priori Knowledge (AK )

Consensus (CO)

Feature implemented

Decay constant

distance between the observations and the associλf i ated track: δ(fi , zt ) = ρ2o + ρ2t − 2ρo ρt cos(θo − θt ) time from last observation-to-track association: λf i δ(fi , zt ) = to − tt occupancy map for occluded object detection λfi (camera FOV: range = [0.3m, 6m], angle = (∼ [−30◦ , 30◦ ], areamap = 18.8m2 ) δ(fi , zt ) = areamax − areaoccupied areas with different light conditions (distance beλf i tween agentpose and center of light area): δ(fi , zt ) = (xa − xl )2 + (ya − yl )2 areas with colored patterns (distance between λf i agent pose and center of “colored” area): 2 2 δ(fi , zt ) = (xa − xl ) + (ya − yl ) percentage of agreeing agents: λf i agents δ(fi , zt ) = agreeing #agents

= 1m

= 2sec = 80% 15m2 )

= 1.5m

= 1.5m

= 80%

The reliability coefficients R1 , . . . , Rs , for information sources 1, . . . , s, are evaluated as follows: for each class feature, we estimate the reliability as a distance, δ(fi , zt ), between the “best” correspondence, on a given feature fi , and the current measurements zt [7]. In order to compute the probability of reliability (p) of an information source, we need a function that maps the distance to a numerical value. This is expressed by Rfi (zt ) ≡ p(δ(fi , zt ), λfi ) = exp(−

δ(fi , zt ) ). λfi

The parameter λfi represents the rate parameter (or decay constant ) characterizing the exponential function. It indicates the time after that the reliability

378

L. Marchetti and L. Iocchi

is considered meaningless: the distance δ(fi , zt ) represents how close the observation is to the feature. The overall reliability of the source is, thus, given by M Rs = Πi=1 Rfi .

3.2

Reliability Handling

Discount Strategy. This policy class embraces most of methodologies that assign a weight to measurements in relation to their quality ([8]). Let Z = {Z1 , · · · , ZS } be the union set of measurements coming from S sources, and let xs represents statistics of each source s to be combined. The fusion operator FR is expressed by the Bayesian rule, which under the condition of source independence is reduced to a product [1]: FR (x1 , · · · , xS , R1 , · · · , RS )|Z ≡ p(xs )

S p(xs |Zs )Rs , ∀s ∈ S . p(xs ) s=1

where xs = p(xs |Zs ). Pruning strategy. The pruning policy selects the reliable sources using a thresholding mechanism. Using probability measures to represent reliability coefficients, the overall probability distribution will be influenced only by the sources that will survive to a validation threshold. Thus, the fusion will use the selected sources as FR (x1 , · · · , xS , R1 , · · · , RS )|Z ≡ p(xs )

S p(xs |Zs ) , ∀s ∈ S = {xs |Rs > threshold} . p(x s) s=1

For the purposes of this paper, we used a fixed threshold. However, as states in [9], using variable thresholds could overcomes fixed one.

4


The experiments described in this paper have been conducted in the scenarios of Cooperative Robots Laser Tag game1 . We modelled the problem of Object Tracking within a simulated world for Laser Tag game2 . Within this scenario, we developed a Multi-Agent Multi-Object Tracking algorithm [10] , based on Nearest-Neighbour Multiple Hypotheses Object Tracking. In this scenario, two teams of agents look for the opponents to “tag” them, by using a simulated fiducial sensor. A global controller is in charge to detect and propagate such information to the agents themselves. 4.1

Assumptions

A Priori Errors. To evaluate the performance in the presented scenario, we added artificial errors to the environment. 1 2

http://www.nerdvest.com/robotics/RoboTag/ http://playerstage.sourceforge.net/


(a)

379

(b)

Fig. 2. Area with a priori errors and an example of influence of different light condition

As shown in Figure 2a, into the (A) areas, there are different light conditions. We model the spots as lamp with different luminosity: the observations will be distorted with systematic errors, in relation of the position of the agent in such areas. In Figure 2b is presented a real world example, in which the tracking objects are represented by colored balls. In the areas labelled with (B), instead, we simulate the presence of objects in the environment wrongly detected as targets. Error Patterns. We define three error profiles: Reliable, Faulty1 and Faulty2. Each agent has a common zero mean Gaussian noise on perceptions, described, as stated before, by a tuple < ρ, θ >. The Reliable profile does not have any additional artificial error added: it models, in fact, an agent operating in normal conditions. The Faulties profiles add, respectively, false positives and systematically wrong perceptions, when an agent pass through the (A) or (B) areas. We run the experiments dividing the agents in three sets, corresponding to each error profile. 4.2

Multi-agent Multi-object Tracking in Laser Tag Game

We run the simulated world for about 30 minutes. The simulated robots explore the arena collecting information about the opponents, simulating a “search and tag” match. The frame rate of readings was 10Hz (resulting in ∼ 3000 measurements). Reliability Assessment. We measured the ability of correctly detect the aents’ quality. In Table 1, we present the percentage of correct assessment (Reliable/Non-reliable), for each profiles. It represents the fraction of the correct reliable/unreliable assessment over the ground-truth provided. In brackets, we indicate also the standard deviation. It is interesting to note that Reliable agent has better performance. This suggests that it is able to detect faulty agents better because it can use better local information. Towards a more effective Multi-Agent tracking, the faulty agents should be able to detect themselves as faulty and treat their own observations accordingly.

380

L. Marchetti and L. Iocchi Table 1. Accuracy of reliability assessment FI RO AK CO All Reliable 75%(±4.6%) 72%(±4.2%) 95%(±2.1%) 87%(±3.3%) 86%(±3.6%) Faulty1 55%(±5.3%) 68%(±5.0%) 93%(±4.1%) 85%(±4.2%) 83%(±4.8%) Faulty2 63%(±6.1%) 63%(±5.6%) 92%(±4.5%) 81%(±4.7%) 72%(±5.2%)

Table 2. Least Mean Square Error on policy rules in Laser Tag game experiments None Discount Pruning Reliable 1.364m(±0.512m) 1.043m(±0.324m) 0.732m(±0.311m) Faulty1 2.391m(±0.883m) 1.541m(±0.760m) 1.699m(±0.796m) Faulty2 1.563m(±0.739m) 1.198m(±0.698m) 1.252m(±0.715m)

Policy Rules. The effectiveness of the different policy rules was evaluated considering all the previously introduced feature classes. We have compared the error on estimation using different policy rules. The results in Table 2 show the improvements given by the Second Order knowledge approach. The values indicates the Least Mean Square Error of trajectory estimation over the ground -truth. The Discount rule has the best performance, for the Faulty profiles, than the Pruning one. This could be explained if we consider how the rule works. Weighting bad information is better than completely removing them. If the agents is assessing as “reliable” a source, while it is not, the tracking considers the bad perceptions with a greater variance, than in the normal behaviour. By Pruning them, the tracking cannot use any information at all, reducing the possibility of tracking dynamic objects (and, thus, increasing the overall tracking accuracy).

5


The executed experiments show the importance of using “Second Order knowledge” to handle multiple information sources. We have confirmed, by experiments, this reasonable assumption: if we could know that a source is giving wrong information, it is better to exclude or, at least, discount it. The results suggest that Pruning badly affects the estimation when the number of information source is small. In such situation, the Discount rule is preferable, because prevent excluding potentially good information. Despite the promising results, more work has to be done: better methodologies to evaluate reliability coefficients have to be developed. More interesting is the problem of understanding the model of reliability using information from the environment, as contextual high-level reasoning. In this paper we introduced this concept by modelling the a priori knowledge as a feature for Reliability Coefficients evaluation. However, a more extensive use of contextual information can contribute to significantly improve the results of a data fusion process.


381

References 1. Rogova, G., Boss, L.: Information quality effects on information fusion. Technical report, Defense and Research Development Canada (2008) 2. Wang, P.: Confidence as higher level of uncertainty. In: Proc. of Int. Symp. on Imprecise Probabilities and Their Applications (2001) 3. Elmenreich, W.: A review on system architectures for sensor fusion applications. LNCS. Springer, Heidelberg (2007) 4. Guibas, L.J.: Sensing, tracking and reasoning with relations. IEEE Signal Processing Magazine (March 2002) 5. Roli, F., Fumera, G.: Analysis of linear and order statistics combiners for fusion of imbalanced classifiers. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, p. 252. Springer, Heidelberg (2002) 6. Marchetti, L., Nobili, D., Iocchi, L.: Improving tracking by integrating reliability of multiple sources. In: Proceedings of the 11th International Conference on Information Fusion (2008) 7. Yan, W.: Fusion in multi-criterion feature ranking. In: 2007 10th International Conference on Information Fusion, July 9-12, (2007) 8. Appriou, A.: Uncertain Data Aggregation in Classification and Tracking Processes. Physica-verlag, Heidelberg (1998) 9. Tumer, K., Ghosh, J.: Analysis of decision boundaries in linearly combined neural classifiers. Pattern Recognition (1996) 10. Marchetti, L.: To believe or not to believe: Improving distributed data fusion with second order knowledge. PhD thesis, “Sapienza” University of Rome, Department of Computer and System Sciences (2009)

Towards Intelligent Management of a Student's Time Evangelia Moka and Ioannis Refanidis University of Macedonia, Department of Applied Informatics Egnatia str. 156, 54006 Thessaloniki, Greece {emoka,yrefanid}@uom.gr

Abstract. In parallel with studies, a lot of extra activities need to be fitted in a student’s schedule. Frequently, excessive workload results in poor performance or in failing to finish the studies. The problem is more severe in lifelong learning, where students are professionals with family duties. So, the need of making informative decisions as of whether taking a specific course fits into a student's schedule is of great importance. This paper illustrates a system, called EDUPLAN and being currently under development, which aims at helping the student to make intelligent management of her time. EDUPLAN aims at informing the student as for which learning objects can fit her schedule or not, as well as at organizing her time. This can be achieved using scheduling algorithms and a description of the user's tasks and events. In the paper we also extend the LOM 1484.12.3™-2005 ontology with classes that can be used to describe the temporal distribution of the workload of any learning object. Finally, we provide EDUPLAN's architecture, being built around the existing SELFP LANNER intelligent calendar application. Keywords: Intelligent systems, scheduling, calendar applications.

1 Introduction Nowadays, the rapid rhythm of development of societies has led to growing importance of education. Students are required to fulfill more obligatory activities and studies. Moreover, adults considering lifelong learning have to fit their studies in around their professional and family commitments. Therefore, taking informative decisions is important, in order to avoid, whenever possible, anxieties and tension, and to avert missed opportunities or deadlines. This paper illustrates a system under development, called EDUPLAN, that helps the prospective student to avoid taking bad decisions as for whether to attend a course (more generally a learning object) or not, and better organize her time. Our proposal is based on our experience with intelligent calendar applications. In particular, SELFPLANNER1 is a web-based intelligent calendar application [6], which allows the user to specify her commitments by employing a powerful scheduler to put the tasks within the user' calendar [5]. In the paper we propose an ontology for describing the 1

http://selfplanner.uom.gr

S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 383–388, 2010. © Springer-Verlag Berlin Heidelberg 2010

384

E. Moka and I. Refanidis

workload of learning objects. The proposed ontology can be considered an extension of the IEEE LOM 1484.12.3™-20052 model that characterizes learning objects. The rest of the paper is structured as follows: Section 2 illustrates a typical use case. Section 3 gives a brief presentation of the SELFPLANNER application. Section 4 presents the EDUPLAN Ontology, whereas Section 5 presents the system's architecture. Finally, Section 6 concludes the paper and identifies further work to be done.

2 A Typical Use Case Perhaps the students of open universities constitute the best example of people that would greatly benefit from EDUPLAN. They are mainly professionals with families and children. At the beginning of each academic year they have to decide whether to undertake two thematic units, which results in full-time studies, or take a single thematic unit, which results in part-time studies. Apparently, making a bad decision could be avoided if the students were well informed about the workload each thematic unit incurs. At a very fine-grained level, the student could be informed the detailed daily program of each unit, such as: • • • •

On March 14th you have to submit the 5th project of the unit. The estimated workload is 12 hours. Having read ch.23 from [2] is a prerequisite. On February 16th, 7 to 8 pm, there is a synchronous tele-lecture. Attending the tele-lecture is optional. If you attend it, then the estimate workload for reading ch.22 of [2] lowers to 3 hours. You can attend the tele-lecture of February 16th afterwards offline, provided that you haven't attended it online. It is preferably to attend it online. On May 29th, 9 to 12 am, are the final exams. Participation is obligatory.

All this information in addition to student’s schedule is necessary for informative decision making. Manual arrangement of this information is impractical. So, an automated scheduler is necessary to solve the computational problem. Our approach applies in all cases of learning objects, synchronous or asynchronous, simple or composite, covering entities varying from tutorials to a university program, like the scenario presented above.

3 The SELFPLANNER Application SELFPLANNER is a web-based intelligent calendar application that helps the user to schedule her personal tasks [6]. With the term ‘personal task’ we mean any activity that has to be performed by the user and requires some of her time. Each task is characterized by its duration and its temporal domain [1]. A domain consists of a set of intervals, where the task can be scheduled. A task might be interruptible and/or periodic. A location or a set of locations is attached to each task; in order to execute a task or a part of it, the user has to be in one of these locations. Travelling time between pairs of locations is taken into account when the system 2

http://ltsc.ieee.org/wg12/

Towards Intelligent Management of a Student's Time

385

schedules adjacent tasks. Ordering constraints and unary preferences, denoting when the user prefers the task to be scheduled, are also supported by the system. SELFPLANNER utilizes Google Calendar for presenting the calendar to the user, and a Google Maps application to define locations and compute the time the user needs to go from one location to another.

4 The Ontology This Section introduces the EDUPLAN ontology, following a brief introduction of the IEEE LOM 1484.12.3™-2005 model. 4.1 The IEEE LOM 1484.12.3™-2005 Ontology The IEEE working group that developed the IEEE 1484.12.1TM-2002 Standard defined learning objects, for the purposes of the standard, as being “any entity, digital or non-digital, that may be used for learning, education or training”. The IEEE LOM 1484.12.3™-2005 Standard defines an XML Schema Binding of the LOM Data Model defined in IEEE Std 1484.12.1TM-2002. The purpose of this standard is to allow the creation of LOM instances in XML, which allows interoperability and the exchange of LOM XML instances between various systems. The IEEE LOM data model comprises a hierarchy of elements. At the first level, there are nine categories, each of which contains sub-elements; these sub-elements might either be simple elements that hold data, or be themselves aggregate elements, which contain further sub-elements. All LOM data elements are optional. The data model also specifies the value space and datatype for each of the simple data elements. The value space defines the restrictions, if any, on the data that can be entered for that element. Fig. 1 depicts the structure of IEEE LOM Ontology. 4.2 The EDUPLAN Ontology The IEEE LOM 1484.12.3™-2005 model emphasizes on the type and the content of each learning object. On the other hand, EDUPLAN ontology focuses on the time demands and restrictions of each learning object. The two ontologies are complementary; EDUPLAN individuals may refer to IEEE LOM 1484 objects for content information. The two basic classes of the proposed ontology are the learningObject and the course. Any individual of the learningObject refers to a well defined learning activity. The class includes a pointer to the IEEE LOM metadata (sourceLO) and carries properties, such as title, description, type and expected duration. An additional property of the learningObject class concerns whether an individual is interruptible (e.g., reading a book) or not (e.g., attending a lecture in real-time). Furthermore, learning objects might have associated locations, serving estimation of travelling time. A learningObject individual is not associated with a specific (in time) learning activity. Indeed, reading a book chapter might be optional in one course and obligatory in another, whereas they will have different deadlines. The class course serves exactly this purpose and has four subclasses: courseAsynchronous, courseSynchronous,

386


Fig. 1. A schematic representation of the hierarchy of elements in the LOM data model

courseComposite and coursePeriodic, with the first three of them being disjoint to each other. Any individual of the coursePeriodic should also belong to one of the other three subclasses. Class course adopts all the properties of the learningObject. In addition, object property refersTo links non-composite course individuals to learning objects. Data property optional is also defined. A courseAsynchronous individual is characterized by its deadline, i.e., a dateTime. On the other hand, a courseSynchronous individual is characterized by its startTime and endTime. A coursePeriodic individual is characterized by its periodType that takes the literals daily, weekly and monthly as values. There are several ways to define the number of occurrences: Specifying a firstPeriod and lastPeriod dateTime values or specifying only a firstPeriod dateTime value accompanied with the number of Occurrences. A periodic activity might have exceptions. Object property exceptions, ranging over the course class, is employed to accommodate the deviations from the base definition. In this case, an integer data property named occurrence is associated with the course class in order to discriminate between the various occurrences. Moreover, property missingPeriods, containing a collection of positive integers, is used to indicate the missing occurrences. A course individual might be defined recursively from other simpler course individuals, as shown in Chapter 2. Class courseComposite servers exactly this purpose. The main property of this class is bagOfCourses, ranging over the entire course class. Several constraints might hold between the various simpler individuals comprising a courseComposite one. The before constraint is a binary constraint with two object properties, firstCourse and secondCourse, ranging over the course individuals. It implies the order of the two courses. Another constraint is the atLeastOne, which applies over optional courses. Property bagOfCourses is used again to designate the involved course individuals. Other types of constraints can be defined as well.

Towards Intelligent Management of a Student's Time

387

Fig. 2. A schematic representation of the EduPlan ontology. Solid lines denote subclass relationships, whereas dashed lined denote object properties.

5 The Overall Architecture Making informative decision as for whether to accomplish a learning activity has three requirements: a precise estimate of the workload imposed by the learning activity, a precise estimate of other duties of the prospective student and an efficient scheduler. The SELFPLANNER system described in Section 3 covers the last two requirements, as soon as users keep their calendars as up-to-date as possible. Concerning the first requirement, we need an information system that will manage information about a variety of learning objects and offered courses. The ontology presented in the previous Section could serve as a basis for this purpose. Taking into account that intelligent calendar application’s independence increases its utility, we consider the two parts of our architecture as separate systems that communicate through web-service invocation. Finally, taking into account SELFPLANNER’s architecture, with the intelligent component being distinct from the calendar application (i.e., Google Calendar), a third part of our architecture comprises the user’s calendar. Fig. 3 depicts EDUPLAN architecture.

Fig. 3. EDUPLAN overall architecture

388


Focusing on the information system, we aim for a more personalized experience. The information system should be able to make personalized estimates for the workload, based on the student’s profile. A user profile can be created both explicitly, with direct encoding by the student, and implicitly, by receiving feedback from the user concerning the actual workload. Lazy learning methods, such as the k-nearest neighbor [3], can be used to obtain these estimates. A more elaborate student’s profile might also retain her already achieved skills. Finally, user preferences as for how to schedule asynchronous learning activities should also be manually provided or learnt using reinforcement learning techniques [4].

6 Conclusions and Further Work This paper presented a system under development called EDUPLAN, which aims at allowing the students to take informative decisions of whether they can afford to attend another learning object or not; furthermore, the system will allow them to schedule all their educational activities within their calendar. Several parts of the overall architecture, mainly the intelligent calendar module, have already implemented. Apart from the system’s architecture, in this paper we presented an ontology that could be used to describe workload aspects of the learning objects. The major next step concerns the development of the information system. Finally, the information system has to integrate with the SELFPLANNER application through the exposition of suitable interfaces.

References 1. Alexiadis, A., Refanidis, I.: Defining a Task’s Temporal Domain for Intelligent Calendar Applications. In: 5th IFIP Conference on Artificial Intelligence Applications & Innovations (AIAI 2009), Thessaloniki, pp. 399–406. Springer, Heidelberg (2009) 2. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2009) 3. Cover, T.M., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967) 4. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4, 237–285 (1996) 5. Refanidis, I.: Managing Personal Tasks with Time Constraints and Preferences. In: Proc. of ICAPS 2007, RI, US, pp. 272–279. AAAI Press, Menlo Park (2007) 6. Refanidis, I., Alexiadis, A.: Deployment and Evaluation of SelfPlanner, an Automated Individual Task Management System. Computational Intelligence (2010) (to be published)

Virtual Simulation of Cultural Heritage Works Using Haptic Interaction Konstantinos Moustakas and Dimitrios Tzovaras Informatics and Telematics Institute / Centre for Research and Technology Hellas, 57001 Thessaloniki, Greece {moustak,tzovaras}@iti.gr

Abstract. This paper presents a virtual reality framework for the modeling and interactive simulation of cultural heritage works with the use of advanced human computer interaction technologies. A novel algorithm is introduced for realistic real-time haptic rendering that is based on an efficient collision detection scheme. Smart software agents assist the user in manipulating the smart objects in the environment, while haptic devices are utilized to simulate the sense of touch. Moreover, the virtual hand that simulates the user’s hand is modeled using analytical implicit surfaces so as to further increase the speed of the simulation and the fidelity of the force feedback. The framework has been tested with several ancient technology works and has been evaluated with visitors of the Science Center and Technology Museum of Thessaloniki. Keywords: virtual reality, cultural heritage, simulation, haptic rendering.

1 Introduction A recent trend of museums and exhibitions of Ancient Greek Technology is the use of advanced multimedia and virtual reality technologies for improving the educational potential of their exhibitions [1]. In [2] the authors utilize augmented reality technology to present an archaeological site. Another attempt was to visually enhance archaeological walkthroughs through the use of visualization techniques [3]. Even if the acceptance of these applications by the museum visitors is considered to be high, there is a clear need for more realistic presentations, including haptic interaction that should be able to offer to the user the capability of interacting with the simulation, achieving in this way enhanced educational and pedagogical benefits. However, these interactive applications involve several time-consuming processes that in most of the cases inhibit any attempt of real-time simulation, like collision detection [4], i.e. the identification of colliding parts of the simulated objects, and the haptic rendering [5], i.e. the calculation of the force that should be fed back to the user via a haptic device. Moreover, the proper modeling of the scene objects so as to increase the ease of representation and simulation is of high importance. In this paper a highly efficient simulator is presented that is based on a robust haptic rendering scheme and interaction agents so as to provide the necessary force feedback to the user for haptic interaction with the virtual environment in real time. S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 389–394, 2010. © Springer-Verlag Berlin Heidelberg 2010

390

K. Moustakas and D. Tzovaras

2 System Overview The main goal is to enhance the realistic simulation and demonstration of the technology works and to present their educational/pedagogical characteristics. The user is allowed to interact with the mechanisms in the virtual environment either by constructing or by using them via the proposed haptic interface. The application aims to contribute to the development of a new perception of the modern era needs, by making reference to the technology evolution, efficiently demonstrating Ancient Greek Technology works, presenting their evolution in time and linking this evolution with corresponding individual and social needs.

Fig. 1. Architecture of the proposed framework

Figure 1 illustrates the general architecture of the proposed framework. A simulation scenario is initially designed using the authoring tool and the available 3D content of the multimedia database. During the interaction with the user the core simulation unit takes as input the user’s actions and the simulation scenario. The software agents perform a high level interpretation of the user’s actions and decide upon the next simulation steps. In parallel, the collision detector checks for possible collision during each simulation step and whenever collision is detected the advanced haptic rendering engine provides the appropriate force feedback to the user that is displayed using either the Phantom or the CyberGrasp haptic device. All aforementioned processes are described in the following sections.

3 Simulation Engine 3.1 Smart Object-Scene Modeling In order to simplify all underlying simulation and interaction processing a smart object modeling tool was created that provides an environment to the expert user for manipulating all the necessary data so as to create an educational scenario. The tool provides: a) functionalities for the composition of 3D simulations, b) connecting with the VR haptic devices c) parameterizing the intelligent software agents that simulate

Virtual Simulation of Cultural Heritage Works Using Haptic Interaction

391

the functionality of parts of Ancient Greek mechanisms, d) composing, processing and storing scenarios, e) integration of various scenarios and f) modifying simulation, interaction and haptic parameters of the objects. The expected increased complexity of the scenario files, lead to the adoption of X3D standard as the scenario format, in order to be able to create more realistic applications. Information that cannot be supported directly from the X3D format is stored as a meta tag of the X3D scenario file. The tool allows the user to select virtual reality agents, associate them with objects in the scene, insert and modify their parameters and provide constrains. The objects may have different characteristics and associations in each step of the scenario according to its needs. The author can control the flow of a scenario using simple arithmetic rules (i.e. , =) in order to trigger the next step in the scenario depending on the actions of the user. Moreover, in the context of the current framework the virtual hand is modeled using superquadrics. All other objects are also modeled using superquadrics augmented with distance maps [7], so as to preserve their accurate geometry. 3.2 Haptic Rendering A very efficient collision detection scheme presented in [7] is utilized in the proposed framework to resolve collisions and perform realistic simulations. Moreover, a simple and very efficient haptic rendering scheme has been developed that utilizes the superquadric representation of the virtual hand to rapidly estimate the force feedback. Consider that point P is a point of the penetrating object and is detected to lie inside a segment of the virtual hand (Figure 2).

Fig. 2. Force feedback evaluation P Let also S SQ represent the distance of point P from the superquadric segment,

which corresponds to point PSQ on the superquadric surface, i.e. PSQ is the projection of P onto the superquadric. The amplitude of the force fed onto the haptic devices is obtained using a simple spring model as illustrated in Figure 2. In particular: P F = k ⋅ SSQ

(1)

where k is the stiffness of the spring. The rest length of the spring is set to zero so that it tends to bring point P onto the superquadric surface. In the present framework, the already obtained superquadric approximation is used in order to rapidly evaluate the

392


force direction. More precisely, the direction of the force feedback is set to be perpendicular to the superquadric surface at point PSQ . In particular using the parametric representation of the superquadric [6], the normal vector is defined at point r (η , ω ) as the cross product of the tangent vectors along the coordinate curves. n (η ,ω ) = tη (η ,ω ) × tω (η ,ω ) = ⎡1 ⎤ 1 1 = s (η ,ω ) ⎢ cos 2−ε1 η ⋅ cos2 −ε 2 ω , cos2 −ε1 η ⋅ sin 2−ε 2 ω , sin 2−ε1 η ⎥ a2 a3 ⎣ a1 ⎦

T

(2)

where s (η , ω ) = −a1a2 a3ε1ε 2 sin ε1 −1 η ⋅ cos 2ε1 −1 η ⋅ sin ε 2 −1 ω ⋅ cosε 2 −1 ω

(3)

Thus, the resulting force is estimated from the following equation: P F = k ⋅ S SQ

n (η , ω ) n (η , ω )

(4)

A significant advantage of the proposed haptic rendering scheme is that friction and haptic texture can be analytically modeled by modifying the above equation through the addition of the friction component. In particular, P Ffriction = − fC ⋅ (1 + k f ⋅ S SQ )

n f (η , ω ) n f (η , ω )

(5)

where fC is the friction coefficient and nf the direction of the motion of the processed point. The product in the parenthesis is perpendicular to the penetrating distance in order to increase the magnitude of the friction force when the penetration depth of the processed point increases. Factor kf controls the contribution of the penetration depth to the calculated friction force. Finally, the force fed onto the haptic device yields from the addition of the reaction and the friction force. Following a similar procedure, a force component related to haptic texture can be also modeled. 3.3 Geometry Construction and Haptic Interaction Agents

The geometry construction agent (GCA) is responsible for the construction and assembly of the geometrical objects in the scene. The agent allows the user to insert a variety of objects. Default sized geometrical objects can be used in order to construct an environment rapidly. The properties of inserted objects can be modified using one or more control points. The GCA is responsible to appropriately check and modify the user’s actions in order to allow only admissible modifications to the objects. The (Haptic Interaction Agent) HIA is responsible for returning force feedback to the user, providing sufficient data to the Geometry Construction Agent and triggering the appropriate actions according to the user input. The environment supports different layers in order to provide an easier way of interaction to the user. The user can select one layer as active and multiple layers as visible. The HIA returns feedback only when the hand is in contact to objects of visible layers and actions of the user modify the active layer. The HIA receives collision information from the collision detection sub-component and is responsible to trigger actions in the haptic environment and send haptic feedback to the user. Feedback is send to the fingers that

Virtual Simulation of Cultural Heritage Works Using Haptic Interaction

393

touch any visible geometry in the scene. The HIA decides when geometries in the scene are grasped or released by the user hand. To grasp an object the user must touch the object with the thumb and index fingertips. To release an object the index and thumb fingers should retain from touching the object.

4 Evaluation and Experimental Results The proposed framework has been evaluated with simulations on the functionality of ancient technology works and war machines. The evaluation included both the assembly and functional simulation of the virtual prototypes by several users including also visitors of the Science Center and Technology Museum of Thessaloniki. The virtual prototypes include the Archimedes screw-pump, the Ktisivios pump, single and double pulley cranes, catapults, cross-bows, the sphere of Eolos, the odometer and other ancient machines. Illustrations of some of the aforementioned virtual prototypes are depicted in Figure 3.

Fig. 3. Ancient technology works. Starting from the top left image: Archimedes screw pump, catapult, double pulley crane, odometer, Eolos’ sphere.

The simulation fidelity and efficiency of ancient technology works was tested in the context of the performed scenarios and haptic rendering update rate of 1kHz can be achieved even for large and detailed virtual environments, while this was not possible with state-of-the-art mesh-based approaches. Moreover, the force feedback obtained from the proposed scheme is not suffering from the force discontinuities at the edges of the mesh triangles, on contrary to the approaches that generate the force feedback directly from the meshes of the colliding objects and does not produce the overounded effect of the force shading method [8]. The system has been evaluated in tests with visitors of the Science Center and Technology Museum of Thessaloniki, in Greece. The test procedure consisted of two phases: In the first phase, the users were introduced to the system and they were asked

394


to use it. During this phase, they were asked questions that focused on usability issues and on their interest in participating to each test. The questionnaire contained also questions to the test observers, e.g. if the user performed the task correctly, how long did it take him/her to perform the task, etc. The second phase was carried out immediately after the tests, using an after tests questionnaire. Specifically, the users where questioned, after finishing all the tests, about general issues such as: (a) the benefits and limitations that they foresee on this technology, (b) the usability of the system in a museum environment, (c) other tests and applications or technologies that they would like to experiment with the application, if any, etc. The system evaluation results have shown that users consider it very innovative and satisfactory in terms of providing a presentation environment in a real museum. The percentage of the satisfied users was over 90%.

5 Conclusions In this paper a novel framework for the simulation of ancient technology works was presented. Novel virtual reality technologies for object modeling and haptic rendering have been proposed that provide realistic interactive simulation using haptic devices. Moreover, a number of simulation scenarios have been developed and evaluated by visitors of the science center and technology museum in Thessaloniki, Greece. Specifically, the analysis of the basic characteristics of Ancient Greek Technologies are presented using virtual reality environments, so that they can become easily perceptible even in those that are not familiar with the technology. In this way, the platform contributes substantially in the general effort to promote the knowledge on Ancient Technologies.

References 1. Iliadis, N.: Learning Technology Through the Internet. Kastaniotis Publisher, Athens (2002) 2. Ledermann, F., Schmalstieg, D.: Presenting an archaeological site in the virtual showcase: Proceedings of the 2003 conference on Virtual reality, archeology, and cultural heritage. ACM Press, New York (2003) 3. Papaioannou, G., Christopoulos, D.: Enhancing virtual reality walkthroughs of archaeological sites. In: Proceedings of the 2003 conference on Virtual reality, archeology, and cultural heritage. ACM Press, New York (2003) 4. Gottschalk, S., Lin, M.C., Manocha, D.: OBBTree: A Hierarchical Structure for Rapid Interference Detection. In: Proc. ACM SIGGRAPH, pp. 171–180 (1996) 5. McNeely, W.A., Puterbaugh, K.D., Troy, J.J.: Six Degree-of-Freedom Haptic Rendering Using Voxel Sampling. In: Computer Graphics and Interactive Techniques, pp. 401–408 (1999) 6. Solina, F., Bajcsy, R.: Recovery of parametric models from range images: The case for uperquadrics with global deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(2), 131–147 (1990) 7. Moustakas, K., Tzovaras, D., Strintzis, M.G.: SQ-Map: Efficient Layered Collision detection and haptic rendering. IEEE Transactions on Visualization and Computer Graphics 13(1), 80–93 (2007) 8. Ruspini, D.C., Kolarov, K., Khatib, O.: The Haptic Display of Complex Graphical Environments. In: Computer Graphics (SIGGRAPH 1997 Conference Proceedings), pp. 345–352 (1997)

Ethnicity as a Factor for the Estimation of the Risk for Preeclampsia: A Neural Network Approach Costas Neocleous1, Kypros Nicolaides2, Kleanthis Neokleous3, and Christos Schizas3 1 Department of Mechanical Engineering, Cyprus University of Technology, Lemesos, Cyprus [email protected] 2 Harris Birthright Research Centre for Fetal Medicine, King’s College Hospital Medical School, Denmark Hill, SE5 8RX, London, United Kingdom [email protected] 3 Department of Computer Science, University of Cyprus, 75 Kallipoleos, 1678, POBox 20537, Nicosia, Cyprus [email protected], [email protected]

Abstract. A large number of feedforward neural structures, both standard multilayer and multi-slab schemes have been applied to a large data base of pregnant women, aiming at generating a predictor for the risk of preeclampsia occurrence at an early stage. In this study we have investigated the importance of ethnicity on the classification yield. The database was composed of 6838 cases of pregnant women in UK, provided by the Harris Birthright Research Centre for Fetal Medicine in London. For each subject 15 parameters were considered as the most influential at characterizing the risk of preeclampsia occurrence, including information on ethnicity. The same data were applied to the same neural architecture, after excluding the information on ethnicity, in order to study its importance on the correct classification yield. It has been found that the inclusion of information on ethnicity, deteriorates the prediction yield in the training and test (guidance) data sets but not in the totally unknown verification data set. Keywords: preeclampsia, neural predictor, ethnicity, gestational age.

1 Introduction Preeclampsia is a syndrome that may appear during pregnancy and can cause perinatal and maternal morbidity and mortality. It affects approximately 2% of pregnancies [1; 2]. It is characterized by hypertension and by significant protein concentration in the urine (proteinuria). Such a high blood pressure may result in damage to the maternal endothelium, kidneys and liver [3; 4]. Preeclampsia may occur during the late 2nd or 3rd trimesters. It has also been observed that it is more common to women on their first pregnancy. The prevailing conditions that lead to preeclampsia are not well understood, hence its diagnosis depends on appropriate signs or suitable investigations [5]. The likelihood of developing S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 395–398, 2010. © Springer-Verlag Berlin Heidelberg 2010

396

C. Neocleous et al.

preeclampsia is thought to increase by a number of factors in the maternal history, nulliparity, high body mass index (BMI), and previous personal or family history of preeclampsia. However, screening by maternal history alone will detect only 30% of those who will develop the condition, with a false positive rate of 10%. Thus, the early diagnosis of preeclampsia is a difficult task, and the prediction even more difficult. Attempts of preeclampsia prevention by using prophylactic interventions have been rather unsuccessful [6; 7]. Thus, any tool that may improve its detection, as for instance a reliable predictor or a method for the effective and early identification of the high-risk group, would be of great help to obstetricians and of course to pregnant women. In recent years, neural networks and other computationally intelligent techniques have been used as medical diagnosis tools aiming at achieving effective medical decisions incorporated in appropriate medical support systems [8; 9]. Neural networks in particular have proved to be quite effective and also have resulted in some relevant patents [10; 11].

2 Data The data were obtained from the greater London area and South-East England, from pregnant women who had singleton pregnancies and attended routine clinical and ultrasound assessment of the risk for chromosomal abnormalities. The database was composed of 6838 cases of pregnant women. For each woman, 24 parameters that were presumed to contribute to preeclampsia were recorded. Some of these parameters were socio-epidemiologic, others were records from ultrasound examination and some from appropriate laboratory measurements. Based on recommendations from medical experts, only 15 parameters were ultimately considered to be the most influential at characterizing the risk of preeclampsia occurrence, and those were used in the built-up of the neural predictor. These are: Mean arterial pressure (MAP), Uterine pulsatility index (UPI), Serum marker PAPPA, Ethnicity, Weight, Height, Smoking? (Y/N), Alcohol consumption? (Y/N), Previous preeclampsia, Conception (spontaneous, ovulation drug or IVF), Medical condition of pregnant woman, Drugs taken by the pregnant woman, Gestation age (in days) when the crown rump length (CRL) was measured, Crown rump length, Mother had preeclampsia? (Y/N). The parameters were encoded in appropriate numerical scales that could make the neural processing to be most effective. A network guidance test set of 36 cases was extracted and used to test the progress of training. This data set included 16 cases (44%) of women that exhibited preeclampsia. Also, a verification data set having 9 cases out of which 5 were with preeclampsia (56%) was extracted to be used as totally unknown to the neural network, and thus to be used for checking the prediction capabilities of each attempted network.

3 Neural Predictor A number of feedforward neural structures, both standard multilayer, of varying number of layers and neurons per layer, as well as multi-slab of different structures, sizes,

Ethnicity as a Factor for the Estimation of the Risk for Preeclampsia

397

and activation functions, were systematically tried for the prediction. The structure ultimately selected and used a multislab neural structure having four slabs that were connected as depicted in Figure 1. Based on extensive previous experience of some of the authors, all the weights were initialized to 0.3, while the learning rate was the same for all connections, having value of 0.1. Similarly, the momentum rate was 0.2 for all links. INPUT – 15 Characteristics MAP, UPI, PAPP_A, Ethnicity, Weight Height, Smoking, Alcohol, Previous PET, Conception Medical condition, Drugs, GA in days, CRL, Mother’s previous PET (Linear activation)

SLAB 1

SLAB 2

100 neurons

100 neurons

(Gaussian complement activation)

(Gaussian activation)

OUTPUT – Preeclampsia occurrence (Logistic activation)

Fig. 1. The neural structure that was selected and used for the prediction of preeclampsia

4 Results Table 1 shows the data characteristics and the overall prediction results when ethnicity was included among the input information. Table 1. Preeclampsia prediction results when all characteristics were used

No of subjects in the database No of preeclampsia cases Percentage of preeclampsia cases Cases predicted Percentage of cases predicted Preeclampsia cases predicted Percentage of Preeclampsia cases predicted

TRAINING SET 6793 116 1.7 3024 44.5 97 83.6

TEST SET 36 16 44.4 26 72.2 15 93.8

VERIFICATION SET 9 5 55.6 7 77.8 5 100

5 Conclusion and Future Work Considering the importance of ethnicity, contrary to the conclusions of Chamberlain and Steer [6], it has been found that by including such information the prediction

398

C. Neocleous et al. Table 2. Preeclampsia prediction results when “ethnicity” information was not used

No of preeclampsia cases Preeclampsia cases predicted Percentage of Preeclampsia cases predicted

TRAINING SET 116 99 85.3

TEST SET 16 16 100

VERIFICATION SET 5 5 100

yield becomes worse as it can easily be observed from Table 2. In fact, it is noted that when “ethnicity” information is provided to the network, the training and test data set predictions improved, especially so, that of the data set. Thus, it may be concluded that this information is not needed in order to assure high prognosis yield. In future work, a sensitivity analysis on other important predictors will be done in order to reach a trimmed network that may effectively predict preeclampsia using as little as possible input information.

Acknowledgments The FMF foundation is a UK registered charity (No. 1037116). We would also like to kindly acknowledge Dr Leona C. Poon and Dr Panayiotis Anastasopoulos for their contribution to the initial organization of the parameters from the original database.

References 1. World Health Organization. Make Every Mother and Child Count, World Health Report, Geneva, Switzerland (2005) 2. Lewis, G. (ed.): Why Mothers Die 2000–2002: The Sixth Report of Confidential Enquiries Into Maternal Deaths in the United Kingdom, pp. 79–85. RCOG Press, London (2004) 3. Drife, J., Magowan, B. (eds.): Clinical Obst. and Gyn., ch. 39, pp. 367–370. Saunders, Philadelphia (2004) 4. Douglas, K., Redman, C.: Eclampsia in the United Kingdom. Br. Med. J. 309(6966), 1395– 1400 (1994) 5. James, D., Steer, P., Weiner, C., Gonik, B. (eds.): High Risk Pregnancy, ch. 37, pp. 639– 640. Saunders, Philadelphia (1999) 6. Chamberlain, G., Steer, P.: Turnbull’s Obstetrics, ch. 21, pp. 336–337. Churchill Livingstone (2001) 7. Moffett, A., Hiby, S.: How does the maternal immune system contribute to the development of pre-eclampsia? Placenta (2007) 8. Yu, C., Smith, G., Papageorghiou, A., Cacho, A., Nicolaides, K.: An integrated model for the prediction of pre-eclampsia using maternal factors and uterine artery Doppler velocimetry in unselected low-risk women. Am. J. Obstet. Gynecol. 193, 429–436 (2005) 9. US Patent 5839438 Computer-based neural network system and method for medical diagnosis and interpretation. US Patent 5839438

A Multi-class Method for Detecting Audio Events in News Broadcasts Sergios Petridis, Theodoros Giannakopoulos, and Stavros Perantonis Computational Intelligence Laboratory, Institute of Informatics and Telecommunications, National Center of Scientific Research Demokritos {petridis,sper}@iit.demokritos.gr, [email protected]

Abstract. We propose a method for audio event detection in video streams from news. Apart from detecting speech, which is obviously the major class in such content, the proposed method detects five non-speech audio classes. The major difficulty of the particular task lies in the fact that most of the non-speech audio events are actually background sounds, with speech as the primary sound. We have adopted a set of 21 statistics computed on a mid-term basis over 7 audio features. A variation of the One Vs All classification architecture has been adopted and each binary classification problem is modeled using a separate probabilistic Support Vector Machine. Experiments have shown that the proposed method can achieve high precision rates for most of the audio events of interest. Keywords: Audio event detection, Support Vector Machines, Semiautomatic multimedia annotation.

1

Introduction

With the huge increase of multimedia content that is made available over Internet, a number of methods have been proposed for automatic characterization of this content. Especially for the case of multimedia files from news broadcasts, several methods have been proposed for automatic annotation, though, only a few of those make extensive use of the audio domain ([1], [2], [3], [4]). In this work, we propose an audio-based algorithm for event detection in real broadcaster videos. This work is part of the CASAM European project (www.casamproject.eu), which aims at computer-aided semantic annotation of multimedia data. Our main goal is to detect (apart from speech) five non-speech sounds that were met in our datasets from real broadcasts. Most of these audio events were secondary sounds to the main event which is obviously speech. This task of recognizing background audio events in news can help in extracting richer semantic information from such content.

2

Audio Class Description

Since the purpose of this work is to analyze audio streams from news, it is expected that the vast majority of the audio data is speech. Therefore, the first S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 399–404, 2010. c Springer-Verlag Berlin Heidelberg 2010

400

S. Petridis, T. Giannakopoulos, and S. Perantonis

of the audio class we have selected to detect is speech. Speech tracking may be useful if its results were used by another audio analysis module, e.g. by a speech recognition task. Though, the detection of speech as an event is not of major importance in a news audio stream. Therefore, the following more semantically rich audio classes have been selected: music, sound of water, sound of air, engine sounds and applause. In a news audio stream the above events most of the times exist as background events, with speech being the major sound. Hence, the detection of such events is obviously a hard task. It has to be noted that an audio segment can, at the same time, be labeled as speech and as some other type of event, e.g. music.

3 3.1

Audio Feature Extraction Short-Term and Mid-term Processing for Feature Extraction

In order to calculate any audio feature of an audio signal, it is needed to adopt a short-term processing technique. Therefore, the audio signal is divided in (overlapping or non-overlapping) short-term windows (frames) and the feature value f is calculated for each frame. Therefore, an array of feature values F, for the whole audio signal is calculated. We have selected a frame size equal to 40 msecs, and a step of 20 msecs. The process of short-term windowing, described above, leads, for each audio signal, to a sequence F of feature values. This sequence can be used for processing / analysis of the audio data. Though, a common technique is the processing of the feature in a mid-term basis. According to this technique, the audio signal is first divided into mid-term windows (segments) and then for each segment the short-term process is executed. In the sequel, the sequence F, which has been extracted for each segment, is used for calculating a statistic, e.g., the average value. So finally, each segment is represented by a single value which is the statistic of the respective feature sequence. We have chosen to use a 2 second mid-term window, with a 1 second step. 3.2

Adopted Audio Features and Respective Statistics

We have implemented 7 audio features, while, for each feature three statistics have been used in a mid-term basis: mean value, standard deviation and std by mean ratio. Therefore, in total, each mid-term window is represented by 21 feature values. In the following, the 7 features are presented, along with some examples of their statistics for different audio classes. For more detailed descriptions of the adopted audio features the reader can refer to [5]. Energy. Let xi (n), n = 1, . . . , N the audio samples of the i−th frame, of length N . Then, for each N frame i the energy is calculated according to the equation: E(i) = N1 n=1 |xi (n)|2 . A statistic that has been used for the case of

A Multi-class Method for Detecting Audio Events in News Broadcasts

401

discriminating signals with large energy variations (like speech, gunshots etc.) is the standard deviation σ 2 of the energy sequence. Zero Crossing Rate. Zero Crossing Rate (ZCR) is the rate of sign-changes of a signal, i.e., the number of times the signal changes from positive to negative or back, per time unit. It can be used for discriminating noisy environmental 2 sounds, e.g., rain. In speech signals, the σμ ratio of the ZCR sequence is high, since speech contains unvoiced (noisy) and voiced parts and therefore the ZCR values have abrupt changes. On the other hand, music, being largely tonal in nature, does not show abrupt changes of the ZCR. ZCR has been used for speechmusic discrimination ([6]) and for musical genre classification ([7]). Energy Entropy. This feature is a measure of abrupt changes in the energy level of an audio signal. It is computed by further dividing each frame into K sub-frames of fixed duration. For each sub-frame j, the normalized energy e2j is calculated, i.e., the sub-frame’s energy, divided by the whole frame’s energy. Afterwards, the entropy of this sequence is computed. The entropy of energy of an audio frame is lower if there are abrupt changes present in that audio frame. Therefore, it can be used for discrimination of abrupt energy changes. Spectral Centroid. The spectral centroid Ci , of the i-th frame is defined as the center of “gravity” of its spectrum. This feature is a measure of the spectral position, with high values corresponding to “brighter” sounds. Position of the Maximum FFT Coefficient. This feature directly uses the FFT coefficients of the audio segment: the position of the maximum FFT coefficient is computed and then normalized by the sampling frequency. This feature is another measure of the spectral position. Spectral Rolloff. Spectral Rolloff is the frequency below which certain percentage (usually around 90%) of the magnitude distribution of the spectrum is concentrated. It is a measure of the spectral shape of an audio signal and it can be used for discriminating between voiced and unvoiced speech ([8]). Spectral Entropy. Spectral entropy ([9]) is computed by dividing the spectrum of the short-term frame into L sub-bands (bins). The energy Ef of the f -th subband, f = 0, . . . , L − 1, is then normalized by the total spectral energy, yielding E nf = L−1f E , f = 0, . . . , L − 1. The entropy of the normalized spectral energy f f =0 L−1 n is then computed by the equation: H = − f =0 nf · log2 (nf ).

4

Event Detection

As described in Section 3, the mid-term analysis procedure leads to a vector of 21 elements for each mid-term window. In order to classify each audio segment, we

402


have adopted Support Vector Machines (SVMs) and a variation of the One Vs All classification architecture. In particular, each binary classification task ,e.g., ’Speech Vs Non-Speech’, ’Music Vs Non-Music’, etc, is modeled using a separate SVM. The SVM has a soft output which is an estimation of the probability that the input sample (i.e. audio segment) belongs to the respective class. Therefore, for each audio segment the following soft classification outputs are extracted: Pspeech , Pmusic , Pair , Pwater , Pengine , Papplause . Furthermore, six corresponding thresholds are defined for each binary classification task. In the training stage, apart from the training of the SVMs, a cross-validation procedure is executed for each of the binary classification sub-problems, in order to estimate the thresholds which maximize the respective binary precision rates. For each audio segment the following three possible classification decisions can exist: a) The label Speech can be given to the segment. b) Any of the non-speech labels can be given to the segment. c) The labels Speech and any of the other labels can be given to the segment. d) The segment can be left unlabeled. In the event detection testing stage, given the 7 soft decisions from the respective binary classification tasks, for each 1-sec audio segment the following process is executed: – If Pspeech ≥ Tspeech , then the label ’Speech’ is given to the segment. – For each of the other labels i, i ∈ {music, air, water, engine, applause}: if Pi < Ti then Pi = 0. – Find the maximum of the non-speech soft outputs and its label imax. – If Pimax > Timax then label the segment as imax. The above process is repeated for all mid-term segments of the audio stream. As a final step, successive audio segments that share the same label are merged. This leads to a sequence of audio events, each one of which is characterized by its label and its time limits.

5 5.1

Experimental Results Datasets and Manual Annotation

For training - testing purposes, two datasets have been populated in the CASAM project: one from the a German international broadcaster (DW- DeutscheWelle) and the second from the Portuguese broadcaster (Lusa - Agncia de Notcias de Portuga). Almost 100 multimedia streams (7 hours total duration) from the above datasets have been manually annotated, using the Transcriber Tool (http://trans.sourceforge.net/). The annotation on the audio stream is carried out in a segment basis. For each homogenous segment, two labels are defined: the primary label is binary and corresponds to the existence of speech, while the secondary label is related to the type of background sound. In Table 1, a representation for an example of an annotated audio file is shown.

A Multi-class Method for Detecting Audio Events in News Broadcasts

403

Table 1. Representation example for an annotated audio file Segment Start Segment End Primary Label (speech) Secondary Label 0 1.2 yes engine 1.2 3.3 no engine 3.3 9.8 no music ... ... ... ... Table 2. Detection performance measures Class names Recall(%) Precision(%) Speech 87 80 SoundofAir 20 82 CarEngine 42 87 Water 52 90 Music 56 85 Applause 59 99 Average (non-speech events) 45 86

5.2

Method Evaluation

Performance measures. The audio event detection performance measures should differ from the standard definitions used in the classification case. In order to proceed, let us first define an event, as the association of a segment s with an element c of a class set : e = {s → c}. Furthermore, let S be the set of all segments of events known to hold as ground truth and S be the set of all segments of events found by the system. For a particular class label c. Also, let S(c) = {s ∈ S : s → c} be the set of ground truth segments associated to class c, ¯ = {s ∈ S : s S(c) → c = c } the set of ground truth segments not associated to class c, S (c) = {s ∈ S : s → c} the set of system segments associated to class c and S¯ (c) = {s ∈ S : s → c = c} the set of system segments not associated to class c. In the sequel let, two segments and a threshold value t ∈ (0, 1). We define | the segment matching function g : S × S → {0, 1} as: gt (s, s ) = |s∩s |s∪s | > t. For defining the recall rate, let A(c) be the ground truth segments s → c for which there exist a matching segment s → c A(c) = {s ∈ S(c), ∃s ∈ S (c) : gt (s, s ) = 1}. Then, the recall of class c is defined as: Recall(c) = |A(c)| |S(c)| . In order to define the event detection precision, let A (c) be the system segments s → c for which there exist a matching segment s → c: A (c) = {s ∈ S(c), ∃s ∈ S(c) : gt (s, s ) = (c)| 1}. Then the precision of class c is defined as: P recision(c) = |A |S (c)| . Performance results. In Table 2, the results of the event detection process is presented. It can bee seen that for most of the audio event types the precision rate is at above 80%. Furthermore, the average performance measures for all non-speech events has been calculated. In particular, the recall rate was found

404


equal to 45%, while precision was 86%. This actually means that almost half of the manually annotated audio events were successfully detected, while 86% of the detected events were correctly classified.

6

Conclusions

We have presented a method for automatic audio event detection in news videos. Apart from detecting speech, which is obviously the most dominant class in the particular content, we have trained classifiers for detecting five other types of sounds, which can provide important content information. Our major purpose was to achieve high precision rates. The experimental results, carried out over a large dataset from real news streams, indicate that the precision rates are always above 80%. Finally, the proposed method managed to detect almost 50% of all the manually annotated non-speech events, while from all the detected events 86% were correct. This is a rather high performance, if we take into consideration that most of these events exist as background sounds to speech in the given content. Acknowledgments. This paper has been supported by the CASAM project (www.casam-project.eu).

References 1. Mark, B., Jose, J.M.: Audio-based event detection for sports video. In: Bakker, E.M., Lew, M., Huang, T.S., Sebe, N., Zhou, X.S. (eds.) CIVR 2003. LNCS, vol. 2728, pp. 61–65. Springer, Heidelberg (2003) 2. Baillie, M., Jose, J.: An audio-based sports video segmentation and event detection algorithm. In: 2004 Conference on Computer Vision and Pattern Recognition Workshop, pp. 110–110 (2004) 3. Tzanetakis, G., Chen, M.: Building audio classifiers for broadcast news retrieval. In: 5th International Workshop on Image Analysis for Multimedia Interactive Services, Lisboa, Portugal, April 2004, pp. 21–23 (2004) 4. Huang, R., Hansen, J.: Advances in unsupervised audio segmentation for the broadcast news and ngsw corpora. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004, vol. 1 (2004) 5. Giannakopoulos, T.: Study and application of acoustic information for the detection of harmful content, and fusion with visual information. PhD thesis, Dpt. of Informatics and Telecommunications, University of Athens, Greece (2009) 6. Panagiotakis, C., Tziritas, G.: A speech/music discriminator based on rms and zerocrossings 7(1), 155–166 (2005) 7. Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing 10(5), 293–302 (2002) 8. Hyoung-Gook, K., Nicolas, M., Sikora, T.: MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval. John Wiley & Sons, Chichester (2005) 9. Misra, H., et al.: Spectral entropy based feature for robust asr. In: ICASSP, Montreal, Canada (2004)

Flexible Management of Large-Scale Integer Domains in CSPs Nikolaos Pothitos and Panagiotis Stamatopoulos Department of Informatics and Telecommunications, University of Athens, Panepistimiopolis, 157 84 Athens, Greece {pothitos,takis}@di.uoa.gr

Abstract. Most research on Constraint Programming concerns the (exponential) search space of Constraint Satisfaction Problems (CSPs) and intelligent algorithms that reduce and explore it. This work proposes a different way, not of solving a problem, but of storing the domains of its variables, an important—and less focused—issue especially when they are large. The new data structures that are used are proved theoretically and empirically to adapt better to large domains, than the commonly used ones. The experiments of this work display the contrast between the most popular Constraint Programming systems and a new system that uses the data structures proposed in order to solve CSP instances with wide domains, such as known Bioinformatics problems. Keywords: CSP domain, Bioinformatics, stem-loop detection.

1

Introduction

Constraint Programming is an Artificial Intelligence area that focuses on solving CSPs in an efficient way. A CSP is a triplet containing variables, their domains (i.e. set of values) and constraints between variables. The simplicity of this definition makes Constraint Programming attractive to many Computer Science fields, as it makes it easy to express a variety of problems. When it comes to solving a CSP, the main problem that we face is the exponential time needed, in the general case. The space complexity comes in second place, as it is polynomial in the size (usually denoted d) of the largest domain. But is O(d) the best space—and therefore time—complexity we can achieve when we have to store a domain? Is it possible to define a lower bound for this complexity? Memory management is a crucial factor determining a Constraint Programming system speed, especially when d is too big. Gent et al. have recently described data structures used to propagate the constraints of a CSP [3]. To the best of our knowledge, the representation of a domain itself has not yet been the primary sector of interest of a specific publication in the area. Nevertheless, Schulte and Carlsson in their Constraint Programming systems survey [7] defined formally the two most popular data structures that can represent a finite set of integers: S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 405–410, 2010. c Springer-Verlag Berlin Heidelberg 2010

406

N. Pothitos and P. Stamatopoulos

Bit Vector. Without loss of generality, we suppose that a domain D contains only positive integer values. Let a be a bit array. Then the value v belongs to D, if and only if a[v] = 1. Bit vector variants are implemented in many Constraint Programming solvers [1,2]. Range Sequence. Another approach is to use a sequence of ranges. Formally, D is ‘decomposed’ into a set {[a1 , b1 ], . . . , [an , bn ]}, such that ∪i [ai , bi ] = D. A desired property for this sequence is to be ordered and the shortest possible, i.e. [ai , bi ] ∩ [aj , bj ] = ∅, ∀i = j. In this case δ denotes the number of ranges. A more simple data structure than the two above, stores only the bounds of D. E.g., for the domain [1..100000]1 we store only two numbers in memory: 1 and 100000. Obviously, this is an incomplete representation for the non-continuous domains (e.g. [1..3 5..9]). It is therefore incompatible with most algorithms designed for CSPs; only specific methodologies can handle it [11]. On the other hand, for the above domain [1..100000], a bit vector would allocate 100,000 bits of memory, although it could be represented by a range sequence using only two memory words. A range sequence can be implemented as a linked list, or as a binary tree, so it is costlier to search for a value in it. In this work we study the trade-off between memory allocation cost and time consuming operations on domains. A new way of memory management that seeks to reduce the redundant space is proposed. The new algorithms and data structures are shown to perform well, especially on problems which contain large domains. Such problems eminently occur in Bioinformatics, a science that aims at extracting information from large genetic data.

2

Efficient Domain Implementations

While attempting to reduce the space complexity, we should not neglect time complexity. Except for memory allocation, a constraint programming system is responsible for two other basic operations that are executed many times on a domain: 1. Search whether a range of values is included in it. 2. Removal of a range of values from a domain. Note that addition of values is unnecessary; the domain sizes only decrease due to constraint propagation or assignments. Search or removal of a range of w values costs O(w) time in a bit vector; if w = 1 this structure is ideal. The same operations in a range sequence that has been implemented as a linked list [7] require O(δ) steps, while the space complexity is much less (O(δ) too) than the bit vector’s one (O(d)). A wiser choice would be to implement the range sequence as a binary search tree, with an average search/removal complexity O(log δ), and the space complexity left unaffected. 1

[a..b] denotes the integer set {a, a + 1, . . . , b}.

Flexible Management of Large-Scale Integer Domains in CSPs

407

However, the subtraction of a range of values from the tree is complicated. (It roughly performs two traversals and then joins two subtrees.) This is undesirable, not only for the time it spends, but also for the many modifications that are done on the structure. The number of modifications is crucial because they are recorded in order to be undone when a Constraint Programming system backtracks, that is when it restores a previous (or the initial) state of the domains, in order to restart the process of finding a solution to a CSP (through other paths). 2.1

Gap Intervals Tree Representation

To make things simpler and more efficient, a binary search tree of gap ranges was implemented. The advantage of this choice is that the subtraction of a range of values is faster, as it affects only one tree node (i.e. it inserts or modifies only one node). For example the domain [9..17 44.. 101] is described by three gaps: [−∞..8], [18..43] and [102..+∞]. Figure 1 depicts the gaps of a domain that are arranged as a binary search tree. A node of the tree apparently contains the first and the last gap value, and pointers to the left and right ‘children.’ 2.2

[-∞..-17] [2001..+∞] [100..102] [10..10]

[999..1050]

[-5..0]

Fig. 1. A tree with the gaps of the domain [−16..−6 1..9 11..99 103..998 1051..2000]

Search/Delete Algorithm

Another advantage of this approach is that the two basic operations on a domain are performed by a single algorithm named SearchGap.2 This function accepts four arguments (gapN ode, newStartV al, newEndV al, removeInterval). – If removeInterval is 1, the range [newStartV al..newEndV al] is deleted from the domain, which is represented by a tree whose root is gapN ode. – If removeInterval is 0, the function returns a node of the tree that contains at least one element of [newStartV al..newEndV al]. If there does not exist such a node that meets this criterion, then the function returns an empty node. Thus, in case we want to check whether a range [a..b] belongs to D, we call SearchGap(root, a, b, 0): • If the returned node is empty, then [a..b] ⊆ D; • otherwise [a..b] D. The above procedures manipulate the data structure as a normal binary search tree; the insertions of gaps and the search for specific values is done in logarithmic time as we traverse a path from the root gapN ode to an internal node. While a Constraint Programming system tries to find a solution, it only adds gaps to the tree. During gap insertions the algorithm seeks to merge as many gap nodes as possible in order to keep the tree short. 2

Available at http://www.di.uoa.gr/~ pothitos/setn2010/algo.pdf

408

3


Empirical Results

Although the above domain implementation is compatible with the ordinary CSP formulation, algorithms and constraint propagation methodologies [6], it is recommended especially when we have to solve problems with large non-continuous domains. Such problems naturally occur in Bioinformatics, so we are going to apply the memory management proposed to them. 3.1

A Sequence Problem

Each human cell contains 46 chromosomes; a chromosome is part of our genetic material, since it contains a sequence of DNA nucleotides. There are four types of nucleotides, namely A, T, G and C. (A = adenine, T = thymine, G = guanine, C = cytosine.) A chromosome may include approximately 247.2 million nucleotides. A Simple Problem Definition. Suppose that we want to ‘fit’ in a chromosome a sequence of four cytosines C1 , C2 , C3 , C4 and a sequence of four guanines G1 , G2 , G3 , G4 too. Ci and Gi designate the positions of the corresponding nucleotides in the DNA chain; the initial domain for a position is [1..247200000]. We assume the first sequence grows geometrically with Ci = Ci+1 /99 and the second sequence is the arithmetic progression Gi+1 = Gi + 99. Pitfalls While Solving. This naive CSP, which is limited to only eight constraint variables, may become. . . difficult, if we do not properly manage the domains that contain millions of values. So, we evolved the data structures of an existing Constraint Programming library and observed their behaviour in comparison with two popular systems.3 Naxos. At first, we integrated the gap intervals tree described into Naxos Solver [5]. Naxos is a library for an object-oriented programming environment; it is implemented in C++. It allows the statement of CSPs having constrained variables with finite domains containing integers. The solution4 for the naive problem described was found immediately, using 3 MB of memory. All the experiments were carried out on a Sun Blade computer with an 1.5 GHz SPARC processor and 1 GB of memory. ECLi PSe . On the same machine, however, it took three seconds for the constraint logic programming system ECLi PSe version 5.105 [2] to find the same solution, using 125 MB of memory, as it implements a bit vector variant to store the domains. If we add one more nucleotide to the problem (i.e. one more constraint variable) the program will be terminated due to stack overflow. This 3

4

5

The datasets and the experiments source code—for each Constraint Programming system we used—are available at http://www.di.uoa.gr/~ pothitos/setn2010 The first solution includes the assignments C1 = 1, C2 = 99, C3 = 9801, C4 = 970299, G1 = 2, G2 = 101, G3 = 200 and G4 = 299. We used the ECLi PSe library ‘ic’ that targets ‘Interval Constraints.’

Flexible Management of Large-Scale Integer Domains in CSPs

14

10000 ECLiPSe ILOG Naxos

10

1000 Space (MB)

Time (minutes)

12

409

8 6 4

ECLiPSe ILOG Naxos

100 10

2 0

1 4 24 44 64 84 194 394 594 794 994 Guanines

(a) Time needed to find a solution

4 24 44 64 84 194 394 594 794 994 Guanines

(b) Memory space allocated

Fig. 2. The resources used by Constraint Programming systems as the problem scales

happens because the default stack size is limited, so in order to continue with the following experiments, we increased it manually. Ilog. Ilog Solver version 4.4 [4], a well-known C++ Constraint Programming library, needs treble time (about ten seconds) to find the solution in comparison with ECLi PSe , but it consumes almost the same memory. Scaling the Problem. A simple way to scale the problem is to add more guanines in the corresponding sequence. Figure 2 illustrates the time and space that each system spends in order to reach a solution. Before even adding a hundred nucleotides, ECLi PSe and Ilog Solver ran out of resources, as they had already used all the available physical and virtual memory. On the other hand, Naxos scales normally, as it benefits from the proposed domain representation, and requires orders of magnitude less memory. The lower price of allocating space makes the difference. 3.2

RNA Motifs Detection Problem

In the previous problem we created a nucleotide sequence, but in Bioinformatics it is more important to search for specific nucleotide patterns/motifs inside genomes, i.e. the nucleotide chains of a specific organism. We can focus on a specific pattern that describes the way that an RNA molecule folds back on itself, thus formulating helices, also known as stemloops [10]. A stem-loop consists of a helix and a region with specific characters from the RNA alphabet [9]. In contrast to Ilog Solver, Naxos Solver extended with the proposed memory management is able to solve this problem for the bacterium Escherichia coli genome, which is available through the site of MilPat, a tool dedicated to searching molecular motifs [8].

410

4


Conclusions and Further Work

In this work, it has been shown that we can achieve a much better lower memory bound for the representation of a domain, than the actual memory consumption of Constraint Programming systems. An improved way of storing a domain, through new data structures and algorithms was proposed. This methodology naturally applies to various problems with wide domains, e.g. Bioinformatics problems that come along with large genome databases. In future, hybrid data structures can contribute towards the same direction. For example, variable size bit vectors could be integrated into binary tree nodes. Everything should be designed to be as much generic as possible, in order to exploit at any case the plethora of known algorithms for generic CSPs. Acknowledgements. This work is funded by the Special Account Research Grants of the National and Kapodistrian University of Athens, in the context of the project ‘C++ Libraries for Constraint Programming’ (project no. 70/4/4639). We would also like to thank Stavros Anagnostopoulos, a Bioinformatics expert, for his valuable help in our understanding of various biological problems and data.

References 1. Codognet, P., Diaz, D.: Compiling constraints in clp(FD). The Journal of Logic Programming 27(3), 185–226 (1996) 2. ECLi PSe constraint programming system (2008), http://eclipse-clp.org 3. Gent, I., Jefferson, C., Miguel, I., Nightingale, P.: Data structures for generalised arc consistency for extensional constraints. In: AAAI 2007: 22nd National Conference on Artificial Intelligence, pp. 191–197. AAAI Press, Menlo Park (2007) 4. ILOG S.A.: ILOG Solver 4.4: User’s Manual (1999) 5. Pothitos, N.: Naxos Solver (2009), http://www.di.uoa.gr/~ pothitos/naxos 6. Sabin, D., Freuder, E.C.: Contradicting conventional wisdom in constraint satisfaction. In: Borning, A. (ed.) PPCP 1994. LNCS, vol. 874, pp. 125–129. Springer, Heidelberg (1994) 7. Schulte, C., Carlsson, M.: Finite domain constraint programming systems. In: Handbook of Constraint Programming, pp. 495–526. Elsevier Science, Amsterdam (2006) 8. Thébault, P.: MilPat’s user manual (2006), http://carlit.toulouse.inra.fr/MilPat 9. Thébault, P., de Givry, S., Schiex, T., Gaspin, C.: Searching RNA motifs and their intermolecular contacts with constraint networks. Bioinformatics 22(17), 2074– 2080 (2006) 10. Watson, J., Baker, T., Bell, S., Gann, A., Levine, M., Losick, R.: Molecular Biology of the Gene, ch. 6, 5th edn. Pearson/Benjamin Cummings (2004) 11. Zytnicki, M., Gaspin, C., Schiex, T.: A new local consistency for weighted CSP dedicated to long domains. In: SAC 2006: Proceedings of the 2006 ACM symposium on Applied computing, pp. 394–398. ACM, New York (2006)

A Collaborative System for Sentiment Analysis Vassiliki Rentoumi1,2 , Stefanos Petrakis3, Vangelis Karkaletsis1, Manfred Klenner3 , and George A. Vouros2 1 2

Inst. of Informatics and Telecommunications, NCSR “Demokritos”, Greece University of the Aegean, Artificial Intelligence Laboratory, Samos, Greece 3 Institute of Computational Linguistics, University of Zurich, Switzerland [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract. In the past we have witnessed our machine learning method for sentiment analysis coping well with figurative language, but determining with uncertainty the polarity of mildly figurative cases. We have shown that for these uncertain cases, a rule-based system should be consulted. We evaluate this collaborative approach on the ”Rotten Tomatoes” movie reviews dataset and compare it with other state-of-the-art methods, providing further evidence in favor of this approach.

1

Introduction

In the past we have shown that figurative language conveys sentiment that can be efficiently detected by FigML[2], a machine learning (ML) approach trained on corpora manually annotated with strong figurative expressions1 . FigML was able to detect the polarity of sentences bearing highly figurative expressions, where disambiguation is considered mandatory, such as: (a)“credibility sinks into a mire of sentiments”. On the other hand, there exist cases for which FigML provided a classification decision based on a narrow margin between negative and positive polarity orientation, often resulting in erroneous polarity evaluation. It was observed that such cases bear mild figurativeness, which according to [4] are synchronically as literal as their primary sense, as a result of standardized usage, like: (b) “this 10th film in the series looks and feels tired”. Here, fatigue as a property of inanimate or abstract objects, although highly figurative, presents an obvious negative connotation, due to standardized usage of this particular sense, therefore sentiment disambiguation is not necessary. Such regular cases could be more efficiently treated by a rule-based system such as PolArt[1]. In fact, in this paper we extend the work presented in [8] where we have indeed shown that cases of mild figurative language are better treated by PolArt, while cases of strong figurative language are better handled by FigML. In [8], a novel collaborative system for sentiment analysis was proposed and managed 1

Subsets from the AffectiveText corpus (SemEval’07) and the MovieReviews sentence polarity dataset v1.0, annotated with metaphors and expanded senses: http://www.iit.demokritos.gr/~ vrentoumi/corpus.zip


412

V. Rentoumi et al.

to outperform its two subcomponents, FigML and PolArt, tested on the AffectiveText corpus. Here, we try to verify the validity of this approach on a larger corpus and of a differenet domain and style. In addition and most importantly, another dimension of complementarity between a machine learning method and a rule-based one is explored: the rule-based approach handles the literal cases and the - already introduced - collaborative method treats the cases of figurative language. Results show that integrating a machine learning approach with a finer-grained linguistically-based one leads to a superior, best-of-breed system.

2

Methodology Description

The proposed collaborative method involves four consecutive steps: (a)Word sense disambiguation(WSD): We chose an algorithm which takes as input a sentence and a relatedness measure[6]. The algorithm supports several WordNet based similarity measures among which Gloss Vector (GV)[6] performs best for non-literal verbs and nouns [5]. Integrating GV in the WSD step is detailed in [2]. (b)Sense level polarity assignment(SLPA): We adopted a machine learning approach which exploits graphs based on character n-grams[7]. We compute models of positive and negative polarity from examples of positive and negative words and definitions provided by a enriched version of the Subjectivity Lexicon2,3 . The polarity class of each test sense, is determined by computing its similarity with the models as detailed in [2]. (c)HMMs training: HMMs serve two purposes. Computing the threshold which divides the sentences in marginal/non-marginal and judging the polarity(positive/ negative) of non-marginal sentences. We train one HMM model for each polarity class. The format of the training instances is detailed in [2]. For computing the threshold, the training data are also used as a testing set. Each test instance is tested against both models and the output is a pair of log probabilities of a test instance to belong to either the negative or the positive class. For each polarity class we compute the absolute difference of the log probabilities. We then sort these differences in ascending order and calculate the first Quartile (Q1) which separates the lower 25% of the sample population from the rest of the data. We set this to be the threshold and we apply it to the test instances. Marginal cases are the ones for which the absolute difference of log probability is below that threshold. In our experiments we use a 10-fold cross validation approach to evaluate our results. (d) Sentence-level polarity detection: The polarity of each sentence is determined by HMMs [2] for non-marginal cases and by PolArt[1] for marginal 2 3

http://www.cs.pitt.edu/mpqa/ For each positive or negative word entry contained in the Subjectivity Lexicon, we extracted the corresponding set of senses from WordNet, represented by their synsets and gloss examples; in this way we tried to reach a greater degree of consistency between the test and the training set.

A Collaborative System for Sentiment Analysis

413

ones. PolArt employs compositional rules and obtains word-level polarities from a polarity lexicon, as described in detail in [1]. The Collaborative system’s total performance is then given by adding up the performances of FigML and PolArt.

3

Experimental Setup

3.1

Resources

We ran our experiments on the MovieReviews corpus4 . This corpus was split into different subsets according to our experimental setup in two different ways: – Expanded Senses/Metaphors/Whole: The corpus was enhriched with manually-added annotations for metaphors and expanded senses inside sentences. We produced an expanded senses dataset and a metaphorical expressions one. Furthermore, we treated the entire corpus as a third dataset, ignoring the aforementioned annotations. The produced datasets are: • Expanded senses: 867 sentences, 450 negative and 417 positive ones. • Metaphors: 996 sentences, 505 negative and 491 positive ones. • Whole: 10649 sentences, 5326 negative and 5323 positive ones. – Literal/Non-literal: We group all figurative sentences (metaphors/expanded senses) as the non-literal set. The rest of the sentences we call the literal set. • Non-literal: 1862 sentences5 , 954 negative and 908 positive ones. • Literal: 8787 sentences, 4372 negative and 4415 positive ones. We run numerous variations of PolArt, modifying each time the polarity lexicon it consults: – SL+: This is the subjectivity lexicon6 with manually added valence operators. – Merged: The FigML system produces automatically sense-level polarity lexica (AutSPs), one for each dataset or subset. For the non-literal, metaphors and expanded senses, these lexica target non-literal expressions, metaphors and expanded senses accordingly. For the entire MovieReviews dataset (Whole), all word senses are targeted. Various Merged lexica are produced by combining and merging the SL+ lexicon with each of the AutSPs. 4 5 6

We used the sentence polarity dataset v1.0 from http://www.cs.cornell.edu/People/pabo/movie-review-data/ One sentence belonged to both the metaphors and expanded senses subsets, and was included only once here. http://www.cs.pitt.edu/mpqa/

414

3.2

V. Rentoumi et al.

Collaborative Method Tested on MovieReviews Dataset

We tested our Collaborative method originally presented and evaluated in [8], with the extended MovieReviews corpus, in order to test its validity. Table 1 presents scores for each polarity class, for both variants of our method, the CollaborativeSL+ (using the SL lexicon) and CollaborativeMerged (using the Merged Lexica), across all three datasets. For the majority of cases, CollaborativeSL+ has better performance than CollaborativeMerged. Comparing the performance of CollaborativeSL+ for the MovieReviews with that of CollaborativeSL+ for the AffectiveText corpus [8], for the Whole corpus (f-measure: neg: 0.62, pos: 0.59), we noticed that the performance remains approximately the same. This is evidence that the method is consistent across different datasets. Table 1. MovieReviews: Performance scores for full system runs

recall Whole precision f-measure recall Met precision f-measure recall Exp precision f-measure

3.3

CollaborativeSL+ neg pos 0.682 0.537 0.596 0.628 0.636 0.579 0.724 0.735 0.737 0.722 0.731 0.728 0.640 0.623 0.647 0.616 0.643 0.619

CollaborativeMerged neg pos 0.656 0.536 0.586 0.609 0.619 0.570 0.697 0.704 0.708 0.693 0.702 0.699 0.642 0.623 0.648 0.617 0.645 0.620

The Collaborative Approach Treats Non-literal Cases as a Whole: Complementarity on the Literal/Non-literal Axis

We have so far shown that our Collaborative method is performing quite well on the expanded senses and metaphors datasets. Although we consider them as distinct language phenomena, they both belong to the sphere of figurative connotation. To support this we tested our claim collectively, across non-literal expressions in general, by merging these two datasets into one labelled nonliterals. As a baseline system for assessing the performance of the collaborative method we use a clean version of PolArt (i.e. without added valence shifters). In Table 2, we compare BaselinePolart with CollaborativeSL+ (using the SL lexicon) and CollaborativeMerged (using the Merged Lexica), tested upon the non-literals dataset. We observe that our proposed method outperforms the baseline and proves quite capable of treating non-literal cases collectively. By assembling the non-literals into one dataset and treating it with our collaborative method we set aside its complementary dataset of literals. Since our method is more inclined to treat figurative language, we do not expect that it should treat literal cases optimally, or at least as efficiently as a system that is more inclined to treat literal language. Therefore, assigning the literals to PolArt and the nonliterals to Collaborative, would provide a more sane system architecture and result in better performance for the entire MovieReviews dataset. In Table 3 we present the performance of both variants of the new system architecture (PolartwithCollaborativeSL+, PolartwithCollaborativeMerged). In

A Collaborative System for Sentiment Analysis

415

Table 2. MovieReviews: Performance scores for the non-literals subset CollaborativeSL+ neg pos recall 0.710 0.646 Nonliterals precision 0.678 0.680 f-measure 0.694 0.662

CollaborativeMerged neg pos 0.681 0.644 0.668 0.658 0.674 0.651

BaselinePolart neg pos 0.614 0.667 0.659 0.622 0.636 0.644

Table 3. MovieReviews: Performance scores for full system runs

recall Literals/nonliterals precision f-measure

Whole

recall precision f-measure

PolartwithCollaborativeSL+ neg pos 0.608 0.659 0.641 0.627 0.624 0.642 CollaborativeSL+ neg pos 0.682 0.537 0.596 0.628 0.636 0.579

PolartwithCollaborativeMerged neg pos 0.603 0.659 0.638 0.624 0.620 0.641 CollaborativeMerged neg pos 0.656 0.536 0.586 0.609 0.619 0.570

both versions pure PolArt treats literal cases, while CollaborativeSL+ and CollaborativeMerged treat non literals cases. This new architecture is compared to the one concerning the treatment of the whole corpus (Whole) by both variants of the proposed method (CollaborativeSL+, CollaborativeMerged). It is observed that the performance of this modified system is better for the majority of cases. This fact leads us to the conclusion that a system which treats sentiments in a more language-sensitive way, can exhibit improved performance. We further compared our system with a state-of-the-art system by Andreevskaia and Bergler[3], tested on the MovieReviews corpus. Their system employs a Naive Bayes Classifier for polarity classification of sentences, trained with unigrams, bigrams or trigrams derived from the same corpus. This state-of-the-art system’s accuracy was reported to be 0.774, 0.739 and 0.654 for unigrams, bigrams and trigrams. Our two alternative system architectures, CollaborativeSL+ and PolartwithCollaborativeSL+, scored 0.609 and 0.633. The performances of both our alternatives are clearly lower than the state-ofthe-art system’s when the latter is trained with unigrams or bigrams, but they get closer when it is trained with trigrams. The main point is that the CollaborativeSL+ method performs quite well even for the case of a corpus containing mainly literal language. We expect CollaborativeSL+ to perform optimally when applied on a corpus consisting mainly of non-literal language. It is also worth noting that since PolArt deals with the majority of cases it is bound to heavily affect the overall system performance. Additionally PolArt’s dependency on its underlying resources and especially the prior polarity lexicon is also a crucial performance factor. Thus, the observed moderate performance of the system can be attributed to the moderate PolArt’s performance, probably due to the incompatibility of the Subjectivity Lexicon with the idiosyncratic/colloquial language of the Movie Reviews corpus.

416

V. Rentoumi et al.

All in all, the overall performance is still quite satisfactory. Consequently, if we provide PolArt with a more appropriate lexicon, we expect a further boost.

4


In this paper we further extend and examine the idea of a sentiment analysis method which exploits complementarily two language specific subsystems, a rule-based (PolArt) for the mild figurative, and a machine learning system (FigML) for the strong figurative language phenomena[8]. By further examining the validity of such an approach in a larger (and of different domain) corpus (Movie Reviews corpus), in which strong figurative language co-exists with mild figurative language, we observed that this Collaborative method is consistent. We also explored another dimension of complementarity concerning literal/ non-literal cases of language, where PolArt is treating the literal cases and the Collaborative method the non-literal cases. We get empirical support from the performance obtained that utilizing the special virtues of the participating subsystems can be a corner-stone in the design and performance of the resulting system. We will test the collaborative method on a more extensive corpus bearing figurative language. We intend to dynamically produce sense-level polarity lexica exploiting additional machine learning approaches (e.g. SVMs).

References 1. Klenner, M., Petrakis, S., Fahrni, A.: Robust compositional polarity classification. In: Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria (2009) 2. Rentoumi, V., Giannakopoulos, G., Karkaletsis, V., Vouros, G.: Sentiment analysis of figurative language using a word sense disambiguation approach. In: Recent Advances in Natural Language Processing (RANLP), Borovets, Bulgaria (2009) 3. Andreevskaia, A., Bergler, S.: When specialists and generalists work together: overcoming domain dependence in sentiment tagging. In: Proceedings of ACL 2008: HLT, pp. 290–298 (2008) 4. Cruse, D.A.: Meaning in language. Oxford University Press, Oxford (2000) 5. Rentoumi, V., Karkaletsis, V., Vouros, G., Mozer, A.: Sentiment Analysis Exploring Metaphorical and Idiomatic Senses: A Word Sense Disambiguation Approach. In: International Workshop on Computational Aspects of Affectual and Emotional Interaction, CAFFEi 2008 (2008) 6. Pedersen, T., Banerjee, S., Patwardhan, S.: Maximizing Semantic Relatedness to Perform Word Sense Disambiguation. Supercomputing Institute Research Report UMSI, vol. 25 (2005) 7. Giannakopoulos, G., Karkaletsis, V., Vouros, G., Stamatopoulos, P.: Summarization system evaluation revisited: N-gram graphs. ACM Transactions on Speech and Language Processing (TSLP) 5 (2008) 8. Rentoumi, V., Petrakis, S., Klenner, M., Vouros, G., Karkaletsis, V.: A Hybrid System for Sentiment Analysis. To appear in LREC 2010 (2010)

Minimax Search and Reinforcement Learning for Adversarial Tetris Maria Rovatsou and Michail G. Lagoudakis Intelligent Systems Laboratory Department of Electronic and Computer Engineering Technical University of Crete Chania 73100, Crete, Greece [email protected], [email protected]

Abstract. Game playing has always been considered an intellectual activity requiring a good level of intelligence. This paper focuses on Adversarial Tetris, a variation of the well-known Tetris game, introduced at the 3rd International Reinforcement Learning Competition in 2009. In Adversarial Tetris the mission of the player to complete as many lines as possible is actively hindered by an unknown adversary who selects the falling tetraminoes in ways that make the game harder for the player. In addition, there are boards of different sizes and learning ability is tested over a variety of boards and adversaries. This paper describes the design and implementation of an agent capable of learning to improve his strategy against any adversary and any board size. The agent employs MiniMax search enhanced with Alpha-Beta pruning for looking ahead within the game tree and a variation of the Least-Squares Temporal Difference Learning (LSTD) algorithm for learning an appropriate state evaluation function over a small set of features. The learned strategies exhibit good performance over a wide range of boards and adversaries.

1

Introduction

Skillful game playing has always been considered a token of intelligence, consequently Artificial Intelligence and Machine Learning exploit games in order to exhibit intelligent performance. A game that has become a benchmark, exactly because it involves a great deal of complexity along with very simple playing rules, is the game of Tetris. It consists of a grid board in which four-block tiles, chosen randomly, fall from the top and the goal of the player is to place them so that they form complete lines, which are eliminated from the board, lowering all blocks above. The game is over when a tile reaches the top of the board. The fact that the rules are simple should not give the impression that the task is simple. There are about 40 possible actions available to the player for placing a tile and about 1064 possible states that these actions could lead to. These magnitudes are hard to deal with for any kind of player (human or computer). Adversarial Tetris is a variation of Tetris that introduces adversity in the game, making it even more demanding and intriguing; an unknown adversary tries to S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 417–422, 2010. c Springer-Verlag Berlin Heidelberg 2010

418

M. Rovatsou and M.G. Lagoudakis

hinder the goals of the player by actively choosing pieces that augment the difficulty of line completion and by even “leaving out” a tile from the entire game, if that suits his adversarial goals. This paper presents our approach to designing a learning player for Adversarial Tetris. Our player employs MiniMax search to produce a strategy that accounts for any adversary and reinforcement learning to learn an appropriate state evaluation function. Our agent exhibits improving performance over an increasing number of learning games.

2

Tetris and Adversarial Tetris

Tetris is a video game created in 1984 by Alexey Pajitnov, a Russian computer engineer. The game is played on a 10 × 20 board using seven kinds of simple tiles, called tetraminoes. All tetraminoes are composed of four colored blocks (minoes) forming a total of seven different shapes. The rules of the game are very simple. The tiles are falling down one-by-one from the top of the board and the user rotates and moves them until they rest on top of existing tiles in the board. The goal is to place the tiles so that lines are completed without gaps; completed lines are eliminated, lowering all the remaining blocks above. The game ends when a resting tile reaches the top of the board. Tetris is a very demanding and intriguing game. It has been proved [1] that finding a strategy that maximizes the number of completed rows, or maximizes the number of the lines eliminated simultaneously, or minimizes the board height, or maximizes the number of tetraminoes placed in the board before the game ends is an N P-hard problem; even approximating an optimal strategy is N P-hard. This inherent difficulty is one of the reasons this game is widely used as a benchmark domain. Tetris is naturally formulated as a Markovian Decision Process (MDP) [2]. The state consists of the current board and the current falling tile and the actions are the approximately 40 placement actions for the falling tile. The transition model is fairly simple; there are seven equiprobable possible next states, since the next board is uniquely determined and the next falling piece is chosen uniformly. The reward function gives positive numerical values for completed lines and the goal is to find a policy that maximizes the long-term cumulative reward. The recent Reinforcement Learning (RL) Competition [3] introduced a variation of Tetris, called Adversarial Tetris, whereby the falling tile generator is replaced by an active opponent. The tiles are now chosen purposefully to hinder the goals of the player (completion or lines). The main difference in the MDP model of Adversarial Tetris is the fact that the distribution of falling tiles is non-stationary and the dimension of the board varies in height and width. Furthermore, the state is produced like the frames of the video game, as it includes the current position and rotation of the falling tile in addition to the configuration of the board and the player can move/rotate the falling tile at each frame. The RL Competition offers a generalized MDP model for Adversarial Tetris which is fully specified by four parameters (the height and width of the board and the adversity and type of the opponent). For the needs of the competition 20 instances of this model were specified with widths ranging from 6 to 11, heights ranging from 16 to 25, and different types of opponents and opponent’s adversity.

Minimax Search and Reinforcement Learning for Adversarial Tetris

3

419

Designing a Learning Player for Adversarial Tetris

Player Actions. In Adversarial Tetris the tile is falling one step downwards every time the agent chooses one of the 6 low-level actions: move the tile left or right, rotate it clockwise or counterclockwise, drop it, and do nothing. Clearly, there exist various alternative sequences of these actions to achieve the same placement of the tile; this freedom yields repeated board configurations that lead to an unnecessary growth of the game tree. Also, playing at the level of the 6 lowlevel actions ruins the idea of a two-player alternating game, as the opponent’s turn appears only once after several turns of the player. Lastly, the branching factor of 6 would lead to an intractable game tree, even before the falling tile reaches a resting position in the board. These observations led us to consider an abstraction of the player’s moves, namely high-level actions that bring the tile from the top of the board directly to its resting position using a minimal sequence of low-level actions planned using a simple look-ahead search. The game tree now contains alternating plies of the player’s and the opponent’s moves, as a true twoplayer alternating game; all unnecessary intermediate nodes of player’s low-level actions are eliminated. The actual number of high-level actions available in each state depends on the width of the board and the number of distinct rotations of the tile itself, but they will be at most 4× wb, where wb is the width of the board (wb columns and 4 rotations). Similarly, the opponent chooses not only the next falling tile, but also its initial rotation, which means that he has as many as 4 × 7 = 28 actions. However, not all these actions are needed to represent the opponent’s moves, since in the majority of cases the player can use low-level actions to rotate the tile at will. Thus, the initial rotation can be neglected to reduce the branching factor at opponent nodes from 28 to just 7. In summary, there are about 4wb choices for the player and 7 choices for the opponent. Game Tree. The MiniMax objective criterion is commonly used in two-player zero-sum games, where any gain on one side (Max) is equal to the loss on the other side (Min). The Max player is trying to select its best action over all possible Min choices in the next and future turns. In Adversarial Tetris, our player is taken as Max, since he is trying to increase his score, whereas the adversarial opponent is taken as Min, since he is trying to decrease our player’s score. We adopted this criterion because it is independent of the opponent (it produces the same strategy irrespectively of the competence of the opponent) and protects against tricky opponents who may initially bluff. Its drawback is that it does not take risks and therefore it cannot exploit weak opponents. The implication is that our agent should be able to play Tetris well against any friendly, adversarial, or no-care opponent. The MiniMax game tree represents all possible paths of action sequences of the two players playing in alternating turns. Our player forms a new game tree from the current state, whenever it is his turn to play, to derive his best action choice. Clearly, our player cannot generate the entire tree, therefore expansion continues up to a cut-off depth. The utility of the nodes at the cut-off depth is estimated by an evaluation function described below. MiniMax is aided by Alpha-Beta Pruning, which prunes away nodes and subtrees not contributing to the root value and to the final decision.

420

M. Rovatsou and M.G. Lagoudakis

Evaluation Function. The evaluation of a game state s whether in favor or against our agent is done by an evaluation function V (s), which also implicitly determines the agent’s policy. Given the huge state space of the game, such an evaluation function cannot be computed or stored explicitly, so it must be approximated. We are using a linear approximation architecture formed by a vector of k features φ(s) and a vector of k weights w. The approximate value is k computed as the weighted sum of the features, V (s) = i=1 φi (s)wi = φ(s) w. We have issued two possible sets of features which will eventually lead to two different agents. The first set includes 6 features for characterizing the board: a constant term, the maximum height, the mean height, the sum of absolute column differences in height, the total number of empty cells below placed tiles (holes), and the total number of empty cells above placed tiles up to the maximum height (gaps). The second set uses a separate block of these 6 features for each one of the 7 tiles of Tetris, giving a total of 42 features. This is proposed because with the first set the agent can learn which boards and actions are good for him, but cannot associate them to the falling tiles that these actions manipulate. The same action on different tiles, even if the board is unchanged, may have a totally different effect; ignoring the type of tile leads to less effective behavior. This second set of features alleviates this problem by simply weighing the 6 base features differently for different falling tiles. Note that only one block of size 6 is active in any state, the one corresponding to the current falling tile. Learning. In order to learn a good set of weights for our evaluation function we applied a variation of the Least-Squares Temporal Difference Learning (LSTD) algorithm [4]. The need for modifying the original LSTD algorithm stems from the fact that the underlying agent policy is determined through the values given to states by our evaluation function, which are propagated to the root; if these values change, so does the policy, therefore it is important to discard old data and use only the recent ones for learning. To this end, we used the technique of exponential windowing, whereby the weights are updated in regular intervals called epochs; each epoch may last for several decision steps. During an epoch the underlying value function and policy remain unchanged for collecting correct evaluation data and only at the completion of the epoch are the weights updated. In the next epoch, data from the previous epoch are discounted by a parameter μ. Therefore, past data are not completely eliminated, but are weighted less and less as they become older and older. Their influence depends on the value of μ which ranges between 0 (no influence) to 1 (full influence). A value of 0 leads to singularity problems due to the shortage of samples within a single epoch, however a value around 0.95 offers a good balance between recent and old data with exponentially decayed weights. A full description of the modified algorithm is given in Algorithm 1 (t indicates the epoch number). In order to accommodate a wider range of objectives we used a rewarding scheme that encourages line completion (positive reward), but discourages loss of a game (negative reward). We balanced these two objectives by giving a reward of +1 for each completed line and a penalty of −10 for each game lost. We set the discount factor to 1 (γ = 1) since rewards/penalties do not loose value as time advances.

Minimax Search and Reinforcement Learning for Adversarial Tetris

421

Algorithm 1. LSTD with Exponential Windowing (wt , At , bt ) = LSTD-EW(k, φ, γ, t, Dt , wt−1 , At−1 , bt−1 , μ) if t == 0 then At ← 0; bt ← 0 else At ← μAt−1 ; bt ← μbt−1 end if for all samples (s, r, s ) ∈ Dt do At ← At + φ(s) φ(s) − γφ(s ) ; bt ← bt + φ(s)r end for −1 wt ← (At ) bt return wt , At , bt

Related Work. There is a lot of work on Tetris in recent years. Tsitsiklis and Van Roy applied approximate value iteration, whereas Bertsekas and Ioffe tried policy iteration, and Kakade used the natural policy gradient method. Later, Lagoudakis et al. applied a least-squares approach to learning an approximate value function, while Ramon and Driessens modeled Tetris as a relational reinforcement learning problem and applied a regression technique using Gaussian processes to predict the value function. Also, de Farias and Van Roy used the technique of randomized constraint sampling in order to approximate the optimal cost function. Finally, Szita and L¨ orincz applied the noisy cross-entropy method. In the 2008 RL Competition, the approach of Thiery [5] based on λ-Policy Iteration outperformed all previous work at the time. There is only unpublished work on Adversarial Tetris from the 2009 RL Competition, where only two teams participated. The winning team from Rutgers University applied look-ahead tree search and the opponent in each MDP was modeled as a fixed probability distribution over falling tiles, which was learned using the cross entropy method.

4

Results and Conclusion

Our learning experiments are conducted over a period of 400 epochs of 8,000 game steps each, giving a total of 3,200,000 samples. The weights are updated at the end of each learning epoch. Learning is conducted only on MDP #1 (out of the 20 MDPs of the RL Competition) which has board dimensions that are closer to the board dimensions of the original Tetris. Learning takes place only at the root of the tree in each move, as learning at the internal nodes leads to a great degree of repetition biasing the learned evaluation function. Agent 1 (6 features) learns by backing up values from depth 1 (or any other odd depth). This set of features ignores the choice of Min and thus it would be meaningless to expand the tree one more level deeper at Min nodes, which are found at odd depths. The second agent (42 features) learns by backing up values from depth 2 (or any other even depth). This set of basis functions takes the action choice of the Min explicitly into account and thus it makes sense to cut-off the search at Max nodes, which are found at even depths. The same cut-offs apply to testing.

M. Rovatsou and M.G. Lagoudakis 600

30

500

Steps per Game

L2 Change in Weights

400

Average Lines per Game

422

300

200

100

400 300 200 100

0 0

100

200

300

0 0

400

100

350

5

300

4 3 2 1

200

Epoch

10 5 100

300

400

250 200 150

50 0

200

300

400

Epoch 12

100

100

15

0 0

400

Average Lines per Game

6

Steps per Game

L2 Change in Weights

300

20

Epoch

Epoch

0 0

200

25

100

200

Epoch

300

400

10 8 6 4 2 0 0

100

200

300

400

Epoch

Fig. 1. Learning curve, steps and lines per update for Agents 1 (top) and 2 (bottom)

Learning results are shown in Figure 1. Agent 1 clearly improves with more training epochs. Surprisingly, Agent 2 hits a steady low level, despite an initial improvement phase. In any case, the performance of the learned strategies is way below expectations compared to the current state-of-the-art. A deeper look into the problem indicated that the opponent in Adversarial Tetris is not very aggressive after all and the MiniMax criterion is way too conservative, as it assumes an optimal opponent. In fact, it turns out that an optimal opponent could actually make the game extremely hard for the player; this is reflected in the game tree and therefore our player’s choices are rather mild in an attempt to avoid states where the opponent could give him a hard time. Agent 1 avoids this pitfall because it goes only to depth 1, where he cannot “see” the opponent, unlike Agent 2. Nevertheless, the learned strategies are able to generalize consistently to the other MDPs (recall that training takes place only on MDP #1). For each learned strategy, we played 500 games on each MDP to obtain statistics. Agent 1 achieves 574 steps and 44 lines per game on average over all MDPs (366 steps and 16 lines on MDP #1), whereas Agent 2 achieves 222 steps and 11 lines (197 steps and 5 lines on MDP #1). Note that our approach is off-line; training takes place without an actual opponent. It remains to be seen how it will perform in an on-line setting facing the exploration/exploitation dilemma.

References 1. Breukelaar, R., Demaine, E.D., Hohenberger, S., Hoogeboom, H.J., Kosters, W.A., Liben-Nowell, D.: Tetris is hard, even to approximate. International Journal of Computational Geometry and Applications 14(1-2), 41–68 (2004) 2. Tsitsiklis, J.N., Roy, B.V.: Feature-based methods for large scale dynamic programming. Machine Learning, 59–94 (1994) 3. Reinforcement Learning Competition (2009), http://2009.rl-competition.org 4. Bradtke, S.J., Barto, A.G.: Linear least-squares algorithms for temporal difference learning. Machine Learning, 22–33 (1996) 5. Thiéry, C: Contrˆ ole optimal stochastique et le jeu de Tetris. Master’s thesis, Université Henri Poincaré – Nancy I, France (2007)

A Multi-agent Simulation Framework for Emergency Evacuations Incorporating Personality and Emotions Alexia Zoumpoulaki1, Nikos Avradinis2, and Spyros Vosinakis1 1

Department of Product and Systems Design Engineering, University of the Aegean, Hermoupolis, Syros, Greece {azoumpoulaki,spyrosv}@aegean.gr 2 Department of Informatics, University of Piraeus, Greece [email protected]

Abstract. Software simulations of building evacuation during emergency can provide rich qualitative and quantitative results for safety analysis. However, the majority of them do not take into account current surveys on human behaviors under stressful situations that explain the important role of personality and emotions in crowd behaviors during evacuations. In this paper we propose a framework for designing evacuation simulations that is based on a multi-agent BDI architecture enhanced with the OCEAN model of personality and the OCC model of emotions. Keywords: Multi-agent Systems, Affective Computing, Simulation Systems.

1 Introduction Evacuation simulation systems [1] have been accepted as very important tools for safety science, since they help examine how people gather, flow and disperse in areas. They are commonly used for estimating factors like evacuation times, possible areas of congestion and distribution amongst exits under various evacuation scenarios. Numerous models for crowd motion and emergency evacuation simulations have been proposed, such as fluid or particle analogies, mathematical equations estimated from real data, cellular automata, and multi-agent autonomous systems. Most recent systems adopt the multi-agent approach, where each individual agent is enriched with various characteristics and their motion is the result of rules or decision making strategies. [2, 3, 4, 5]. Modern surveys indicate that there is number of factors [8, 9] influencing human behavior and social interactions during evacuations. These factors include personality traits, individual knowledge and experience and situation-related conditions like building characteristics or crowd density, among others. Contrary to what is believed, people don’t immediately rush towards the exits but take some time before they start evacuating, performing several tasks (i.e. gather information, collect items) and look at the behaviors of others in order to decide whether to start moving or not. Also route and exit choices depend on familiarity with the building. Preexisting relationships among the individuals also play a crucial role upon behavior as members of the same S. Konstantopoulos et al. (Eds.): SETN 2010, LNAI 6040, pp. 423–428, 2010. © Springer-Verlag Berlin Heidelberg 2010

424

A. Zoumpoulaki, N. Avradinis, and S. Vosinakis

group like friends and members of a family will try to stay together, move with similar speeds, help each other and aim to exit together. Additionally, emergency evacuations involve complex social interactions, where new groups form and grow dynamically as the egress progress. New social relations arise as people exchange information, try to decide between alternatives and select a course of actions. Some members act as leaders, committed to help others, by shouting instructions or leading towards the exits while others follow. [10] Although individuals involved in evacuations continue to be social actors, and this is why under non-immediate danger, people try to find friends, help others evacuate or even collect belongings, stressful situations can result to behaviors like panic. [11] During an emergency, the nature of the information obtained, time pressure, the assessment of danger, the emotional reaction and observed actions of others are elements that might result to catastrophic events, such as stampedes. The authors claim that above factors and their resulting actions should be modeled, for realistic behaviors to emerge during an evacuation simulation. The proposed approach takes in consideration recent research not only in evacuation simulation models but also in multi agent system development [7], cognitive science, group dynamics and surveys of real situations. [8] In our approach, decision making is based on emotional appraisal of the environment, combined with personality traits in order to select the most suited behavior according to the agents’ psychological state. We introduce an EP – BDI (Emotion Personality Beliefs Desires Intentions) architecture that incorporates computational models of personality (OCEAN) and emotion (OCC). The emotion module participates in the appraisal of information obtained, decision making and action execution. The personality module influences emotional reactions, indicates tendencies to behaviors and help address issues of diversity. Additionally we use a more meaningful mechanism for social organization, where groups form dynamically and roles emerge due to knowledge, personality and emotions. We claim that these additions may provide the necessary mechanisms for simulating realistic human like behavior under evacuation. Although the need for such an approach is widely accepted, to our knowledge no other evacuation simulation framework has been designed incorporating fully integrated computational models of emotion and personality.

2 The Proposed Framework The proposed agent architecture (Fig.1) is based on the classic BDI (Beliefs-DesiresIntentions) architecture enriched with the incorporation of Personality and Emotions. The agent’s operation cycle starts with the Perception phase, where the agent acquires information on the current world state through its sensory subsystem. Depending on the agent’s emotional state at the time, its perception may be affected and some information may possibly be missed. The newly acquired information is used to update the agent’s Beliefs. Based upon its new beliefs, the agent performs an appraisal process, using its personality and its knowledge about the environment in order to update its emotional state. The agent’s Decision making process follows, where current conditions, personality and agent’s

A Multi-agent Simulation Framework for Emergency Evacuations

Perception

425

Beliefs Knowledge of World

Personality Emotional State of others

Appraisal

OCEAN

Group Status Simulation Environment

Physical Status

Other Agents

Desire Status

Decision Making

Emotion Emotional State

Desires Evacuation

Action

Intentions

Threat Avoidance Group Related Information

Fig. 1. The proposed agent architecture

own emotional state are synthesized in order to generate a Desire. This desire is fulfilled through an appropriate Intention, which will be executed as a ground action in the simulation environment. The personality model adopted in the proposed framework is the Five Factor Model [12], also known as OCEAN by the initials of the five personality traits it defines: Openness, Conscientiousness, Extraversion, Agreeableness and Neuroticism. Every agent considered to possess the above characteristics in varying degrees and is assigned a personality vector, instantiated with values representing their intensity. It has been shown in psychology research that personality and emotion, although distinct concepts, are closely interdependent and there is a link between personality and types [13]. Based on this premise, the proposed architecture follows an original approach in evacuation simulation systems, by closely intertwining the functions of emotion and personality, in practically every step of the operation cycle. The emotion model adopted is based on the OCC model and particularly its revised version, as presented in [14]. In the current approach, we model five positive/negative emotion types, the first of which is an undifferentiated positive/negative emotion, as coupled emotion pairs: Joy/Distress, Hope/Fear, Pride/Shame, Admiration/Reproach and SorryFor/HappyFor. The first three emotions concern the agent itself, while the last two focus on other agents. Each agent is assigned a vector representing its emotion status at a specific temporal instance of the simulation. Agents can perceive objects, events and messages through sensors and update their beliefs accordingly. Their initial beliefs include at least one route to the exit, i.e. the route they followed entering the building, and, besides imminent perception, they may acquire knowledge about other exits or blocked paths due to the exchange of messages. Agents can also perceive the emotional state of others, which may impact their own emotions as well. The agent’s own emotional state may influence perception, affecting

426


an agent’s ability to notice an exit sign or an obstacle. Relationships between agents like reproach or admiration may cause a communication message to be ignored or accounted as truth respectively. Perceived events, actions and information from the environment are appraised according to their consequences on the agent’s goals, and the well being of itself as well as other agents. All events, negative or positive, affect one or more of the agent’s emotions, in a varying degree, according to its personality. The level of influence a particular event may have on the agent’s emotional status depends on its evaluation, the agent’s personality, and an association matrix that links personality traits to emotions. This drives the agent into an, intermediate emotional state that affects the agent’s appraisal of the actions of other agents. Attribution-related emotions (pride/shame, admiration/reproach) are updated by evaluating the current desire’s achievement status with respect to an emotional expectation that is associated with each desire. Finally, the agent’s emotional state is updated with the calculated impact. This process is repeated for all events newly perceived. Every agent has a number of competing desires each of which is assigned an importance value. This value is affected by the agent’s emotional state, personality and by his beliefs about the state of the environment and of its group. These desires have been determined by surveys on human action during emergency situation [8] and include: a) move towards an exit, b) receive information, c) transmit information d) join a group, e) maintain a group f) expand a group and g) avoid threat. Each of these is assigned a set of activation conditions and they can become active only if these conditions are met. Once the decision process starts, the activation conditions of all desires are checked and the valid desires are determined. The agent, in every cycle, will try to accomplish the desire with the highest importance value. This value is calculated as the weighted sum of two independent values, one calculated from the agent’s personality and one from his current emotional status. The first is produced using an association matrix that relates specific OCEAN profiles to personality-based importance values for each desire. The relative distance of the agent’s personality value to the profiles in the association matrix determines the personality-based importance value that will be assigned to the active desires. On the other hand, emotion-based importance values are assigned according to agent’s current emotional state and the expected emotional outcome, if the desire is fulfilled. Once an agent is committed to pursuit a desire, a list of possible intention for its fulfillment becomes available. For example “evacuate” desire can be translated to either “move to known exit” or “search for exit” and “follow exit sign”. The selection of the most appropriate intention depends on current knowledge of the world. Choosing an intention translates to basic actions like walk, run or wait, which are affected by the emotional state. For example, agents in a state of panic will be less careful in terms of keeping their personal space and they will not decrease their speed significantly when approaching other agents, leading to inappropriate and dangerous behaviors, such as pushing. Social interactions are modeled through group dynamics. There are two types of groups; static groups representing families and friends that don’t change during the simulation and emergent groups. The latter are formed during the simulation based on agent’s personality, evacuation experience, message exchange, and relationships

A Multi-agent Simulation Framework for Emergency Evacuations

427

established between agents. This relationship once is established is evaluated by terms of achieving a goal, keeping safe and maintaining personal spaces. The size of the groups is also an important factor influencing the merging of nearby groups.

3 Simulation Environment The authors have set up a simulation environment of fire evacuation as an implementation of the proposed framework. The environment is a continuous 2D space in which all static elements are represented as polygonal obstacles and the fire is modeled as a set of expanding regions. The initial agent population, the demographic and personality distribution and the position and spread parameters of fire are userdefined. Agents have individual visual and aural perception abilities and can detect the alarm sound, other agents, the fire, exit signs and exits. They are equipped with a short term memory, which they use to remember the last observed position people and elements that are no longer in their field of view. The visual and aural perception abilities of each agent can be temporarily reduced due to its current emotional state and crowd density. The agents can demonstrate a variety of goal-oriented behaviors. They can explore the environment in search of the exit, a specific group or a specific person; they can move individually, such as following an exit sign or moving to a known exit, or they can perform coordinated motion behaviors, such as following a group or waiting for slower group members to follow. These behaviors are selected according to an agent’s desire with the highest priority and the associated intentions it is committed to. Agents may get injured or die during the simulation if they are found in areas of great congestion or if they find themselves very close to the fire. The authors ran of a series of scenarios under a variety of initial conditions to test the simulation results and to evaluate the proposed framework. The initial tests showed a number of promising results. Emergent groups were formed during evacuation time, due to agents taking the role of a leader and inviting other agents to follow. Some members abandoned the groups because of an increase in anger towards the leader, e.g. due to a series of observed negative events, such as injury of group members or close proximity to the fire. The sight of fire and the time pressure caused an increase in negative emotions, such as fear and distress, and some agents demonstrated non-adaptive pushing behavior. This behavior was appraised negatively by other observer agents, causing distress to spread through the crowd population and leading to an increased number of injuries. Furthermore, the perception of the alarm sound caused agents to seek information about the emergency and to exchange messages with each other about exit routes and fire location. Missing members of preexisting groups caused other group members to search for them, often ignoring bypassing groups and moving to opposite directions.

4 Conclusions and Future Work We presented a simulation framework for crowd evacuation that incorporates computational models of emotion and personality in order to generate realistic behaviors in emergency scenarios. The proposed approach is based on research results

428


about the actual crowd responses observed during real emergency situations or drills. The initial implementation results demonstrated the ability of the simulation platform to generate a variety of behaviors, consistent with real life evacuations. These include emergent group formation, bi-directional motion, altruistic behaviors and emotion propagation. Future work includes further research in emotion models and appraisal theories to formalize the decision making mechanism under evacuation scenarios. Further study of the complex social processes, characterizing group dynamics is also needed. Furthermore, we are planning to run a series of case studies using various age and personality distributions and to compare the results with published data from real emergency evacuations in order to evaluate the validity of the proposed framework.

References 1. Still, G., Review, K.: of pedestrian and evacuation simulations. Int. J. Critical Infrastructures 3(3/4), 376–388 (2007) 2. Pelechano, N., Allbeck, J.M., Badler, N.I.: Virtual Crowds: Methods, Simulation and Control. Morgan & Claypool, San Francisco (2008) 3. Pan, X., Han, C.S., Dauber, K., Law, K.H.: Human and social behavior in computational modeling and analysis of egress. Automation in Construction 15 (2006) 4. Musse, S.R., Thalmann, D.: Hierarchical model for real time simulation of virtual human crowds. IEEE Transaction on Visualization and Computer Graphics, 152–164 (2001) 5. Luo, L., et al.: Agent-based human behavior modeling for crowd simulation. Comput. Animat. Virtual Worlds 19(3-4), 271–281 (2008) 6. Helbing, D., Farkas, I., Vicsek, T.: Simulating dynamics feature of escape panic. Nature, 487–490 (2000) 7. Shao, W., Terzopoulos, D.: Autonomous pedestrians. In: Proc. ACM SIGGRAPH, pp. 19– 28 (2005) 8. Zhao, C.M., Lo, S.M., Liu, M., Zhang, S.P.: A post-fire survey on the pre-evacuation human behavior. Fire Technology 45, 71–95 (2009) 9. Proulx, G.: Occupant Behavior and Evacuation. In: Proceedings of the 9th International Fire Protection Symposium, Munich, May 25-26, 2001, pp. 219–232 (2001) 10. Turner, R.H., Killian, L.M.: Collective Behavior, 3rd edn. Prentice-Hall, Englewood Cliffs (1987) 11. Chertkoff, J.M., Kushigian, R.H.: Don’t Panic. The Psychology of Emergency Egress and Ingress. Praeger, Westport (1999) 12. Costa, P.T., McCrae, R.R.: Normal personality assessment in clinical practice: The NEO personality inventory. Psychological Assessment, 5–13 (1992) 13. Ortony, A.: On making believable emotional agents believable. In: Trappl, R., Petta, P., Payr, S. (eds.) Emotions in humans and artifacts, MIT Press, Cambridge (2003) 14. Zelenski, J., Larsen, R.: Susceptibility to affect: a comparison of three personality taxonomies. Journal of Personality 67(5) (1999)

Author Index

Alexopoulos, Nikolaos D. 9 Amini, Massih-Reza 183 Ampazis, Nikolaos 9 Anagnostopoulos, Dimosthenis Antoniou, Grigoris 213, 265 Antoniou, Maria A. 297 Arvanitopoulos, Nikolaos 19 Avradinis, Nikos 423 Bakopoulos, Yannis 163 Balafoutis, Thanasis 29 Ballesteros, Miguel 39 Bassiliades, Nick 123, 173 Bernardino, Anabela Moreira Bernardino, Eugénia Moreira Bikakis, Antonis 213 Blekas, Konstantinos 203 Bountouri, Lina 321 Bouzas, Dimitrios 19

Hatzi, Ourania Herrera, Jes´ us 123

123 39

Iocchi, Luca 375 Iosif, Elias 133 Jennings, Nicholas R.

49, 303 49, 303

Chasanis, Vasileios 309 Chrysakis, Ioannis 213 Constantopoulos, Panos 1 Dasiopoulou, Stamatia 61 Dimitrov, Todor 71 Doukas, Charalampos N. 243 Dounias, George 101 Dzega, Dorota 223, 315 Fakotakis, Nikos 81, 357, 363 Francisco, Virginia 39 Fykouras, Ilias 143 Gaitanou, Panorea 321 Ganchev, Todor 81, 357 Georgilakis, Pavlos S. 327 Georgopoulos, Efstratios F. 297 Gergatsoulis, Manolis 321 Gerv´ as, Pablo 39 Giannakopoulos, Theodoros 91, 399 Giannakouris, Giorgos 101 Giannopoulos, Vasilis 113 Giannoutakis, Konstantinos M. 333 G´ omez-Pulido, Juan Antonio 49, 303 Goutte, Cyril 183

275

Kalles, Dimitris 143 Kanterakis, Alexandros 233 Karapidakis, Emmanuel S. 327 Karavasilis, Vasileios 153 Karkaletsis, Vangelis 411 Katsigiannis, Yiannis A. 327 Kehagias, Dionisis D. 333 Klenner, Manfred 411 Kompatsiaris, Ioannis 61 Kontopoulos, Efstratios 173 Korokithakis, Stavros 339 Kosmopoulos, Dimitrios I. 91, 345 Kostoulas, Theodoros 357 Kotani, Katsunori 351 Kotinas, Ilias 363 Koumakis, Lefteris 233 Koutroumbas, Konstantinos 163 Kravari, Kalliopi 173 Krithara, Anastasia 183 Lagoudakis, Michail G. 3, 339, 417 Lazaridis, Alexandros 357 Likas, Aristidis 153, 309 Likothanassis, Spiridon D. 297 Lindner, Claudia 193 Lyras, Dimitrios P. 363 Maglogiannis, Ilias 243 Makris, Alexandros 91 Marami, Ermioni 369 Marchetti, Luca 375 Marketakis, Yannis 265 Mavridis, Nikolaos 5 Mavridou, Efthimia 333 Moka, Evangelia 383 Moustakas, Konstantinos 389 Moustakis, Vassilis 233 Mporas, Iosif 81, 357

430

Author Index

Naroska, Edwin 71 Neocleous, Costas 395 Neokleous, Kleanthis 395 Nicolaides, Kypros 395 Nikou, Christophoros 153 Oikonomou, Vangelis P.

Stamatopoulos, Panagiotis Stergiou, Kostas 29

203

Paliouras, Georgios 287 Papatheodorou, Christos 321 Patkos, Theodore 213 Pauli, Josef 71 Peppas, Pavlos 113 Perantonis, Stavros 91, 399 Petrakis, Stefanos 411 Petridis, Sergios 399 Pietruszkiewicz, Wieslaw 223, 315 Plagianakos, Vassilis P. 243 Plexousakis, Dimitris 213 Potamias, George 233 Pothitos, Nikolaos 405 Refanidis, Ioannis 383 Renders, Jean-Michel 183 Rentoumi, Vassiliki 411 Rovatsou, Maria 417

Tasoulis, Sotiris K. 243 Tefas, Anastasios 19, 369 Terzopoulos, Demetri 7 Theodoridis, Sergios 91 Theofilatos, Konstantinos A. 297 Tsatsaronis, George 287 Tzanis, George 255 Tzitzikas, Yannis 265 Tzovaras, Dimitrios 333, 389 Varlamis, Iraklis 287 Varvarigou, Theodora A. 345 Vassiliadis, Vassilios 101 Vassilopoulos, Anastasios P. 297 Vega-Rodr´ıguez, Miguel Angel 49, 303 Vetsikas, Ioannis A. 275 Vlahavas, Ioannis 123, 255 Vosinakis, Spyros 423 Voulodimos, Athanasios S. 345 Vouros, George A. 411 Vrakas, Dimitris 123 Yoshimi, Takehiko

S´ anchez-Pérez, Juan Manuel Schizas, Christos 395 Sgarbas, Kyriakos 363

405

351

49, 303 Zavitsanos, Elias 287 Zoumpoulaki, Alexia 423

Advances in Artificial Intelligence: Theories, Models, and Applications: 6th Hellenic Conference on AI, SETN 2010, Athens, Greece, May 4-7, 2010. ... Lecture Notes in Artificial Intelligence)

Advances in Artificial Intelligence: 23rd Canadian Conference on Artificial Intelligence, Canadian AI 2010, Ottawa, Canada, May 31 - June 2, 2010, ... Lecture Notes in Artificial Intelligence)

Lecture Notes in Artificial Intelligence)

Lecture Notes in Artificial Intelligence

Lecture Notes in Artificial Intelligence)

Lecture Notes in Artificial Intelligence)

Lecture Notes in Artificial Intelligence

Lecture Notes in Artificial Intelligence

Lecture Notes in Artificial Intelligence

Lecture Notes in Artificial Intelligence)

Lecture Notes in Artificial Intelligence)

Lecture Notes in Artificial Intelligence

Lecture Notes in Artificial Intelligence)

Lecture Notes in Artificial Intelligence

Lecture Notes in Artificial Intelligence)

Lecture Notes in Artificial Intelligence

Lecture Notes in Artificial Intelligence

Lecture Notes in Artificial Intelligence)

Lecture Notes in Artificial Intelligence)

Lecture Notes in Artificial Intelligence

Lecture Notes in Artificial Intelligence)

Lecture Notes in Artificial Intelligence

Lecture Notes in Artificial Intelligence)

Lecture Notes in Artificial Intelligence)

Lecture Notes in Artificial Intelligence)

Lecture Notes in Artificial Intelligence

Lecture Notes in Artificial Intelligence

Lecture Notes in Artificial Intelligence)

Lecture Notes in Artificial Intelligence)

Lecture Notes in Artificial Intelligence

Lecture Notes in Artificial Intelligence)

Recommend Documents