ECAI 2008
Frontiers in Artificial Intelligence and Applications FAIA covers all aspects of theoretical and applied artificial intelligence research in the form of monographs, doctoral dissertations, textbooks, handbooks and proceedings volumes. The FAIA series contains several sub-series, including “Information Modelling and Knowledge Bases” and “Knowledge-Based Intelligent Engineering Systems”. It also includes the biennial ECAI, the European Conference on Artificial Intelligence, proceedings volumes, and other ECCAI – the European Coordinating Committee on Artificial Intelligence – sponsored publications. An editorial panel of internationally well-known scholars is appointed to provide a high quality selection. Series Editors: J. Breuker, R. Dieng-Kuntz, N. Guarino, J.N. Kok, J. Liu, R. López de Mántaras, R. Mizoguchi, M. Musen, S.K. Pal and N. Zhong
Volume 178 Recently published in this series Vol. 177. Vol. 176. Vol. 175. Vol. 174. Vol. 173. Vol. 172. Vol. 171. Vol. 170. Vol. 169. Vol. 168. Vol. 167. Vol. 166. Vol. 165. Vol. 164. Vol. 163. Vol. 162. Vol. 161. Vol. 160. Vol. 159. Vol. 158. Vol. 157. Vol. 156. Vol. 155. Vol. 154. Vol. 153. Vol. 152. Vol. 151. Vol. 150. Vol. 149. Vol. 148. Vol. 147. Vol. 146. Vol. 145. Vol. 144.
C. Soares et al. (Eds.), Applications of Data Mining in E-Business and Finance P. Zaraté et al. (Eds.), Collaborative Decision Making: Perspectives and Challenges A. Briggle, K. Waelbers and P.A.E. Brey (Eds.), Current Issues in Computing and Philosophy S. Borgo and L. Lesmo (Eds.), Formal Ontologies Meet Industry A. Holst et al. (Eds.), Tenth Scandinavian Conference on Artificial Intelligence – SCAI 2008 Ph. Besnard et al. (Eds.), Computational Models of Argument – Proceedings of COMMA 2008 P. Wang et al. (Eds.), Artificial General Intelligence 2008 – Proceedings of the First AGI Conference J.D. Velásquez and V. Palade, Adaptive Web Sites – A Knowledge Extraction from Web Data Approach C. Branki et al. (Eds.), Techniques and Applications for Mobile Commerce – Proceedings of TAMoCo 2008 C. Riggelsen, Approximation Methods for Efficient Learning of Bayesian Networks P. Buitelaar and P. Cimiano (Eds.), Ontology Learning and Population: Bridging the Gap between Text and Knowledge H. Jaakkola, Y. Kiyoki and T. Tokuda (Eds.), Information Modelling and Knowledge Bases XIX A.R. Lodder and L. Mommers (Eds.), Legal Knowledge and Information Systems – JURIX 2007: The Twentieth Annual Conference J.C. Augusto and D. Shapiro (Eds.), Advances in Ambient Intelligence C. Angulo and L. Godo (Eds.), Artificial Intelligence Research and Development T. Hirashima et al. (Eds.), Supporting Learning Flow Through Integrative Technologies H. Fujita and D. Pisanelli (Eds.), New Trends in Software Methodologies, Tools and Techniques – Proceedings of the sixth SoMeT_07 I. Maglogiannis et al. (Eds.), Emerging Artificial Intelligence Applications in Computer Engineering – Real World AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies E. Tyugu, Algorithms and Architectures of Artificial Intelligence R. Luckin et al. (Eds.), Artificial Intelligence in Education – Building Technology Rich Learning Contexts That Work B. Goertzel and P. Wang (Eds.), Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms – Proceedings of the AGI Workshop 2006 R.M. Colomb, Ontology and the Semantic Web O. Vasilecas et al. (Eds.), Databases and Information Systems IV – Selected Papers from the Seventh International Baltic Conference DB&IS’2006 M. Duží et al. (Eds.), Information Modelling and Knowledge Bases XVIII Y. Vogiazou, Design for Emergence – Collaborative Social Play with Online and Location-Based Media T.M. van Engers (Ed.), Legal Knowledge and Information Systems – JURIX 2006: The Nineteenth Annual Conference R. Mizoguchi et al. (Eds.), Learning by Effective Utilization of Technologies: Facilitating Intercultural Understanding B. Bennett and C. Fellbaum (Eds.), Formal Ontology in Information Systems – Proceedings of the Fourth International Conference (FOIS 2006) X.F. Zha and R.J. Howlett (Eds.), Integrated Intelligent Systems for Engineering Design K. Kersting, An Inductive Logic Programming Approach to Statistical Relational Learning H. Fujita and M. Mejri (Eds.), New Trends in Software Methodologies, Tools and Techniques – Proceedings of the fifth SoMeT_06 M. Polit et al. (Eds.), Artificial Intelligence Research and Development A.J. Knobbe, Multi-Relational Data Mining P.E. Dunne and T.J.M. Bench-Capon (Eds.), Computational Models of Argument – Proceedings of COMMA 2006
ISSN 0922-6389
ECAI 2008 18th European Conference on Artificial Intelligence July 21–25, 2008, Patras, Greece Including
Prestigious Applications of Intelligent Systems (PAIS 2008)
Proceedings Edited by
Malik Ghallab INRIA, France
Constantine D. Spyropoulos NCSR Demokritos, Greece
Nikos Fakotakis University of Patras, Greece
and
Nikos Avouris University of Patras, Greece
Organized by the European Coordinating Committee for Artificial Intelligence (ECCAI) and the Hellenic Artificial Intelligence Society (EETN) Hosted by the University of Patras, Greece
Amsterdam • Berlin • Oxford • Tokyo • Washington, DC
© 2008 The authors and IOS Press. All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without prior written permission from the publisher. ISBN 978-1-58603-891-5 Library of Congress Control Number: 2008905319 Publisher IOS Press Nieuwe Hemweg 6B 1013 BG Amsterdam Netherlands fax: +31 20 687 0019 e-mail:
[email protected] Distributor in the UK and Ireland Gazelle Books Services Ltd. White Cross Mills Hightown Lancaster LA1 4XS United Kingdom fax: +44 1524 63232 e-mail:
[email protected] Distributor in the USA and Canada IOS Press, Inc. 4502 Rachael Manor Drive Fairfax, VA 22032 USA fax: +1 703 323 3668 e-mail:
[email protected] LEGAL NOTICE The publisher is not responsible for the use which might be made of the following information. PRINTED IN THE NETHERLANDS
v
ECCAI Member Societies ACIA (Spain) Catalan Association for Artificial Intelligence (Associació Catalana d’Intelligència Artificial) ADUIS (Ukrain) Association of Developers and Users of Intelligent Systems AEPIA (Spain) Spanish Association for Artificial Intelligence (Asociación Española para la Inteligencia Artificial) AFIA (France) French Association for Artificial Intelligence (Association Française pour l’Intelligence Artificielle) AIAI (Ireland) Artificial Intelligence Association of Ireland AIIA (Italy) Italian Association for Artificial Intelligence (Associazione Italiana per l’Intelligenza Artificiale) AISB (United Kingdom) Society for the Study of Artificial Intelligence and the Simulation of Behaviour APPIA (Portugal) Portuguese Association for Artificial Intelligence (Associação Portuguesa para a Inteligência Artificial) BAIA (Bulgaria) Bulgarian Artificial Intelligence Association BCS-SGAI (United Kingdom) British Computer Society Specialist Group on Artificial Intelligence BNVKI (Belgium/Netherlands) Belgian-Dutch Association for Artificial Intelligence (Belgisch-Nederlandse Vereniging voor Kunstmatige Intelligentie) CSKI (Czech Republic) Czech Society for Cybernetics and Informatics (Ceská spolecnost pro kybernetiku a informatiku) DAIS (Denmark) Danish Artificial Intelligence Society EETN (Greece) Hellenic Artificial Intelligence Society FAIS (Finland) Finnish Artificial Intelligence Society (Suomen Tekoälyseura ry) GI/KI (Germany) German Informatics Association (Gesellschaft für Informatik; Sektion KI e.V.) IAAI (Israel) Israeli Association for Artificial Intelligence LANO (Latvia) Latvian National Organisation of Automatics (Latvijas Automatikas Nacionala Organizacija) LIKS-AIS (Lithuania) Lithuanian Computer Society–Artificial Intelligence Section (Lietuvos Kompiuterininku Sajunga) NJSZT (Hungary) John von Neumann Society for Computing Sciences (Neumann János Számítógéptudományi Társaság) ÖGAI (Austria) Austrian Society for Artificial Intelligence (Österreichische Gesellschaft für Artificial Intelligence) RAAI (Russia) Russian Association for Artificial Intelligence SAIS (Sweden) Swedish Artificial Intelligence Society SGAICO (Switzerland) Swiss Group for Artificial Intelligence and Cognitive Science (Schweizer Informatiker Gesellschaft) SLAIS (Slovenia) Slovenian Artificial Intelligence Society (Slovensko drustvo za umetno inteligenco) SSKI SAV (Slovak Republic) Slovak Society for Cybernetics and Informatics at Slovak Academy of Sciences (Slovenská spolocnost pre kybernetiku a informatiku pri Slovenskej akadémii vied)
This page intentionally left blank
vii
ECAI 2008 Conference Chair Constantine D. Spyropoulos, Greece
Programme Committee Chair Malik Ghallab, France
Organizing Committee Chairs Nikos Fakotakis, Greece Nikos Avouris, Greece
Workshops Chairs Boi Faltings, Switzerland Ioannis Vlahavas, Greece
Demonstration Systems Chair Nikos Karacapilidis, Greece
Area Chairs Antoniou, Grigoris, Greece Benhamou, Frédéric, France Bessiere, Christian, France Console, Luca, Italy Cordier, Marie-Odile, France Dague, Philippe, France De Raedt, Luc, Belgium Flach, Peter, UK Geffner, Hector, Spain Horrocks, Ian, UK Ingrand, Felix, France Lakemeyer, Gerhard, Germany Lang, Jérôme, France Milano, Michela, Italy
Myllymaki, Petri, Finland Oliveira, Eugenio, Portugal Pazienza, Maria Teressa, Italy Saffiotti, Alessandro, Sweden Struss, Peter, Germany Thiébaux, Sylvie, Austria Torasso, Pietro, Italy Traverso, Paolo, Italy Trousse, Brigitte, France Uszkoreit, Hans, Germany Van Harmelen, Frank, The Netherlands Van Someren, Maarten, The Netherlands Verfaillie, Gérard, France
viii
PAIS 2008 Chairs Nick Jennings, United Kingdom Alex Rogers, United Kingdom
PAIS Programme Committee Stuart Aitken, UK Joachim Baumeister, Germany Jeremy Baxter, UK Riccardo Bellazzi, Italy Michael Berger, Germany Stefan Bussmann, Germany Andrew Byde, UK Monique Calisti, Switzerland Simon Case, UK Pádraig Cunningham, Ireland Ian Dickinson, UK Partha Dutta, UK
Floriana Esposito, Italy Robert Ghanea-Hercock, UK Josep Lluis Arcos, Spain Simon Maskell, UK David Nicholson, UK Michal Pechoucek, Czech Republic Nicola Policella, Germany Sarvapali Ramchurn, UK Oliviero Stock, Italy Jerome Thomas, France Simon Thompson, UK Franz Wotawa, Austria
ix
ECAI Programme Committee Agirre, Eneko, ES Ågotnes, Thomas, NO Ait-Mokhtar, Salah, FR Alechina, Natasha, UK Alonso, Carlos, ES Alonso, Eduardo, UK Amgoud, Leila, FR Ananiadou, Sophia, UK Antunes, Luis, PT Ardissono, Liliana, IT Areces, Carlos, FR Assayag, Gerard, FR Avesani, Paolo, IT Baldwin, Timothy, AU Baroglio, Cristina, IT Bartak, Roman, CZ Basili, Roberto, IT Battiti, Roberto, IT Beaufils, Bruno, FR Beck, Christopher, CA Beetz, Michael, DE Beldiceanu, Nicolas, FR Ben Naim, Jonathan, FR Bertoli, Piergiorgio, IT Besnard, Philippe, FR Biau, Gérard, FR Biswas, Gautam, US Blockeel, Hendrik, BE Boella, Guido, IT Boissier, Olivier, FR Bonet, Blai, VE Bonnefon, J.-F., FR Booth, Richard, TH Bordeaux, Lucas, UK Borrajo, Daniel, ES Bouchon-Meunier, B., FR Bouillon, Pierrette, CH Bouquet, Paolo, IT Bourreau, Eric, FR Bozzano, Marco, IT Brafman, Ronen, IL Brazdil, Pavel, PT Brown, Ken, IE Brugali, Davide, IT Buffet, Olivier, FR Buntine, Wray, AU Busquets, Didac, ES Cali, Andrea, UK
Camps, Valerie, FR Cancedda, Nicola, FR Cardoso, Amilcar, PT Carlsson, Mats, SE Carroll, John, US Ceberio, Martine, US Chades, Iadine, FR Charpillet, Francois, FR Chevaleyre, Yann, FR Cholvy, Laurence, FR Christie, Marc, FR Coelho, Helder, PT Coghill, George, UK Cohen, David, UK Collet, Jacques, FR Comet, Jean-Paul, FR Conitzer, Vincent, US Cornet, Ronald, NL Cortes, Juan, FR Cortés, Ulises, ES Coste-Manière, Eve, FR Coste-Marquis, Sylvie, FR Crowley, James, FR Cuenca Grau, Bernardo, UK Cussens, James, UK David, Bertrand, FR De Giacomo, Giuseppe, IT De Jong, Hidde, FR De Kleer, Johan, US De Ruyter, Boris, NL de Vries, Gerben Klaas Dirk, NL Dechter, Rina, US Delgrande, James, CA Demazeau, Yves, FR Devy, Michel, FR Dignum, Frank, NL Dignum, Virginia, NL Dimitrakakis, Christos, NL Dombre, Etienne, FR Domingue, John, UK Domshlak, Carmel, IL Dousson, Christophe, FR Dressler, Oskar, DE Duckett, Tom, UK Dutech, Alain, FR Edelkamp, Stefan, DE Eisele, Andreas, DE Eiter, Thomas, AT
El Fallah, S. Amal, FR Elkind, Edith, UK Endriss, Ulle, NL Erdem, Esra, TR Esteva, Marc, ES Euzenat, Jérôme, FR Eveillard, Damien, FR Ferber, Jacques, FR Faltings, Boi, CH Fargier, Hélène, FR Feelders, Ad, NL Fern, Alan, US Fernandez-Madrigal, J.-A, ES Ferrane, Isabelle, FR Ferré, Sébastien, FR Finzi, Alberto, IT Fischer, Klaus, DE Fisher, Michael, UK Forbus, Ken, US Fornara, Nicoletta, CH Fox, Maria, UK Frank, Eibe, NZ Frasconi, Paolo, IT Friedrich, Gerhard, AT Fuernkranz, Johannes, DE Gama, Joao, PT Gebhard, Patrick, DE Gent, Ian, UK Ghidini, Chiara, IT Giordana, Attilio, IT Giordano, Laura, IT Giovannucci, Andrea, ES Giunchiglia, Enrico, IT Gleizes, Marie-Pierre, FR Glimm, Birte, UK Godo, Lluis, ES Goethals, Bart, BE Gordillo, Jose-Luis, MX Governatori, Guido, AU Grastien, Alban, AU Gribonval, Rémi, FR Grobelnik, Marko, SI Gros, Patrick, FR Grosclaude, Irene, FR Grossi, Davide, LU Grunwald, Peter, NL Guéré, Emmanuel, FR Haarslev, Volker, CA
x
Haase, Peter, DE Habet, Djamal, FR Hajicova, Eva, CZ Hansen, Eric, US Harrenstein, Paul, DE Haslum, Patrik, AU Haton, Jean-Paul, FR Hayes, Pat, US Helmert, Malte, DE Hernandez, Daniel, DE Hernandez-Orallo, Jose, ES Hertzberg, Joachim, DE Herzig, Andreas, FR Hitzler, Pascal, DE Hofbaur, Michael, AT Hoffmann, Joerg, AT Hollink, Vera, NL Hoos, Holger, CA Hosobe, Hiroshi, JP Hu, Wei, CN Huang, Jinbo, AU Huang, Zhisheng, NL Huget, Marc-Philippe, FR Hunter, Aaron, CA Hunter, Anthony, UK Hustadt, Ullrich, UK Infantes, Guillaume, US Ironi, Liliana, IT Isaac, Antoine, NL Jaeger, Manfred, DK Jaffar, Joxan, SG Jannin, Pierre, FR Jonsson, Anders, ES Julio, Alferes Jose, PT Junker, Ulrich, FR Jéron, Thierry, FR Kayser, Daniel, FR Kalech, Meir, US Kalfoglou, Yannis, UK Kalyanpur, Aditya, US Kaplunova, Alissa, DE Karlsson, Lars, SE Kaski, Samuel, FI Kazakov, Yevgeny, UK Kern-Isberner, Gabriele, DE Kersting, Kristian, DE Klein, Michel, NL Koehn, Philipp, UK Koivisto, Mikko, FI Kok, Joost, NL
Konieczny, Sébastien, FR Koubarakis, Manolis, GR Krose, Ben, NL Krüger, Antonio, DE Kudenko, Daniel, UK Kuesters, Ralf, DE Lachiche, Nicolas, FR Lacroix, Simon, FR Lafortune, Stephane, US Lallouet, Arnaud, FR Lamperti, Gianfranco, IT Lanfranchi, Vitaveska, UK Larranaga, Pedro, ES Lavrac, Nada, Slovenia Lechevallier, Yves, FR Lecoutre, Christophe, FR Lembo, Domenico, IT Lesperance, Yves, CA Levene, Mark, UK Lima, Pedro, PT Liz, Sonenberg, AU Long, Derek, UK Longin, Dominique, FR Lorini, Emiliano, FR Lucas, Peter, NL Luis, Correia, PT Lukasiewicz, Thomas, UK Lutz, Carsten, DE López de Mántaras, R., ES Mackay, Wendy, FR Magro, Diego, IT Malerba, Donato, IT Manya, Felip, ES Marchand, Hervé, FR Marquis, Pierre , FR Martelli, Alberto, IT Massa, Paolo, IT Massimo, Zanzotto F., IT Maudet, Nicolas, FR McNeill, Fiona, UK Meisels, Amnon, IL Mendes, Rui, PT Mengin, Jerome, FR Meo, Rosa, IT Meseguer, Pedro, ES Meyer, Tommie, ZA Michel, Laurent, US Milicic, Maja, DE Mille, Alain, FR Mobasher, Bamshad, US
Moeller, Ralf, DE Monfroy, Eric, CL Mosterman, Pieter, US Motik, Boris, UK Mouaddib, Abdel-Illah, FR Muggleton, Stephen, UK Màrquez, Lluís, ES Napoli, Amedeo, FR Narasimhan, Sriram, US Nardi, Daniele, IT Nayak, Abhaya, AU Neumann, Guenter, DE Niemela, Ilkka, FI Nijholt, Anton, NL Nijssen, Siegfried, BE Nivre, Joakim, SE Noirhomme, Monique, BE Nunes, Luís, PT Nyberg, Mattias, SE O’Sullivan, Barry, IE Oddi, Angelo, IT Oepen, Stephan, NO Omicini, Andrea, IT Oriolo, Giuseppe, IT Ossowski, Sascha, ES Ozturk, Escoffier M., FR Pagnucco, Maurice, AU Palacios, Hector, ES Paliouras, Georgios, GR Pan, Jeff, UK Paolucci, Mario, IT Paquet, Thierry, FR Parsia, Bijan, UK Paternò, Fabio, IT Patino Vilchis, Jose Luis, FR Paula, Rocha Ana, PT Payne, Terry, UK Peek, Niels, NL Peischl, Bernhard, AT Pena, Jose, SE Pencolé, Yannick, FR Peppas, Pavlos, GR Perini, Anna, IT Perron, Laurent, FR Petrelli, Daniela, UK Pfahringer, Bernhard, NZ Pianesi, Fabio, IT Picardi, Claudia, IT Pirri, Fiora, IT Poesio, Massimo, IT
xi
Poibeau, Thierry, FR Portinale, Luigi, IT Pralet, Cédric, FR Price, Chris, UK Provan, Gregory, IE Pulido, Junquera B., ES Pulman, Stephen, UK Putnik, Goran, PT Pélachaud, Catherine, FR Quiniou, René, FR Quinou, Rene, FR Regin, Jean-Charles, FR Reis, Luis Paulo, PT Remondino, Marco, IT Renz, Jochen, AU Retore, Christian, FR Ricci, Francesco, IT Rintanen, Jussi, AU Robertson, Dave, UK Rochart, Guillaume, FR Roli, Andrea, IT Roos, Teemu, FI Rosati, Riccardo, IT Rosec, Olivier, FR Rossi, Francesca, IT Rousset, Marie-Christine, FR Rudova, Hana, CZ Ruml, Wheeler, US Sabbadin, Régis, FR Sabou, Marta, UK Sabouret, Nicolas, FR Sachenbacher, Martin, DE Salido, Miguel, ES Sanchez, Daniel, ES Sanner, Scott, AU Sattler, Uli, UK Saubion, Frederic, FR Sauro, Luigi, IT
Saïs, Lakhdar, FR Schaub, Torsten, DE Schiex, Thomas, FR Schlobach, Stefan, NL Schmid, Helmut, DE Schulte, Christian, SE Schulte, im Walde S., DE Schumann, Anika, AU Schwind, Camilla, FR Sellmann, Meinolf, US Semeraro, Giovanni, IT Serafini, Luciano, IT Serrurier, Mathieu, FR Shapiro, Steven, CA Shvaiko, Pavel, IT Sidobre, Daniel, FR Siegel, Anne, FR Simeon, Nicola, FR Simon, Laurent, FR Simonis, Helmut, IE Simov, Kiril, Bulgaria Smith, Barbara, UK Sprinkhuizen-Kuyper I., NL Stamou, Giorgos, GR Stede, Manfred, DE Stergiou, Kostas, GR Stuckenschmidt, Heiner, DE Stumme, Gerd, DE Stumptner, Markus, AU Stylianou, Yannis, GR Teichteil-Königsbuch, F., FR Ten Teije, Annette, NL Terenziani, Paolo, IT Terna, Pietro, IT Terrioux, Cyril, FR Tessaris, Sergio, IT Theseider Dupré, Daniele, IT Thielscher, Michael, DE
Thonnat, Monique, FR Torta, Gianluca, IT Trave-Massuyes, L., FR Trombettoni, Gilles, FR Truszczynski, Miroslaw, US Tsoukias, Alexis, FR Van Atteveldt, Wouter, NL Van Beek, Peter, CA Van Ditmarsch, Hans, NZ Van Hage, Willem, NL Van Hentenryck, Pascal, US Van Hoeve, Willem-Jan, US Van den Bosch, Antal, NL Van der Torre, Leon, LU Verhagen, Harko, SE Viappiani, Paolo, CA Vidal, Thierry, FR Vidal, Vincent, FR Vincent, Nicole, FR Volz, Raphael, DE Wallace, Mark, AU Wang, Kewen, AU Wang, Shenghui, NL Webb, Nick, US Weibelzahl, Stephan, IE Weydert, Emil, LU Widmer, Gerhard, AT Wilks, Yorick, UK Williams, Mary-Anne, AU Wilson, Nic, IE Wotawa, Franz, AT Wrobel, Stefan, DE Yangarber, Roman, FI Yap, Roland, SG Yokoo, Makoto, JP Yu, Huizhen, FI Zancanaro, Massimo, IT Zanella, Marina, IT
This page intentionally left blank
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
xiii
Preface Artificial Intelligence is a highly creative field. Numerous research areas in Computer Science that originated over the past fifty years within AI laboratories and were discussed in AI conferences are now completely independent and mature research domains whose young practitioners may not even be acquainted with the AI affiliation. It is fortunate to see that while disseminating and spreading out, the AI field per se remains very active. This is particularly the case in Europe. The ECAI series of conferences keeps growing. This 18th edition received more submissions than the previous ones. About 680 papers and posters were registered at ECAI 2008 conference system, out of which 518 papers and 43 posters were actually reviewed. The program committee decided to accept • •
121 full papers, an acceptance rate of 23%, and 97 posters.
Several submitted full papers have been accepted as posters. All posters, presented in these Proceedings as short papers, will have formal presentation slots in the technical sessions of the main program of the conference, as well as poster presentations within a specific session. The 561 reviewed submissions were originated from 51 different countries, out of which 35 countries are represented in the final program. The following table shows the number of submitted and accepted papers or posters per country, based on the contact author affiliation. Country Australia Austria Belgium Brazil Bulgaria Canada Chile China Cyprus Czech Republic Denmark Egypt Finland France Germany Greece Hungary
Sub. Acc. 26 12 12 6 4 3 13 1 1 1 13 6 1 6 3 1 1 6 1 1 1 1 4 3 116 42 49 20 34 14 1
Country India Iran Ireland Israel Italy Japan Korea Luxembourg Malaysia Malta Mexico Morocco Netherlands New Zealand Norway Pakistan Poland
Sub. Acc. 2 5 1 13 6 6 2 43 19 9 4 2 4 2 2 1 1 1 1 1 1 23 11 1 2 1 1 4
Country Sub. Acc. Portugal 17 6 Romania 4 1 Russia 4 Saudi Arabia 1 Singapore 1 Slovenia 4 3 South Africa 2 Spain 35 12 Sweden 9 5 Switzerland 2 Taiwan 2 1 Thailand 1 Tunisia 5 1 Turkey 3 1 United Kingdom 46 19 United States 15 6 Venezuela 1
The distribution of the 561 submitted and the 218 accepted paper or posters over reviewing areas (based on the first keyword chosen by the authors) is given below. With respect to previous ECAI conferences, one may notice a relative growth of the Machine Learning and Cognitive Modeling & Interaction areas. The rest of the distribution remains about stable, with marginal fluctuations given that areas are overlapping and their frontiers are not sharp.
xiv
ECAI 2008 Conference Areas KR&R Machine Learning Distributed & Multi-agents Systems Cognitive Modeling & Interaction Constraints and search Model-based Reasoning and Diagnosis NLP Planning and scheduling Perception, Sensing and Cognitive Robotics Uncertainty in AI
Papers Submitted 102 102 92 57 51 51 47 33 14 12 561
Papers Accepted 42 32 37 17 20 26 18 13 6 7 218
The Prestigious Applications of Intelligent Systems (PAIS), ECAI associated subconference, has also been very successful this year by the number and quality of submitted papers. Its program committee received 35 submissions in total and accepted 11 full papers, and 4 additional papers with short presentations. In conclusion, we are very happy to introduce you to the Proceedings of this 18th edition of ECAI, a conference that is growing and maintaining a high standard of quality. The success of this edition is due to the contribution and support of many colleagues. We would like to gratefully thank all those who helped organizing ECAI 2008 into a tremendous success. Area chairs, PAIS, workshop chairs and workshop organizers as well as the Systems Demonstration Chair were the key actors of this success. They managed timely and efficiently a heavy workload. Much thanks in particular to Felix Ingrand, who acted not only area chair but also as a program co-chair through the overall process. PC members provided high quality reviews and contributed to detailed discussions of several papers before reaching a decision. Finally, to all the persons involved in the local organization of the conference, many thanks for a tremendous amount of excellent work and much appreciated help. June 2008
Malik Ghallab Constantine Spyropoulos Nikos Fakotakis Nikos Avouris
xv
Contents ECCAI Member Societies
v
Conference Organization
vii
ECAI Programme Committee
ix
Preface Malik Ghallab, Constantine D. Spyropoulos, Nikos Fakotakis and Nikos Avouris
xiii
I. Invited Talks Semantic Activity Recognition Monique Thonnat
3
Bayesian Methods for Artificial Intelligence and Machine Learning Zoubin Ghahramani
8
The Impact of Constraint Programming Pascal Van Hentenryck
9
Web Science George Metakides
10
II. Papers 1. Knowledge Representation and Reasoning Advanced Preprocessing for Answer Set Solving Martin Gebser, Benjamin Kaufmann, André Neumann and Torsten Schaub
15
A Generic Framework for Comparing Semantic Similarities on a Subsumption Hierarchy Emmanuel Blanchard, Mounira Harzallah and Pascale Kuntz
20
Complexity of Subsumption in the EL Family of Description Logics: Acyclic and Cyclic TBoxes Christoph Haase and Carsten Lutz
25
Reasoning About Dynamic Depth Profiles Mikhail Soutchanski and Paulo Santos
30
Comparing Abductive Theories Katsumi Inoue and Chiaki Sakama
35
Privacy-Preserving Query Answering in Logic-Based Information Systems Bernardo Cuenca Grau and Ian Horrocks
40
Optimizing Causal Link Based Web Service Composition Freddy Lécué, Alexandre Delteil and Alain Léger
45
Extending the Knowledge Compilation Map: Closure Principles Hélène Fargier and Pierre Marquis
50
Semantic Modularity and Module Extraction in Description Logics Boris Konev, Carsten Lutz, Dirk Walther and Frank Wolter
55
New Results for Horn Cores and Envelopes of Horn Disjunctions Thomas Eiter and Kazuhisa Makino
60
xvi
Belief Revision with Reinforcement Learning for Interactive Object Recognition Thomas Leopold, Gabriele Kern-Isberner and Gabriele Peters
65
A Formal Approach for RDF/S Ontology Evolution George Konstantinidis, Giorgos Flouris, Grigoris Antoniou and Vassilis Christophides
70
Modular Equivalence in General Tomi Janhunen
75
Description Logic Rules Markus Krötzsch, Sebastian Rudolph and Pascal Hitzler
80
Conflicts Between Relevance-Sensitive and Iterated Belief Revision Pavlos Peppas, Anastasios Michael Fotinopoulos and Stella Seremetaki
85
Conservativity in Structured Ontologies Oliver Kutz and Till Mossakowski
89
Removed Sets Fusion: Performing off the Shelf Julien Hué, Eric Würbel and Odile Papini
94
A Coherent Well-Founded Model for Hybrid MKNF Knowledge Bases Matthias Knorr, José Júlio Alferes and Pascal Hitzler
99
2. Machine Learning Prototype-Based Domain Description Fabrizio Angiulli
107
Online Rule Learning via Weighted Model Counting Frédéric Koriche
112
Focused Ensemble Selection: A Diversity-Based Method for Greedy Ensemble Selection Ioannis Partalas, Grigorios Tsoumakas and Ioannis Vlahavas
117
MTForest: Ensemble Decision Trees Based on Multi-Task Learning Qing Wang, Liang Zhang, Mingmin Chi and Jiankui Guo
122
Many-Valued Concept Lattices for Conceptual Clustering and Information Retrieval Nizar Messai, Marie-Dominique Devignes, Amedeo Napoli and Malika Smail-Tabbone
127
Online Optimization for Variable Selection in Data Streams Christoforos Anagnostopoulos, Dimitris K. Tasoulis, David J. Hand and Niall M. Adams
132
Sub Node Extraction with Tree Based Wrappers Stefan Raeymaekers and Maurice Bruynooghe
137
Automatic Recurrent ANN Development for Signal Classification: Detection of Seizures in EEGs Daniel Rivero, Julian Dorado, Juan Rabuñal and Alejandro Pazos
142
A Method for Classifying Vertices of Labeled Graphs Applied to Knowledge Discovery from Molecules Frédéric Pennerath, Géraldine Polaillon and Amedeo Napoli
147
Nonnegative Decompositions with Resampling for Improving Gene Expression Data Biclustering Stability Liviu Badea and Doina Ţilivea
152
Exploiting Locality of Interactions Using a Policy-Gradient Approach in Multiagent Learning Francisco S. Melo
157
xvii
A Fast Method for Property Prediction in Graph-Structured Data from Positive and Unlabelled Examples Susanne Hoche, Peter Flach and David Hardcastle
162
VCD Bounds for Some GP Genotypes José Luis Montaña
167
Robust Division in Clustering of Streaming Time Series Pedro Pereira Rodrigues and João Gama
172
3. Model-Based Diagnosis and Reasoning Generating Diagnoses from Conflict Sets with Continuous Attributes Emmanuel Benazera and Louise Travé-Massuyés
179
A Compositional Mathematical Model of Machines Transporting Rigid Objects Peter Struss, Axel Kather, Dominik Schneider and Tobias Voigt
184
Model-Based Diagnosis of Discrete Event Systems with an Incomplete System Model Xiangfu Zhao and Dantong Ouyang
189
Chronicles for On-Line Diagnosis of Distributed Systems Xavier Le Guillou, Marie-Odile Cordier, Sophie Robin and Laurence Rozé
194
Test Generation for Model-Based Diagnosis Gregory Provan
199
Observation-Subsumption Checking in Similarity-Based Diagnosis of Discrete-Event Systems Gianfranco Lamperti and Marina Zanella
204
Local Consistency and Junction Tree for Diagnosis of Discrete-Event Systems Priscilla Kan John and Alban Grastien
209
Hierarchical Explanation of Inference in Bayesian Networks that Represent a Population of Independent Agents Peter Šutovský and Gregory F. Cooper
214
Coupling Continuous and Discrete Event System Techniques for Hybrid System Diagnosability Analysis Mehdi Bayoudh, Louise Travé-Massuyès and Xavier Olive
219
A Probabilistic Analysis of Diagnosability in Discrete Event Systems Farid Nouioua and Philippe Dague
224
Temporal Logic Patterns for Querying Qualitative Models of Genetic Regulatory Networks Pedro T. Monteiro, Delphine Ropers, Radu Mateescu, Ana T. Freitas and Hidde de Jong
229
Fighting Knowledge Acquisition Bottleneck with Argument Based Machine Learning Martin Možina, Matej Guid, Jana Krivec, Aleksander Sadikov and Ivan Bratko
234
4. Cognitive Modeling and Interaction Automatic Page Turning for Musicians via Real-Time Machine Listening Andreas Arzt, Gerhard Widmer and Simon Dixon
241
CDL: An Integrated Framework for Context Specification and Recognition
246
Fulvio Mastrogiovanni, Antonello Scalmato, Antonio Sgorbissa and Renato Zaccaria Web Page Prediction Based on Conditional Random Fields Yong Zhen Guo, Kotagiri Ramamohanarao and Laurence A.F. Park
251
xviii
A Formal Model of Emotions: Integrating Qualitative and Quantitative Aspects Bas R. Steunebrink, Mehdi Dastani and John-Jules Ch. Meyer
256
Modeling Collaborative Similarity with the Signed Resistance Distance Kernel Jérôme Kunegis, Stephan Schmidt, Şahin Albayrak, Christian Bauckhage and Martin Mehlitz
261
Modeling the Dynamics of Mood and Depression Fiemke Both, Mark Hoogendoorn, Michel Klein and Jan Treur
266
Groovy Neural Networks Axel Tidemann and Yiannis Demiris
271
An Efficient Student Model Based on Student Performance and Metadata Arndt Faulhaber and Erica Melis
276
5. Natural Language Processing Reducing Bias Effects in DOP Parameter Estimation Evita Linardaki
283
Multilingual Evidence Improves Clustering-Based Taxonomy Extraction Hans Hjelm and Paul Buitelaar
288
Unsupervised Grammar Induction Using a Parent Based Constituent Context Model Seyed Abolghasem Mirroshandel and Gholamreza Ghassem-Sani
293
Word Sense Induction Using Graphs of Collocations Ioannis P. Klapaftis and Suresh Manandhar
298
Learning Context-Free Grammars to Extract Relations from Text Georgios Petasis, Vangelis Karkaletsis, Georgios Paliouras and Constantine D. Spyropoulos
303
Talking Points in Metaphor: A Concise Usage-Based Representation for Figurative Processing Tony Veale and Yanfen Hao
308
Semantic Decomposition for Question Answering Sven Hartrumpf
313
Finding Key Bloggers, One Post at a Time Wouter Weerkamp, Krisztian Balog and Maarten de Rijke
318
Why Is This Wrong? – Diagnosing Erroneous Speech Recognizer Output with a Two Phase Parser Bernd Ludwig and Martin Hacker
323
Task Driven Coreference Resolution for Relation Extraction Feiyu Xu, Hans Uszkoreit and Hong Li
328
WWW Sits the SAT: Measuring Relational Similarity on the Web Danushka Bollegala, Yutaka Matsuo and Mitsuru Ishizuka
333
Improved Statistical Machine Translation Using Monolingual Paraphrases Preslav Nakov
338
Orthographic Similarity Search for Dictionary Lookup of Japanese Words Lars Yencken and Timothy Baldwin
343
6. Uncertainty and AI From Belief Change to Preference Change Jérôme Lang and Leendert van der Torre
351
xix
A General Model for Epistemic State Revision Using Plausibility Measures Jianbing Ma and Weiru Liu
356
Structure Learning of Markov Logic Networks Through Iterated Local Search Marenglen Biba, Stefano Ferilli and Floriana Esposito
361
Single-Peaked Consistency and Its Complexity Bruno Escoffier, Jérôme Lang and Meltem Öztürk
366
Belief Revision Through Forgetting Conditionals in Conditional Probabilistic Logic Programs Anbu Yue and Weiru Liu
371
Mastering the Processing of Preferences by Using Symbolic Priorities in Possibilistic Logic Souhila Kaci and Henri Prade
376
7. Distributed and Multi-Agents Systems Interaction-Oriented Agent Simulations: From Theory to Implementation Yoann Kubera, Philippe Mathieu and Sébastien Picault
383
Optimal Coalition Structure Generation in Partition Function Games Tomasz Michalak, Andrew Dowell, Peter McBurney and Michael Wooldridge
388
Coalition Structures in Weighted Voting Games Edith Elkind, Georgios Chalkiadakis and Nicholas R. Jennings
393
Agents Preferences in Decentralized Task Allocation Mark Hoogendoorn and Maria L. Gini
398
Game Theoretical Insights in Strategic Patrolling: Model and Algorithm in Normal-Form Nicola Gatti
403
Monitoring the Execution of a Multi-Agent Plan: Dealing with Partial Observability Roberto Micalizio and Pietro Torasso
408
A Hybrid Approach to Multi-Agent Decision-Making Paulo Trigo and Helder Coelho
413
Coalition Formation Strategies for Self-Interested Agents Thomas Génin and Samir Aknine
418
Of Mechanism Design and Multiagent Planning Roman van der Krogt, Mathijs de Weerdt and Yingqian Zhang
423
IAMwildCAT: The Winning Strategy for the TAC Market Design Competition Perukrishnen Vytelingum, Ioannis A. Vetsikas, Bing Shi and Nicholas R. Jennings
428
Multi-Agent Reinforcement Learning Algorithm with Variable Optimistic-Pessimistic Criterion Natalia Akchurina
433
As Safe as It Gets: Near-Optimal Learning in Multi-Stage Games with Imperfect Monitoring Danny Kuminov and Moshe Tennenholtz
438
A Heuristic Based Seller Agent for Simultaneous English Auctions Patricia Anthony and Edwin Law
443
A Truthful Two-Stage Mechanism for Eliciting Probabilistic Estimates with Unknown Costs Athanasios Papakonstantinou, Alex Rogers, Enrico H. Gerding and Nicholas R. Jennings
448
xx
Goal Generation and Adoption from Partially Trusted Beliefs Célia da Costa Pereira and Andrea G.B. Tettamanzi
453
Adaptive Play in Texas Hold’em Poker Raphaël Maîtrepierre, Jérémie Mary and Rémi Munos
458
Theoretical and Computational Properties of Preference-Based Argumentation Yannis Dimopoulos, Pavlos Moraitis and Leila Amgoud
463
Norm Defeasibility in an Institutional Normative Framework Henrique Lopes Cardoso and Eugénio Oliveira
468
8. Constraints and Search SLIDE: A Useful Special Case of the CARDPATH Constraint Christian Bessiere, Emmanuel Hebrard, Brahim Hnich, Zeynep Kiziltan and Toby Walsh
475
Frontier Search for Bicriterion Shortest Path Problems L. Mandow and J.L. Pérez de la Cruz
480
Heuristics for Dynamically Adapting Propagation Kostas Stergiou
485
Near Admissible Algorithms for Multiobjective Search Patrice Perny and Olivier Spanjaard
490
Compressing Pattern Databases with Learning Mehdi Samadi, Maryam Siabani, Ariel Felner and Robert Holte
495
A Decomposition Technique for Max-CSP Hachémi Bennaceur, Christophe Lecoutre and Olivier Roussel
500
Fast Set Bounds Propagation Using BDDs Graeme Gange, Vitaly Lagoon and Peter J. Stuckey
505
A New Approach for Solving Satisfiability Problems with Qualitative Preferences Emanuele Di Rosa, Enrico Giunchiglia and Marco Maratea
510
Combining Binary Constraint Networks in Qualitative Reasoning Jason Jingshi Li, Tomasz Kowalski, Jochen Renz and Sanjiang Li
515
Solving Necklace Constraint Problems Pierre Flener and Justin Pearson
520
Vivifying Propositional Clausal Formulae Cédric Piette, Youssef Hamadi and Lakhdar Saïs
525
Hybrid Tractable CSPs Which Generalize Tree Structure Martin C. Cooper, Peter G. Jeavons and András Z. Salamon
530
Justification-Based Non-Clausal Local Search for SAT Matti Järvisalo, Tommi Junttila and Ilkka Niemelä
535
Multi-Valued Pattern Databases Carlos Linares López
540
Using Abstraction in Two-Player Games Mehdi Samadi, Jonathan Schaeffer, Fatemeh Torabi Asr, Majid Samar and Zohreh Azimifar
545
xxi
9. Planning and Scheduling A Practical Temporal Constraint Management System for Real-Time Applications Luke Hunsberger
553
Towards Efficient Belief Update for Planning-Based Web Service Composition Jörg Hoffmann
558
Genetic Optimization of the Multi-Location Transshipment Problem with Limited Storage Capacity Nabil Belgasmi, Lamjed Ben Saïd and Khaled Ghédira
563
Regression for Classical and Nondeterministic Planning Jussi Rintanen
568
Combining Domain-Independent Planning and HTN Planning: The Duet Planner Alfonso Gerevini, Ugur Kuter, Dana Nau, Alessandro Saetti and Nathaniel Waisbrot
573
Learning in Planning with Temporally Extended Goals and Uncontrollable Events André A. Ciré and Adi Botea
578
A Simulation-Based Approach for Solving Generalized Semi-Markov Decision Processes Emmanuel Rachelson, Gauthier Quesnel, Frédérick Garcia and Patrick Fabiani
583
Heuristics for Planning with Action Costs Revisited Emil Keyder and Héctor Geffner
588
Diagnosis of Simple Temporal Networks Nico Roos and Cees Witteveen
593
10. Perception, Sensing and Cognitive Robotics An Attentive Machine Interface Using Geo-Contextual Awareness for Mobile Vision Tasks Katrin Amlacher and Lucas Paletta
601
Learning Functional Object-Categories from a Relational Spatio-Temporal Representation Muralikrishna Sridhar, Anthony G. Cohn and David C. Hogg
606
Sequential Spatial Reasoning in Images Based on Pre-Attention Mechanisms and Fuzzy Attribute Graphs Geoffroy Fouquier, Jamal Atif and Isabelle Bloch
611
Automatic Configuration of Multi-Robot Systems: Planning for Multiple Steps Robert Lundh, Lars Karlsson and Alessandro Saffiotti
616
Structure Segmentation and Recognition in Images Guided by Structural Constraint Propagation Olivier Nempont, Jamal Atif, Elsa Angelini and Isabelle Bloch
621
Theoretical Study of Ant-Based Algorithms for Multi-Agent Patrolling Arnaud Glad, Olivier Simonin, Olivier Buffet and François Charpillet
626
Incremental Component-Based Construction and Verification of a Robotic System Ananda Basu, Matthieu Gallien, Charles Lesire, Thanh-Hung Nguyen, Saddek Bensalem, Félix Ingrand and Joseph Sifakis
631
Salience-Driven Contextual Priming of Speech Recognition for Human-Robot Interaction Pierre Lison and Geert-Jan Kruijff
636
xxii
III. Prestigious Applications of Intelligent Systems (PAIS) A New CBR Approach to the Oil Spill Problem Juan Manuel Corchado, Aitor Mata, Juan Francisco De Paz and David Del Pozo
643
QuestSemantics – Intelligent Search and Retrieval of Business Knowledge Ian Blacoe, Ignazio Palmisano, Valentina Tamma and Luigi Iannone
648
Intelligent Adaptive Monitoring for Cardiac Surveillance Lucie Callens, Guy Carrault, Marie-Odile Cordier, Elisa Fromont, François Portet and René Quiniou
653
A Decision Support System for Breast Cancer Detection in Screening Programs Marina Velikova, Peter J.F. Lucas, Nivea Ferreira, Maurice Samulski and Nico Karssemeijer
658
The Design, Deployment and Evaluation of the AnimalWatch Intelligent Tutoring System Paul R. Cohen, Carole R. Beal and Niall M. Adams
663
AI on the Move: Exploiting AI Techniques for Context Inference on Mobile Devices Adolfo Bulfoni, Paolo Coppola, Vincenzo Della Mea, Luca Di Gaspero, Danny Mischis, Stefano Mizzaro, Ivan Scagnetto and Luca Vassena
668
Two Stage Knowledge Discovery for Spatio-Temporal Radio-Emission Data Matthias Haringer, Lothar Hotz and Vera Kamp
673
Using Natural Language Generation Technology to Improve Information Flows in Intensive Care Units James Hunter, Albert Gatt, François Portet, Ehud Reiter and Somayajulu Sripada
678
Application and Evaluation of a Medical Knowledge System in Sonography (SONOCONSULT) Frank Puppe, Martin Atzmueller, Georg Buscher, Matthias Huettig, Hardi Luehrs and Hans-Peter Buscher
683
Automating Accreditation of Medical Web Content Vangelis Karkaletsis, Pythagoras Karampiperis, Konstantinos Stamatakis, Martin Labský, Marek Růžička, Vojtěch Svátek, Enrique Amigó Cabrera, Matti Pöllä, Miquel Angel Mayer, Angela Leis and Dagmar Villarroel Gonzales
688
Pattern Classification Techniques for Early Lung Cancer Diagnosis Using an Electronic Nose Rossella Blatt, Andrea Bonarini, Elisa Calabró, Matteo Matteucci, Matteo Della Torre and Ugo Pastorino
693
A BDD Approach to the Feature Subscription Problem T. Hadzic, D. Lesaint, D. Mehta, B. O’Sullivan, L. Quesada and N. Wilson
698
Continuous Plan Management Support for Space Missions: The RAXEM Case Amedeo Cesta, Gabriella Cortellessa, Michel Denis, Alessandro Donati, Simone Fratini, Angelo Oddi, Nicola Policella, Erhard Rabenau and Jonathan Schulster
703
The i-Walker: An Intelligent Pedestrian Mobility Aid R. Annicchiarico, C. Barrué, T. Benedico, F. Campana, U. Cortés and A. Martínez-Velasco
708
Mixture of Gaussians Model for Robust Pedestrian Images Detection Dymitr Ruta
713
IV. Short Papers 1. Knowledge Representation and Reasoning Deriving Explanations from Causal Information Ph. Besnard, M.-O. Cordier and Y. Moinard
723
xxiii
A Hybrid Tableau Algorithm for ALCQ Jocelyne Faddoul, Nasim Farsinia, Volker Haarslev and Ralf Möller
725
Semantic Relatedness in Semantic Networks Laurent Mazuel and Nicolas Sabouret
727
HOOPO: A Hybrid Object-Oriented Integration of Production Rules and OWL Ontologies Georgios Meditskos and Nick Bassiliades
729
Rule-Based OWL Ontology Reasoning Using Dynamic ABOX Entailments Georgios Meditskos and Nick Bassiliades
731
Computability and Complexity Issues of Extended RDF Anastasia Analyti, Grigoris Antoniou, Carlos Viegas Damásio and Gerd Wagner
733
Automated Web Services Composition Using Extended Representation of Planning Domain Mohamad El Falou, Maroua Bouzid, Abdel-Illah Mouaddib and Thierry Vidal
735
Propositional Merging Operators Based on Set-Theoretic Closeness Patricia Everaere, Sébastien Konieczny and Pierre Marquis
737
Partial and Informative Common Subsumers in Description Logics Simona Colucci, Eugenio Di Sciascio, Francesco Maria Donini and Eufemia Tinelli
739
Prime Implicate-Based Belief Revision Operators Meghyn Bienvenu, Andreas Herzig and Guilin Qi
741
Approximate Structure Preserving Semantic Matching Fausto Giunchiglia, Mikalai Yatskevich, Fiona McNeill, Pavel Shvaiko, Juan Pane and Paolo Besana
743
Discovering Temporal Knowledge from a Crisscross of Timed Observations Nabil Benayadi and Marc Le Goc
745
Fred Meets Tweety Antonis Kakas, Loizos Michael and Rob Miller
747
Definability in Logic and Rough Set Theory Tuan-Fang Fan, Churn-Jung Liau and Duen-Ren Liu
749
WikiTaxonomy: A Large Scale Knowledge Resource Simone Paolo Ponzetto and Michael Strube
751
Computing ∈-Optimal Strategies in Bridge and Other Games of Sequential Outcome Pavel Cejnar
753
2. Machine Learning Classifier Combination Using a Class-Indifferent Method Yaxin Bi, Shenli Wu, Pang Xiong and Xuhui Shen
757
Reinforcement Learning with Classifier Selection for Focused Crawling Ioannis Partalas, Georgios Paliouras and Ioannis Vlahavas
759
Intuitive Action Set Formation in Learning Classifier Systems with Memory Registers L. Simões, M.C. Schut and E. Haasdijk
761
An Ensemble of Classifiers for Coping with Recurring Contexts in Data Streams Ioannis Katakis, Grigorios Tsoumakas and Ioannis Vlahavas
763
xxiv
Content-Based Social Network Analysis Paola Velardi, Roberto Navigli, Alessandro Cucchiarelli and Mirco Curzi
765
Efficient Data Clustering by Local Density Approximation Marc-Ismaël Akodjènou and Patrick Gallinari
767
Gas Turbine Fault Diagnosis Using Random Forests Manolis Maragoudakis, Euripides Loukis, Panayotis-Prodromos Pantelides
769
How Many Objects?: Determining the Number of Clusters with a Skewed Distribution Satoshi Oyama and Katsumi Tanaka
771
Active Concept Learning for Ontology Evolution Murat Şensoy and Pınar Yolum
773
Determining Automatically the Size of Learned Ontologies Elias Zavitsanos, Sergios Petridis, Georgios Paliouras and George A. Vouros
775
Dynamic Multi-Armed Bandit with Covariates Nicos G. Pavlidis, Dimitris K. Tasoulis, Niall M. Adams and David J. Hand
777
Reinforcement Learning with the Use of Costly Features Robby Goetschalckx, Scott Sanner and Kurt Driessens
779
Data-Driven Induction of Functional Programs Emanuel Kitzelmann
781
CTRNN Parameter Learning Using Differential Evolution Ivanoe De Falco, Antonio Della Cioppa, Francesco Donnarumma, Domenico Maisto, Roberto Prevete and Ernesto Tarantino
783
3. Model-Based Diagnosis and Reasoning Incremental Diagnosis of DES by Satisfiability Alban Grastien and Anbulagan
787
Characterizing and Checking Self-Healability Marie-Odile Cordier, Yannick Pencolé, Louise Travé-Massuyès and Thierry Vidal
789
Improving Robustness in Consistency-Based Diagnosis Using Possible Conflicts Belarmino Pulido, Anibal Bregon and Carlos Alonso-González
791
Dependable Monitoring of Discrete-Event Systems with Uncertain Temporal Observations Gianfranco Lamperti and Marina Zanella
793
Distributed Repair of Nondiagnosability Anika Schumann, Wolfgang Mayer and Markus Stumptner
795
From Constraint Representations of Sequential Code and Program Annotations to Their Use in Debugging Mihai Nica and Franz Wotawa
797
Compressing Binary Decision Diagrams Esben Rune Hansen, S. Srinivasa Rao and Peter Tiedemann
799
Dependent Failures in Consistency-Based Diagnosis Jörg Weber and Franz Wotawa
801
Cost-Sensitive Iterative Abductive Reasoning with Abstractions Gianluca Torta, Daniele Theseider Dupré and Luca Anselma
803
xxv
Computation of Minimal Sensor Sets for Conditional Testability Requirements Gianluca Torta and Pietro Torasso
805
Combining Abduction with Conflict-Based Diagnosis Ildikó Flesch and Peter J.F. Lucas
807
4. Cognitive Modeling and Interaction An Activity Recognition Model for Alzheimer’s Patients: Extension of the COACH Task Guidance System B. Bouchard, P. Roy, A. Bouzouane, S. Giroux and A. Mihailidis
811
Not So New: Overblown Claims for ‘New’ Approaches to Emotion Dylan Evans
813
Emergence of Rules in Cell Assemblies of fLIF Neurons Roman V. Belavkin and Christian R. Huyck
815
ERS: Evaluating Reputations of Scientific Journals Émilie Samuel and Colin de la Higuera
817
Personal Experience Acquisition Support from Blogs Using Event-Depicting Images Keita Sato, Yoko Nishihara and Wataru Sunayama
819
Object Configuration Reconstruction from Descriptions Using Relative and Intrinsic Reference Frames H. Joe Steinhauer
821
Probabilistic Reinforcement Rules for Item-Based Recommender Systems Sylvain Castagnos, Armelle Brun and Anne Boyer
823
An Efficient Behavior Classifier Based on Distributions of Relevant Events Jose Antonio Iglesias, Agapito Ledezma, Araceli Sanchis and Gal Kaminka
825
ContextAggregator: A Heuristic-Based Approach for Automated Feature Construction and Selection Robert Lokaiczyk and Manuel Goertz
827
A Pervasive Assistant for Nursing and Doctoral Staff Alexiei Dingli and Charlie Abela
829
5. Natural Language Processing Author Identification Using a Tensor Space Representation Spyridon Plakias and Efstathios Stamatatos
833
Categorizing Opinion in Discourse Nicholas Asher, Farah Benamara and Yvette Yannick Mathieu
835
A Dynamic Approach for Automatic Error Detection in Generation Grammars Tim vor der Brück and Holger Stenzhorn
837
Answering Definition Question: Ranking for Top-k Chao Shen, Xipeng Qiu, Xuanjing Huang and Lide Wu
839
Ontology-Driven Human Language Technology for Semantic-Based Business Intelligence Thierry Declerck, Hans-Ulrich Krieger, Horacio Saggion and Marcus Spies
841
Evaluation Evaluation David M.W. Powers
843
xxvi
6. Uncertainty and AI Using Decision Trees as the Answer Networks in Temporal Difference-Networks Laura-Andreea Antanas, Kurt Driessens, Jan Ramon and Tom Croonenborghs
847
An Efficient Deduction Mechanism for Expressive Comparative Preferences Languages Nic Wilson
849
An Analysis of Bayesian Network Model-Approximation Techniques Adamo Santana and Gregory Provan
851
7. Distributed and Multi-Agents Systems Verifying the Conformance of Agents with Multiparty Protocols Laura Giordano and Alberto Martelli
855
Simulated Annealing for Coalition Formation Helena Keinänen and Misa Keinänen
857
A Default Logic Based Framework for Argumentation Emanuel Santos and João Pavão Martins
859
An Empirical Investigation of the Adversarial Activity Model Inon Zuckerman, Sarit Kraus, Jeffrey S. Rosenschein
861
Addressing Temporal Aspects of Privacy-Related Norms Guillaume Piolle and Yves Demazeau
863
Evaluation of Global System State Thanks to Local Phenomenona Jean-Michel Contet, Franck Gechter, Pablo Gruer and Abder Koukam
865
Experience and Trust — A Systems-Theoretic Approach Norman Foo and Jochen Renz
867
Trust-Aided Acquisition of Unverifiable Information Eugen Staab, Volker Fusenig and Thomas Engel
869
BIDFLOW: A New Graph-Based Bidding Language for Combinatorial Auctions Madalina Croitoru, Cornelius Croitoru and Paul Lewis
871
Multi-Agent Reinforcement Learning for Intrusion Detection: A Case Study and Evaluation Arturo Servin and Daniel Kudenko
873
GR-MAS: Multi-Agent System for Geriatric Residences Javier Bajo, Juan M. Corchado and Sara Rodriguez
875
Agent-Based and Population-Based Simulation of Displacement of Crime (extended abstract) Tibor Bosse, Charlotte Gerritsen, Mark Hoogendoorn, S. Waqar Jaffry and Jan Treur
877
Organizing Coherent Coalitions Jan Broersen, Rosja Mastop, John-Jules Ch. Meyer and Paolo Turrini
879
A Probabilistic Trust Model for Semantic Peer-to-Peer Systems Gia-Hien Nguyen, Philippe Chatalic and Marie-Christine Rousset
881
Conditional Norms and Dyadic Obligations in Time Jan Broersen and Leendert van der Torre
883
Trust Aware Negotiation Dissolution Nicolás Hormazábal, Josep Lluis de la Rosa i Esteva and Silvana Aciar
885
xxvii
On the Role of Structured Information Exchange in Supervised Learning Ricardo M. Araujo and Luis C. Lamb
887
Magic Agents: Using Information Relevance to Control Autonomy B. van der Vecht, F. Dignum and J.-J.Ch. Meyer
889
Infection-Based Norm Emergence in Multi-Agent Complex Networks Norman Salazar, Juan A. Rodriguez-Aguilar and Josep Ll. Arcos
891
Opponent Modelling in Texas Hold’em Poker as the Key for Success Dinis Félix and Luís Paulo Reis
893
8. Constraints and Search LRTA* Works Much Better with Pessimistic Heuristics Aleksander Sadikov and Ivan Bratko
897
Thinking Too Much: Pathology in Pathfinding Mitja Luštrek and Vadim Bulitko
899
Dynamic Backtracking for Distributed Constraint Optimization Redouane Ezzahir, Christian Bessiere, Imade Benelallam, El Houssine Bouyakhf and Mustapha Belaissaoui
901
Integrating Abduction and Constraint Optimization in Constraint Handling Rules Marco Gavanelli, Marco Alberti and Evelina Lamma
903
Symbolic Classification of General Multi-Player Games Peter Kissmann and Stefan Edelkamp
905
Redundancy in CSPs Assef Chmeiss, Vincent Krawczyk and Lakhdar Sais
907
Reinforcement Learning and Reactive Search: An Adaptive MAX-SAT Solver Roberto Battiti and Paolo Campigotto
909
A MAX-SAT Algorithm Porfolio Paulo Matos, Jordi Planes, Florian Letombe, João Marques-Silva
911
On the Practical Significance of Hypertree vs. Tree Width Rina Dechter, Lars Otten and Radu Marinescu
913
9. Planning and Scheduling A New Approach to Planning in Networks Jussi Rintanen
917
Detection of Unsolvable Temporal Planning Problems Through the Use of Landmarks E. Marzal, L. Sebastia and E. Onaindia
919
A Planning Graph Heuristic for Forward-Chaining Adversarial Planning Pascal Bercher and Robert Mattmüller
921
10. Perception, Sensing and Cognitive Robotics Vector Valued Markov Decision Process for Robot Platooning Matthieu Boussard, Maroua Bouzid and Abdel-Illah Mouaddib
925
xxviii
Learning to Select Object Recognition Methods for Autonomous Mobile Robots Reinaldo A.C. Bianchi, Arnau Ramisa and Ramón López de Mántaras
927
Robust Reservation-Based Multi-Agent Routing Adriaan ter Mors, Xiaoyu Mao, Jonne Zutt, Cees Witteveen and Nico Roos
929
Automatic Animation Generation of a Teleoperated Robot Arm Khaled Belghith, Benjamin Auder, Froduald Kabanza, Philippe Bellefeuille and Leo Hartman
931
Planning, Executing, and Monitoring Communication in a Logic-Based Multi-Agent System Martin Magnusson, David Landén and Patrick Doherty
933
Author Index
935
I. Invited Talks
This page intentionally left blank
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-3
3
Semantic Activity Recognition Monique Thonnat 1 Abstract. Extracting automatically the semantics from visual data is a real challenge. We describe in this paper how recent work in cognitive vision leads to significative results in activity recognition for visualsurveillance and video monitoring. In particular we present work performed in the domain of video understanding in our PULSAR team at INRIA in Sophia Antipolis. Our main objective is to analyse in real-time video streams captured by static video cameras and to recognize their semantic content. We present a cognitive vision approach mixing 4D computer vision techniques and activity recognition based on a priori knowledge. Applications in visualsurveillance and healthcare monitoring are shown. We conclude by current issues in cognitive vision for activity recognition.
with the unautorized person accessing together with an employee to a fordidden area. In the second case (shown in figure 2) without information on the location of the scene one can recognize a woman standing alone; a medical expert knowing the patient will interpret the same scene as an active elderly preparing a meal in her kitchen. In fact, the interpretation of a video sequence is not unique but it depends on the a priori knowledge of the observer and on his/her goal.
1 INTRODUCTION This paper is focused on activity recognition. Activity recognition is a hot topic in the academic field not only due to scientific motivations but also due to strong demands coming from the industry and the society; in particular for videosurveillance and healthcare. In fact, there is an increasing need to automate the recognition of activities observed by visual sensors (usually CCD cameras, omni directional cameras, infrared cameras). More precisely we are interested in the real-time semantic interpretation of dynamic scenes observed by video cameras. We thus study spatio-temporal activities performed by mobile objects (e.g. human beings, animals or vehicles) interacting with the physical world. What does it mean to understand a video ? Is it just to perform statistics on the appearance of images and to recognize an image from a set of already seen images? If we really want to understand the activities performed by the physical objects 2D analysis is not sufficient. We need to locate the physical objects in the 3D real world. The dynamics of the physical objects is a major cue for activity recognition. The computer vision community is very active in the domain of motion detection, mobile object tracking and more recently trajectory analysis. Very often these analyses are performed in the image plane and are thus dependant of the sensor parameters as its field of view, position and orientation. However for reliable activity recognition the dynamics of the physical objects must be computed in the 4D space. Is there a unique objective interpretation of a dynamic scene? For instance the scenes shown in figures 1 and 2 can be interpreted more or less precisely in function of the a priori knowledge of the observer. In the first case (shown in figure 1) without information on the location of the scene one can recognize an indoor scene where two men are walking together towards a door; a videosurveillance expert knowing the location (a bank agency), its spatial configuration as well as security rules will interpret the same scene as a bank attack 1
INRIA, France, email:
[email protected] Figure 1. A scene with different valid interpretations: two people walking together towards a door or a bank attack with an access to a forbidden area by an unauthorized person and an employee.
Figure 2.
A scene with different valid interpretations: a person standing in a room or an active elderly preparing a meal in a kitchen.
2 4D APPROACH We present a cognitive vision approach mixing 4D computer vision techniques and activity recognition based on a priori knowledge. The major issue in semantic interpretation of dynamic scenes is the gap between the subjective interpretation of data and the objective measures provided by the sensors.
4
M. Thonnat / Semantic Activity Recognition
images from camera 1 images from camera 2
images from camera N
Figure 3.
mobile objects from camera 1
fused tracked mobile objects for the whole scene
motion detection
frame to frame tracking
motion detection
frame to frame tracking
...
...
motion detection
frame to frame tracking
tracked mobile objects from camera 1 cameras with overlapped FOV fusion
tracked mobile objects from camera N
long term group tracking long term crowd tracking physical objects
AND/OR tree−based scenario recognition automaton−based scenario recognition
alerts
temporal−constraints based scenario recognition Bayesian−network based scenario recognition
From sensor data to high level interpretation; global structure of an activity monitoring system built with VSIP[1].
Our approach to address this problem is to keep a clear boundary between the application dependent subjective interpretations and the objective analysis of the videos. We thus define a set of objective measures which can be extracted in real-time from the videos, we propose formal models to enable users to express their activities of interest and we build matching techniques to bridge the gap between the objective measures and the activity models. Figure 3 shows the global structure of a videosurveillance system built with this approach. First, a motion detection step followed by a frame to frame tracking is made for each video camera. Then the tracked mobile objects coming from different video cameras with overlapping fields of view are fused into a unique 4D representation for the whole scene. Depending on the chosen application, a combination of one or more of the available trackers (individuals, groups and crowd tracker) is used. Then scenario recognition is performed by a combination of one or more of the available recognition algorithms (automaton based, Bayesian-network based, AND/OR tree based and temporal constraints based). Finally the system generates the alerts corresponding to the predefined recognized scenarios. For robust semantic interpretation of mobile object behaviour it is mandatory to rely on correct physical object type classification. It can be based on simple 3D models like parallelepipeds [12] or complex 3D human body configurations with posture models as in [2]. Figure 4 shows examples of such postures.
Figure 4.
long term individual tracking
scenes or the walls and doors for indoor scenes) as well as the main static 3D objects (for instance the furniture in indoor scenes) and the 2D zones of interest. This geometry is defined in terms of 3D position, shape and volume. • Semantic information: for each part of the map semantic information is added as its type (e.g. 3D object, 2D zone), its characterics (e.g. yellow, fragile) or its function (e.g. entrance zone, seat). We can see on figure 5 a 2D map of an indoor flat and on figure 10 two partial views of the 3D map built for monitoring elderly at home. In this map in addition to the main structure of the rooms (walls, doors, etc.), the equipment and the furniture are defined as well as the information related to the sensors.
Figure 5. Top view of the flat
Different 3D models of human body postures
Figure 6.
3D map: the kitchen area and the top view of a flat for monitoring elderly at home
3 3D MAP We use 3D maps as a means to model the a priori knowledge of the physical environment captured by the sensors. More precisely the 3D maps contain the a priori knowledge of the empty scenes:
4 ACTIVITY MODELLING
• Video Cameras: 3D position of the sensors, calibration matrix, fields of view,... • 3D Geometry: the geometry of the static structure of the empty scene (for instance the buildings and road structure for outdoor
In order to express the semantics of the activities a modelling effort is needed. The models correspond to the modeling of all the knowledge needed by the system to recognize video events occurring in the scene. To allow security operators to easily define and modify their models, the description of the knowledge is declarative and intuitive
5
M. Thonnat / Semantic Activity Recognition
(in natural terms). We propose a video event ontology to share common concepts in video understanding and to decrease the effort of knowledge modelling.
4.1 The Video Event Ontology The event ontology is a set of concepts for describing physical objects, events and relations between concepts: The physical objects are all the concepts to describe objects of the real world in the scene observed by the sensors. The attributes of a physical object are pertinent for the recognition. These attributes characterize the physical object. There are two types of physical objects: contextual objects (which are usually static and whenever in motion, its movement can be predicted using contextual information) and mobile objects (which can be perceived as moving in the scene and as initiating their motions, without the possibility to predict their movement). The events are all the concepts to describe mobile object evolutions and interactions in a scene. Different terms are used to describe these concepts and categorized into two categories: state (including primitive/composite state) and event (including primitive/composite event, single/multi-agent event). A primitive state is a spatio-temporal property valid at a given instant or stable on a time interval which is directly inferred from audiovisual attributes of physical objects computed by low level signal processing algorithms. A composite state is a combination of states. A primitive event is a change of states. A composite event is a combination of states and events. A single-agent event is an event involving a single mobile object. A multi-agent event is a composite event involving several (at least two) mobile objects with different motions. Currently this ontology contains 151 concepts used for different applications in video understanding. This ontology is implemented in Protege to be independant of a particular activity recognition formalism.
algorithm recognizes which events are occurring using the primitive video events. To recognize an event composed of sub-events, given the event model, the recognition algorithm selects a set of physical objects matching the remaining physical object variables of the event model. The algorithm then looks back in the past for any previously recognized state/event that matches the first component of the event model. If these two recognized components verify the event model constraints (e.g. temporal constraints), the event is said to be recognized. In order to facilitate complex event recognition, after each event recognition, event templates are generated for all composite events, the last component of which corresponds to this recognized event. For more details see [9].
6 APPLICATIONS This approach has been applied to a large set of applications in visualsurveillance.
6.1 Visualsurveillance A typical example of complex activities in which we are interested is aircraft monitoring (see figure 7 in apron areas . In this example the duration of the servicing activities8 around the aircraft is about one hour and the activities involve interactions between several ground vehicles and human operators. The goal is to recognize these activities through formal activity models as shown in figure 9 and data captured by a network of video cameras (such as the ones shown in figure 7). For more details, refer to [3] and the related European project website http://www.avitrack.net/.
4.2 Activity Models A formalism for expressing an activity is directly based on the concepts of the video event ontology. A composite event model is composed of five parts: ”physical objects” involved in the event (e.g. person, equipment, zones of interest), ”components” corresponding to the sub-events composing the event, ”forbidden components” corresponding to the events which should not occur during the main event, ”constraints” are conditions between the physical objects and/or the components (including symbolic, logical, spatial and temporal constraints including Allen interval algebra operators, and ”alarms” describing the actions to be taken when the event is recognized. Primitive states, composite states and primitive events can be described using the same formalism. Please see [10] and [9] for more details of the formalism.
Figure 7.
a
b
c
d
Different views of an apron area captured by video cameras for aircraft monitoring
5 ACTIVITY RECOGNITION The algorithm proposed in [9] and in [10] enables to process efficiently (i.e. in realtime) a data flow and to recognize pre-defined activities. Alternative approaches based on probabilistic methods [6] or [7] can also be used. In the following we concentrate on the first approach because it is directly based on the formalism and the ontology presented in the previous section. The video event recognition
6.2 Healtcare monitoring In this application the objective is to monitor elderly at home (see figure 10). In collaboration with gerontologists, we have modeled several primitive states, primitive events and composite events. First we
6
M. Thonnat / Semantic Activity Recognition
Figure 8. Activity recognition problem in airport: the main servicing operations around an aircraft (refuelling, baggage loading, power supply, etc...) and the location of the 8 video cameras (in blue)
are interesting in modelling events characteristic of critical situations such as falling down. Second, these events aim at detecting abnormal changes of behavior patterns such as depression. Given these objectives we have selected the activities that can be detected using video cameras [11]. We have modeled thirty four video events. In particular, we have defined fourteen primitives states, four of them are related to the location of the person in the scene (e.g. inside kitchen, inside livingroom) and the ten remaining are related to the proposed 3D key human postures. We have defined also four primitive events related to the combination of these primitive states: ”standing up” which represents a change state from sitting or slumping to standing, ”sitting down” which represents a change state from standing, or bending to sitting on a chair, ”sitting up” represents a change state from lying to sitting on the floor, and ”lying down” which represents a change state from standing or sitting on the floor to lying. We have defined also six primitive events such as: stay in kitchen, stay in livingroom. These primitive states and events are used to define more composite events. For this study, we have modeled ten composite events. In this paper, we present just two of them: ”feeling faint” and ”falling down”. The model of the ”feeling faint” event is shown in figure 4. The ”feeling faint” model involves one physical object (one person), and it contains three 3D human posture components and constraints between these components. CompositeEvent (PersonFeelingFaint, PhysicalObjects( (p: Person) ) Components ( (pStand: PrimitiveState Standing(p)) (pBend: PrimitiveState Bending(p)) (pSit: PrimitiveState Sitting Outstretched Legs(p))) Constraints ((Sequence pStand; pBend; pSit) (pSit’s Duration >= 10)) Alarm( AText(”Person is Feeling Faint”) AType(”URGENT”)) ) ”Feeling faint” model.
Figure 9. Activity recognition problem in airport: example of an activity model enabling to describe an unloading operation with a high-level language
We have also modelled the ”falling down” event. There are different ways for describing a person falling down. Thus, we have modelled the event ”falling down” with three models: Falling down 1: A change state from standing, sitting on the floor (with flexed or outstretched legs) and lying (with flexed or outstretched legs). Falling down 2: A change state from standing, and lying (with flexed or outstretched legs). Falling down 3: A change state from standing, bending and lying (with flexed or outstretched legs). An example of the definition of the model ”falling down 1” is shown below.
Figure 10.
healthcare
CompositeEvent(PersonFallingDown1, PhysicalObjects( (p: Person) ) Components ( (pStand: PrimitiveState Standing(p)) (pSit: PrimitiveState Sitting Flexed Legs(p)) (pLay: PrimitiveState Lying Outstretched Legs(p))) Constraints ( (pSit before meet p lay) (pLay’s Duration >= 50)) Alarm (AText(”Person is Falling Down”) AType(”VERYURGENT”)) ) ”Falling down 1” model.
Figure 11 and figure 12 show respectively the camera view and the 3D visualization of the recognition of the ”feeling faint” event.
M. Thonnat / Semantic Activity Recognition
7
7 CONCLUSION
Figure 11.
Recognition of the ”feeling faint” event
We have shown a 4D semantic approach for activity recognition of dynamic scene. There are still a lot of open issues among which a full theory of visual data interpretation, reliable techniques for 4D analysis able to deal with changing observation conditions and scene content. From an activity recognition point of view the three main points are the development of shared operational ontologies, of formalisms for activity modelling with good properties such as scalability and learning techniques for model refinement. In particular a large set of learning issues are rised by this 4D semantic approach for instance: learning contextual variations for physical object detection and image segmentation [5], learning the structure of the activity models [8] or learning the visual concept detectors [4].
REFERENCES
Figure 12.
3D visualization of the recognition of the ”feeling faint” event
Figure 13 and figure 14 show respectively the camera view and the 3D visualization of the recognition of the ”falling down” event.
Figure 13.
Figure 14.
Recognition of the ”falling down” event
3D visualization of the recognition of the ”falling down” event
[1] A. Avanzi, F. Bremond, C. Tornieri, and M. Thonnat, ‘Design and assessment of an intelligent activity monitoring platform’, EURASIP Journal on Applied Signal Processing, special issue in ”Advances in Intelligent Vision Systems: Methods and Applications”, 2005(14), 2359– 2374, (August 2005). [2] B. Boulay, F. Bremond, and M. Thonnat, ‘Applying 3d human model in a posture recognition system’, Pattern Recognition Letter, Special Issue on vision for Crime Detection and Prevention, 27(15), 1788–1796, (2006). [3] Florent Fusier, Valery Valentin, Franc¸ois Bremond, Monique Thonnat, Mark Bor g, David Thirde, and James Ferryman, ‘Video understanding for complex activity recognition’, Machine Vision and Applications Journal, 18, 167–188, (2007). [4] N. Maillot and M. Thonnat, ‘Ontology based complex object recognition’, Image and Vision Computing Journal, Special Issue on Cognitive Computer Vision, 26(1), 102–113, (2008). [5] V. Martin and M. Thonnat, ‘Learning contextual variations for video segmentation’, in The 6th International Conference on Vision Systems (ICVW08), Santorini, Greece, (2008). [6] G. Medioni, I. Cohen, F. Br´emond, S. Hongeng, and G. Nevatia, ‘Activity Analysis in Video’, Pattern Analysis and Machine Intelligence PAMI, 23(8), 873–889, (2001). [7] N. Moenne-Loccoz, F. Br´emond, and M. Thonnat, ‘Recurrent bayesian network for the recognation of human behaviors from video’, in Third International Conference On Computer Vision Systems (ICVS 2003), volume LNCS 2626, pp. 44–53, Graz, Austria, (2003). Springer. [8] A. Toshev, F. Br´emond, and M. Thonnat, ‘An a priori-based method for frequent composite event discovery in videos’, in Proceedings of 2006 IEEE International Conference on Computer Vision Systems, New York USA, (January 2006). [9] V-T. Vu, F. Br´emond, and M. Thonnat, ‘Automatic video interpretation: A novel algorithm for temporal scenario recognition’, in The Eighteenth International Joint Conference on Artificial Intelligence (IJCAI’03), Acapulco, Mexico, (2003). [10] V-T. Vu, F. Br´emond, and M. Thonnat, ‘Automatic video interpretation: A recognition algorithm for temporal scenarios based on pre-compiled scenario models’, in The 3rd International Conference on Vision System (ICVS’03), Graz, Austria, (2003). [11] N. Zouba, B. Boulay, F. Br´emond, and M. Thonnat, ‘Monitoring activities of daily living (adls) of elderly based on 3d key human postures’, in The 4th International Cognitive Vision Workshop (ICVW08), Santorini, Greece, (2008). [12] M. Z´uniga, F. Br´emond, and M. Thonnat, ‘Fast and reliable object classification in video based on a 3d generic model’, in The 3rd International Conference on Visual Information Engineering (VIE2006), pp. 433–441, Bangalore, India, (September 26-28 2006).
8
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-8
Bayesian Methods for Artificial Intelligence and Machine Learning Zoubin Ghahramani Department of Engineering, University of Cambridge, UK Machine Learning Department, Carnegie Mellon University, USA http://learning.eng.cam.ac.uk/zoubin
Abstract. Bayesian methods provide a framework for representing and manipulating uncertainty, for learning from noisy data, and for making decisions that maximize expected utility----components which are important to both AI and Machine Learning. However, although Bayesian methods have become more popular in recent years, there remains a good degree of skepticism with respect to taking a fully Bayesian approach. This talk will introduce fundamental topics in Bayesian statistics as they apply to machine learning and AI, and address some misconceptions about Bayesian approaches. I will then discuss some current work on non-parametric Bayesian machine learning, particularly in the area of unsupervised learning.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-9
9
The Impact of Constraint Programming Pascal Van Hentenryck Brown University
Abstract. Constraint programming is a success story for artificial intelligence. It quickly moved from research laboratories to industrial applications and is in daily use to solve complex optimization throughout the world. At the same time, constraint programming continued to evolve, addressing new needs and opportunities. This talk reviews some recent progress in constraint programming, including its hybridization with other optimization approaches, the quest for more autonomous search, and its applications in a variety of nontraditional areas.
10
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-10
Web Science George Metakides
Abstract not available at time of printing.
II. Papers
This page intentionally left blank
1. Knowledge Representation and Reasoning
This page intentionally left blank
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-15
15
Advanced Preprocessing for Answer Set Solving Martin Gebser and Benjamin Kaufmann and Andr´e Neumann and Torsten Schaub1 2 Abstract. We introduce the first substantial approach to preprocessing in the context of answer set solving. The idea is to simplify a logic program while identifying equivalences among its relevant constituents. These equivalences are then used for building a compact representation of the program (in terms of Boolean constraints). We implemented our approach as well as a SAT-based technique to reduce Boolean constraints. This allows us to empirically analyze both preprocessing types and to demonstrate their computational impact.
1
INTRODUCTION
Answer Set Programming (ASP; [3]) has become an attractive paradigm for declarative problem solving. This is partly due to the availability of efficient off-the-shelf ASP solvers [9, 19]. In fact, modern ASP solvers rely on Boolean constraint solving technology [1, 8, 7], leading to a similar performance as advanced SAT solvers [17]. On the other hand, the attractiveness of ASP stems from its rich modeling language, allowing for an easy and elaborationtolerant handling of knowledge-intensive applications. In practice, an input program is usually run through multiple preprocessing steps. At first, a so-called grounder instantiates all variables, thus producing a ground logic program. Classical ASP solvers, such as smodels [19], more or less take the resulting program as is without doing further optimizations. In contrast, modern ASP solvers translate a ground program into a set of Boolean constraints (e.g., clauses) in order to exploit advanced SAT solving technology. Such translations necessitate the introduction of extra propositions (see below) in order to avoid an exponential blow-up. Also, this addition may result in exponentially smaller search spaces [16] and permits more succinct representations of loop constraints [14]. Nonetheless, the question arises in how far the introduced redundancy can be trimmed. While ASP solvers still lack full-fledged preprocessing techniques, they already constitute an integral part of many SAT solvers [2, 20, 10]. There are two principal ways to address preprocessing in ASP solving: the external one, aiming at the reduction of a ground program, and the internal one, (recurrently) optimizing its inner representation. Within modern ASP solvers, the latter can be done by adapting corresponding techniques from SAT. Hence, we concentrate in the sequel on the former approach, being specific to ASP. Thereby, we build upon work on program transformations and equivalence [4, 5, 11]. To be precise, we develop preprocessing techniques for ground logic programs under answer set semantics. The idea is to transform a program into a simpler one, along with an assignment and a relation expressing equivalences among the assignable constituents of the program. These equivalences are subsequently exploited when transforming the resulting program into Boolean constraints, represented as clauses. We implemented both our external and a SATbased internal reduction strategy within the ASP solver clasp [7]. This makes clasp the first ASP solver incorporating advanced pre-
processing techniques. Furthermore, our implementation allows us to empirically assess both the external and the internal approach to preprocessing, thus demonstrating their computational impact.
2
A (normal) logic program over an alphabet A is a finite multiset3 of rules of the form a ← b1 , . . . , bm , ∼cm+1 , . . . , ∼cn , where a, bi , cj ∈ A are atoms for 0 < i ≤ m, m < j ≤ n. A literal is an atom a or its (default) negation ∼a. Furthermore, let ∼A = {∼a | a ∈ A} and A = {a | a ∈ A}, where a is used for (classical) negation in propositional formulas. For a rule r, let head (r) = a be the head of r and the multiset body(r) = {b1 , . . . , bm , ∼cm+1 , . . . , ∼cn } be the body of r. Given a (multi)set B of literals, let B + = {a ∈ A | a ∈ B} and B − = {a ∈ A | ∼a ∈ B}. The set of atoms occurring in a logic program Π is denoted by atom(Π) and body(Π) = {body(r) | r ∈ Π}. Also, we define body(a) = {body(r) | r ∈ Π, head (r) = a}. Following [18], we characterize the answer sets of a logic program Π by the models of the completion [6] and loop formulas of Π. As mentioned above, in practice, this involves introducing extra propositions pB for bodies B. Given a program Π over A, its completion formula is then defined as follows: ˘ `W ´ ¯ CF (Π, A) = a ↔ ∪ B∈body(a) pB | a ∈ A `V ´ ¯ ˘ V b ∧ c | B ∈ body(Π) . (1) pB ↔ b∈B + c∈B − A loop is a (nonempty) set of atoms that circularly depend upon each other in a program’s positive atom dependency graph [18]. The set of all loops of Π is denoted by loop(Π). If loop(Π) = ∅, then Π is said to be tight [12]. The loop formula of some L ∈ loop(Π) is `W ´ `W ´ LF (Π, L) = a∈L a → a∈L,B∈body(a),B + ∩L=∅ pB , and LF (Π) = {LF (Π, L) | L ∈ loop(Π)}. The bodies contributing to the consequent of a loop formula provide external support for the antecedent’s atoms. An atom is said to be unfounded if it belongs to the antecedent of a loop formula whose consequent is ⊥, expressing the absence of external support. We represent (classical) models by their set of entailed propositions, and let M(F ) stand for the set of all models of F . For some alphabet A, we define M(F )|A = {M ∩ A | M ∈ M(F )}. Then, a set X ⊆ A is an answer set of a logic program Π over A if X ∈ M(CF (Π, A) ∪ LF (Π))|A . We let AS (Π) denote the set of all answer sets of Π. Note that, whenever Π is tight, we have X ∈ AS (Π) iff X ∈ M(CF (Π, A))|A . Consider the following program Π over A = {a, . . . , f }: {a ←; b ← a, ∼c; c ← ∼b, ∼d; e ← ∼c; e ← f ; f ← a, e} . We get the following completion formula, CF (Π, A): {a ↔ p0 ; b ↔ p1 ; c ↔ p2 ; d ↔ ⊥; e ↔ p3 ∨ p4 ; f ↔ p5 } ¯∪ ˘ p0 ↔ ; p1 ↔ a ∧ c; p2 ↔ b ∧ d; p3 ↔ c; p4 ↔ f ; p5 ↔ a ∧ e . 3
1 2
Affiliated with SFU, Canada, and Griffith University, Australia. Universit¨at Potsdam, August-Bebel-Str. 89, D-14482 Potsdam, Germany
BACKGROUND
The usage of multisets is motivated by the syntactic nature of our approach and the fact that grounders produce duplicates. For simplicity, we keep standard set notation for multiset operations.
16
M. Gebser et al. / Advanced Preprocessing for Answer Set Solving
CF (Π, A) has three models: {a, b, e, f, p0 , p1 , p3 , p4 , p5 }, {a, c, p0 , p2 }, and {a, c, e, f, p0 , p2 , p4 , p5 }. Furthermore, program Π has one loop, {e, f }, yielding LF (Π) = {e ∨ f → p3 }. This loop formula is falsified by {a, c, e, f, p0 , p2 , p4 , p5 }, thus {a, c, e, f } is no answer set of Π. The other two models of CF (Π, A) satisfy LF (Π) and correspond to the answer sets {a, b, e, f } and {a, c} of Π. Finally, a (partial) Boolean assignment A over A ∪ 2A∪∼A is a set of possibly negated elements of its domain. We define A = {a ∈ A | a ∈ A} ∪ {B ⊆ A ∪ ∼A | B ∈ A}. For instance, A = {a, d, {a, ∼c}} assigns true to a and false to d as well as body {a, ∼c}, and A = {d, {a, ∼c}} contains all false elements of A.
3
PREPROCESSING
Our initial goal is to turn a given program Π over an alphabet A into a simplified program Π , a partial assignment A, and an equivalence relation E on the atoms and bodies in Π . More formally, we transform a triple (Π, ∅, ∅) into (Π , A, E). Thereby, Π is obtained from Π by program transformations, mainly involving rule eliminations and body modifications. The semantics of the original program Π is captured by Π along with assignment A and E, where the latter is also exploited to generate a compact representation of Π in terms of Boolean constraints. Our transformation rules, shown in Table 1, are grouped into four building blocks: s = {(s0 ), . . . , (s15 )}, e = {(e16 ), . . . , (e27 )}, a = {(a28 ), . . . , (a35 )}, and u = {(u36 )}. (Note that many of them are subject to conditions, given in the rightmost column.) Roughly, the rules in s permit elementary simplifications, while e partitions atoms and bodies into equivalence classes. As a byproduct of this, all unclassified atoms are unfounded and set to false via (u36 ). Finally, the rules in a substitute the atoms in an equivalence class by a unique representative for that class. Note that s, e, a, and u are intended to be applied till saturation before proceeding to another block of transformations. In what follows, we gradually explain the different transformations and also provide examples. To begin with, rules (s0 ) to (s10 ) build upon well-known program transformations [4, 5, 11]. Let T →∗ T represent the computation of a fixpoint T by repeated applications of → to T . Then, s →∗ amounts to computing the fixpoint of Fitting’s operator [13]. s In addition, →∗ makes assignments to bodies and simplifies the program at hand. Finally, rules (s11 ) to (s15 ) preserve the correspondence between the program Π and its associated assignment A. s For Π0 = {a ←; b ← a, ∼c; c ← ∼b, ∼d}, we get (Π0 , ∅, ∅) →∗ (Π1 , A1 , ∅), where Π1 = {b ← ∼c; c ← ∼b} and A1 = {a, d}. s In general, a fixpoint of → has the following syntactic properties. s Proposition 1 Let (Π, ∅, ∅) →∗ (Π , A, ∅), for logic program Π over alphabet A. Then, we have: 1. body(r) = ∅, for all r ∈ Π ; 2. body(a) = ∅, for all a ∈ atom(Π ); 3. (atom(Π ) ∪ body(Π )) ∩ (A ∪ A) = ∅; 4. A ∩ A = ∅; 5. {B S ⊆ A ∪ ∼A | B ∈ A ∪ A} ⊆ A; 6. B∈A\A (B + ∪ B − ) ⊆ atom(Π ). W W Using BF (Y ) = {( b∈B + b ∨ c∈B − c) | B ∈ Y }, we can capture the relationship between the original program Π and the reduced program Π along with assignment A as follows. s Proposition 2 Let (Π, ∅, ∅) →∗ (Π , A, ∅), for logic program Π over alphabet A. Then, we have AS (Π) = M(CF (Π , A\A)∪LF (Π )∪(A∩A)∪BF (A\A))|A . Rules (e16 ) to (e27 ) comprise the heart of our approach and build an equivalence relation on atoms and bodies. We represent equivalence classes as triples, viz., E = [a, B, C], where a is an atom
representative for E, B is a body (externally) supporting E, and C contains all atoms and bodies belonging to E. We denote the components of E by aE = a, BE = B, and CE = C. Thereby, ∅ denotes a null value, where aE = ∅ means that CE ∩ A = ∅ and BE = ∅ expresses that E is not (externally) supported. For a set E of equivalence classes, define:4 S S s EC = EC = [a,B,C]∈E,B =∅ C [a,B,C]∈E C S s + = B . EB [a,B,C]∈E,B =∅ Some classes in E are defined as dual to each other (and are finally represented by complementary propositional literals). In Table 1, the rules (e16 ) and (e17 ) each introduce a new equivalence class E along e and we assume both classes to be correlated via with its dual class E, e1 ; E2 , E e2 ; . . . ). Finally, we use E e to some unique name (e.g., E1 , E e = E. denote the dual class of E, and let E e Let us illustrate →∗ starting from (Π1 , A1 , ∅): e
→
E
Rule
(e16 ) b ← ∼c (e17 ) b ← ∼c (e18 ) b ← ∼c (e16 ) c ← ∼b (e20 ) (e17 ) c ← ∼b (e18 ) c ← ∼b
E1 E2 E3 E4 E5 E6 E7
= {E1 = E1 ∪ {E2 = {E1 = E3 ∪ {E3 e1 = {E = E5 ∪ {E4 e1 = {E E1 e2 E
e1 = [∅, ∅, ∅]} = [∅, {∼c}, {{∼c}}], E e2 = [∅, ∅, ∅] = [b, {∼c}, {b}], E } e1 , E e2 = [b, {∼c}, {b, {∼c}}], E } e3 = [∅, ∅, ∅]} = [∅, {∼b}, {{∼b}}], E e2 , E e3 = [∅, {∼b}, {{∼b}}], E1 , E } e4 = [∅, ∅, ∅] = [c, {∼b}, {c}], E } = [c, {∼b}, {c, {∼b}}], = [b, {∼c}, {b, {∼c}}], e3 = E e4 = [∅, ∅, ∅] =E }
e1 . We get two non-trivial, dual equivalence classes: E1 and E e1 is repreClass E1 is represented by b and supported by {∼c}; E sented by c and supported by {∼b}. Observe that (e16 ) and (e17 ) introduce equivalence classes and their duals, while (e18 ) and (e20 ) merge different classes. (For simplicity, trivial dual classes are kept.) e The overall proceeding of →∗ is support-driven, that is, rules are only taken into account if their positive body atoms have been classified. Moreover, each (vital) class [a, B, C] must be supported by some body B = ∅. To illustrate this, consider Π0 ∪ Π1 , where Π1 = {e ← ∼c; e ← f ; f ← e; g ← e, ∼f ; g ← h, ∼f ; h ← f, g} . s
We get (Π0 ∪ Π1 , ∅, ∅) →∗ (Π1 ∪ Π1 , A1 , ∅) and continue by ape plying →∗ to (Π1 ∪ Π1 , A1 , E7 ): e
→ (e17 ) (e16 ) (e21 ) (e17 ) (e16 ) (e21 ) (e19 )
E
Rule e ← ∼c f ←e f ←e e←f f ←e
E1 E2 E3 E4 E5 E6 E7
(e22 ) g ← e,∼f E7
= E7 ∪ = E1 ∪ = E7 ∪ = E3 ∪ = E4 ∪ = E3 ∪ = E7 ∪
{E1 {E2 {E1 {E3 {E4 {E3 {E1 e E 1
e = [∅, ∅, ∅] } = [e, {∼c}, {e}], E 1 e = [∅, ∅, ∅] } = [∅, {e}, {{e}}], E 2 e , E e = [e, {∼c}, {e, {e}}], E } 1 2 e = [f, {e}, {f }], E3 = [∅, ∅, ∅] } e = [∅, ∅, ∅]} = [∅, {f }, {{f }}], E 4 e , E e = [f, {e}, {f, {f }}], E } 3 4 = [e, {∼c}, {e, {e}, f, {f }}], e = E e = E e = [∅, ∅, ∅] =E } 2 3 4
We thus get (Π2 , A1 , E7 ), where Π2 = Π1 ∪ (Π1 \ {g ← e, ∼f }). Set E7 augments E7 with E1 , revealing that e and f can be treated as equals. Note that the supporting body {∼c} does not belong to CE1 , given that bodies {e} and {f } in CE1 are involved in loop {e, f }. Notably, the application of (e22 ) to g ← e,∼f allows us to stop without classifying g and h, which are unfounded relative to Π2 . However, by delaying the removal of g ← e,∼f , an equivalence relation E7 such that g and h belong to classes E satisfying BE = ∅ 4
The superscript s indicates supporting bodies B = ∅.
17
M. Gebser et al. / Advanced Preprocessing for Answer Set Solving
s
(s0 ) (s1 ) (s2 ) (s3 ) (s4 ) (s5 ) (s6 ) (s7 ) (s8 ) (s9 ) (s10 ) (s11 ) (s12 ) (s13 ) (s14 ) (s15 )
(Π ∪ {r, r}, A, E) (Π ∪ {a ← , , B}, A, E) (Π ∪ {a ← b, ∼b, B}, A, E) (Π ∪ {a ← a, B}, A, E) (Π ∪ {a ←}, A, E) (Π, A, E) (Π ∪ {a ← ∼a, B}, A, E) (Π ∪ {a ← B}, A ∪ {a}, E) (Π ∪ {a ← B}, A ∪ {B}, E) (Π ∪ {a ← , B}, A ∪ {}, E) (Π ∪ {a ← ∼, B}, A ∪ {}, E) (Π, A ∪ {{, } ∪ B}, E) (Π, A ∪ {{b, ∼b} ∪ B}, E) (Π, A ∪ {, {} ∪ B}, E) (Π, A ∪ {, {} ∪ B}, E) (Π, A ∪ {B}, E)
→ s → s → s → s → s → s → s → s → s → s → s → s → s → s → s →
(e16 ) (e17 ) (e18 ) (e19 ) (e20 ) (e21 ) (e22 ) (e23 ) (e24 ) (e25 ) (e26 )
(Π ∪ {a ← B}, A, E) → e (Π ∪ {a ← B}, A, E) → e (Π ∪ {a ← B}, A, E ∪ {E, [a, B, C]}) → e (Π ∪ {a ← B}, A, E ∪ {E, [a, B, C]}) → e e (Π, A, E ∪ {E, E, [a, B, C]}) → e e [a, B, C]}) (Π, A, E ∪ {E, E, → e e (Π ∪ {a ← B}, A, E ∪ {E, E}) → e (Π, A, E ∪ {[a, B, C]}) → e (Π, A, E ∪ {[a, B, C]}) → e (Π ∪ {a ← B}, A, E ∪ {[a, ∅, C]}) → e (Π ∪ {a ← B}, A, E ∪ {[a , ∅, C ]}) →
e
e
(e27 ) (Π ∪ {a ← B}, A, E ∪ {[∅, ∅, C]})
→
e (a28 ) (Π ∪ {a ← B}, A, E ∪ {E, E})
→
e (a29 ) (Π ∪ {a ← b, B}, A, E ∪ {E, E})
→
(a30 ) (a31 ) (a32 ) (a33 ) (a34 ) (a35 )
e (Π ∪ {a ← b, B}, A, E ∪ {E, E}) e (Π ∪ {a ← ∼c, B}, A, E ∪ {E, E}) e (Π ∪ {a ← ∼c, B}, A, E ∪ {E, E}) e (Π, A ∪ {B}, E ∪ {E, E}) (Π, A ∪ {{b} ∪ B}}, E ∪ {E}) (Π, A ∪ {{∼c} ∪ B}}, E ∪ {E})
a
a
a
→ a → a → a → a → a → u
(u36 ) (Π, A, E)
→
(Π ∪ {r}, A, E) (Π ∪ {a ← , B}, A, E) (Π, A, E) (Π, A, E) (Π, A ∪ {a}, E) (Π, A ∪ {a}, E) (Π, A ∪ {{∼a} ∪ B}, E) (Π, A ∪ {a}, E) (Π, A ∪ {B}, E) (Π ∪ {a ← B}, A ∪ {}, E) (Π, A ∪ {}, E) (Π, A ∪ {{} ∪ B}, E) (Π, A, E) (Π, A ∪ {, B}, E) (Π, A ∪ {}, E) (Π, A ∪ {a, B}, E)
` ´ a ∈ (B + ∪ B − ) \ (atom(Π) ∪ A ∪ A) ´ ` + s ,B ∈ e = [∅, ∅, ∅]}) (Π ∪ {a ← B}, A, E ∪ {E = [∅, B, {B}], E B ∪ Es ⊆ EC / EC ´ ` + Bs s ,a ∈ e = [∅, ∅, ∅]}) (Π ∪ {a ← B}, A, E ∪ {E = [a, B, {a}], E B ∪ EB ⊆ EC / EC ` ´ (Π ∪ {a ← B}, A, E ∪ {E = [a, B, C ∪ CE ]}) body(a) ⊆ CE , CE ∩ atom(Π) = ∅ ` ´ (Π ∪ {a ← B}, A, E ∪ {E = [aE , BE , CE ∪ C]}) body(a) ⊆ CE , CE ∩ atom(Π) = ∅ ` ´ e (Π, A, E ∪ {E = [a, B, C ∪ CE ], E}) B ∈ C, B + = ∅, B − ⊆ CEe , CE ∩ atom(Π) = ∅ ` ´ e (Π, A, E ∪ {E = [aE , BE , CE ∪ C], E}) B ∈ C, B + ⊆ CE , B − ⊆ CEe , CE ∩ atom(Π) = ∅ ` + ´ e (Π, A, E ∪ {E, E}) (B ∩ CE ) ∪ (B − ∩ CEe ) = ∅, (B + ∩ CEe ) ∪ (B − ∩ CE ) = ∅ ` ´ (Π, A, E ∪ {[a, ∅, C]}) B = ∅, B ∈ / body(Π) ´ ` s (Π, A, E ∪ {[a, ∅, C]}) B = ∅, B + ⊆ EC ´ ` + s s (Π ∪ {a ← B}, A, E ∪ {[a, B, C]}) B ∪ EB ⊆ EC ` s ⊆ Es , (Π ∪ {a ← B}, A, E ∪ {[a, B, C]}) {a, a } ⊆ C , a = a , B + ∪ EB C´ C = ({a, B} ∩ C ) ∪ (C \ (atom(Π) ∪ body(Π))) ´ ` s ⊆ Es (Π ∪ {a ← B}, A, E ∪ {[∅, B, C]}) B ∈ C, B + ∪ EB C ` e (Π, A, E ∪ {E, E}) a ∈ CE \ {aE }, {(a ← B ) ∈ Π S ∪ {a ← B} | a ∈ CE \ {aE },´ B + = ∅, a ∈ r∈Π∪{a←B} body(r)+ } = ∅ ` ← B ) ∈ Π | a ∈ C \ {a }, e (Π ∪ {a ← aE , B}, A, E ∪ {E, E}) b ∈ CE \ {aE }, {(aS E E ´ B + = ∅, a ∈ r∈Π∪{a←b,B} body(r)+ } = ∅ ` ´ e (Π ∪ {a ← ∼aEe , B}, A, E ∪ {E, E}) b ∈ CE \ {aE }, (b ← B ) ∈ Π, B + = ∅ ` ´ e (Π ∪ {a ← B}, A, E ∪ {E, E}) c ∈ CE , B + ∩ CEe = ∅ ` ´ e (Π ∪ {a ← ∼aE , B}, A, E ∪ {E, E}) c ∈ CE \ {aE }, B + ∩ CEe = ∅ ` + ´ e (Π, A, E ∪ {E, E}) (B ∩ CE ) ∪ (B − ∩ CEe ) = ∅, (B + ∩ CEe ) ∪ (B − ∩ CE ) = ∅ ` ´ (Π, A ∪ {{aE } ∪ B}}, E ∪ {E}) b ∈ CE \ {aE } ` ´ (Π, A ∪ {{∼aE } ∪ B}}, E ∪ {E}) c ∈ CE \ {aE } ` ´ s ∪ A) (Π, A ∪ {a}, E) a ∈ atom(Π) \ (EC
Transformation rules for preprocessing (where ∈ A ∪ A, ∼a = a, ∼a = a, and a = a).
Table 1.
could have been obtained as well. The latter again signals that g and h are unfounded, as in the case that they remain unclassified. The next results shed some light on the syntactic properties of the s e s e consecutive application of →∗ and →∗ , abbreviated by →∗ →∗ . s
e
Proposition 3 Let (Π, ∅, ∅) →∗ →∗ (Π , A, E), for logic program Π over alphabet A. Then, we have: 1. 2. 3. 4. 5.
` ´ a ∈ atom(Π) \ (A ∪ A), body(a) = ∅
s s EB ⊆ EC ⊆ atom(Π ) ∪ body(Π ); EC ∩ (A ∪ A) = ∅; CE ∩ CE = ∅, for all E, E ∈ E such that E = E ; (aE ← BE ) ∈ Π , for all E ∈ E such that aE = ∅, BE = ∅; s s body(r)+ ⊆ EC , for all r ∈ Π such that head (r) ∈ / EC .
We next show that our transformations preserve answer sets and that duality among equivalence classes carries forward to answer sets. s
e
Proposition 4 Let (Π, ∅, ∅) →∗ →∗ (Π , A, E), for logic program Π over alphabet A, and let X ∈ AS (Π). Then, we have: s 1. A ∩ A ⊆ X ⊆ (A ∩ A) ∪ EC ;
2. CE ∩A ⊆ X and CEe ∩X = ∅ or CEe ∩A ⊆ X and CE ∩X = ∅, e ⊆ E. for all {E, E} Equivalences and implicit or explicit unfoundedness of atoms (cf. E7 and E7 above) are exploited by the remaining transformations: (a28 ) to (a35 ) substitute equivalent atoms by the representative aE (or ∼aEe via rule (a30 )) for their class E, while (u36 ) assigns false to unfounded atoms. a u Although → and → leave program Π1 unchanged, they allow for further reducing Π2 in view of the obtained equivalence classes. We a u obtain (Π2 , A1 , E7 ) →∗ (Π3 , A1 , E7 ) →∗ (Π3 , A2 , E7 ), where Π3 = Π1 ∪ {e ← ∼c; e ← e; g ← h, ∼e; h ← e, g} and A2 = A1 ∪ {g, Sh} = {a, d, g, h}. Using E[X] = [a,B,C]∈E,C∩X =∅ (C ∩ A) for accumulating all atoms equivalent to members of X, we obtain the following result. s e a u Proposition 5 Let (Π, ∅, ∅) →∗ →∗ →∗ →∗ (Π , A, E), for logic program Π over alphabet A. Then, we have AS (Π) = {X ∪E[X]∪(A∩A) | X ∈ AS (Π )∩M(BF (A\A))} .
18
M. Gebser et al. / Advanced Preprocessing for Answer Set Solving
Finally, we consider the saturated result of preprocessing, where s e a u ∗ Π → (Π , A, E) stands for (Π, ∅, ∅) ( →∗ →∗ →∗ →∗ )∗ (Π , A, E). Let σ = {y1 /y1 , . . . , yn /yn } denote a substitution, and let Yσ be Y with every occurrence of yi replaced by yi for 1 ≤ i ≤ n. This allows us to formulate the following termination and confluence result. Theorem 6 Let Π be a logic program over A. Then, we have: ∗ 1. Every derivation → from Π terminates with some (Π , A, E) such that no transformation rule in Table 1 is applicable to (Π , A, E); ∗ ∗ 2. If Π → (Π1 , A1 , E1 ) and Π → (Π2 , A2 , E2 ), then (A1 ∩ A) ∪ E[A1 ] = (A2 ∩A)∪E[A2 ], Π1 σ = Π2 , and (A1 \A)σ = A2 \A, where σ = {a/aE | E ∈ E2 , a ∈ CE ∩ A}; ∗ ∗ e1 } ⊆ E1 3. If Π → (Π1 , A1 , E1 ), Π → (Π2 , A2 , E2 ), and {E1 , E e such that BE1 = ∅, then {E2 , E2 } ⊆ E2 such that BE2 = ∅, CE1 σ = CE2 σ, and CEe1 σ = CEe2 σ, where σ = {a/aE | E ∈ E2 , a ∈ CE ∩ A}. ∗
Reconsidering Π0 ∪ Π1 , we get (Π0 ∪ Π1 ) → (Π1 , A2 , E ), where E contains two vital classes, viz., E = [b, {∼c}, {b, {∼c}, e = [c, {∼b}, {c, {∼b}}], while all other e, {e}, f, {f }}] and E classes E ∈ E are such that BE = ∅. This outcome is independent from the order in which transformations are applied. Also note that all six rules of Π1 are removed by preprocessing, thus transforming non-tight program Π0 ∪ Π1 into tight program Π1 . Notably, the result of our transformations goes beyond the wellfounded model [21] of a logic program. ∗ Proposition 7 Let Π → (Π , A, E), for logic program Π over A, and let I ⊆ A ∪ A be the well-founded model of Π. Then, we have I ∩A ⊆ (A∩A)∪E[A] and I ∩A ⊆ (A \ (A ∪ E[A ∪ atom(Π )])). Similar to the known algorithms for computing a program’s well∗ founded model, → can be computed in quadratic time. In fact, if no program rule is removed (via rules other than (a28 )) after the initial s s e a u application of →∗ , a linear pass of →∗ →∗ →∗ →∗ suffices to coms∗ e∗ a∗ u∗ ∗ ∗ pute →, while iteration, viz., ( → → → → ) , is needed otherwise. We now take advantage of the result of our initial preprocessing phase, (Π , A, E), for obtaining a compact completion formula. To this end, we use E to induce a variable mapping ν : atom(Π ) ∪ {pB | B ∈ body(Π )} → V ∪ V, where V is an alphae ⊆ E such that BE = ∅, we bet of variable names. For each {E, E} e as follows: select a unique v ∈ V and map the elements of E and E 1. ν(y) = v iff y ∈ (CE ∩ atom(Π )) ∪ {pB | B ∈ CE ∩ body(Π )}; 2. ν(y) = v iff y ∈ (CEe ∩ atom(Π )) ∪ {pB | B ∈ CEe ∩ body(Π )}. Practically, ν amounts to an abstraction of the original program, as used for the internal representation within ASP solvers. We then use ν for inducing a substitution σν = {y/ν(y) | y ∈ atom(Π ) ∪ {pB | B ∈ body(Π )}}. For (Π1 , A2 , E ), we get mapping ν1 = {b → v; c → v; p{∼c} → v; p{∼b} → v}, using only one variable v. Having mapping ν induced by (Π , A, E), we express the completion and loop formulas of Π using the variables in V: ` VFν (Π , A, E) = LF (Π ) ∪ BF (A \ A) ∪ ´ CF (Π , atom(Π ) ∪ (A \ (A ∪ E[A ∪ atom(Π )]))) σν . Note that applying σν leaves the introduction of body propositions (cf. (1)) implicit. In our example, we get VFν1 (Π1 , A2 , E ) = CF (Π1 , {b, c, d, g, h})σν1 = {v ↔ v; v ↔ v; d ↔ ⊥; g ↔ ⊥; h ↔ ⊥} . Note that LF (Π1 ) is empty (since Π1 is tight), and so is BF (A2 \A). Clearly, CF (Π1 , {b, c, d, g, h})σν1 possesses the models ∅ and {v}. Such models are linked to the atoms in an original program Π by
appeal to EFν (E) = {a ↔ ν(aE ) | E ∈ E, BE = ∅, a ∈ CE ∩ A}; e.g., EFν1 (E ) = {b ↔ v; e ↔ v; f ↔ v; c ↔ v}. Formally, we have the following result. ∗ Theorem 8 Let Π → (Π , A, E), for logic program Π over A, and let ν be a variable mapping induced by (Π , A, E). Then, we have AS (Π) = M((A ∩ A) ∪ E[A] ∪ VFν (Π , A, E) ∪ EFν (E))|A . For instance, for (Π1 , A2 , E ), ν1 , and A = {a, . . . , h}, we obtain M({a} ∪ ∅ ∪ VFν1 (Π1 , A2 , E ) ∪ EFν1 (E ))|A = {{a, b, e, f }, {a, c}}, which are the two answer sets of Π0 ∪ Π1 . Finally, note that our implementation within clasp takes advantage of the preprocessing result only for the initial construction of a compact completion formula, while loop formulas are not computed a priori, but only if they are used for propagation or conflict analysis.
4
EXPERIMENTS
We conducted systematic experiments on the benchmark sets used in the categories SCore and SLparse of the ASP competition [15]. Our comparison considers the ASP solver clasp in four modes: (1) no elaborated preprocessing, only elementary simplifications as in (s0 ) to (s15 ); (2) external program reduction (as described in Section 3); (3) internal reduction, extending SatELite-like techniques [10];5 and (4) both types of preprocessing. Table 2 summarizes results in seconds (t), indicating the number of timeouts via a superscript. Each line averages over n runs on n/3 instances, each shuffled three times. Furthermore, |r|, |a|, and |b| give the average number of rules, atoms, and bodies, respectively, in the original programs of each class; |v| and |c| give the average number of variables and Boolean constraints in the internal representation. The number of variables |v| is the same for variant (1) and (3) as well as for (2) and (4), respectively, and thus not duplicated in Table 2. At the bottom of Table 2, all individual runs are summed up, not taking averages. Full details are provided at [7]. In total, we see that variant (4) performs best, even though SatELite-like techniques are currently not applied to so-called extended rules (allowed within SLparse instances, shown in the second part of Table 2), while we have generalized external program reduction to work on such rules too. Furthermore, SatELite-like techniques work best on tight examples, being released from unfounded set checking. (Note that 2/3 of the benchmark classes are tight.) Unlike this, the approach in Section 3 is advantageous on non-tight programs due to its support-driven strategy. Another factor is the size of s e a u input programs. While our external technique ( →∗ →∗ →∗ →∗ ) is implemented in a linear fashion, SatELite-like techniques involve subsumption tests yielding a quadratic worst case behavior. Regarding the number of variables, one has to compare |a|+|b| with |v|. In the worst case, both would be equal. However, we sometimes see significant reductions of more than one order of magnitude. Given that the elementary simplifications already cut down the number of variables, the speed-ups of version (2) over (1) are mainly due to the reduced completion formula (reflected by |c|). Also, the number |c| of constraints is often much smaller than the original number |r| of rules.
5
DISCUSSION
We provided the first ASP-specific approach to preprocessing logic programs, aiming at reducing an input program as well as the number of variables in its internal representation. The latter goal is also pursued by smodels [19], where choices rely on atoms occurring negatively in bodies, and by cmodels [8], where heuristics are used to 5
Note that a straightforward application of SatELite-like techniques is insufficient since it interferes with unfounded set detection.
19
M. Gebser et al. / Advanced Preprocessing for Answer Set Solving
Problem Name (n) 15-Puzzle (30) BlockedN-Queens (42) EqTest (15) Factoring (15) HamiltonianPath (42) RLP-150 (42) RLP-200 (42) RandomNonTight (42) SchurNumbers (15)
|r| 17203 308796 6901 6974 4228 728 1184 839 12014
|a| |b| 5161 13029 5503 155646 434 2996 4965 6782 1533 2542 151 715 201 1165 55 806 736 4391
|v| 3100 53716 1143 3637 1358 288 455 287 1005
clasp (1) clasp (2) |c| t |v| |c| t 24348 0.3 2930 23942 0.3 69281 18 285.8 50613 2988 16 254.5 12338 16.0 999 11514 14.4 13407 5.6 2244 9524 3.9 5533 0.1 748 2987 0.1 3002 0.3 286 2992 0.3 4850 0.9 453 4838 0.9 5286 32.3 283 5267 32.8 4862 2.3 829 3971 1.4
clasp (3) |c| t 13497 0.3 2720 18 265.1 9866 16.4 3791 1.8 2974 0.1 2994 0.3 4835 1.0 5286 31.3 2451 2.6
clasp (4) |c| t 13296 0.3 2720 18 265.7 9419 14.7 3765 1.9 1277 0.1 2986 0.3 4826 0.9 5252 33.4 1602 1.0
15-Puzzle (15) 38250 11385 37498 15694 116321 1 213.2 15298 115173 96.3 79624 104.1 79624 112.8 BlockedN-Queens (15) 5024 4699 2726 2472 331 17.1 894 331 9.1 331 9.5 331 13.5 BoundedSpanningTree (15) 206557 2359 203226 68524 201427 3.7 67796 198432 3.7 190486 16.5 190486 16.8 CarSequencing (15) 1582 2303 1263 1189 630 15 600.0 695 630 15 600.0 630 15 600.0 630 13 566.3 Factoring (12) 7685 5470 7472 4006 14803 8.6 2473 10525 4.1 4196 2.2 4170 2.1 HamiltonianCycle (15) 10502 7003 4955 3986 12236 0.3 1925 7916 0.2 4676 1.4 4641 1.3 HamiltonianPath (15) 4924 1623 2920 1514 6102 0.1 864 3387 0.1 3364 0.1 1560 0.1 Hashiwokakero (12) 738726 149926 717900 227596 2163406 3 125.2 217954 1912400 3 125.2 1915809 3 125.4 1912400 3 125.3 KnightsTour (15) 58062 10968 37996 14866 16518 0.5 11383 10559 0.5 5317 0.7 3402 0.7 RLP-150 (15) 735 151 721 290 3030 0.4 288 3019 0.3 3023 0.4 3014 0.3 RLP-200 (15) 793 199 781 326 3309 1.1 319 3269 1.0 3276 1.0 3244 1.0 RandomNonTight (15) 848 55 816 290 5380 9.0 287 5361 5.8 5380 9.0 5347 5.5 SchurNumbers (15) 85319 1713 43097 7570 11438 2 129.3 7307 11438 1 164.0 10705 1 129.0 10705 1 97.8 SearchTest-plain (15) 690808 4339 522045 34753 160494 3 122.9 31869 148922 2 114.1 114633 3 124.4 105102 1 81.5 SearchTest-verbose (15) 802803 4959 606804 40320 165791 12.3 36964 152633 13.8 97379 37.5 88708 34.9 SocialGolfer (15) 31506 11269 31108 12500 119754 3 120.6 11857 119754 3 121.3 108148 3 124.4 108148 3 124.2 SolitaireBackward (15) 20508 8381 9305 5473 39345 1.9 2545 18017 1.1 13980 1.7 11740 0.7 SolitaireBackward2 (15) 27435 4397 25517 8713 14323 4 260.4 8366 14323 6 312.8 10008 4 179.1 10009 3 177.7 SolitaireForward (15) 19606 8020 8858 5153 29835 3 120.3 3602 23819 3 120.3 18448 2 90.3 15253 3 120.2 Su-Doku (9) 1003593 17053 502502 173185 12772 7.1 165897 12772 7.9 12772 11.0 12772 11.3 TowersOfHanoi (15) 18340 7215 15028 7294 15903 24.1 5500 13527 24.4 8665 24.7 8664 16.0 TravelingSalesperson (15) 3825 3065 1588 1448 3588 0.4 583 2356 0.2 2356 0.3 2339 1.5 VerifyTest-variableSearchSpace (15) 12914 2296 9134 1061 4285 0.1 608 3088 0.1 1273 0.1 806 0.1 WeightBoundedDominatingSet (15) 3163 2879 798 1187 2048 6 245.9 264 910 4 165.1 453 3 128.2 453 2 105.4 WeightedLatinSquare (15) 997 770 446 405 222 0.0 146 222 0.0 222 0.0 222 0.0 WeightedSpanningTree (15) 112034 2185 108934 36998 81210 2.3 36294 78426 2.2 78052 4.5 78052 4.4 Total time/timeouts 44116.9/58 40774.2/53 38641.0/52 37139.0/47 variables/constraints 10954406/46339719 10172081/39117132 -/35997972 -/35438242 Table 2. Experiments with clasp (1.0.5) on a 2.2GHz PC under Linux; each run restricted to 600s time and 1GB RAM.
eliminate body variables. However, up to now clasp is the only ASP solver integrating advanced preprocessing techniques. Neither ASPspecific (external) nor SatELite-like (internal) preprocessing have yet been implemented elsewhere in the context of ASP. Our experiments show that investments in preprocessing are well spent. In fact, the best results are obtained when combining ASP-specific with SatELitelike preprocessing. Instead of integrating preprocessing into clasp, it could be performed by a dedicated front-end, beneficial also to other solvers. The development of such a tool is left as a future issue.
REFERENCES [1] http://assat.cs.ust.hk. [2] F. Bacchus, ‘Enhancing Davis Putnam with extended binary clause reasoning’, in Proceedings AAAI’02, pp. 613–619. AAAI Press, (2002). [3] C. Baral, Knowledge Representation, Reasoning and Declarative Problem Solving. Cambridge University Press, (2003). [4] S. Brass and J. Dix, ‘Semantics of (disjunctive) logic programs based on partial evaluation’, Journal of Logic Programming, 40(1), 1–46, (1999). [5] S. Brass, J. Dix, B. Freitag, and U. Zukowski, ‘Transformation-based bottom-up computation of the well-founded model’, Theory and Practice of Logic Programming, 1(5), 497–538, (2001). [6] K. Clark, ‘Negation as failure’, in Logic and Data Bases, eds., H. Gallaire and J. Minker, pp. 293–322. Plenum Press, (1978). [7] http://www.cs.uni-potsdam.de/clasp. [8] http://www.cs.utexas.edu/users/tag/cmodels. [9] http://www.dlvsystem.com. [10] N. E´en and A. Biere, ‘Effective preprocessing in SAT through variable
[11] [12] [13] [14] [15]
[16] [17] [18] [19] [20] [21]
and clause elimination’, in Proceedings SAT’05, eds., F. Bacchus and T. Walsh, pp. 61–75. Springer, (2005). T. Eiter, M. Fink, H. Tompits, and S. Woltran, ‘Simplifying logic programs under uniform and strong equivalence’, in Proceedings LPNMR’04, eds., V. Lifschitz and I. Niemel¨a, pp. 87–99. Springer, (2004). F. Fages, ‘Consistency of Clark’s completion and the existence of stable models’, J. of Methods of Logic in Computer Science, 1, 51–60, (1994). M. Fitting, ‘Tableaux for logic programming’, Journal of Automated Reasoning, 13(2), 175–188, (1994). M. Gebser, B. Kaufmann, A. Neumann, and T. Schaub, ‘Conflict-driven answer set solving’, in Proceedings IJCAI’07, ed., M. Veloso, pp. 386– 392. AAAI Press/MIT Press, (2007). M. Gebser, L. Liu, G. Namasivayam, A. Neumann, T. Schaub, and M. Truszczy´nski, ‘The first answer set programming system competition’, in Proceedings LPNMR’07, eds., C. Baral, G. Brewka, and J. Schlipf, pp. 3–17. Springer, (2007). M. Gebser and T. Schaub, ‘Tableau calculi for answer set programming’, in Proceedings ICLP’06, eds., S. Etalle and M. Truszczy´nski, pp. 11–25. Springer, (2006). C. Gomes, H. Kautz, A. Sabharwal, and B. Selman, ‘Satisfiability solvers’, in Handbook of Knowledge Representation, eds., V. Lifschitz, F. van Hermelen, and B. Porter. Elsevier, (2008). F. Lin and Y. Zhao, ‘ASSAT: computing answer sets of a logic program by SAT solvers’, Artificial Intelligence, 157(1-2), 115–137, (2004). http://www.tcs.hut.fi/Software/smodels. S. Subbarayan and D. Pradhan, ‘NiVER: Non increasing variable elimination resolution for preprocessing SAT instances’, in Proceedings SAT’04, eds., H. Hoos and D. Mitchell, pp. 276–291. Springer, (2005). A. Van Gelder, K. Ross, and J. Schlipf, ‘The well-founded semantics for general logic programs’, Journal of the ACM, 38(3), 620–650, (1991).
20
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-20
A generic framework for comparing semantic similarities on a subsumption hierarchy Emmanuel Blanchard1 and Mounira Harzallah1 and Pascale Kuntz1 Abstract. Defining a suitable semantic similarity between concept pairs of a subsumption hierarchy is becoming a generic problem for many applications in knowledge engineering exploiting ontologies. In this paper, we define a generic framework which can guide the proposition of new measures by making explicit the information on the ontology which has not been integrated into existing definitions yet. Moreover, this framework allows us to rewrite numerous measures, originally proposed in various contexts, which are in fact closely related to each other. From this observation, we show some metrical and ordinal properties. Experimental comparisons on WordNet and on collections of human judgments complete the theoretical results and confirm the relevance of our propositions.
1
Introduction
Semantic similarity is a generic issue in a variety of applications in the areas of computational linguistics, artificial intelligence and biology, both in the academic community and the industry. Examples include word sense disambiguation [20], detection and correction of word spelling errors (malaproprisms) [4], image retrieval [23], information retrieval [13] and biological issues [25]. Similarities have been widely studied for set representations. The similarity σ(A, B) between two subsets of elements A and B is often defined as a function of the elements common to A and B and as a function of the distinct ones. The Jaccard’s coefficient [12] and the Dice’s coefficient [7], which have originally been defined for ecological studies, are probably the most commonly used similarities among a large family of coefficients [11][24]. Their theoretical properties have been carefully studied [10][6]. Another important issue is the evaluation of semantic similarity in a network structure. With a long history in psychology [27][21], the problem of evaluating semantic similarity in a network structure has known a noticeable renewed interest linked to the development of the semantic web. In the 1970’s many studies on categorization were influenced by a theory which stated that, from an external point of view, the categories in a set of objects were organized in a taxonomy according to an abstraction process. It is a common principle of the current knowledge representation systems to describe proximity relationships between domain concepts by a hierarchy, or more generally by a graph, i.e. by the ontologies associated with the new languages of the semantic Web –in particular OWL [1]. The tree-based similarities defined on a subsumption hierarchy contain two categories of similarities: those which, like the Wu and Palmer’s similarity [28], only depend on the hierarchical structure (e.g., path lengths between concept pairs), and those which, like the Lin’s similarity [14], additionally incorporate statistics on a corpus 1
University of Nantes, France, email:
[email protected] (e.g., concept occurrence frequencies). Some recent work has tried to extend the tree-based definitions to graphs by simultaneously taking into account different semantic relationships [15]. But, despite its pertinence, this attempt is faced with many open problems, and in practice the set-based and the tree-based similarities still remain the most widely used. Our main purpose here is to show that these measures, which have originally been proposed in various contexts, are closely related to each other. Most set-based similarities σ (A, B) can be re-written as functions f (|A| , |B| , |A ∩ B|) of the cardinalities of sets A and B and of their intersection set A ∩ B. In data analysis, a classification attempt, not widely used in knowledge engineering, has permitted to gather numerous similarity definitions into two parametrized functions that we denote by fα and fβ [6]. In this paper, we extend the definitions of these functions to the tree-based similarities: we define two generic functions feα and feβ with the same schema as fα and fβ . Each function depends on a real parameter α or β, and on the “information content” ψ(ci ) = − log P (ci ) initially introduced by Resnik [19], where P (ci ) is the probability of encountering an instance of the concept ci . The operational computation of the theoretical probability P(ci ) may vary according to the available information (e.g., a corpus). We show that numerous published tree-based similarities are associated with a α or β value and an approximation of P. The interests of this work are threefold. First, some partial pairwise comparisons have already been presented in the literature, but our unified framework allows to precisely identify the theoretical differences and commonalities of a large set of measures. Second, an analysis of the combinatorics of the subsumption hierarchy has led us to define new approximations of the probability P which exploit information on the subsumption hierarchy which has not been integrated into existing measures yet. Third, we show that ordinal and metrical properties can be straightforwardly deduced from this unified framework. We complete this theoretical study by numerical experiments on WordNet samples (version 2.0) and on benchmarks on which human judgments have been collected.
2
A typology of set-based similarities
In this section, we denote by S a finite set of elements and A, B, C some subsets of S. We briefly recall that a similarity σ on P(S) is a function σ : P(S) × P(S) → IR+ which satisfies two properties: symmetry (σ(A, B) = σ(B, A)) and maximality (σ(A, A) ≥ σ(B, C)). Most of the set-based similarities can be grouped into two parametrized families. The first one σα has been proposed by Caillez and Kuntz [6]. It is defined by a ratio between the cardinality of the intersection |A ∩ B|
21
E. Blanchard et al. / A Generic Framework for Comparing Semantic Similarities on a Subsumption Hierarchy
and the Cauchy’s mean [5] of the cardinalities of the respective sets |A| and |B|: σα (A, B) = fα (|A| , |B| , |A ∩ B|) =
|A∩B| μα (|A|,|B|)
(1)
”1 “ α α α where μα (|A| , |B|) = |A| +|B| for α ∈ IR. 2 Note that the case α = 1 concides with the classical arithmetic mean. The second family σβ has been studied by Gower and Legendre [10]: σβ (ci , cj ) = fβ (|A| , |B| , |A ∩ B|) =
β·|A∩B| |A|+|B|+(β−2)·|A∩B|
Table 1. Correspondence between different parameter values and well-known set-based similarities α Mean μα Similarity σα −∞ minimum Simpson β Similarity σβ −1 harmonic Kulczinsky 1/2 Sokal&Sneath 0 geometric Ochia¨ı 1 Jaccard 1 arithmetic Dice 2 Dice +∞ maximum Braun&Blanquet
It is easy to check that the values of the similarities σα and σβ are in the interval [0; 1].
A new formulation of tree-based similarities
In the following, we denote by C = {c1 , c2 , . . . , cn } a finite set of concepts. Formally, an ontology can be modeled by a directed graph where the nodes represent concepts and the arcs represent labeled relationships. Here, like often in the literature, we restrict ourselves to the subsumption relationship “is-a” on C × C. This relationship is common to every ontology, and different papers have confirmed that it is the most structuring one (e.g., [18]). In this case, if we assume that each concept ci has no more than one parent (direct subsumer), the ontology can be modeled by a rooted tree T (C) where the root c0 is either an informative concept or a “dummy” concept just added for the connectivity. We denote by cij the most specific common subsumer of the concepts ci and cj in T (C). In this section, we adapt the definitions 1 and 2 above to define new tree-based similarity families using the information content notion [19]. We also propose different ways to compute the information content of a concept which aims at better exploiting the hierarchy. Moreover, we show how our framework support the rediscovering of existing tree-based similarities. Our proposition allows to better understand both the relationships between the set-based similarities and the tree-based similarities and between the tree-based similarities themselves.
3.1
σ eα (ci , cj ) = feα (ψ(ci ), ψ(cj ), ψ(cij )) =
ψ(cij )
μα (ψ(ci ),ψ(cj ))
(3)
where μα is the Cauchy’s mean and α ∈ IR, and σ eβ (ci , cj )
(2)
where β ∈ IR∗+ . Table 1 shows the correspondence for different values of α and β with well-known measures (see [24] for the original references of the definitions).
3
the information content ψ(cij ) = − log P(cij ) of their most specific common subsumer cij . Consequently, from the definitions 1 and 2, we deduce two new parametrized functions which define tree-based similarities:
= feβ (ψ(ci ), ψ(cj ), ψ(cij )) β·ψ(cij ) = ψ(ci )+ψ(cj )+(β−2)·ψ(c ij )
(4)
where β ∈ IR∗+ Let us remark that σ eα (ci , cj ) = σ eβ (ci , cj ) when α = 1 and β = 2. The parameter α allows to choose different definitions of the mean (e.g., arithmetic, geometric). Formulation 4 explicitely shows that the parameter β allows to weight the importance of the common information associated with the most specific common subsumer. The logarithm base has no influence over this similarity measure due to the use of a ratio.
3.2
Information content computation
Let us remark that in practice the instance set I is never completely described in extension. Consequently, the operational computation of the probability P (ci ) depends both on the information at our disposal and on the hypothesis carried through the construction of the ontology. We denote by Pb (ci ) the approximation of P (ci ) in practice. br proposed by Resnik is computed by the forThe approximation P br (ci ) = n(ci ) where n(ci ) is the number of occurrences of mula: P n(c0 ) ci plus the number of occurrences of the concepts which are subsumed by ci in T (C). This approximation considers the root as virbr (c0 ) = 1). tual (P The probability P(ci ) can be approximated without considering any additional information. We propose some approximations deduced from various hypothesis on the extension of the concepts. We distinguish three approaches associated with different hypothesis: • descending approach – Hypothesis 1: exponential decreasing of the instance number bd ) with concept depth in T (C) (P – Hypothesis 2: uniform distribution of the father’s instances on bs ) its sons (P • ascending approach – Hypothesis 3: exponential increasing of the instance number bh ) with concept height in T (C) (P – Hypothesis 4: uniform distribution of the root’s instances on bg ) leaves (P • combined approach bdh : aggregation of P bd and P bh – P bsg : aggregation of P bs and P bg – P
Two new generic functions
Like Lin in his seminal paper [14], let us suppose that a concept ci references a subset Ii of an instance set I. By analogy with the Shannon’s information theory, the information content of the concept ci is measured by ψ(ci ) = − log P(ci ) where P(ci ) ∈ [0, 1] is the probability for a generic instance of ci to belong to Ii . Similarly, the common information associated with a concept pair {ci , cj } is
3.2.1
d (Hypothesis 1) Approximation P
The probability for an instance to be associated with a concept ci decreases exponentially with the depth di of ci in T (C). Then, b b bd (ci ) = Pd (parent (ci )) = P(c0 ) P k k di
(5)
22
E. Blanchard et al. / A Generic Framework for Comparing Semantic Similarities on a Subsumption Hierarchy
where k is a fixed integer and parent (ci ) is the parent (direct subsumer) of ci . Let us remark that when the logarithm base is set to k, the information content of a concept ci is equivalent to its depth plus the information content of the root: bd (ci ) = di + ψ(c0 ) ψd (ci ) = − logk P
3.2.2
(6)
s (Hypothesis 2) Approximation P
bs (parent(ci )) P |Children(parent(ci ))|
h (Hypothesis 3) Approximation P
bh (ci ) = P
(8)
In the particular case of a logarithm base equal to k, the information content of a concept ci is defined by: bh (ci ) = h0 − hi + ψ(c0 ) ψh (ci ) = − logk P
3.2.4
(10)
where Leaves (ci ) corresponds to the leaf set subsumed by ci (when ci is a leaf, Leaves (ci ) = {ci }). bs case. Here, the information This case is dual to the previous P content (ψg ) deduced from this approximation corresponds to the generality degree in comparison with the leaves ; the height takes into account a part of the information exploited by this generality debh by considering the number of gree. This approximation refines P sons of the concept and its subsumed concepts.
3.2.5
sg and P dh Approximations P
We consider an alternative which simultaneously take into account the specificity and the generality degrees: bsg (ci ) = P
bg (ci ) bs (ci )+P P 2
(11)
bs and bsg is based on the arithmetic mean of P The definition of P b Pg . This choice is forced by the preservation of the recursivity: bsg (cx ). bsg (ci ) = P P P cx ∈Children(ci )
bd and P bh : A dual case is the aggregation of P bdh (ci ) = P
bh (ci ) bd (ci )+P P 2
lin(ci , cj ) =
2·ψr (cij ) ψr (ci )+ψr (cj )
(13)
Due to the Resnik’s approximation, the root concept is considered b 0 ) = 1). as virtual (P(c
3.3.2
Wu & Palmer’s similarity
wup(ci , cj ) =
3.3.3
(12)
2·ψd (cij ) ψd (ci )+ψd (cj )
(14)
Stojanovic’s similarity
bd allows to rewrite the Stojanovic’s similarity The approximation P [26] which is analogous to the Jaccard’s coefficient: sto(ci , cj ) =
3.3.4
We consider a uniform distribution of the instances of the root concept on the leaf concepts: |Leaves(ci )| |Leaves(c0 )|
The Lin’s similarity [14] is analogous to the Dice’s coefficient with the Resnik’s approximation:
(9)
g (Hypothesis 4) Approximation P
bg (ci ) = P(c b 0) · P
Lin’s similarity
The Wu & Palmer’s similarity [28] is analogous to the Dice’s coeffibd : cient with the approximation P
Each leaf has the same instance number and the probability of an instance to be associated with a concept ci increases exponentially with the height of ci . A leaf concept has a minimal probability which depends on the height of the hierarchy and on the instance number of the root. We can approximate P(ci ) by: b 0) P(c kh0 −hi
In this subsection, we show that the generic functions σ eα and σ eβ describe a set of semantic similarities (e.g., Lin, Wu & Palmer). We show that, in some cases, the approximations of P (ci ) coincide with known measures of the literature.
(7)
where Children (ci ) corresponds to the set of sons of ci . The information content (ψs ) deduced from this approximation corresponds to the specificity degree in comparison with the root ; the depth takes into account a part of the information exploited by bd by considering this specificity degree. This approximation refines P the number of sons of each subsumer.
3.2.3
Similarity definitions deduced from the approximations
3.3.1
We consider a uniform distribution of the instances of a father concept on its son concepts : bs (ci ) = P
3.3
ψd (cij ) ψd (ci )+ψd (cj )−ψd (cij )
(15)
Proportion of Shared Specificity
The Proportion of Shared Specificity (pss) proposed by Blanchard et bs approximaal. [2] coincides with the Dice’s coefficient with the P tion: 2·ψs (cij ) pss(ci , cj ) = ψs (ci )+ψ (16) s (cj )
4
Metrical and ordinal properties
Most of the work on the mathematical properties of the similarities are focused on their metrical aspect [18]. They usually resort to preliminary transformations of the similarity into a dissimilarity of the form δ = M axσ − σ, where M axσ is the maximal value reached by σ, or δ = σ1 when M axσ is not finite, in order to check the triangular inequality δ (ci , cj ) ≤ δ (ci , ck ) + δ (ck , cj ). Here, M axσα = M axσβ = 1 and we can consider the transformations δα = 1 − σα and δβ = 1 − σβ . By studying the set-based similarities, Caillez et al. [6] and Gower et al. [10] have proved that the triangular inequality holds for α → +∞ and β ∈ [0, 1]. From a formal point of view, these questions are interesting; however, for practical applications in knowledge engineering, the developed approaches do not generally require this constraining property. When comparing results with different similarities, we can remark that specialists are more often concerned with the ordering associated with the obtained values than with the intrinsic values. Indeed, they order the concept pairs according to the proximities quantified by these measures.
E. Blanchard et al. / A Generic Framework for Comparing Semantic Similarities on a Subsumption Hierarchy
Proposition 1. The similarities of the family {e σβ }β∈IR∗ fol+
low the same ordering: for any ci , cj , ck , cl in C, σ eβ (ci , cj ) ≤ σ eβ (ck , cl ) ⇔ σ eβ (ci , cj ) ≤ σ eβ (ck , cl ) for any β and β ∈ IR∗+ . We show that σ eβ (ci , cj ) ≤ σ eβ (ck , cl ) ⇐⇒ σ e1 (ci , cj ) ≤σ e1 (ck , cl ) for any β ∈ IR∗+ . When ψ(ci )+ψ(cj )−2·ψ(cij ) = 0 then, σ e1 (ci , cj ) = σ eβ (ci , cj ) for any β > 0. Otherwise, it is easy to check that, for ψ(ci ) + ψ(cj ) − 2 · ψ(cij ) = 0, σ eβ (ci , cj ) =
β·e σ1 (ci ,cj ) 1+(β−1)·e σ1 (ci ,cj )
Consequently, σ e1 (ci , cj ) ≥ σ e1 (ck , cl ) σ eβ (ck , cl ).
⇐⇒
23
the discussion between experts concerning the ontological nature of WordNet. We have computed the information content for four different concept sets: the whole set of WordNet (146690 concepts) and three subsets of WordNet composed of the concept sets used respectively in the Miller & Charles [16], Rubenstein & Goodenough [22] and Finkelstein & Gabrilovich [9] benchmarks. We have compared the bd , P bg and P br . The correlations ρ (ψd , ψr ) and approximations P ρ (ψg , ψr ) are reported in the figure 1 (the rank correlations not reported here give similar results).
σ eβ (ci , cj ) ≥
Proposition 2. The similarities of the family {e σα }α∈IR do not follow the same ordering. Let us consider the following counter-example on a set C = {c1 , c2 , c3 , c4 }. We suppose that c1 is a subsumer of c2 , and that ψ(c1 ) = 1, ψ(c2 ) = 3, ψ(c3 ) = ψ(c4 ) = 2 and ψ(c34 ) = 2. In this 1 case, the Cauchy’s means are μα (ψ(c1 ), ψ(c2 )) = ((1 + 3α )/2) α and μα (ψ(c3 ), ψ(c4 )) = 2. Due to the convexity of the power function when α > 1, then μα (ψ(c1 ), ψ(c2 )) > μα (ψ(c3 ), ψ(c4 )) and consequently σ eα (c1 , c2 ) < σ eα (c3 , c4 ). When α < 1, the inequality is inverted. Proposition 3. The similarities of the family {e σα }α∈IR are decreasing functions of α. This is due to the fact that the α-means are increasing functions of α (e.g., [5]).
5
Experimental results
In this section, we present two complementary comparisons based on the subsumption hierarchy of WordNet 2.0 [8]. First, we compare the information content restricted to the structural information with the well-known Resnik’s information content which additionally requires a corpus. This allows us to quantify the information deduced from the corpus. Second, we use three well-known benchmarks (Rubenstein & Goodenough [22], Miller & Charles [16], Finkelstein et al. [9]) which gather human judgments on some concept pairs. This allowed us to evaluate the relevance of the different approximations.
5.1
Figure 1. Correlation of ψd and ψg information content with the one of Resnik ψr on WordNet concepts and four subsets
br which is a yardstick has been computed The approximation P with the British National Corpus with the Resnik counting method and a smoothing by 1 [17]. We can remark that each benchmark uses a sample of concepts which is not so representative of the whole set of concepts. Indeed, the corpus effect on the information content is more important on the whole set than on the three samples. From this point of view, the one of Finkelstein & Gabrilovich is the worse benchmark. Unsurprisingly, the information content based on the approximabd is the less correlated with P br . However, the positive corretion P lations show the relationship between the ascending and descending approximations: the depth tends to be conversely proportional to the height. The correlations between ψg and ψr show that the information quantity deduced from the corpus is restricted comparatively to the information deduced from the hierarchical structure. Nevertheless, these results depend on the corpus and the structure of WordNet. That’s why further work is required to generalize this conclusion to a large set of ontologies.
Comparison on WordNet
This subsection presents a comparison between the information content based on different approximations. We restrict ourselves to nouns and to the subsumption hierarchy (hyperonymy/hyponymy) of WordNet. This hierarchy which contains 146690 nodes constitutes the backbone of the noun subnetwork accounting for close to 80% of the links [3]. The computations have been performed with the Perl modules of Pedersen et al. [17] which allowed us to adapt treebased measures to the WordNet structure. Hence, although a synset could have more than one hyperonym, we have represented it as a tree model TW ordN et (C). We have also added some Perl modules to take into account all the new approximations presented in this paper. The main interest of TW ordN et (C) is to be large enough to allow computations of robust statistics and we do not enter here into
5.2
Comparisons with human judgments
As showed in section 3.1, two components are essential when comparing two concepts ci and cj : the shared information content (ψ ∩ (ci , cj ) = ψ(cij )) and the distinguishing information content (ψ (ci , cj ) = ψ(ci ) + ψ(cj ) − 2 · ψ(cij )). To measure the specific influence of these two components we have computed the correlation of each of them with the human judgment. The considered human judgment evaluations are taken from the Miller & Charles [16], Rubenstein & Goodenough [22], Finkelstein & Gabrilovich [9] experiments and the approximation of P is the Resnik’s approximation. The results (figure 2) closely depend on the test sets. The contribution of ψr is more important than the one of ψr∩ for the benchmarks of Miller & Charles and Rubenstein & Goodenough
24
E. Blanchard et al. / A Generic Framework for Comparing Semantic Similarities on a Subsumption Hierarchy
REFERENCES
Figure 2. Contribution of ψ ∩ and br to simulate human ψ with P judgment
Figure 3. Contribution of ψ ∩ and bg to simulate human ψ with P judgment
contrary to the Finkelstein & Gabrilovich benchmark. This tend to express the variability of human sensibility which can be due to the evaluation process of the three benchmarks. bg seems to Moreover the previous experiments have shown that P be the more efficient (better correlated with human judgments) approximation comparing to the Resnik’s approximation which uses a corpus. Hence, we have computed the correlations of the two compobg (figure 3). The nents ψ ∩ and ψ with the human judgment with P results are very similar to those obtained with the Resnik’s approximation. This tend to suppose that the information deduced from the corpus contain as much information as noise.
6
Conclusion
The concept of similarity is fundamental in numerous fields (e.g., classification, AI, psychology, ...). At the origin, the definitions are often built to fulfill precise objectives in specific domains. However, several measures (e.g., [12, 7]) have shown their relevance to very different applications. Nowdays similarities know a significant renewed interest associated with the expansion of the ontologies in knowledge engineering. In this framework, the most often used measures to quantify proximities between concept pairs are tree-based similarities whose definitions may integrate or not additional information from a textual corpus. In practice, the choice of a similarity is a critical step since the results of the algorithms often closely depend on this choice. In this paper, we have built a new theoretical framework which allows to rewrite homogeneously numerous similarity functions used in knowledge engineering. We believe that such an approach, in the spirit of the pioneer work of Lin, is important for two major reasons. First, this rewriting highlights relationships both semantically and structurally between a large set of measures which have been originally defined for very different purposes. And, it has allowed to deduce mathematical properties. Second, it can guide the proposition of new measures by making explicit the information on the ontology which has not been integrated into the definitions yet. In this way, we have here proposed new approximations which allow to better exploit the information associated with the hierarchical structure of the ontology. We have also restricted ourselves to similarities for subsumption hierarchies without multiple inheritance. We have started to extend our approach to subsumption hierarchy with multiple inheritance.
ACKNOWLEDGEMENTS We would like to thank the referees for their comments which helped improve this paper.
[1] S. Bechhofer, F. van Harmelen, J. Hendler, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider, and L. A. Stein. Owl web ontology language reference, 2004. http://www.w3.org/TR/owl-ref/. [2] E. Blanchard, P. Kuntz, M. Harzallah, and H. Briand, ‘A tree-based similarity for evaluating concept proximities in an ontology’, in Proc. 10th Conf. Int. Federation Classification Soc., pp. 3–11. Springer, (2006). [3] A. Budanitsky, ‘Lexical semantic relatedness and its application in natural language processing’, Technical report, Univ. of Toronto, (1999). [4] A. Budanitsky and G. Hirst, ‘Evaluating wordnet-based measures of semantic distance’, Computational Linguistics, 32(1), 13–47, (2006). [5] P.S. Bullen, D. S. Mitrinovic, and P. M. Vasics, Means and their inequalities, Reidel, 1988. [6] F. Caillez and P. Kuntz, ‘A contribution to the study of the metric and euclidiean structures of dissimilarities’, Psychometrika, 61(2), 241– 253, (1996). [7] L. R. Dice, ‘Measures of the amount of ecologic association between species’, Ecology, 26(3), 297–302, (1945). [8] WordNet: An electronic lexical database, ed., C. Fellbaum, MIT Press, 1998. [9] L. Finkelstein, E. Gabrilovich, Y. Matias, G. Wolfman E. Rivlin, Z. Solan, and E. Ruppin, ‘Placing search in context: The concept revisited’, ACM Trans. Information Systems, 20(1), 116–131, (2002). [10] J.C. Gower and P. Legendre, ‘Metric and euclidean properties of dissimilarity coefficients’, J. of Classification, 3, 5–48, (1986). [11] Z. Hubalek, ‘Coefficient of association and similarity based on binary (presence, absence) data: an evaluation’, Biological Reviews, 57(4), 669–689, (1982). [12] P. Jaccard, ‘Distribution de la flore alpine dans le bassin des dranses et dans quelques r´egions voisines’, Bulletin de la Soci´et´e Vaudoise de Sciences Naturelles, (37), 241–272, (1901). (in french). [13] J. H. Lee, M. H. Kim, and Y. J. Lee, ‘Information retrieval based on conceptual distance in is-a hierarchies’, J. Documentation, 49(2), 188– 207, (1993). [14] D. Lin, ‘An information-theoretic definition of similarity’, in Proc. 15th Int. Conf. Machine Learning, pp. 296–304. Morgan Kaufmann, (1998). [15] A. G. Maguitman, F. Menczer, H. Roinestad, and A. Vespignani, ‘Algorithmic detection of semantic similarity’, in Proc. 14th Int. Conf. World Wide Web, pp. 107–116. ACM Press, (2005). [16] G.A. Miller and W.G. Charles, ‘Contextual correlates of semantic similarity’, Language and Cognitive Processes, 6(1), 1–28, (1991). [17] T. Pedersen, S. Patwardhan, and J. Michelizzi, ‘Wordnet similarity measuring the relatedness of concepts’, in Proc. 5th Ann. Meet. North American Chapter Assoc. Comp. Linguistics, pp. 38–41, (2004). [18] R. Rada, H. Mili, E. Bicknell, and M. Blettner, ‘Development and application of a metric on semantic nets’, IEEE Trans. Syst., Man, Cybern., 19(1), 17–30, (1989). [19] P. Resnik, Selection and Information : A Class based Approach to Lexical Relationships, Ph.D. dissertation, University of Pennsylvania, 1993. [20] P. Resnik, ‘Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language’, J. Artificial Intell. Research, 11, 95–130, (1999). [21] E. Rosch, ‘Cognitive representations of semantic categories’, Experimental Psychology: Human Perception and Performance, 1, 303–322, (1975). [22] H. Rubenstein and J.B. Goodenough, ‘Contextual correlates of synonymy’, Comm. ACM, 8(10), 627–633, (1965). [23] A.W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, ‘Content-based image retrieval at the end of the early years’, IEEE Trans. Pattern Anal. Machine Intell., 22(12), 1349–1380, (2000). [24] R. R. Sokal and P. H. Sneath, Principles of numerical taxonomy, W. H. Freeman, 1963. [25] O. Steichen, C. Daniel-Le Bozec, M. Thieu, E. Zapletal, and M.-C. Jaulent, ‘Computation of semantic similarity within an ontology of breast pathology to assist inter-observer consensus’, Computers in Biology and Medicine, 36(7-8), 768–788, (2006). [26] N. Stojanovic, A. Maedche, S. Staab, R. Studer, and Y. Sure, ‘Seal: a framework for developing semantic portals’, in Proc. Int. Conf. Knowledge Capture, pp. 155–162, (2001). [27] A. Tversky, ‘Features of similarity’, Psychological Review, 84(4), 327– 352, (1977). [28] Z. Wu and M. Palmer, ‘Verb semantics and lexical selection’, in Proc. 32nd Annual Meeting Assoc. Computational Linguistics, pp. 133–138, (1994).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-25
25
Complexity of Subsumption in the EL Family of Description Logics: Acyclic and Cyclic TBoxes Christoph Haase1 and Carsten Lutz2 Abstract. We perform an exhaustive study of the complexity of subsumption in the EL family of lightweight description logics w.r.t. acyclic and cyclic TBoxes. It turns out that there are interesting members of this family for which subsumption w.r.t. cyclic TBoxes is tractable, whereas it is E XP T IME-complete w.r.t. general TBoxes. For other extensions that are intractable w.r.t. general TBoxes, we establish intractability already for acyclic and cyclic TBoxes.
1
MOTIVATION
Description logics (DLs) are a popular family of KR languages that can be used for the formulation of and reasoning about ontologies [5]. Traditionally, the DL research community has strived for identifying more and more expressive DLs for which reasoning is still decidable. In recent years, however, there have been two lines of development that have led to significant popularity also of DLs with limited expressive power. First, a number of novel and useful lightweight DLs with tractable reasoning problems has been identified, see e.g. [3, 8]. And second, many large-scale ontologies that are formulated in such lightweight DLs have emerged from practical applications. Prominent examples include the Systematized Nomenclature of Medicine, Clinical Terms (SNOMED CT), which underlies the systematized medical terminology used in the health systems of the US, the UK, and other countries [19]; and the gene ontology (GO), which aims at consistent descriptions of gene products in different databases [20]. In this paper, we are concerned with the EL family of lightweight DLs, which consists of the basic DL EL and its extensions. Members of this family underly many large-scale ontologies including SNOMED CT and GO. The DL counterpart of an ontology is called a TBox, and the most important reasoning task in DLs is subsumption. In particular, computing subsumption allows to classify the concepts defined in the TBox/ontology according to their generality [5]. In the DL literature, different kinds of TBoxes have been considered. In decreasing order of expressive power, the most common ones are general TBoxes, (potentially) cyclic TBoxes, and acyclic TBoxes. For the EL family, the complexity of subsumption w.r.t. general TBoxes has exhaustively been analyzed in [3] and its recent successor [4]. In all of the considered cases, subsumption is either tractable or E XP T IME-complete. However, the study of general TBoxes does not reflect common practice of ontology design, as most ontologies from practical applications correspond to cyclic or acyclic TBoxes. For example, SNOMED CT and GO both correspond to so-called acyclic TBoxes. Since cyclic and acyclic TBoxes are often preferable in terms of computational complexity [7, 14], the question arises 1 2
University of Oxford, UK,
[email protected] TU Dresden, Germany,
[email protected] whether there are useful extensions of EL for which reasoning w.r.t. such TBoxes is computationally cheaper than reasoning w.r.t. general TBoxes. The goal of the current paper is to analyse the computational complexity of subsumption in the EL family of description logics w.r.t. acyclic TBoxes and cyclic TBoxes, with a special emphasis on the border of tractability. In our analysis, we omit extensions of EL for which tractability w.r.t. general TBoxes has already been established. Our results exhibit a more varied complexity landscape than in the case of general TBoxes: we identify cases in which reasoning is tractable, co-NP-complete, PS PACE-complete, and E XP T IMEcomplete. Notably, we identify two maximal extensions of EL for which subsumption w.r.t. cyclic TBoxes is tractable, whereas it is E XP T IME-complete w.r.t. general TBoxes. In particular, these extensions include primitive negation and at-least restrictions. They also include concrete domains, but fortunately do not require the strong convexity condition that was needed in the case of general TBoxes to guarantee tractability [3]. For other extensions of EL such as inverse roles and functional roles, we show intractability results already w.r.t. acyclic TBoxes. Compared to the case of general TBoxes, it is often necessary to develop new approaches to lower bound proofs. We also show that the union of the two identified tractable fragments is not tractable. Detailed proofs are provided in [10].
2
DESCRIPTION LOGICS
The two types of expressions in a DL are concepts and roles, which are built inductively starting from infinite sets NC and NR of concept names and role names, and applying concept constructors and role constructors. The basic description logic EL provides the concept constructors top (), conjunction (C D) and existential restriction (∃r.C), and no role constructors. Here and in what follows, we denote the elements of NC with A and B, the elements of NR with r and s, and concepts with C and D. The semantics of concepts and roles is given in terms of an interpretation I = (ΔI , ·I ), with ΔI a non-empty set called the domain and ·I the interpretation function, which maps every A ∈ NC to a subset AI of ΔI and every role name r to binary relation rI of over ΔI . Extensions of EL are characterized by the additional concept and role constructors that they offer. Figure 1 lists all relevant constructors, concept constructors in the upper part and role constructors in the lower part. The left column gives the syntax, and the right column shows how to inductively extend interpretations to composite concepts and roles. In the presence of role constructors, composite roles can be used inside existential restrictions. In atleast restrictions (≥ n r) and atmost restrictions (≤ n r) , we use n to denote a nonnegative integer. The concrete domain constructor p(f1 , . . . , fk ) de-
C. Haase and C. Lutz / Complexity of Subsumption in the EL Family of Description Logics: Acyclic and Cyclic TBoxes
26
Syntax
Semantics
(C1)
LT (B) ⊆ LT (A)
ΔI
(C2)
For each ∃rB .B ∈ ET (B) there is ∃rA .A ∈ ET (A) such that rA ⊆ rB and (A , B ) ∈ S
(C3)
ConD (A) implies ConD (B)
I
I
¬C
Δ \C
C D
C I ∩ DI
C D
C I ∪ DI
(≤ n r)
{x | #{y | (x, y) ∈ rI } ≤ n}
(≥ n r)
{x | #{y | (x, y) ∈ rI } ≥ n}
∃r.C
{x | ∃y : (x, y) ∈ rI ∧ y ∈ C I }
∀r.C
{x | ∀y : (x, y) ∈ rI → y ∈ C I }
p(f1 , . . . , fk )
{x | ∃d1 , . . . , dk : f1I (x) = d1 ∧ . . . ∧ fkI (x) = dk ∧ (d1 , . . . , dk ) ∈ pD }
r∩s
rI ∩ sI
r∪s
rI ∪ sI
r− r
+
{(x, y) | (y, x) ∈ rI } S I i i>0 (r )
Figure 1. Syntax and semantics of concept and role constructors.
serves further explanation, to be given below. To denote extensions of EL, we use the symbol of the added constructors in superscript. For example, EL ,∪,− denotes the extension of EL with concept disjunction (C D), role disjunction (r ∪ s), and inverse roles (r− ). The concrete domain constructor permits reference to concrete data objects such as strings and integers. It provides the interface to a concrete domain D = (ΔD , ΦD ), which consists of a domain ΔD and a set of predicates ΦD [13]. Each p ∈ ΦD is associated with a fixed arity n and a fixed extension pD ⊆ Δn D . In the presence of a concrete domain D, we assume that there is an infinite set NF of feature names disjoint from NR and NC . In Figure 1 and in general, f1 , . . . , fk are from NF and p ∈ ΦD . An interpretation I maps every f ∈ NF to a partial function f I from ΔI to ΔD . We use EL(D) to denote the extension of EL with the concrete domain D. In this paper, a TBox T is a finite set of concept definitions A ≡ C, where A ∈ NC and C is a concept. We require that the left-hand side of all concept definitions in a TBox are unique. A concept name A ∈ NC is defined if it occurs on the left-hand side of a concept definition in T , and primitive otherwise. A TBox T is acyclic if there are no concept definitions A1 ≡ C1 , . . . , Ak ≡ Ck ∈ T such that Ai+1 occurs in Ci for 1 ≤ i ≤ k, where Ak+1 := A1 . An interpretation I is a model of T iff AI = C I for all A ≡ C ∈ T . The main reasoning task considered in this paper is subsumption. A concept C is subsumed by a concept D w.r.t. a TBox T , written T |= C D, if C I ⊆ DI for all models I of T . If T is empty or missing, we simply write C D. Sometimes, we also consider satisfiability of concepts. A concept C is satisfiable w.r.t. a TBox T if there is a model of T such that C I = ∅. For many extensions of EL, satisfiability is trivial because there are no unsatisfiable concepts.
3
TRACTABLE EXTENSIONS
We identify two extensions of EL for which subsumption w.r.t. TBoxes is tractable: EL∪,(¬) (D) and EL≥,∪ . This should be contrasted with the results in [3] which imply that subsumption w.r.t. general TBoxes is E XP T IME-complete in both extensions. In Section 4.1, we show that taking the union of the two extensions results in intractability already w.r.t. acyclic TBoxes.
Figure 2.
3.1
EL∪,(¬) (D): Conditions for adding (A, B) to S.
Role Disjunction, Primitive Negation, and Concrete Domains
We show that subsumption in EL∪,(¬) (D) w.r.t. (acyclic and cyclic) TBoxes is tractable. The superscript ·(¬) indicates primitive negation, i.e., negation can only be applied to concept names. The following is an example of an EL∪,(¬) (D)-TBox, where has age is a feature, and ≥13 and ≤19 are unary predicates of the concrete domain D: Parent
≡
Human ∃(has child ∪ has adopted).
Mother
≡
Parent Female ¬Male
Teenager
≡
Human ≥13 (has age) ≤19 (has age)
To guarantee tractability, we require the concrete domain D to satisfy a standard condition. Namely, we require D to be p-admissibile, i.e., satisfiability of and implication between concrete domain expressions of the form p1 (v11 , . . . , vn1 1 ) ∧ · · · ∧ pm (v1m , . . . , vnmm ) are decidable in polynomial time, where the vji are variables that range over ΔD . In [3], it is shown that a much stronger condition is required to achieve tractability in EL(D) with general TBoxes. This condition is convexity, which requires that if a concrete domain atom p(v1 , . . . , vn ) implies a disjunction of such atoms, then it implies one of the disjuncts. For our result, there is no need to impose convexity. When deciding subsumption, we only consider concept names instead of composite concepts. This is sufficient since T |= C D iff T |= A B, where T := T ∪ {A ≡ C, B ≡ D} and A and B do not occur in T . The subsumption algorithm requires the input TBox T to be in the following normal form. In each A ≡ C ∈ T , C is of the form
1≤i≤k
Li
1≤i≤
∃ri .Bi
1≤i≤m
pi (f1i , . . . , fni i )
where the Li are primitive literals, i.e., possibly negated primitive concept names; the ri are of the form r1 ∪ . . . ∪ rn ; and the Bi are defined concept names. In the following, we refer to the set of literals occurring in C with LT (A), to the set of existential restrictions as ET (A), and define the following concrete domain expression, which for simplicity uses features as variables: ConD (A) := p1 (f11 , . . . , fn11 ) ∧ · · · ∧ pm (f1m , . . . , fnmm ). To ease notation, we confuse a role ri = r1 ∪ . . . ∪ rn with the set {r1 , . . . , rn }. It is easy to see how to adapt the algorithm given in [2] to convert an EL∪,(¬) (D)-TBox into normal form in quadratic time. During the normalization, we check for unsatisfiable concepts. This is easy since a defined concept name A with A ≡ C ∈ T is unsatisfiable w.r.t. T iff one of the following three conditions holds: (i) there is a primitive concept P with {P, ¬P } ∈ LT (A); (ii) ConD (A) is unsatisfiable; or (iii) there is an ∃r.B ∈ ET (A) with B unsatisfiable. Suppose we want to decide whether A is subsumed by B w.r.t. a TBox T in normal form. If A is unsatisfiable, the algorithm answers
C. Haase and C. Lutz / Complexity of Subsumption in the EL Family of Description Logics: Acyclic and Cyclic TBoxes
27
(C2) For each ∃rB .B ∈ ET (B) there is ∃rA .A ∈ ET (A) such that rA ⊆ rB and (A , B ) ∈ S
In the extension of EL with only at-least restrictions (≥ n r), subsumption w.r.t. general TBoxes is E XP T IME-complete [3]. As we will show in Section 4.3, EL extended with at-most restrictions (≤ n r) is intractable already w.r.t. acyclic TBoxes.
(C3) For each (≥ m r) ∈ NT (B), there is (≥ n r) ∈ NT (A) such that n ≥ m.
4
(C1) PT (B) ⊆ PT (A)
Figure 3.
EL≥,∪ : Conditions for adding (A, B) to S.
“yes”. Otherwise and if B is unsatisfiable, it answers “no”. If A and B are both satisfiable, it computes a binary relation S on the defined concept names of T . The relation S is initialized with the identity relation and then completed by exhaustively adding pairs (A, B) for which the conditions in Figure 2 are satisfied. It is easily seen that the algorithm runs in time polynomial w.r.t. the size of the input TBox. Let S0 , . . . , Sn be the sequence of relations that it produces. To show soundness, it suffices to prove that if (A, B) ∈ Si , i ≤ n, then T |= A B. This is straightforward by induction on i. To prove completeness, we have to exhibit a model I of T with AI \ B I = ∅. Such a model is constructed in a twostep process. First, we start with an instance of A, and then “apply” the concept definitions in the TBox as implications from left to right, constructing a potentially infinite, tree-shaped interpretation. In the second step, we apply the concept definitions from right to left, filling up the interpretation of defined concepts. Both steps involve some careful bookkeeping which ensures that the constructed instance of A is not an instance of B. Theorem 1 Subsumption in EL∪,(¬) (D) w.r.t. TBoxes is in PT IME. This result still holds if we additionally allow role conjunction (r ∩s) and require that composite roles are in disjunctive normal form (without DNF, subsumption becomes co-NP-hard).It is worth mentioning that, in the presence of general TBoxes, extending EL with each single one of (i) primitive negation, (ii) role disjunction, and (iii) any non-convex concrete domain results in E XP T IME-hardness [3]. Note that convexity of a concrete domain is a rather strong restriction, and it is pleasant that we do not need it to achieve tractability. We point out that it should be possible to enhance the expressive power of EL∪,(¬) (D) by enriching it with additional constructors of the DL EL++ [3]. Examples include nominals and transitive roles.
INTRACTABLE EXTENSIONS
We identify extensions of EL for which subsumption is intractable w.r.t. acyclic and cyclic TBoxes.
4.1
Primitive Negation and At-Least Restrictions
We show that taking the union of the DLs EL∪,(¬) (D) and EL≥,∪ from Sections 3.1 and 3.2 results in intractability. To this end, we consider EL≥,(¬) and show that subsumption w.r.t. the empty TBox is CO -NP-complete. It is easy to establish the lower bound also for EL≥ (D) as long as there are two concepts p(f1 , . . . , fn ) and p (f1 , . . . , fm ) that are mutually exclusive. This is the case for most practically useful concrete domains D. For the lower bound, we reduce 3-colorability of graphs to nonsubsumption. Given an undirected graph G = (V, E), reserve one concept name Pv for each node v ∈ V , and a single role name r. Then, G is 3-colorable iff CG (≥ 4 r), where „ « CG := ∃r. Pv ¬Pw v∈V
{v,w}∈E
I \ (≥ 4 r)I , then d has at most three rIntuitively, if d ∈ CG successors, each describing one of the three colors. The use of primitive negation in CG ensures that no two adjacent nodes have the same color. A matching upper bound can be derived from the CO -NP-upper bound for subsumption in ALUN , which has the concept constructors top, bottom (⊥), value restriction (∀r.C), conjunction, disjunction, primitive negation, number restrictions, and unqualified existential restriction [11]. Given two EL≥,(¬) -concepts C, D, we have C D iff ¬D ¬C. It remains to observe that bringing ¬C and ¬D into negation normal form yields two ALUN -concepts.
Theorem 3 Subsumption in EL≥,(¬) is CO -NP-complete.
4.2
Inverse Roles
where the Pi are primitive concept names, the ri are of the form r1 ∪ . . . ∪ rn , the Bi are defined concept names, and the si are role names. We use PT (A) to refer to the set of primitive concept names occurring in C, ET (A) is as in the previous section, and NT (A) is the set of number restrictions in C. The conditions for adding a pair (A, B) to the relation S are given in Figure 3.
In [1], it is shown that subsumption w.r.t. the empty TBox is tractable in (an extension of) EL− . We prove that, w.r.t. acyclic TBoxes, subsumption in EL− is PS PACE-complete. Since the upper bound follows from PS PACE-completeness of subsumption in ALCI [5], we concentrate on the lower bound. We reduce validity of quantified Boolean formulas (QBFs). Let ϕ = Q1 v1 · · · Qk vk .ψ be a QBF, where Qi ∈ {∀, ∃} for 1 ≤ i ≤ k. W.l.o.g., we may assume that ψ = c1 ∧ · · · ∧ cn is in conjunctive normal form. We construct an acyclic TBox Tϕ and select two concept names L0 and E0 such that ϕ is valid iff Tϕ |= L0 E0 . Intuitively, a model of L0 and Tϕ is a binary tree of depth k that is used to evaluate ϕ. In the tree, a transition from a node at level i to its left successor corresponds to setting vi+1 to false, and a transition to the right successor corresponds to setting vi+1 to true. Thus, each node on level i corresponds to a truth assignment to the variables v1 , . . . , vi . In Tϕ , we use a single role name r and the following concept names:
Theorem 2 Subsumption in EL≥,∪ w.r.t. TBoxes is in PT IME.
• L0 , . . . , Lk represent the level of nodes in the tree model;
3.2
Role Disjunction and At-Least Restrictions
In EL≥,∪ , we allow role disjunction only in existential restrictions, but not in number restrictions. To show that subsumption w.r.t. TBoxes is tractable, we use a variation of the algorithm in the previous section. In the following, we only list the differences. A TBox is in normal form if, in each A ≡ C ∈ T , C is of the form
1≤i≤k
Pi
1≤i≤
∃ri .Bi
(≥ ni si )
1≤i≤m
C. Haase and C. Lutz / Complexity of Subsumption in the EL Family of Description Logics: Acyclic and Cyclic TBoxes
28
• Ci,j , 1 ≤ i ≤ n and 1 ≤ j ≤ k, represents truth of the clause ci on level j of the tree model; • E0 , . . . , Ek are used for evaluating ψ, and the index again refers to the level. For 1 ≤ i ≤ k, we use Pj to denote the conjunction of all concept names Ci,j , 1 ≤ i ≤ n, such that vj occurs positively in ci ; similarly, Nj denotes the conjunction of all concept names Ci,j , 1 ≤ i ≤ n, such that vj occurs negatively in ci . Now, the TBox Tϕ is as follows: L0 Lk−1 Ci,j Ek Ei Ei
≡ ··· ≡ ≡ ≡ ≡ ≡
∃r.(L1 P1 ) ∃r.(L1 N1 ) ∃r.(Lk Pk ) ∃r.(Lk Nk ) ∃r− .Ci,j−1 for 1 ≤ i ≤ n and 1 < j ≤ k C1,k · · · Cn,k ∃r.Ei+1 for 0 ≤ i < k where Qi+1 = ∃ ∃r.(Pi+1 Ei+1 ) ∃r.(Ni+1 Ei+1 ) for 0 ≤ i < k where Qi+1 = ∀
The definitions for L0 , . . . , Lk−1 build up the tree. The use of P1 and N1 in these definitions together with the definition of Ci,j sets the truth value of the clause ci according to a partial truth assignment of length j. Finally, the definitions of E0 , . . . , Ek evaluate ϕ according to its matrix formula ψ and quantifier prefix. It can be checked that ϕ is valid iff Tϕ |= L0 E0 . Theorem 4 Subsumption in EL− w.r.t. acyclic TBoxes is PS PACEcomplete. We leave the case of cyclic TBoxes as an open problem. In this case, the lower bound from Theorem 4 is complemented only by the E XP T IME upper bound for subsumption in EL− w.r.t. general TBoxes from [3].
4.3
concept names. The TBox Tϕ is as follows: 8 j if pi ∈ { j1 , j2 , j3 } < ∃r0 .Ai+1 j j Ai ≡ ∃r1 .Ai+1 if ¬pi ∈ { j1 , j2 , j3 } : j j ∃r0 .Ai+1 ∃r1 .Ai+1 otherwise Ajn+1
≡
Aϕ
≡
Bi
≡
Let EL be EL extended with functional roles, i.e., there is a countably infinite subset NF ⊆ NR such that all elements of NF are interpreted as partial functions. It is shown in [3] that subsumption in ELf w.r.t. general TBoxes is E XP T IME-complete. We show that it is co-NP-complete w.r.t. acyclic TBoxes and PS PACE-complete w.r.t. cyclic ones. We use ELF to denote the variation of ELf in which all role names are interpreted as partial functions. It has been observed in [3] that there is a close connection between ELF and FL0 , which provides the concept constructors conjunction and value restriction. It is easy to exploit this connection to transfer the known co-NP-hardness (PS PACE-hardness) from subsumption in FL0 w.r.t. acyclic (cyclic) TBoxes as proved in [16, 12] to ELF . We omit details for brevity. Since the described approach is not very illuminating regarding the source of intractability, however, we give a dedicated proof of coNP-hardness of subsumption in ELF w.r.t. acyclic TBoxes using a reduction from 3-SAT to non-subsumption. Let ϕ = c1 ∧ . . . ∧ ck be a 3-formula in the propositional variables p1 , . . . , pn and with cj = j1 ∨ j2 ∨ j3 for 1 ≤ j ≤ k. We construct a TBox Tϕ and select concept names Aϕ and B1 such that ϕ is satisfiable iff Tϕ |= Aϕ B1 . In the reduction, we use two role names r0 and r1 to represent falsity and truth of variables. More precisely, a path rv1 · · · rvn with rvi ∈ {r0 , r1 } corresponds to the valuation pi → vi , 1 ≤ i ≤ n. Additionally, we use a number of auxiliary
1≤j≤k
Aj1
∃r0 .Bi+1 ∃r1 .Bi+1
Bn+1 ≡
If I is a model of Tϕ and d ∈ (Aj1 )I , 1 ≤ j ≤ k, then d is the root of a tree in I whose edges are labelled with r0 and r1 and whose paths are the valuations that make the clause cj false. Due to functionality of r0 and r1 , each d ∈ AIϕ is thus the root of a (single) tree whose paths are precisely the valuations that make any clause in ϕ false. Finally, d ∈ B1I means that d is the root of a full binary tree of depth n whose paths describe all valuations. It follows that ϕ is satisfiable iff Tϕ |= Aϕ B1 . To prove matching upper bounds for ELf , we exploit the fact that, due to the FL0 -connection, subsumption in ELF is easily shown to be in CO -NP w.r.t. acyclic TBoxes and in PS PACE w.r.t. cyclic ones. We give an algorithm for subsumption in ELf that uses subsumption in ELF as a subprocedure. Like the algorithms in Section 3, it computes a binary relation S on the set of defined concept names by repeatedly adding pairs (A, B) such that the input TBox entails A B. The algorithm works for both acyclic and cyclic TBoxes, giving us the desired upper bound in both cases. We assume the input TBox T to be in the same normal form as described in Section 3.2, but without concepts of the form (≥ n r). Let S be a binary relation on the defined concept names in T . For every concept ∃r.A occurring in T with r ∈ / NF , introduce a fresh concept name Xr,A such that Xr,A = Xr ,A iff r = r , (A, A ) ∈ S, and (A , A) ∈ S. Now let the ELF -TBox TS be obtained from T by (i) replacing every concept ∃r.A where r ∈ / NF with Xr,A , and (ii) for each ∃r.A in T with r ∈ / NF , adding the concept definition
Functional Roles f
Xr.A ≡ Xr,B1 · · · Xr,Bn Zr,A where B1 , . . . , Bn are all concept names with (A, Bi ) ∈ S and (Bi , A) ∈ / S; and Zr,A is a fresh concept name. The algorithm starts with S as the identity relation and then exhaustively performs the following step: add (A, B) to S if TS |= A B. It returns “yes” if the input concepts form a pair in S, and “no” otherwise. Additionally, we can show that subsumption in ELf without TBoxes is in PT IME by a reduction to subsumption in EL. Theorem 5 Subsumption in ELf is in PT IME, CO -NP-complete w.r.t. acyclic TBoxes and PS PACE-complete w.r.t. cyclic TBoxes. It is not hard to see that the lower bounds carry over to EL≤ .
4.4
Booleans
We consider extensions of EL with Boolean constructors, starting with negation. Since EL¬ is a notational variant of ALC, we obtain the following from the results in [17, 18]. Theorem 6 Satisfiability and subsumption in EL¬ is PS PACEcomplete without TBoxes and w.r.t. acyclic TBoxes, and E XP T IMEcomplete w.r.t. cyclic TBoxes. Now for disjunction. It has been shown in [6] that subsumption in EL is CO -NP-complete without TBoxes. In order to establish lower
C. Haase and C. Lutz / Complexity of Subsumption in the EL Family of Description Logics: Acyclic and Cyclic TBoxes
bounds for subsumption w.r.t. TBoxes, we reduce satisfiability in EL¬ to non-subsumption in EL . An EL¬ -TBox T is in normal form if for each A ≡ C ∈ T , C is of the form , P , ¬B, ∃r.B, or B1 B2 with P primitive and B, B1 , B2 defined. It is straightforward to show that any EL¬ -TBox T can be transformed into normal form in linear time such that all (non-)subsumptions are preserved. Thus, let T = {A1 ≡ C1 , . . . , An ≡ Cn } be an EL¬ -TBox in normal form. Since the proofs underlying Theorem 6 use only a single role name, we may assume w.l.o.g. that T contains only a single role name r. We convert T into an EL -TBox T by introducing fresh concept names A1 , . . . , An representing the negations of A1 , . . . , An and replacing every A ≡ ¬Aj ∈ T with A ≡ Aj and every Ai ≡ ∃r.Aj ∈ T with Ai ≡ ∃r.(Aj
(Ak Ak )).
. . ∃r.}(Aj Aj ) M ≡ 0≤i 0 ∧ z > 0 ∧ d > 0∧ s. Below, only the SSA for depth is shown, size and dist are f acing(θ, loc(xr , yr ), s) ∧ f ieldV iew(β)∧ analogous. visible(loc(xr , yr ), b, β, θ, s)∧ The predicate depth(pk(b, u, z, d), u, loc(xr , yr ), do(a, s)) /* there are no invisible peaks in p */ holds after the execution of an action a at a situation s if and only (¬∃bI , uI , zI , dI ) (pk(bI , uI , zI , dI ) ∈ p∧ if a was a sensing action that picked out the peak of b with depth ¬visible(loc(xr , yr ), bI , β, θ, s) ), u or the robot R (or an object b) moved to a location such that the or in English, sensing a profile p is a possible action, if p includes a Euclidean distance from the object to the observer (the depth of the peak (with positive attributes) from a visible object and has no peaks object b) becomes u in the resulting situation. This SSA is formally from objects that are currently not visible (given robot’s orientation expressed in the following formula, that also includes a frame axiom and aperture). The predicate visible(v, b, β, θ, s) means that a body stating that the value of the fluent depth remains the same in the b is visible from the current viewpoint v if the field of view is β absence of any action that explicitly changes its value. and the robot is facing a direction θ in the situation s. This predicate
M. Soutchanski and P. Santos / Reasoning About Dynamic Depth Profiles
depth(pk(b, u, z, d), u, loc(xr , yr ), do(a, s)) ≡ (∃t, p)a = sense(p, loc(xr , yr ), t) ∧ pk(b, u, z, d) ∈ p ∨ (∃t, x, y, x1 , y1 , r, e)(a = endM ove(R, loc(x1 , y1 ), loc(xr , yr ), t)∧ location(b, loc(x, y), s) ∧ location(R, loc(x1 , y1 ), s)∧ radius(b, r) ∧ euD(loc(x, y), loc(xr , yr ), e) ∧ (u = e − r)) ∨ (∃t, x1 , y1 , x2 , y2 , r, e)(a = endM ove(b, loc(x1 , y1 ), loc(x2 , y2 ), t)∧ location(R, loc(xr , yr ), s) ∧ location(b, loc(x1 , y1 ), s)∧ radius(b, r) ∧ euD(loc(xr , yr ), loc(x2 , y2 ), e) ∧ (u = e − r)) ∨ depth(pk(b, u, z, d), u, loc(xr , yr ), s)∧ location(R, loc(xr , yr ), s) ∧ (∃x, y).location(b, loc(x, y), s)∧ (¬∃t, l, p , u , z , d , x1 , y1 ) (a = endM ove(R, loc(xr , yr ), l, t) ∨ a = endM ove(b, loc(x, y), loc(x1 , y1 ), t) ∨ a = sense(p, loc(xr , yr ), t) ∧ pk(b, u , z , d ) ∈ p ∧ u = u ). In addition to the predicates on peak attributes we can define a set of relations representing transitions between attributes of single peaks. These transitions account for the perception of moving bodies and can be divided into two kinds: predicates referring to transitions in single peaks and transitions between pairs of peaks. Transitions on single peaks are: extending(pk(b, u, z, d),loc(xr , yr ), s), which states that a peak pk(b, u, z, d), representing an object b, is perceived from loc(xr , yr ) as extending (or expanding in size) in situation s; shrinking(pk(b, u, z, d), loc(xr , yr ), s), states that pk(b, u, z, d), representing a visible object b, is shrinking (contracting in size) in s; appearing(pk(b, u, z, d), loc(xr , yr ), s) means that pk(b, u, z, d), unseen in a previous situation, is perceived in a situation s; and, vanishing(pk(b, u, z, d), loc(xr , yr ), s) that represents the opposite of appearing. Finally, peak static represents that the peak attributes do not change in the resulting situation do(a, s) wrt s. For instance, SSA for extending (below) states that a peak is perceived as extending in a situation do(a, s) iff there was a sensing action that perceived that its angular size is greater in do(a, s) than in s, or the robot (or the object) moved to a position such that the computed angular size of the object in do(a, s) is greater than its size in situation s. In either case, the depth in both situations, depth u in do(a, s) and depth u in s, has to be smaller than an L (the furthermost point that can be noted by the robot sensors), representing in this case a threshold on depth that allow the distinction between extending and appearing. Thus, if the peak depth u in situation s was such that u ≥ L, i.e., the peak was too far, but the depth u < L in do(a, s), i.e., the peak is closer to the viewpoint in the resulting situation, then the peak is perceived as appearing, rather than extending (shrinking and vanishing are analogous). Examples of situations in which these fluents hold are given in Figure 1: if the observer moves from viewpoint ν2 to ν1 (Figure 1(c) and (a)), the peak from b2 is perceived as extending (the peak q from b2 is greater in Figure 1(b) than in (d)). If the change is from ν1 to ν2 , instead, q would be shrinking, whereas if only one of the distances was smaller than L, then q would be appearing or vanishing, according to the differences noted in s and in do(a, s). For simplicity, we present a high-level description of the SSA only. extending(peak, viewpoint, do(a, s)) iff a is a sensing action which measured that the angular size of peak is currently larger than it was at s or a is an endM ove action terminating the process of robot’s motion resulting in the viewpoint such that a computed size of peak from the viewpoint is larger than it was at s or a is an endM ove action terminating the motion of an object to a new position such that from robot’s viewpoint a computed size of peak became larger than it was at s or extending(peak, viewpoint, s) and % frame axiom % a is none of those actions which have effect of decreasing the perceived angular size of peak
33
One of the predicates referring to the transition between pairs of peaks is approaching(pk(b1 , u1 , z1 , d1 ), pk(b2 , u2 , z2 , d2 ),loc(xr , yr ), s), which represents that peaks pk(b1 , u1 , z1 , d1 ) and pk(b2 , u2 , z2 , d2 ) (related, respectively, to objects b1 and b2 ) are approaching each other in situation s as perceived from the viewpoint loc(xr , yr ). (The following relations have analogous arguments to those of approaching, they were omitted here for brevity.) Similarly, receding, states that two peaks are receding from each other. The predicate coalescing, states that two peaks are coalescing. Analogously to coalescing, the relation hiding represents the case of a peak coalescing completely with another peak (corresponding to total occlusion of one body by another). The predicate splitting, states the case of one peak splitting into two distinct peaks; finally, two peak static, states that the two peaks are static. Axioms constraining the transitions between pairs of peaks are straightforward, but long and tedious (due to involved geometric calculations). Therefore, for simplicity, we discuss only a high-level description of the SSA for approaching (the axioms for receding, coalescing, shrinking and hiding are analogous). The axiom for approaching expresses that two depth peaks are approaching iff an apparent angle between them obtained by a sensing action is smaller at the situation do(a, s) than at s or, the observer (or an object) moved to a position such that a calculated apparent angle is smaller at do(a, s) than at s. In the latter case, the apparent angle between peaks from b1 , b2 is calculated by the predicate angle(loc(xb1 , yb1 ), loc(xb2 , yb2 ), loc(xν , yν ), rb1 , rb2 , γ) that has as arguments, respectively, the location of the centroids of objects b1 and b2 , the location of viewpoint ν, the radii of b1 and b2 and γ is an angle that we want to compute. The computations accomplished by angle include the straightforward solution (in time O(1)) of a system of equations (including quadratic equations for the circles representing the perimeter of the objects and linear equations for the tangent rays going from the viewpoint to the circles). Similarly to the threshold L used in the SSA for extending above, the SSA for approaching uses a pre-defined (hardware dependent) threshold Δ (roughly, the number of pixels between peaks) that differentiates approaching (receding) from coalescing (splitting). Another threshold is used in an analogous way to differentiate coalescing from hiding. Figure 1 also exemplifies a case where approaching can be entailed. Consider for instance a robot going from viewpoint ν1 to ν2 , in this case, the angular distance (k − j) between peaks p and q in Fig. 1(d) is less than (e − n) in Fig. 1(b). Moving from viewpoint ν2 to ν1 would result in the entailment of receding. If it was the case that the apparent distance between the objects was less than Δ, coalescing or splitting could be entailed. approaching(peak1, peak2, viewpoint, do(a, s)) iff a is a sensing action that measured the angle between peak1 and peak2 and this angle is smaller than it was at s or a is an endM ove action terminating the process of robot’s motion resulting in the viewpoint such that a computed angle between peak1 and peak2 is currently smaller than it was at s or a is an endM ove action terminating the motion of an object to a new position such that from robot’s viewpoint a computed angle between peaks decreased in comparison to what it was at s or approaching(peak1, peak2, viewpoint, s) and % frame axiom% a is none of those actions which have an effect of increasing the perceived angle between peak1 and peak2.
We name Theory of Depth and Motion (T DM) a theory consisting of the precondition axioms Dap for actions introduced in this section, SSA Dss for all fluents in this section, an initial theory DS0 (with at least two objects and the robot), together with Duna and Σ.
34
M. Soutchanski and P. Santos / Reasoning About Dynamic Depth Profiles
5 Perception and Motion in T DM The previous section introduced SSA for depth profiles constraining the fluents on depth peaks to hold when either a particular transition in the attributes of a depth peak was sensed, or the robot (or an object) moved to a position such that a particular transition happens. It is easy to see that the axioms presented above define the conceptual neighbourhood diagram (CND) for depth profiles (Fig. 2). It is worth noting also that the vertices in the conceptual neighbourhood diagram (and the edges connecting them) in Figure 2 represent all the percepts that can be sensed given the depth profile calculus in a domain where objects and the observer can move. Therefore, we can say that perception in T DM is sound and complete wrt motion, in the sense that the vertices and edges of the CND in Fig. 2 result from object’s motion (i.e. perception is sound) and that every motion in the world is accounted by a fluent or by an edge between fluents in this CND (i.e. it is complete). Our first result is a schema applying to each fluent in T DM that represents perception of relations between peaks. Theorem 1 (Perception is sound wrt motion). For any fluent F in the CND the following holds: T DM |= a = sense(p, loc(xr , yr ), t ) ⊃ (¬F ( x, s)∧F ( x, do(a, s)) ⊃ (∃b, l1 , l2 , t)a = endM ove(b, l1 , l2 , t) ) T DM |= a = sense(p, loc(xr , yr ), t ) ⊃ (F ( x, s)∧¬F ( x, do(a, s)) ⊃ (∃b, l1 , l2 , t)a = endM ove(b, l1 , l2 , t) ). For any fluents F and F in T DM if there is an edge between F and F in the CND then the following holds: T DM |= a = sense(p, loc(xr , yr ), t ) ⊃ ( F ( x, s) ∧ ¬F ( x, s) ∧ ¬F ( x, do(a, s))∧F ( x, do(a, s)) ⊃ (∃b, l1 , l2 , t) a = endM ove(b, l1 , l2 , t) ). Proof sketch: The proof of this theorem rephrases the explanation closure axiom that follows from the corresponding SSA (see [11] for details). For every vertex in the CND (i.e., for every perceptionrelated fluent F of T DM), if the last action that the robot did is not a sense action, then the change in the value of this fluent can happen only due to an action endM ove. In addition, we show that for every edge linking two distinct fluents F and F of the CND in Fig. 2, the transition is due to a move action such that in the resulting situation, the fluent F ceases to hold, but F becomes true. 2 The next theorem states that every motion in the domain is accounted by a vertex or by an edge of the CND in Fig. 2. We denote by Fi , Fj all perception-related fluents (Fi and Fj can be different vertices or can be the same). Theorem 2 (Perception is complete wrt motion). For any moving action a in T DM there is a fluent Fi or an edge between two fluents Fi and Fj in the CND: T DM |= ˆW x, do(a, s)) ∨ ´˜ (∃b, i Fi ( W l1 ,`l2 , t)a = endM ove(b, l1 , l2 , t) ⊃ x, s)∧¬Fj ( x, s) ∧ ¬Fi ( x, do(a, s))∧Fj ( x, do(a, s)) i,j Fi ( Proof sketch: The proof follows from the geometric fact that the twelve numbered regions defined by the bi-tangents between two objects (Figure 3) define all possible qualitatively distinct viewpoints to observe these objects. It is easy to see that for every motion of the observer within each region or across adjacent regions in Figure 3 there is an action A mentioned in the SSAs that corresponds to this motion. Therefore, it follows from SSAs that, either a vertex of the CND (a fluent F ) describes the perception resulting from the motion, or there are two fluents F and F such that F ceases to hold after doing a, but F becomes true. For instance, take a robot in Region 5 (Fig. 3) facing the two objects a and b, but moving backward from them. The SSAs would allow the conclusion that the peaks referring to a and b would be approaching and shrinking. On the other hand, a robot (still facing a and b) crossing from Region 5 to 6 would be able to en-
tail the transition from approaching to coalescing by using SSAs. 9
10 11 12
Figure 3.
a
8
1 2
3
b
7
4 5
6
Bi-tangents between two visible objects.
6 Discussion and conclusion We propose a logical theory built within the situation calculus for reasoning about depth perception and motion of a mobile robot amidst moving objects. The resulting formalism, called Theory of Depth and Motion (T DM), is a rich language that allows both sensor data assimilation and reasoning about motion in the world, where their effects are calculated with Euclidean geometry. We show that reasoning about perception of depth in T DM is sound and complete with respect to actual motion in the world. This result proves the conjecture made in [12] which hypothesises that the transitions in the conceptual neighbourhood diagrams of the depth profile calculus are logical consequences of a theory about actions and change. Note that T DM relies on standard models of dense orders, computational geometry and other quantitative abstractions, but this pays off at the end: we can obtain logical consequences about purely qualitative phenomena (e.g., objects approaching each other) from T DM. This theory is an important contribution of our paper. Future research includes the implementation of the proposed formalism in a simulator of a dynamic traffic scenario. We expect that the theory presented in this paper will allow the reasoning system to recognize and summarize (in simple sentences) plans of other vehicles based on knowledge about its own motion, and its perceptions. Acknowledgements: Thanks to Joshua Gross, Fr´edo Durand, Sherif Ghali for comments about computing visibility efficiently in dynamic 2D scenes. This research has been partially supported by the Canadian Natural Sciences and Engineering Research Council (NSERC) and FAPESP, S˜ao Paulo, Brazil.
REFERENCES [1] A. G. Cohn and J. Renz, ‘Qualitative spatial representation and reasoning’, in Handbook of Knowledge Representation, 551–596, (2008). [2] M. de Berg et al, Computational Geometry, Algorithms and Applications (Chapter 15), 2nd Edition, Springer, 2000. [3] A. Goultiaeva and Y. Lesp´erance, ‘Incremental plan recognition in an agent programming framework’, in Cognitive Robotics, Papers from the 2006 AAAI Workshop, pp. 83–90, Boston, MA, USA, (2006). [4] Gerd Herzog, VITRA: Connecting Vision and Natural Language Systems, http://www.dfki.de/vitra/, Saarbr¨ucken, Germany, 1986-1996. [5] H. Levesque and G. Lakemeyer, ‘Cognitive robotics’, in Handbook of Knowledge Representation, 869–886, Elsevier, (2008). [6] R. Mann, A. Jepson, and J. M. Siskind, ‘The computational perception of scene dynamics’, CVIU, 65(2), 113–128, (1997). [7] A. Miene, A. Lattner, U. Visser, and O. Herzog, ‘Dynamic-preserving qualitative motion description for intelligent vehicles’, in IEEE Intelligent Vehicles Symposium (IV-04), pp. 642–646, Parma, Italy, (2004). [8] Hans-Hellmut Nagel, ‘Steps toward a cognitive vision system’, AI Magazine, 25(2), 31–50, (2004). [9] R. P. A. Petrick, A Knowledge-level approach for effective acting, sensing, and planning, Ph.D. dissertation, University of Toronto, 2006. [10] D. Randell, M. Witkowski, and M. Shanahan, ‘From images to bodies: Modeling and exploiting spatial occlusion and motion parallax’, in Proc. of IJCAI, pp. 57–63, Seattle, U.S., (2001). [11] Raymond Reiter, Knowledge in Action. Logical Foundations for Specifying and Implementing Dynamical Systems, MIT, 2001. [12] Paulo Santos, ‘Reasoning about depth and motion from an observer’s viewpoint’, Spatial Cognition and Computation, 7(2), 133–178, (2007). [13] M Soutchanski, ‘A correspondence between two different solutions to the projection task with sensing’, in Proc. of the 5th Symposium on Logical Formalizations of Commonsense Reasoning, pp. 235–242, New York, USA, May 20-22, (2001).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-35
35
Comparing Abductive Theories Katsumi Inoue 1 and Chiaki Sakama 2 Abstract. This paper introduces two methods for comparing explanation power of different abductive theories. One is comparing explainability for observations, and the other is comparing explanation contents for observations. Those two measures are represented by generality relations over abductive theories. The generality relations are naturally related to the notion of abductive equivalence introduced by Inoue and Sakama. We also analyze the computational complexity of these relations.
1
Introduction
Abduction has been used in many applications of AI including diagnosis, design, updates, and discovery. Abduction is incorporated in problem-solving and programming technologies as abductive logic programming [11]. In the process of building knowledge bases, we need to update an abductive theory in accordance with situation change and discovery of surprising facts. For example, to refine an incomplete description, one may need to add more details to a part of the current theory. Such a refinement is expected to ensure that the revised theory is more powerful in abductive reasoning than the previous one. Then, it is important to evaluate abductive theories by comparing abductive power of each theory in such processes. In predicate logic, comparison of information contents between theories is done by comparing their logical consequences. For example, given two first-order theories T1 and T2 , T1 is considered more informative than T2 if T2 |= ψ implies T1 |= ψ for any formula ψ, i.e., T1 |= T2 . In this case, it is also said T1 is more general than T2 [13, 14]. On the other hand, T1 and T2 are equally informative if T1 |= T2 and T2 |= T1 , that is, if T1 and T2 are logically equivalent (T1 ≡ T2 ). Recently, Inoue and Sakama considered the generality conditions for answer set programming (ASP) [9] and for Reiter’s default logic [10]. These generality/equivalence relations compare monotonic/nonmonotonic theories in terms of deduction. The topic of our interest in this paper is how to compare abductive theories. That is, we seek conditions under which an abductive theory has more explanation power than another abductive theory. As far as the authors know, no answer to this question is given in the literature of abduction. To understand the problem, suppose that an abductive theory A1 is defined to be stronger than another abductive theory A2 . This might imply that there is a formula which can be explained in the former but cannot be in the latter. Then, we would expect that A1 has more background knowledge than A2 or A1 has more hypotheses than A2 . However, the situation is not so simple because addition of background knowledge may violate the consistency of some combination of hypotheses. Hence, relationships between 1 2
National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan. email:
[email protected] Wakayama University, Sakaedani, Wakayama 640-8510, Japan. email:
[email protected] amounts of background theories and hypotheses need to be analyzed in depth to compare abductive theories precisely. In this paper, we consider two logical frameworks for abduction, first-order abduction and abductive logic programming (ALP). Then, we introduce two methods for comparing explanation power of different abductive theories, which were originally introduced by Inoue and Sakama [8] to identify equivalence of two abductive theories. The first one is aimed at comparing explainability for observations in different theories, while the second one is aimed at comparing explanation contents for observations. Those two comparison measures are represented by generality relations over abductive theories. Moreover, the generality relations can naturally be related to the notion of abductive equivalence in [8]. Note that the proposed techniques for first-order abduction can also be applied to comparing frameworks for explanatory induction in inductive logic programming. The rest of this paper is organized as follows. Section 2 introduces two generality relations for comparing abductive first-order theories. Section 3 applies the similar techniques to ALP. Section 4 relates the abductive generality relations to abductive equivalence. Section 5 discusses the complexity issues. Section 6 gives concluding remarks.
2
Generality Relations in First-order Abduction
In this section, we consider abductive theories represented in firstorder logic, which have often been used in abduction in AI, e.g., [17]. In this setting, abductive theories are compared by two measures. Definition 1 Suppose that B and H are sets of first-order formulas, where B represents background knowledge and H is a set of (candidate) hypotheses. We call a pair (B, H) a (first-order) abductive theory. Given a formula O as an observation, a set E of formulas belonging to H 3 is an explanation of O in (B, H) if B ∪ E |= O and B ∪ E is consistent. We say that O is explainable in (B, H) if it has an explanation in (B, H).
2.1
Comparing Explainability
We first consider a measure for comparing explainability between abductive theories. Definition 2 An abductive theory A1 = (B1 , H1 ) is more (or equally) explainable than an abductive theory A2 = (B2 , H2 ), written as A1 ≥ A2 , if every observation explainable in A2 is also explainable in A1 . 3
In this paper we do not specify how H is constructed. For example, when hypotheses contain variables, we could just assume that the set H is closed under instantiation. In another case, we could specify the language of H with a bias and then define that any formula which is constructed from H and satisfies the bias belongs to H. This latter treatment enables us to deal with comparing theories for inductive logic programming (ILP) [14] within the same logical framework as abduction. In any case, we simply denote as E ⊆ H when E is a set of formulas belonging to H.
36
K. Inoue and C. Sakama / Comparing Abductive Theories
Example 1 Consider three abductive theories A1 = (B1 , H1 ), A2 = (B2 , H2 ) and A3 = (B3 , H3 ), where B1
=
{ sprinkler was on ⊃ grass is wet },
H1
=
{ sprinkler was on, rained last night },
B2
=
B1 ∪ { rained last night ⊃ grass is wet },
H2
=
H1 ∪ { ¬(sprinkler was on ⊃ grass is wet ) },
B3
=
B2 ∪ { grass is wet ⊃ shoes are wet },
H3
=
H1 ∪ { ¬(sprinkler was on ⊃ shoes are wet ) }.
Then, A3 ≥ A2 ≥ A1 holds. In fact, every observation explainable in Ai is explainable in Ai+1 for i = 1, 2. Notice that A1 ≥ A2 also holds because rained last night can be explained by itself in both A1 and A2 . By contrast, shoes are wet is explainable in A3 , but is not in either A1 or A2 , i.e., A2 ≥ A3 . Note that each additional hypothesis in Hj \ H1 for j = 2, 3 has no effect in explaining any formula as it cannot be added to Bj without violating the consistency. We provide a necessary and sufficient condition for the explainable generality relation. In the following, T h(Σ) denotes the set of logical consequences of a set Σ of first-order formulas.
Proof: For any abductive theory (B, H), we can associate a prerequisite-free normal default theory Δ = (DH , B), where DH = | h ∈ H}. Then there is a 1-1 correspondence between the ex{ :h h tensions of Δ (in the sense of Reiter [18]) and Ext((B, H)) [17, Theorem 4.1]. By the semi-monotonicity of normal default theories [18, Theorem 3.2], H1 ⊇ H2 implies that, for any extension F of Δ2 = (DH2 , B), there is an extension E of Δ1 = (DH1 , B) such that F ⊆ E. By Theorem 2, the result holds. 2 For abductive theories A1 = (B1 , H) and A2 = (B2 , H) with the same hypotheses, B1 |= B2 implies neither A1 ≥ A2 nor A2 ≥ A1 . This explains the name of semi-monotonicity in Proposition 4. Example 2 Suppose the abductive theories A = (B, H) and A = (B , H) where B = {a ∧ b ⊃ p}, B = B ∪ {¬b}, and H = {a, b}. Then, A ≥ A because p has the explanation {a, b} in A but is not explainable in A . On the other hand, A ≥ A because ¬b has the explanation ∅ in A but is not explainable in A.
2.2
Comparing Explanations
We next provide a second measure for comparing abductive theories. This time we compare explanation contents.
Definition 3 An extension of an abductive theory A = (B, H) is T h(B ∪ S) where S is a maximal set of formulas belonging to H such that B∪S is consistent. The set of all extensions of A is denoted as Ext(A).
Definition 4 An abductive theory A1 = (B1 , H1 ) is more (or equally) explanatory than an abductive theory A2 = (B2 , H2 ), written as A1 A2 , if, for any observation O, every explanation of O in A2 is also an explanation of O in A1 .
Lemma 1 ([17]) Let O be a (possibly infinite) set of formulas. There is an explanation that explains every formula in O in (B, H) iff there is an extension X of (B, H) such that O ⊆ X.
Example 3 For three abductive theories in Example 1, A3 A2 A1 holds. Although A1 ≥ A2 holds, we see that A1 A2 because {rained last night } is an explanation of grass is wet in A2 but is not in A1 .
Theorem 2 Let A1 = (B1 , H1 ) and A2 = (B2 , H2 ) be abductive theories. Then, A1 ≥ A2 holds iff for any extension X2 of A2 , there is an extension X1 of A1 such that X2 ⊆ X1 .
It is easy to see that the relation is stronger than the relation ≥, that is, A1 A2 implies A1 ≥ A2 . Now we show the necessary and sufficient condition for explanatory generality.
Proof: (⇐) By Lemma 1, if an observation O is explainable in A2 , there is X2 ∈ Ext(A2 ) such that O ∈ X2 . For any such X2 , there is X1 ∈ Ext(A1 ) such that X2 ⊆ X1 . Then, O ∈ X1 and O is explainable in (B1 , H1 ) by Lemma 1. Hence, A1 ≥ A2 . (⇒) Assume that there is X2 ∈ Ext(A2 ) such that X2 ⊆ X1 for any X1 ∈ Ext(A1 ). Pick a formula ψ i for each X1 i ∈ Ext(A1 ) such that ψi ∈ (X2 \ X1 i ) (= ∅), and let O be the set of ψi ’s from every X1 i . Then, V O ⊆ X2 but O ⊆ X1 for any X1 ∈ Ext(A1 ). By Lemma 1, F ∈O F is explainable in A2 but is not explainable in 2 A1 . Hence, A1 ≥ A2 . There are several classes of abductive theories in which we can see explainable generality holds under some simple conditions. Proposition 3 (Assumption-freeness) Suppose two abductive theories (B1 , L) and (B2 , L), where L is the set of all literals in the underlying language. Then, (B1 , L) ≥ (B2 , L) iff B2 |= B1 . Proof: Any extension of an abductive theory (Bi , L) is logically equivalent to a (complete) model of Bi . By Theorem 2, (B1 , L) ≥ (B2 , L) iff, for any model M of B2 , there is a model N of B1 such that M ⊆ N . Because both M and N are complete, M ⊆ N implies M = N . Hence, any model of B2 is a model of B1 . 2 Proposition 4 (Semi-monotonicity) Suppose that (B, H1 ) and (B, H2 ) are two abductive theories with the same background knowledge. If H1 ⊇ H2 , then (B, H1 ) ≥ (B, H2 ).
Theorem 5 Let A1 = (B1 , H1 ) and A2 = (B2 , H2 ) be abductive theories. Then, A1 A2 holds iff B1 |= B2 and H1 ⊇ H2 hold, where Hi = { E ⊆ Hi | Bi ∪ E is consistent } for i = 1, 2. Proof: Note that any explanation E of an observation O in (Bi , Hi ) satisfies that (1) Bi ∪ E |= O and (2) E ∈ Hi . (⇐) Suppose A1 A2 . Then there exist a formula O and a set E of formulas such that B2 ∪ E |= O and E ∈ H2 while B1 ∪ E |= O or E ∈ H1 . If B1 ∪ E |= O holds, we have B1 |= E ⊃ O and B2 |= E ⊃ O, which implies B1 |= B2 . If E ∈ H1 holds, by E ∈ H2 we have H2 ⊆ H1 . Hence, the result holds. (⇒) Suppose A1 A2 . Then for any formula O and any set E of formulas, B2 ∪ E |= O and E ∈ H2 imply B1 ∪ E |= O and E ∈ H1 . By the fact that B2 ∪ E |= O implies B1 ∪ E |= O for any O, we have B2 ∪ E |= B1 ∪ E for any E ∈ H2 ∩ H1 . Then, B2 |= B1 holds when E = ∅. By the fact that E ∈ H2 implies 2 E ∈ H1 , we also have H2 ⊆ H1 . Hence, the result holds. Corollary 6 Let A1 = (B1 , H1 ) and A2 = (B2 , H2 ) be abductive theories. Then, A1 A2 holds iff B1 |= B2 and A1 ≥ A2 hold. Proof: The set Hi in Theorem 5 contains every subset E of Hi such that Bi ∪ E is consistent. Hi can be characterized by Ext(Ai ) as each consistent theory is a subset of some extension. Then, it can be proved that H1 ⊇ H2 iff for any X2 ∈ Ext(A2 ), there is X1 ∈ Ext(A1 ) such that X2 ⊆ X1 . Hence, the result follows from Theorem 2. 2 Corollary 7 If H1 ⊇ H2 , then (B, H1 ) (B, H2 ) holds.
K. Inoue and C. Sakama / Comparing Abductive Theories
3
Generality Relations in Abductive Logic Programming
In this section, we turn our attention to generality relations in abductive logic programming (ALP) [11]. The most significant difference between abduction in first-order logic and ALP is that ALP allows the nonmonotonic negation-as-failure operator not in a background program. When the background program P is nonmonotonic, the fact that P ∪E is consistent for some set E of hypotheses does not necessarily imply that P ∪ E is consistent for E ⊂ E. Hence comparing abductive power in ALP should be checked in a more naive manner upon each subset of hypotheses.
Definition 8 Let A1 = P1 , Γ1 and A2 = P2 , Γ2 be abductive programs, and G an observation. A1 is more (or equally) explainable than A2 , written as A1 ≥ A2 , if every observation explainable in A2 is also explainable in A1 . On the other hand, A1 is more (or equally) explanatory than A2 , written as A1 A2 , if, for any observation G, every explanation of G in A2 is also an explanation of G in A1 . Example 4 Let A1 = P1 , Γ and A2 = P2 , Γ be abductive programs, where P1 = { p ← a, a ← b }, P2 = { p ← a, p ← b }, and Γ = {a, b}. Then, A1 ≥ A2 and A2 ≥ A1 , while A1 A2 but A2 A1 . In fact, {b} is an explanation of a in A1 , but is not in A2 . The following results hold for two generality relations.
Definition 5 An abductive (logic) program is a pair P, Γ where • P is a (logic) program, which is a set of rules of the form: L1 ; · · · ; Lk ; not Lk+1 ; · · · ; not Ll ← Ll+1 , . . . , Lm , not Lm+1 , . . . , not Ln
(1)
where each Li is a literal (n ≥ m ≥ l ≥ k ≥ 0), and not represents negation as failure (NAF). The symbol ; represents disjunction. The left-hand side of the rule is the head, and the right-hand side is the body. A program containing variables is a shorthand of its ground instantiation. • Γ is a set of literals, called abducibles. Any instance of an abducible is also an abducible. Logic programs mentioned above belong to the class of general extended disjunctive programs (GEDPs) [6]. If any rule of the form (1) in a program P does not contain not in its head, i.e., k = l, P is called an extended disjunctive program (EDP) [4]. Moreover, if the head of any rule in an EDP P contains no disjunction, i.e., k = l ≤ 1, P is called an extended logic program (ELP). A semantics of a logic program is given by the answer set semantics [4, 6]. We denote the set of all ground literals in the language of a program as Lit. For a program P , the set of answer sets of P is denoted as AS(P ). When P is an EDP, AS(P ) is an antichain in 2Lit , that is, for any two answer sets S1 , S2 ∈ AS(P ), S1 ⊆ S2 implies S1 = S2 [4], but this is not the case for a GEDP. A semantics for ALP is given by extending answer sets of the background program with addition of abducibles. Such an extended answer set is called a belief set, which has also been called a generalized stable model [11]. Definition 6 Let A = P, Γ be an abductive program, and E ⊆ Γ. A belief set of A (with respect to E) is a consistent answer set of the logic program P ∪ E. The set of all belief sets of A is denoted as BS(A). A set S ∈ BS(A) is often denoted as SE when S is a belief set with respect to E. Definition 7 Let A = P, Γ be an abductive program, and G a conjunction of ground literals called an observation. We will often identify a conjunction G with the set of literals in G. A set E ⊆ Γ is an explanation of G in A if every ground literal in G is true in a belief set of A with respect to E.4 When G has an explanation in A, G is explainable in A. Note that restrictions in ALP can be removed so that not only literals but rules can be allowed as abducibles and that observations can contain NAF formulas as well as literals. As in the case of first-order abduction, two generality relations are defined for ALP as follows. 4
This definition provides credulous explanations. Alternatively, skeptical explanations are defined as E ⊆ Γ such that G is true in every belief set of A with respect to E.
37
Theorem 8 Let A1 = P1 , Γ1 and A2 = P2 , Γ2 be abductive programs. Then, A1 ≥ A2 holds iff for any belief set S2 of A2 , there is a belief set S1 of A1 such that S2 ⊆ S1 . Proof: (⇐) If G is explainable in A2 , there is S2 ∈ BS(A2 ) such that G ⊆ S2 . For any such S2 , there is S1 ∈ BS(A1 ) such that S2 ⊆ S1 . Then, G ⊆ S1 and G is explainable in A1 . Hence, A1 ≥ A2 . (⇒) Assume that there is S2 ∈ BS(A2 ) such that S2 ⊆ S1 for any S1 ∈ BS(A1 ). For each S1 i ∈ BS(A1 ), pick a literal Li such that Li ∈ (S2 \ S1 i ) (= ∅), and let G be the set of Li ’s from every S1 i . Then, G ⊆ S2 but G ⊆ S1 for any S1 ∈ BS(A1 ). That is, G 2 is explainable in A2 but is not in A1 , i.e., A1 ≥ A2 . Theorem 9 Let A1 = P1 , Γ1 and A2 = P2 , Γ2 be abductive programs. Then, A1 A2 holds iff for any E ⊆ Γ2 and any SE ∈ BS(A2 ), there is TE ∈ BS(A1 ) such that E ⊆ Γ1 and SE ⊆ TE . Proof: (⇒) Suppose A1 A2 . Then, for any observation G and any E ⊆ Γ2 , the fact that G ⊆ SE for some SE ∈ BS(A2 ) implies that G ⊆ TE for some TE ∈ BS(A1 ). Thus, SE ⊆ TE . (⇐) Suppose SE ∈ BS(A2 ) for any E ⊆ Γ2 implies the existence of TE ∈ BS(A1 ) with E ⊆ Γ1 such that SE ⊆ TE . Then, for any observation G, G ⊆ SE implies G ⊆ TE . That is, if G has an 2 explanation E in A2 , G has the same explanation E in A1 . Theorem 8 and Theorem 9 might look similar, but the condition of the latter is finer-grained than that of the former. In fact, as in the case of first-order abduction, A1 A2 implies A1 ≥ A2 .
4
Connection to Abductive Equivalence
In this section, we consider the relationship between the generality relations in abduction proposed in this paper and the equivalence relations in abduction proposed in the literature. Inoue and Sakama [8] study different types of equivalence relations in abduction: explainable/explanatory equivalence of abductive theories under both first-order abduction and ALP. Pearce et al. [16] characterize a part of these problems in the context of equilibrium logic. In the following, an abductive framework A means either a first-order abductive theory A = (B, H) or an abductive logic program A = P, Γ . Definition 9 ([8]) Let A1 and A2 be abductive frameworks. 1. A1 and A2 are explainably equivalent if, for any observation O,5 O is explainable in A1 iff O is explainable in A2 . 2. A1 and A2 are explanatorily equivalent if, for any observation O, E is an explanation of O in A1 iff E is an explanation of O in A2 . 5
This definition of explainable equivalence for ALP is not exactly the same as that in [8, Definition 4.3]. In [8] an observation is a single ground literal, while we allow a conjunction of ground literals as an observation.
38
K. Inoue and C. Sakama / Comparing Abductive Theories
Explainable equivalence requires that two abductive frameworks have the same explainability for any observation. Explainable equivalence may reflect a situation that two programs have different knowledge to derive the same goals. On the other hand, explanatory equivalence assures that two abductive frameworks have the same explanation contents for any observation. Explanatory equivalence is stronger than explainable equivalence: if two abductive frameworks are explanatorily equivalent then they are explainably equivalent. By Definitions 2, 4, 8, and 9, it is obvious that all generality relations defined in this paper are “anti-symmetric”6 in the sense that two abductive frameworks are explainably/explanatorily equivalent iff one is both more (or equally) and less (or equally) explainable/explanatory than another at the same time.
there is T ∈ max(AS(P1 ∪ E)) such that T ⊆ T . By A2 A1 , there is S ∈ AS(P2 ∪ E) such that T ⊆ S , and then there is S ∈ max(AS(P2 ∪ E)) such that S ⊆ S . Then S ⊆ S holds and both belong to max(AS(P2 ∪ E)), which imply S = T = S , and thus S ∈ max(AS(P1 ∪ E)). Hence, (1) if E ⊆ Γ2 and P2 ∪ E is consistent then E ⊆ Γ1 and P1 ∪ E is consistent, and (2) max(AS(P2 ∪ E)) ⊆ max(AS(P1 ∪ E)) for any E ⊆ Γ2 . Similarly, (3) if E ⊆ Γ1 and P1 ∪ E is consistent then E ⊆ Γ2 and P2 ∪ E is consitent, and (4) max(AS(P1 ∪ E)) ⊆ max(AS(P2 ∪ E)) for any E ⊆ Γ1 . By (1) and (3), C1 = C2 holds. By (2) and (4), max(AS(P1 ∪ E)) = max(AS(P2 ∪ E)) holds for any E ⊆ Γ1 and for any E ⊆ Γ2 . Hence, the result follows. (⇐) can be proved in a similar way. 2
Proposition 10 Let A1 and A2 be abductive frameworks.
Two logic programs P1 and P2 are strongly equivalent with respect to a rule set R if AS(P1 ∪ R) = AS(P2 ∪ R) for any logic program R ⊆ R [7]. This equivalence notion is a restricted version of strong equivalence [12], and is called relative strong equivalence [7].7 The next result was originally shown in [8]8 and then was discussed in [16] for EDPs. Now it can be simply proved by the antichain property of AS(P ) for any EDP P .
1. A1 and A2 are explainably equivalent iff A1 ≥ A2 and A2 ≥ A1 . 2. A1 and A2 are explanatorily equivalent iff A1 A2 and A2 A1 . With this correspondence and results in previous sections, we can derive either new characterizations of abductive equivalence or new (and simple) proofs of previously presented results. For first-order abduction, the following results can be verified with new proofs. Proposition 11 Two first-order abductive theories A1 and A2 are explainably equivalent iff Ext(A1 ) = Ext(A2 ) holds. Proposition 12 For first-order abductive theories A1 = (B1 , H1 ) and A2 = (B2 , H2 ), the following four statements are equivalent. 1. 2. 3. 4.
A1 A1 B1 B1 Hi
and A2 are explanatorily equivalent. and A2 are explainably equivalent and B1 ≡ B2 . ≡ B2 and H1 = H2 . ≡ B2 and H1 = H2 , where = { h ∈ Hi | Bi ∪ {h} is consistent } for i = 1, 2.
For ALP, the next results can be newly obtained. In the following, for any set X, let max(X) = { x ∈ X | ¬∃y ∈ X. x ⊂ y }. Theorem 13 Let A1 = P1 , Γ1 and A2 = P2 , Γ2 be abductive programs. Then, A1 and A2 are explainably equivalent iff max(BS(A1 )) = max(BS(A2 )). Proof: (⇒) By Theorem 8, A1 ≥ A2 implies that, for any S2 ∈ max(BS(A2 )) there exists S1 ∈ BS(A1 ) such that S2 ⊆ S1 , and then there exists S1 ∈ max(BS(A1)) such that S1 ⊆ S1 . By A2 ≥ A1 , there exists S2 ∈ BS(A2 ) such that S1 ⊆ S2 , and then there exists S2 ∈ max(BS(A2 )) such that S2 ⊆ S2 . Then S2 ⊆ S2 holds, but because both belong to max(BS(A2 )), S2 = S2 holds. Hence, S2 (= S1 ) also belongs to max(BS(A1 )), and thus the result holds. (⇐) can be proved by tracing the above proof backward. 2 Theorem 14 Let A1 = P1 , Γ1 and A2 = P2 , Γ2 be abductive programs. A1 and A2 are explanatorily equivalent iff C1 = C2 holds and max(AS(P1 ∪ E)) = max(AS(P2 ∪ E)) for any E ∈ Ci , where Ci = { E ⊆ Γi | Pi ∪ E is consistent } for i = 1, 2. Proof: (⇒) Suppose that A1 and A2 are explanatorily equivalent. By Theorem 9, A1 A2 implies that, for any E ⊆ Γ2 and any SE ∈ BS(A2 ), there is TE ∈ BS(A1 ) such that E ⊆ Γ1 and SE ⊆ TE . Then, for any E ⊆ Γ2 and any S ∈ max(AS(P2 ∪ E)), E ⊆ Γ1 and there is T ∈ AS(P1 ∪ E) such that S ⊆ T , and then 6
The relations ≥ and are also preorders, i.e., reflexive and transitive, for both first-order abduction and ALP.
Corollary 15 Let A1 = P1 , Γ and A2 = P2 , Γ be abductive programs with the same hypotheses such that both P1 and P2 are EDPs. Also, let Pi = Pi ∪{ ← L, ¬L | L ∈ Lit} for i = 1, 2. Then, A1 and A2 are explanatorily equivalent iff P1 and P2 are strongly equivalent with respect to Γ.
5
Complexity Results
We show that the computational complexity of deciding generality between abductive theories becomes more complex in general than that of abductive equivalence presented in [8]. Theorem 16 Let A1 and A2 be two propositional abductive theories. Deciding if A1 ≥ A2 is ΠP 3 -complete. Proof: Let A1 = (B1 , H1 ) and A2 = (B2 , H2 ). We here identify Ext(Ai ) with the extensions of the prerequisite-free normal default theory (DHi , Bi ) for i = 1, 2 as in the proof of Proposition 4. For any subset S ⊆ H2 , checking if E = T h(B2 ∪ S) is an extension of A2 is coNP-complete [19]. If E ∈ Ext(A2 ) then deciding if there does not exist F ∈ Ext(AV 1 ) such that E ⊆ F can be determined V by checking if the formula B2 ∧ S belongs to some extension P of A1 , which is Σ2 -complete [5]. Thus, we can choose S ⊆ H2 in nondeterministic polynomial time with a ΣP 2 -oracle to decide if A1 ≥ A2 holds. Hence, the original problem is the complement of P this, and belongs to ΠP 3 . We omit the proof of Π3 -hardness because of the space limitation. 2 Theorem 17 Let A1 and A2 be two propositional abductive theories. Deciding if A1 A2 is ΠP 3 -complete. Proof: Follows from Corollary 6 and Theorem 16. 7
2
This definition is due to [7], and is slightly different from the notion of relativized equivalence in [20, 16]. In [20], P1 and P2 are defined as strongly equivalent relative to a literal set U iff AS(P1 ∪ R) = AS(P2 ∪ R) for any set R of rules that are constructed using literals in U . 8 The condition of EDPs was missing in [8, Theorem 4.4]. In fact, only Theorem 14 holds for GEDPs. Moreover, to characterize inconsistent programs in ALP, an EDP having the answer set Lit should be translated to an EDP without an answer set in Corollary 15.
K. Inoue and C. Sakama / Comparing Abductive Theories
Theorem 18 Let A1 = P1 , Γ1 and A2 = P2 , Γ2 be abductive programs. Deciding if A1 ≥ A2 is (i) ΠP 2 -complete when P1 and P2 are ELPs, and is (ii) ΠP -complete when P1 and P2 are GEDPs. 3 Proof: A computation problem in GEDPs reduces in polynomial time to the corresponding problem in EDPs [6], so we here consider the cases that each Pi is either an ELP or an EDP. (Membership) For any guess S ⊆ Lit, deciding if S ∈ BS(A2 ) is NP-complete for an ELP P2 (resp. ΣP 2 -complere for an EDP P2 ) [2]. For such an S, deciding if there does not exist T ∈ BS(A1 ) such that S ⊆ T can be determined by credulous reasoning that contains S, which is NP-complete for an ELP P1 (resp. ΣP 2 -complere for an EDP P1 ) [2]. Hence, by Theorem 8, A1 ≥ A2 can be nondeterministically solvable with two calls to an NP-oracle (resp. a ΣP 2 -oracle). P (resp. Π ). Therefore, the complement is in ΠP 2 3 (Hardness) We prove for Wnthe ELP case. Let Φ = ∀X∃Y.φ be a closed QBF, where φ = j=1 Cj is a DNF formula, that is, Cj is a conjunction of literals. Let A1 = P1 , Γ1 and A2 = P2 , Γ2 be abductive programs such that P1 = {g ← Cj | 1 ≤ j ≤ n}, Γ1 = X ∪ ¬X ∪ Y ∪ ¬Y , P2 = {g ← }, and Γ2 = X ∪ ¬X, where ¬X = {¬x | x ∈ X} and ¬Y = {¬y | y ∈ Y }. Note that both P1 and P2 are ELPs. We prove that: A1 ≥ A2 ⇔ Φ is valid. (⇒) Suppose A1 ≥ A2 . By Theorem 8, for any S ∈ BS(A2 ), there is T ∈ BS(A1 ) such that S ⊆ T . In particular, for any IX ⊆ X, there is a belief set S ∈ BS(A2 ) with respect to IX ∪¬(X \IX ), and hence IX ∪¬(X\IX ) ⊆ T for some T ∈ BS(A1 ). Since g ∈ S, g must be in T too. Then, some Cj (1 ≤ j ≤ n) must be true under IX ∪ ¬(X \ IX ) and IY ∪ ¬(Y \ IY ) for some IY ⊆ Y . Hence, φ is true under such an interpretation. Since IX was arbitrary, Φ is valid. (⇐) Suppose Φ is valid. Then for any IX ⊆ X, φ is true under IX ∪ ¬(X \ IX ) and IY ∪ ¬(Y \ IY ) for some IY ⊆ Y . Then some Cj is true under this interpretation, and hence g holds. It is easy to see for any S ∈ BS(A2 ) that there is T ∈ BS(A1 ) such that S ⊆ T . By Theorem 8, A1 ≥ A2 holds. For the EDP case, we can apply a transformation of a QBF ∀X∃Y ∀Z.φ into a disjunctive program, which is analogous to the one presented in [1, Theorem 3.1] and [2, Lemma 2]. 2 Theorem 19 Let A1 = P1 , Γ1 and A2 = P2 , Γ2 be abductive programs. Deciding if A1 A2 is (i) ΠP 2 -complete when P1 and P2 are ELPs, and is (ii) ΠP 3 -complete when P1 and P2 are GEDPs. Proof: Like Theorem 18, we can assume that each Pi is either an ELP or an EDP. For any guess S ⊆ Lit, deciding if SE ∈ BS(A2 ) for some E ⊆ Γ2 is NP-complete for an ELP P2 (resp. ΣP 2 -complere for an EDP P2 ) [2]. For any such E, deciding if AS(P1 ∪ E) = ∅ is NP-complete for an ELP P2 (resp. ΣP 2 -complere for an EDP P2 ) [1]. For SE , deciding if there does not exist T ∈ AS(P1 ∪ E) such that SE ⊆ T can be determined by credulous reasoning that contains SE , which is NP-complete for an ELP P1 (resp. ΣP 2 complere for an EDP P1 ) [2]. Hence, by Theorem 9, A1 A2 can be nondeterministically solvable with three calls to an NP-oracle (resp. P P a ΣP 2 -oracle). Therefore, the complement is in Π2 (resp. Π3 ). The hardness can be shown in the same way as in Theorem 18. 2
6
Discussion
The relation ≥ introduced in this paper can be represented by generality relations defined by Inoue and Sakama [9, 10]. We briefly sketch the relationships here. For first-order abductive theories A1 = (B1 , H1 ) and A2 = (B2 , H2 ), by identifying Ext(Ai) with the extensions of the prerequisite-free normal default theory (DHi , Bi ) for
39
i = 1, 2, we can prove that A1 ≥ A2 iff A1 |=dt A2 , where |=dt is a Hoare order defined on the class of default theories [10]. On the other hand, for abductive logic programs A1 = P1 , Γ1 and A2 = P2 , Γ2 , let Pi (i = 1, 2) be the GEDP defined by Pi = Pi ∪ { l; not l ← | l ∈ Γi }. Then, BS(Ai ) = AS(Pi ) holds [6]. With this result, we can see that A1 ≥ A2 iff P1 |=lp P2 , where |=lp is a Hoare order defined on the class of GEDPs (originally defined on the class of EDPs in [9]). Besides work on generality relations in ASP [9], a general correspondence framework has been proposed in [3, 15] to compare logic programs. This framework is defined to compare equivalence and inclusion between the semantics of logic programs instead of generality, but the notions of projection and contexts are also introduced to enable a variety of equivalence comparison. Incorporating these notions into our generality framework is a topic of future work.
REFERENCES [1] T. Eiter and G. Gottlob. On the computational cost of disjunctive logic programs: propositional case. Annals of Mathematics and Artificial Intelligence, 15:289–323, 1995. [2] T. Eiter, G. Gottlob and N. Leone. Abduction from logic programs: semantics and complexity. Theoretical Computer Science, 189:129– 177, 1997. [3] T. Eiter, H. Tompits and S. Woltran. On solution correspondences in answer-set programming. In: Proc. IJCAI-05, pp. 97–102, 2005. [4] M. Gelfond and V. Lifschitz. Classical negation in logic programs and disjunctive databases. New Generation Computing, 9:365–385, 1991. [5] G. Gottlob. Complexity results for nonmonotonic logics. J. Logic and Computation, 2:397–425, 1992. [6] K. Inoue and C. Sakama. Negation as failure in the head. J. Logic Programming 35, pp. 39–78, 1998. [7] K. Inoue and C. Sakama. Equivalence of logic programs under updates. In: Proc. 9th European Conference on Logics in Artificial Intelligence, LNAI 3229, pp. 174–186, Springer, 2004. [8] K. Inoue and C. Sakama. Equivalence in abductive logic. In: Proc. IJCAI-05, 2005, pp. 472–477. [9] K. Inoue and C. Sakama. Generality relations in answer set programming. In: Proc. 22nd International Conference on Logic Programming, LNCS 4079, pp. 211–225, Springer, 2006. [10] K. Inoue and C. Sakama. Generality and equivalence relations in default logic. In: Proc. 22nd Conference on Artificial Intelligence (AAAI07), pp. 434–439, 2007. [11] A. Kakas, R. Kowalski and F. Toni. The role of abduction in logic programming. In: D. Gabbay, C. Hogger and J. Robinson, editors, Handbook of Logic in Artificial Intelligence and Logic Programming, Vol. 5, pp. 235–324, Oxford University Press, 1998. [12] V. Lifschitz, D. Pearce and A. Valverde. Strongly equivalent logic programs. ACM Transactions on Computational Logic, 2:526–541, 2001. [13] T. Niblett. A study of generalization in logic programs. In: Proc. 3rd European Working Sessions on Learning, pp. 131–138, Pitman, 1988. [14] S.-H. Nienhuys-Cheng and R. De Wolf. Foundations of Inductive Logic Programming. LNAI 1228, Springer, 1997. [15] J. Oetsch, H. Tompits and S. Woltran. Facts do not cease to exist because they are ignored: relativised uniform equivalence with answer-set projection. In: Proc. 22nd Conference on Artificial Intelligence (AAAI07), pp. 458–464, 2007. [16] D. Pearce, H. Tompits and S. Woltran. Relativised equivalence in equilibrium logic and its applications to prediction and explanation: preliminary report. In: Proc. LPNMR’07 Workshop on Correspondence and Equivalence for Nonmonotonic Theories, pp. 37–48, 2007. [17] D. Poole. A logical framework for default reasoning. Artificial Intelligence, 36:27–47, 1988. [18] R. Reiter. A logic for default Reasoning. Artificial Intelligence, 13:81– 132, 1980. [19] R. Rosati. Model checking for nonmonotonic logics: algorithm and complexity. In: Proc. IJCAI-99, pp. 76–81, 1999. [20] S. Woltran. Characterizations for relativized notions of equivalence in answer set programming. In: Proc. 9th European Conference on Logics in Artificial Intelligence, LNAI 3229, pages 161–173, Springer, 2004.
40
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-40
Privacy-Preserving Query Answering in Logic-based Information Systems Bernardo Cuenca Grau and Ian Horrocks 1 Abstract. We study privacy guarantees for the owner of an information system who wants to share some of the information in the system with clients while keeping some other information secret. The privacy guarantees ensure that publishing the new information will not compromise the secret one. We present a framework for describing privacy guarantees that generalises existing probabilistic frameworks in relational databases. We also formulate different flavors of privacy-preserving query answering as novel, purely logic-based reasoning problems and establish general connections between these reasoning problems and the probabilistic privacy guarantees.
1
Motivation
Privacy protection is an important issue in modern information systems. The digitalization of data on the Web has dramatically increased the risks of private information being either accidentally or maliciously disclosed. These risks have been witnessed by numerous cases of personal data theft from systems that were believed to be secure. The design of information systems that provide provable privacy guarantees is, however, still an open problem—in fact, the notion of privacy is itself still open to many interpretations [2]. This paper addresses the problem of privacy-preserving query answering. In this setting it is assumed that the information itself is kept secret, but that the owner of the information wants to allow some query access to it while at the same time preventing private information from being revealed. For example, a hospital may want to allow researchers studying prescribing practices to query the patients’ records database for information about medicines dispensed in the hospital, but they want to ensure that no information is revealed about the medical conditions of individual patients. To make this more precise, the hospital wants to check whether answering specified legal queries could augment knowledge (from whatever source) that an attacker may have about the answer to a query for patient names and their medical conditions (the so-called sensitive query). Taking into account that an attacker may have previous knowledge about the system is of crucial importance, as such knowledge may connect the answers to legal and sensitive queries, and lead to the (partial) revelation of the latter. For example, allowing a query for drugs and the dates on which they were prescribed may seem harmless, but if the attacker knows the dates on which patients have been in hospital and drugs that are used to treat AIDS, then he may deduce that there must be an AIDS patient amongst the group known to be in hospital on a date when AIDS drugs were dispensed. This problem has been recently investigated in the context of relational databases (DBs) [9, 10, 6]. In these privacy frameworks, the knowledge and/or beliefs about the system of a potential attacker are 1
Oxford University Computing Laboratory, UK
modeled as a probability distribution over possible states of the information system. Privacy checking then amounts to verifying whether publishing new information, such as the answer to a legal query, could change the probability (from an attacker’s perspective) of any particular answer to the sensitive query. In the first part of this paper, we extend the probabilistic notions of privacy explored in the DB literature to cover a very general class of logic-based languages which includes, for example, ontology languages [12]. Furthermore, since these notions are too strict in practice, we propose ways to weaken them. In the second part, we formulate privacy-preserving query answering in terms of novel, purely logic-based reasoning problems. We show that our logic-based notions have natural probabilistic counterparts. Finally, we argue that these reasoning problems are related to existing ones; to illustrate this fact, we point out a connection with the notion of a conservative extension, an important concept in modular ontology design [8, 7]. Given the generality of our notion of an information system, we do not make claims concerning computational properties. Our results, however, provide an excellent formal base for studying such properties for particular languages.
2
Logic-based Information Systems
We adopt a general framework for describing logic-based information systems that captures any language whose formal semantics is based on First Order (FO) models; the framework is open toward different mechanisms for selecting admissible models and thus comprises a wide range of languages. We distinguish between intensional knowledge (background knowledge about the application domain) and extensional knowledge (data involving specific objects of the domain). This allows us to make the usual distinction in KR between schema knowledge and data. The framework here has been adapted from existing general frameworks in the literature [5, 1]. An Information System Formalism (ISF) is a tuple F = (Σ, LS , LD , Sem) where Σ is a countably infinite FO-signature, LS , LD are FO-languages over Σ, called the schema and dataset language respectively, and Sem is a specification of the semantics (of which more below). A schema S (respectively a dataset D ) is a set of LS -sentences (respectively a set of LD -sentences) over Σ. For example, in relational DBs, Σ is a set of relations and constants; LD only allows for ground atomic formulas, and LS is the language of FO Predicate Logic with equality. Datasets and schemas are called relational instances and relational schemas respectively. In the case of description logic (DL) ontologies, Σ contains unary relations, binary relations and constants; LS is a DL, such as SH I Q [12], and LD again only allows for ground atomic formulas over the predicates in Σ; Datasets are called ABoxes and schemas TBoxes.
41
B. Cuenca Grau and I. Horrocks / Privacy-Preserving Query Answering in Logic-Based Information Systems
The semantics is given by a pair Sem = (δ, ◦); δ is a function that assigns to each FO-interpretation I over Σ and each possible set S of LS -sentences (respectively LD -sentences D ) a truth value δ(I , S ) ∈ {true, false} (respectively δ(I , D ) ∈ {true, false}); ◦ is a binary operation on sets of interpretations, such that for each pair of sets M1 , M2 , ◦ returns a set of interpretations M3 = M1 ◦ M2 . An information system (IS) F is a pair ℑ = (S , D ), with S an LS -schema, and D an LD -dataset. The set of models of ℑ is Mod(ℑ) = Mod(S ) ◦ Mod(D ), with Mod(S ) = {I | δ(I , S ) = true} / and Mod(D ) = {I | δ(I , D ) = true}. ℑ is satisfiable if Mod(ℑ) = 0. For example, in both ontologies and relational DBs, schemas are interpreted in the usual way in FOL: δ(I , S ) = true iff I |=FOL S . In SH I Q ontologies, datasets are also interpreted in the usual way: δ(I , D ) = true iff I |=FOL D , and ◦ is the intersection between the schema and the dataset models. In relational DBs, however, the data usually has a single model—that is, δ(I , D ) = true iff I = ID , where ID is the minimal Herbrand model of D ; The operation ◦ is also defined differently: I1 ◦ I2 ∈ Mod(ℑ) iff I2 = ID and ID |=FOL S . We are also very permissive w.r.t. query languages. A query language for F is an FO-language LQ over Σ. A boolean query Q is an LQ -sentence. The semantics is given by a function δLQ that assigns to each interpretation I and boolean query Q a truth value δLQ (I , Q) ∈ {true, false}. A system ℑ entails Q, written ℑ |=F Q if, for each I ∈ Mod(ℑ), δLQ (I , Q) = true. A general query Q is a LQ -formula, where x is the vector of free variables in Q. Let σ[x/o] be a function that, when applied to a general query Q, yields a new boolean query σ[x/o] (Q) by replacing in Q the variables in x by the constants in o. The answer set for Q in ℑ is the following set of tuples of constants: ans(Q, ℑ) = {o | ℑ |=F σ[x/o] (Q)}. An example of a query language could be the language of conjunctive queries in both DBs and ontologies. Given a query language LQ , a view over ℑ is a pair V = (V, v), with V —the definition of the view— an LQ -query, and v—the extension of the view— a finite set of tuples of constants, such that v = ans(V, ℑ). Condition [S ↑] [S ∗ ] [V] [Q = q]
Set Syst([S ↑]) = {ℑ = (S , D ) | ℑ ∈ IS and S ⊆ S } Syst([S ∗ ]) = {ℑ = (S , D ) | ℑ ∈ IS} Syst([V]) = {ℑ ∈ IS | each V ∈ V is a view over ℑ} Syst([Q = q]) = {ℑ ∈ IS | ans(Q, ℑ) = q} Table 1. Conditions on Information Systems
Given F = (Σ, LS , LD , Sem), we denote by IS, D the set of all satisfiable systems and datasets respectively in F , and by Tup the set of all tuples of constants over Σ. We also consider systems in IS that satisfy certain conditions; the conditions we consider are given in Table 1. Given a schema S , the first and second rows in the table represent respectively the set of ISs whose schemas extend S and are equal to S ; given a set of views V, the third row represents the set of ISs over which every V ∈ V is a view; finally, given a query Q and an answer set q, the last row represents the ISs for which q is the answer to Q. We denote with [C1 , . . . ,Cn ] the conjunction of conditions [C1 ], . . . , [Cn ], and with Syst([C1 , . . . ,Cn ]) the subsets of IS that satisfy all of [C1 ], . . . , [Cn ].
3
The Privacy Problems
Given F = (Σ, LS , LD , Sem) and a query language LQ , our goal is to study privacy guarantees for Bob —the owner of a system ℑ = (S , D ) in IS— against the actions of Alice— a potential attacker. Existing privacy frameworks for DBs[9, 10, 6] assume that the actual data D is kept hidden. The data to be protected is defined by
a query Q, called the sensitive query, whose definition is known by Alice. As an external user, Alice can only access the system through a query interface which allows her to ask certain “legal” queries; these legal queries, together with their answers, are represented as a set V of views over ℑ. Bob wants to extend the set of legal queries, i.e., to publish new views. The problem of interest is the following: The publishing problem: Given ℑ = (S , D ), an initial set of views V and a final set of views W over ℑ with V ⊆ W, verify that no additional information about the answers to Q is disclosed.2 R(x,y) (dis1,drug1) (dis2,drug1) (dis3, drug2) (dis4, drug2)
S(z,y) (pat1,drug1) (pat2, drug1) (pat3, drug2) (pat4, drug2)
T(z,w,x) (pat1,male,dis1) (pat2,male,dis2) (pat3, f em, dis3) (pat4, male, dis4)
F(z,t) (pat1, (pat2, (pat3, (pat4,
f lo1) f lo2) f lo3) f lo2)
Table 2. Example Hidden Dataset
Example 1 The IS of a hospital, modeled in FO-logic, contains data about the following predicates: R(x, y), which relates diseases to drugs, S(z, y), which relates patients to their prescribed drugs, T(z, w, x), which relates patients, their gender, and their diagnosed disease, and F(z,t) which specifies the floor of the hospital where each patient is located. Their extension in the hidden dataset D is given in Table 2. The schema S is public and contains FO-sentences such as ∀x, y : [R(x, y) ⇒ Disease(x)∧Drug(y)], which ensures that R only relates diseases to drugs, and sentences like ∀x : [Disease(x) ⇒ ¬Drug(x))], which ensures disjointness between drugs, diseases, patients, genders and floors. S also models other common-sense knowledge, e.g. that the gender of a patient is unique. Bob does not want to reveal any information about which patients suffer from dis1, i.e., the answer to the query Q(z) = ∃w : [T(z, w, dis1)] should be secret; however, Bob also wants to publish views V1 = (V1 , v1 ), and V2 = (V2 , v2 ) with V1 (x, y) ← F(z,t) and V2 (z, w) ← ∃x : [T(z, w, x)], and where v1 , v2 are their respective extensions w.r.t. D . Publishing these views could lead to a privacy breach w.r.t. Q. For example, if S contains a sentence α stating that all the patients in f lo1 suffer from dis1 then, by publishing V1 , Alice could deduce that pat1 suffers from dis1 and thus belongs to the answer to Q1 , which clearly causes a privacy breach. Even if the identity of patients suffering from dis1 is not revealed, the views could still provide useful information to Alice. Suppose that S contains β stating that dis1 is a kind of disease that only affects men; then by publishing V2 Alice could infer that pat3, a woman, cannot be in the answer to Q1 , which would permit Alice to discard possible answers. Such privacy breaches are datasetdependent: if all patients in D were male and none of them is on the first floor, then publishing V1 and V2 would be harmless. 3 Existing DB frameworks assume that the schema is static and fully known by Alice, which are not always reasonable assumptions. For inferential systems like ontologies [12], where the schema participates in query answering by allowing the deduction of new data, Bob may prefer to hide a part of the schema. In fact, some widely used ontologies, such as SNOMED-CT—a component of the Care Record Service in the British Health System—are not fully available. Furthermore, the schema may undergo continuous modifications; indeed many ontologies are updated on a daily basis. To overcome these limitations, we propose to formalise and study the following problems: The generalised publishing problem: New views or schema axioms are published, but the IS ℑ = (S , D ) remains static. Given an initial public schema S1 and a final public schema S2 with S1 ⊆ S2 ⊆ S , 2
/ Note that this generalises the “standard” case where V = 0.
42
B. Cuenca Grau and I. Horrocks / Privacy-Preserving Query Answering in Logic-Based Information Systems
initial views V and final views W with V ⊆ W, Bob wants to verify that no additional information about the answers to Q is disclosed. The system evolution problem: The IS ℑ = (S , D ) evolves to ℑ = (S , D ). Bob wants to ensure that, if it was possible to safely publish certain information before the change, then the same information can be safely published after the change. DB frameworks are probabilistic and apply to the publishing problem [10, 6, 11]. In the next section, we generalise them. Our presentation differs from [10, 6, 11] in two aspects: we consider arbitrary ISFs instead of relational DBs; and we consider the generalised publishing problem: instead of assuming that the schema is fixed and known, we allow for partially secret schemas. We show that known results for DBs can be naturally lifted to our more general setting.
4
Probabilistic Frameworks
The framework by Miklau & Suciu [10] is based on Shannon’s information-theoretic notion of perfect secrecy. As mentioned before, we present the framework in a more general form. Alice’s (additional) knowledge about the IS being attacked is given as a distribution P : IS → [0, 1] over all possible ISs. Given P, the probability that an IS satisfies a condition [C] in Table 1 is as follows: P([C]) = ∑ℑ∈Syst([C]) P(ℑ). Given [C1 ], [C2 ], P([C1 ] | [C2 ]) represents the probability, according to Alice’s knowledge, that an IS satisfies [C1 ] given that it satisfies [C2 ]; this can be computed using the Bayes P([C1 ,C2 ]) formula: P([C1 ] | [C2 ]) = P([C 2 ]) Let ℑ = (S , D ) be the system to be protected. Alice initially knows part of the schema S1 ⊆ S and views V over ℑ. After publication, she observes the new schema S2 with S1 ⊆ S2 and views W = V ∪ U; she is also aware that the real schema S extends both S1 and S2 . The apriori and a-posteriori probabilities, according to Alice’s knowledge, that q is the answer to Q are respectively given as follows:3 P([Q = q] | [S1 ↑, V])
(a-priori)
(1)
P([Q = q] | [S2 ↑, W])
(a-posteriori)
(2)
The privacy condition under consideration is called perfect privacy: intuitively, Alice should not learn anything about the possible outcomes of Q, whatever her additional knowledge or beliefs (i.e., for any P). Note that the condition is trivially satisfied if S1 and V already reveal the answer to Q, i.e., if each ℑ ∈ Syst([S1 ↑, V]) yields the same outcome to Q; in this case we say that Q is trivial. Example 2 Suppose that in Example 1, the schema S with β ∈ S is known, and V2 —the relation between patients and their genders— is published. Suppose that Alice has only vague knowledge about the IS and considers all datasets consistent with S equally likely. Consider an answer set q containing pat3. Before publishing the view, the probability (1) is non-zero for q, whereas, after publishing V2 , (2) is zero. Intuitively, Alice’s knowledge about Q has increased. 3 Definition 1 (Perfect Privacy). Perfect privacy holds if, for each P : IS → [0, 1] and q ∈ Tup with (1) well-defined, (2) equals (1).
In Example 1, Alice may believe that the answer to Q is q1 = {pat1} with P(q1 ) = 2/3, q2 = {pat1, pat2} with P(q2 ) = 1/6 and q3 = {pat1, pat3} with P(q3 ) = 1/6. Note the difference with [10], where Alice had prior knowledge about the possible ISs themselves. The distribution P induces possible compatible distributions P : IS → [0, 1] over ISs as follows: P is compatible with P, written P ∈ Comp(P) if, for each q, the sum of the probabilities of the ISs for which ans(Q, ℑ) = q is precisely P(q) (i.e., ∑{ℑ∈Syst([Q=q])} P (ℑ) = P(q)). Alice’s a-priori and a-posteriori knowledge is given respectively by (1) and (2) over P , and the privacy condition is the following: Definition 2 (Safety). Safety holds if, for each P : Tup → [0, 1], P ∈ Comp(P), and q ∈ Tup with (1) well-defined, (2) equals (1). Triviality of Perfect Privacy and Safety: In the relational DB literature, it has been observed that, on the one hand, safety and perfect privacy are closely related [6] and that, on the other hand, they are too strict in practice: revealing any new information, even if apparently irrelevant to Q, causes perfect privacy and safety not to hold— intuitively, this is because the attacker’s beliefs can establish a (possibly spurious) connection between any revealed information and the answer to the sensitive query. We show that these results can be naturally lifted to the generalised publishing problem for arbitrary ISFs as follows: Theorem 1 For given ℑ, Q, S1 , S2 , and V, W: (i) Safety ⇔ Perfect Privacy, and (ii) Perfect Privacy ⇔ Syst([S1 ↑, V]) ⊆ Syst([S2 ↑, W]). Relaxing Perfect Privacy and Safety: A number of recent papers have tried to weaken these notions. Miklau and Suciu [10] proposed to place constraints on P and consider only product distributions; this amounts to assuming that the tuples in the DB are independent. This assumption, however, is not reasonable if the schema is nontrivial: schema constraints can impose arbitrary correlations between tuples. Other proposals, e.g. [3], involve making (1) only approximately equal to (2). In this paper, we propose two novel notions— quasi-safety and quasi-privacy— that significantly relax Definitions 1 and 2 respectively; we show later on that both notions are equivalent and have a nice logical counterpart in terms of purely logic-based reasoning problems. Consider the notion of safety. Given P : Tup → [0, 1], Definition 2 requires (1) and (2) to coincide for all its compatible distributions. Definition 2 can be relaxed by requiring, for each P, only the existence of a compatible distribution P for which (1) and (2) coincide. Moreover, such distribution must be “reasonable” given the public information S1 , V—that is, if P assigns non-zero probability to q1 , then P cannot assign zero probability to all ISs that satisfy [S1 , V] and yield q1 . Formally, we say that P ∈ Comp(P) is ad/ there is an IS missible for S1 , V if, for each q such that P(q) = 0, / ℑ ∈ Syst([S1 , V, Q = q]) such that P (ℑ) = 0. Definition 3 (Quasi-Safety). Quasi-safety holds if, for each P : Tup → [0, 1] there is an admissible P ∈ Comp(P) s.t., for each q ∈ Tup, for which (1) is well-defined, (2) equals (1).
The framework by Deutsch and Papakonstantinou [6, 11] models Alice’s knowledge or beliefs as a distribution P : Tup → [0, 1] over the possible outcomes of the sensitive query. Here, we present the framework in a more general form.
That is, whatever Alice’s knowledge or beliefs about the answers to Q, there is always a compatible opinion about the hidden IS that is “reasonable” given the public information and that would not cause her to revise her beliefs after the new information is published. A similar principle can be used for weakening perfect privacy:
These probabilities are well-defined if P([S1 ↑, V]) and P([S2 ↑, W]) are nonzero; that is, if there is a IS with non-zero probability that is compatible with the available information.
Definition 4 (Quasi-Privacy). Quasi-privacy holds if, for each P : IS → [0, 1], there is a P : IS → [0, 1] s.t., for each q ∈ Tup for which (1) is well-defined over P, (2) over P equals (1) over P.
3
B. Cuenca Grau and I. Horrocks / Privacy-Preserving Query Answering in Logic-Based Information Systems
That is, whatever Alice’s initial beliefs about the hidden IS, she can always revise them such that her opinion about the answers to Q does not change when the new information is published.
5
A Logic-based Framework
In this section, we formalise privacy from a purely logic-based perspective as a guarantee that the published information will not “change the meaning” of the sensitive query. We propose a collection of privacy conditions that model this notion of meaning change, and consider both the publishing and the evolution problems.
5.1
The Generalised Publishing Problem
The most basic information about Q is obviously its answer. The most dangerous privacy breach occurs when publishing new information reveals part of such answer. In Example 1, before publishing any views, Alice cannot deduce the name of any patient suffering from dis1; after publication of V1 , Alice learns that pat1 does have dis1 and therefore belongs to the answer of Q. We will then say that the set of certain answers to Q has changed. Furthermore, as seen in Example 1, a privacy breach could also occur if Alice can discard possible answers and therefore formulate a “better guess”, even if part of the actual answer has not been disclosed. Initially, all possible sets of patients (e.g. q3 = {pat2, pat3}) are possible. Upon publication of V2 , all answers including pat3 (e.g. q3 = {pat2, pat3}) become impossible. We will then say that the set of possible outcomes of Q has changed. Possible outcomes and certain answers: Given Q and a condition [C] (see Table 1), the possible outcomes of Q given [C] are as follows: out([C]) = {q ∈ Tup | ∃ℑ ∈ Syst([Q = q,C])}
(3)
The set of certain answers of Q given [C] is defined as the common subset of all the possible outcomes: cert([C]) = out([C]). As argued before, a privacy condition should at least guarantee that the set of certain answers given the initial schema and views stays the same after publishing the new information:4 cert([S1 ↑, V]) = cert([S2 ↑, W])
(4)
A stronger privacy condition can be obtained if we require the set of possible outcomes not to change as follows: out([S1 ↑, V]) = out([S2 ↑, W])
(5)
It is ultimately up to the data owner to decide which condition is most appropriate for his application needs. Monotonicity for answer sets: Sometimes in this section we will focus only on ISFs and query languages that have a monotonic behavior with respect to answer sets—that is, if new schema axioms and/or views are published, the set of possible answers to a query Q can only decrease. In the limit, if the whole system is published, then only one answer remains possible, namely the “real” answer for Q against the IS . This property can be formalized as follows:
S1 ⊆ S2 and V ⊆ W ⇒ out([S2∗ , W]) ⊆ out([S1∗ , V])
(6)
Many languages currently used in practice, such as relational DBs and DL ontologies satisfy this property. Checking Condition (5) in ISFs that satisfy Property (6) just requires to consider the initial and final schemas, instead of all their super-sets. 4
It can be easily seen that Condition (5) implies 4
43
Proposition 1 If F satisfies Property (6), then Condition (5) holds iff out([S1∗ , V]) ⊆ out([S2∗ , W]), In what follows, if a result depends on Property (6), it will be explicitly stated; otherwise, we assume general ISFs and queries. Bridges between probability and logic: At this stage, we can establish a first general bridge between our logic-based conditions and the probabilistic ones. In particular, it turns out that Condition (5) is equivalent to both quasi-privacy and quasi-safety: Theorem 2 Quasi-safety ⇔ Quasi-privacy ⇔ Condition (5). Note that Theorem 2, on the one hand, implies that quasi-safety and quasi-privacy are indeed equivalent notions; on the other hand, it provides a natural logical interpretation to our probabilistic weakening of safety and perfect privacy. Breaches in logic privacy: Condition (5) may still lead to potential security breaches if new schema axioms are published, as shown by the following example: Example 3 Suppose LS is FO predicate logic, LD only allows for ground atomic formulas, and LQ is the language of conjunctive queries. Let A, B be unary predicates and R a binary predicate; consider a Σ with two constants: a, b. The sensitive query is A(x). Suppose that Bob publishes V1 with definition B(x) and extension {a, b}. Initially, S1 = 0/ and hence all outcomes Tup = {{}, {a}, {b}, {a, b}} are possible. Suppose that Bob publishes S2 = {∀x : [A(x) ↔ ∃y : [R(x, y) ∧ B(y)]]}. Upon publication of S2 , no possible outcome is ruled out, but S2 has introduced a correlation between V1 and Q. These correlations could potentially lead to a security breach. 3 Indeed, even if Alice cannot discard any possible outcome of Q, Bob may want to prevent the new information from establishing potentially dangerous correlations; to this end, we introduce a stronger notion of logic-based privacy. Strengthening logic privacy: We propose an additional condition in case new schema axioms are published. Our condition is only defined for ISs satisfying Property (6) and it ensures that for each possible dataset D , Alice obtains the same answer for Q independently of whether she considers the initial schema S1 or the final one S2 . That is, for each ℑ = (S2 , D ) ∈ Syst([S2∗ , W]), the following should hold: ans(Q, ℑ) = ans(Q, ℑ )
(7)
where ℑ = (S1 , D ). If we enforce this condition in the example above, we would have that publishing S2 yields a privacy breach. Indeed, consider D = {R(a, b), B(a), B(b)}; we have ans(Q, S1 = {}) = {}, whereas ans(Q, S2 ) = {a}. These intuitions motivate the following notion of privacy for ISFs satisfying Property (6): Definition 5 (Strong Logic-based Privacy). Given Q, S1 , S2 , V, W, strong logic-based privacy holds if Conditions (5) and (7) hold. The above establishes a middle ground between too strict privacy notions (Definitions 1, 2) and rather permissive ones (Definitions 3, 4). Definition 5 implies that a privacy breach may only occur if the new information correlates the public one to the answers of Q; that is, publishing information that is completely unrelated to Q will not break privacy. Note, however, that if S1 = S2 , then Definition 5 reduces to Condition (5) since Condition (7) trivially holds. A connection with conservative extensions: Definition 5 is close to conservative extensions, a well-established notion in mathematical logic, and an important concept in ontology design and reuse [8, 4, 7].
44
B. Cuenca Grau and I. Horrocks / Privacy-Preserving Query Answering in Logic-Based Information Systems
Conservative extensions have been recently proposed as the basic notion for defining modules in ontologies—independent parts of a given theory— and safe refinements—extensions of a theory that do not affect certain aspects of the meaning of the original theory. In the context of privacy-preserving query answering, the notion of a query conservative extension [7] for monotonic ISFs is of special relevance: Definition 6 (Query Conservative Extension). 5 Given S1 ⊆ S2 , sets Q, D of queries and datasets respectively, S2 is a query conservative extension of S1 w.r.t. Q, D if, for each Q ∈ Q and D ∈ D, we have that ans(Q, ℑ = (S2 , D )) = ans(Q, ℑ = (S1 , D )). In order to establish a connection between Definitions 5 and 6, let us introduce the following notation. Given [C], we denote the set of datasets that an IS that satisfies [C] can have as follows: Data([C]) = {D ∈ D | ∃ℑ ∈ Syst([C]), ℑ has dataset D }. If D = Data([S2∗ , W]), then Definition 6 corresponds precisely to Condition (7). If V = W, and D = Data([S1∗ , V]), then Definition 6 is a sufficient condition for strong logic-based privacy.
5.2
The System Evolution Problem
Suppose that the privacy of ℑ = (S , D ) w.r.t. a query Q and a set V of published views has been tested and the system evolves to ℑ = (S , D ). We want to ensure that ℑ behaves in the same way as ℑ w.r.t. the secrecy of Q given V. Such notion of robustness under changes can be characterized as follows. Let ℑ = (S , D ), ℑ = (S , D ) be ISs, and let Q be a sensitive query. Consider a notion of security characterized by a predicate Privacy(ℑ, Q, V), e.g. (strong) logic-based privacy, which is evaluated to true if, given the IS ℑ = (S , D ), with S being public, Q is secure for the publication of V. Definition 7 (Secure Evolution). The evolution of ℑ = (S , D ) to ℑ = (S , D ) is secure w.r.t. Q and V if Privacy(ℑ, Q, V) implies Privacy(ℑ , Q, V ) with V being the views over ℑ with the same view definitions as V. We distinguish two situations: (i) the data changes during the evolution of the system, but the schema remains constant, and (ii) the data remains constant, but the schema changes. Varying the data: We first formulate the notion of data independence, which ensures robust evolution w.r.t. changes in the data. Definition 8 (Data Independence). A notion of privacy is dataindependent w.r.t. S , Q and V if, for each ℑ, ℑ ∈ Syst([S ∗ ]) the evolution of ℑ to ℑ is secure w.r.t. Q, V. It is not hard to see that, given any non-trivial Q and any S , Perfect privacy and safety are data-independent w.r.t. S , Q. In contrast, the notion of privacy derived from Condition 5 is not data-independent for all S . Consider Example 1 and suppose that the schema S contains the sentence β and that the dataset D only contains male patients. In this case, Condition (5) holds since no possible outcome of Q can be ruled out when publishing V2 ; however, if D evolves to D containing a female patient, then the condition is violated. As a consequence, strong logic-based privacy is not data-indepedent and, given Theorem 2, nor are quasi-privacy and quasi-safety. Data independence for any schema is, indeed, a strict requirement. For ISFs satisfying Property (6), certain schemas and certain views, it is possible to obtain data-independence results: 5
In [7], D and Q are the sets of all datasets and all queries respectively over a given signature.
Proposition 2 Let S be a query conservative extension of S = {} w.r.t. Q = {Q} and D = D; let V, V be s.t. out([V]) = out([V ]). Then (strong) logic-based privacy is data-independent w.r.t. S , Q. Proposition 2 guarantees that data independence is obtained for schemas and views that are uncorrelated with the sensitive query. Varying the schema: we now assume that the data remains constant and the schema changes. Suppose that, in Example 1, the initial schema S does not contain β; let S = S ∪ {β} and let the dataset D contain a female patient. Publishing the names and gender of the patients (view V2 ) does not cause a privacy breach since S does not introduce any correlation between diseases and the gender of patients; however, when ℑ = (S , D ) evolves to ℑ = (S , D ) then such correlation does exist and the publication of V2 is no longer safe. Note that, given Q, D , we have that S is not a query conservative extension of S . This observation suggests the following sufficient condition for secure evolution of ISFs satisfying Property (6): Proposition 3 Let S is a query conservative extension of S w.r.t Q = {Q} and D = Data([S ∗ ]); let out([S∗ , V]) = out([S∗ , V ]). Then, the evolution of ℑ = (S , D ) to ℑ = (S , D ) is secure w.r.t. Q, V for both privacy as in Condition (5) and strong logic-based privacy. Propositions 2 and 3 establish a bridge between the notions of conservative extension and secure evolution and show that the former can be used to provide sufficient conditions for the latter.
6
Conclusion
In this paper, we have generalised existing results for privacy in databases, and proposed novel privacy conditions. We have proposed a novel logic-based approach and established bridges with existing information-theoretic approaches. Our results provide a deeper fundamental understanding of privacy-preserving query answering and can be used as a starting point for studying the decidability and complexity of the different privacy guarantees for particular languages.
REFERENCES [1] F. Baader, C. Lutz, H. Sturm, and F. Wolter, ‘Fusions of Description Logics and Abstract Description Systems’, JAIR, 16, 1–58, (2002). [2] E. Bertino, S. Jajodia, and P. Samarati, ‘Database security: Research and practice’, Inf. Syst., 20(7), 537–556, (1995). [3] A. Blum, C. Dwork, F. McSherry, and K. Nissim, ‘Practical privacy: the sulq framework’, in PODS, pp. 128–138. ACM, (2005). [4] B. Cuenca Grau, I. Horrocks, Y. Kazakov, and U. Sattler, ‘A logical framework for modularity of ontologies’, in IJCAI-07, pp. 298–304. AAAI, (2007). [5] G. De Giacomo E. Franconi I. Horrocks A. Kaplunova D. Lembo M. Lenzerini C. Lutz D. Martinenghi R. Moeller R. Rosati S. Tessaris A.Y. Turhan D. Calvanese, B. Cuenca Grau. Common framework for representing ontologies. TONES Project Deliverable, 2007. [6] A. Deutsch and Y. Papakonstantinou, ‘Privacy in database publishing’, in ICDT-2005, volume 3363 of LNCS, pp. 230–245. Springer, (2005). [7] R. Kontchakov, F. Wolter, and M. Zakharyaschev, ‘Modularity in dl lite’, in DL-2007. [8] C. Lutz, D. Walther, and F. Wolter, ‘Conservative extensions in expressive description logics’, in IJCAI-07, pp. 453–459. AAAI, (2007). [9] A. Machanavajjhala and J. Gehrke, ‘On the efficiency of checking perfect privacy’, in PODS-2006, pp. 163–172. ACM, (2006). [10] G. Miklau and D. Suciu, ‘A formal analysis of information disclosure in data exchange’, J. Comput. Syst. Sci., 73(3), 507–534, (2007). [11] A. Nash and A. Deutsch, ‘Privacy in GLAV information integration’, in ICDT, pp. 89–103, (2007). [12] P.F. Patel-Schneider, P. Hayes, and I. Horrocks. Web ontology language OWL Abstract Syntax and Semantics. W3C Recommendation, 2004. [13] L. Sweeney, ‘K-anoniminity: a model for protecting privacy’, Int. J. on Uncertainty, Fuzziness and Knowledge-based Systems., 10(5), (2002).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-45
45
Optimizing Causal Link Based Web Service Composition Freddy L´ecu´e1,2 and Alexandre Delteil2 and Alain L´eger2 Abstract. Automation of Web service composition is one of the most interesting challenges facing the Semantic Web today. Since Web services have been enhanced with formal semantic descriptions, it becomes conceivable to exploit causal links i.e., semantic matching between their functional parameters (i.e., outputs and inputs). The semantic quality of causal links involved in a composition can be then used as a innovative and distinguishing criterion to estimate its overall semantic quality. Therefore non functional criteria such as quality of service (QoS) are no longer considered as the only criteria to rank compositions satisfying the same goal. In this paper we focus on semantic quality of causal link based semantic Web service composition. First of all, we present a general and extensible model to evaluate quality of both elementary and composition of causal links. From this, we introduce a global causal link selection based approach to retrieve the optimal composition. This problem is formulated as an optimization problem which is solved using efficient integer linear programming methods. The preliminary evaluation results showed that our global selection based approach is not only more suitable than the local approach but also outperforms the naive approach.
1
Introduction
The semantic web [6] is considered to be the future of the current web. Web services in the semantic web are enhanced using rich description languages such as the Web Ontology Language (OWL) [19]. Formally the latter semantic descriptions are expressed by means of Description Logics concepts [4] in ontologies. An ontology is defined as a formal conceptualization of a domain we require to describe the semantics of services e.g., their functional input, output parameters. Intelligent software agents can, then, use these descriptions to reason about web services and automate their use to accomplish intelligent tasks e.g., selection, discovery, composition. In this work we focus on web service composition and more specifically on its functional level (aka causal link composition). Starting from an initial set of web services, such a level of composition aims at selecting and inter-connecting web services by means of their (semantic) causal links according to a goal to achieve. The functional criterion of causal link, first introduced in [14], is defined as a semantic connection between an output of a service and an input parameter of another service. Since the quality of the latter links are valued by a semantic matching between their parameters, causal link compositions could be estimated and ranked as well. From their estimation results, some compositions can be considered as unsuitable in case of under specified causal links. Indeed a composite service that does not provide acceptable quality of causal links might be as useless as a service not providing the desired functionality. Unlike most of approaches [5, 22, 23] which focus on the quality of composition by means of non functional parameters i.e., quality of 1 2
Ecole de Mines de Saint-Etienne, France, email:
[email protected] Orange Labs, France, email: {firstname.lastname}@orange-ftgroup.com
service (QoS), the quality of causal links can be considered as a distinguishing functional criterion for semantic web service compositions. Here we address the problem of optimization in service composition with respect to this functional criterion. Retrieving such a composition is defined as the global selection of causal links maximizing the quality of the composition, taking into account preferences and constraints defined by the end-user. To this end, an objective function maximizing the overall quality subject to causal links constraints is introduced. This leads to an NP-hard optimization problem [8] which is solved using integer linear programming methods. The remainder of this paper is organised as follows. In the next section we briefly review i) causal links, ii) a distinguishing criterion i.e., their robustness and iii) the causal link composition model. Section 3 defines the causal link quality criteria we require during the global selection phase. Section 4 formulates the problem of global causal link selection and describes an integer linear programming method to efficiently solve it. Section 5 presents its computational complexity and some experimentations. Section 6 briefly comments on related work. Finally section 7 draws some conclusions and talk about possible future directions.
2
Background
First of all, we present causal links. Then we remind the definition of their robustness, and finally describe causal link composition.
2.1
Web Service Composition & its Causal Links
In the semantic web, parameters (i.e., input and output) of services referred to concepts in a common ontology3 or Terminology T , where the OWL-S profile [1] or SA-WSDL [18] can be used to describe them (through semantic annotations). At functional level web service composition consists in retrieving some semantic links between output parameters Out si ∈ T of services si and input parameters In sj ∈ T of other services sj . Such a link i.e., causal link [14] cli,j (Figure 1) between two functional parameters of si and sj is formalized as si , SimT (Out si , In sj ), sj . Thereby si and sj are partially linked according to a matching function SimT . This function expresses which matching type is employed to chain services. The range of SimT is reduced to the four well known matching type introduced by [16] and the extra type Intersection [15]: • Exact If the output parameter Out si of si and the input parameter In sj of sj are equivalent; formally, T |= Out si ≡ In sj . • PlugIn If Out si is sub-concept of In sj ; formally, T |= Out si In sj . • Subsume If Out si is super-concept of In sj ; formally, T |= In sj Out si . • Intersection If the intersection of Out si and In sj is satisfiable; formally, T |= Out si In sj ⊥. 3
Distributed ontologies are not considered here but are largely independent of the problem addressed in this work.
46
F. Lécué et al. / Optimizing Causal Link Based Web Service Composition
• Disjoint Otherwise Out si and In sj are incompatible i.e., T |= Out si In sj ⊥. Out0 si
In0 si Ink si
si
Service
Inn si
Figure 1.
Service
In0 sj
In sj
Out si Outn si
Causal Link cl
2.2
Causal Link cli,j (SimT (Out si , In sj ))
sj
Service
Out sj
Inn sj Input Parameter
Output Parameter
Illustration of a Semantic Causal Link cli,j .
Robust Causal Link
The latter matching function SimT enables, at design time, finding some levels of semantic compatibilities (i.e., Exact, PlugIn, Subsume, Intersection) and incompatibilities (i.e., Disjoint) among independently defined web service descriptions. However, as emphasized by [13], the matching types Intersection and Subsume need some refinements to be fully efficient for causal links composition. Example 1. (Causal Link & Subsume Matching Type) Suppose s1 and s2 be two services such that the output parameter NetworkConnection of s1 is (causal) linked to the in1 put parameter SlowNetworkConnection of s2 (cl1,2 in Figure 3). This causal link is valued by a Subsume matching type since N etworkConnection SlowN etworkConnection (Figure 2). It is obvious that such a causal link should not be directly applied in a service composition since the NetworkConnection is not specific enough to be used by the input SlowNetworkConnection. Indeed the output parameter NetworkConnection requires some Extra Descriptions to ensure a composition of s1 and s2 .
Example 2. (Robustness, Extra & Common Description) Suppose the causal link presented in Example 1. Such a link is not robust enough (Definition 1) to be applied in a composition. The description missing in NetworkConnection to be used by the input parameter SlowNetworkConnection is defined by the Extra Description SlowN etworkConnection\N etworkConnection i.e., ∀netSpeed.Adsl1M . However the Common Description is not empty since this is defined by SlowN etworkConnection N etworkConnection i.e., ∀netP ro.P rovider. Robust causal links can be obtained by retrieving Extra Description that changes an Intersection in a PlugIn matching type, and a Subsume by an Exact matching type.
2.3
Causal Link Composition Model
In this work, the process model of web service composition and its causal links is specified by a statechart [10]. Its states refer to services whereas its transitions are labelled with causal links. In addition some basic composition constructs such as sequence, conditional branching (i.e., OR-Branching), structured loops, concurrent threads (i.e., AND-Branching), and inter-thread synchronization can be found. To simplify the presentation, we assume that all considered statecharts are acyclic and consists of only sequences, OR-Branching and AND-Branching. In case of cycle, a technique for unfolding statechart into its acyclic form needs to be applied beforehand. Details about this unfolding process are omitted for space reasons. Example 3. (Process Model of a Causal Link Composition) Suppose si,3≤i≤8 be six services extending Example 1 in a more complex composition. The process model of this composite service is illustrated in Figure 3. The composition consists in an OR-Branching and AND-Branching wherein nine causal links are involved.
A causal link valued by the Intersection matching type requires a comparable refinement. From this, [13] defined a robust causal link. N etworkConnection ≡ ∀netP ro.P rovider ∀netSpeed.Speed SlowN etworkConnection ≡ N etworkConnection ∀netSpeed.Adsl1M
T2 1 cl1,2
T1
s1
Network Connection
Adsl1M ≡ Speed ∀ mBytes.1M
1 cl1,4
Figure 2. Sample of an ALE domain ontology T . Causal Link cl
Definition 1. (Robust Causal link) A causal link si , SimT (Out si , In sj ), sj is robust iff the matching type between Out si and In sj is either Exact or PlugIn. Property 1. (Robust Web Service Composition) A composition is robust iff all its causal links are robust. A possible way to replace a link si , SimT (Out si , In sj ), sj valued by Intersection or Subsume in its robust form consists in computing the information contained in the input In sj and not in the output Out si . To do this, the difference or subtraction operation [7] for comparing ALE DL descriptions is adapted in [13]. Even if [20] previously presented an approach to capture the real semantic difference, the [7]’s difference is preferred since its result is unique. From this, in case a causal link si , SimT (Out si , In sj ), sj is neither valued by a Disjoint matchmaking nor robust, Out si and In sj are compared to obtain two kinds of information, a) the Extra Description In sj \Out si that refers to the information required but not provided by Out si to semantically link it with the input In sj of sj , and b) the Common Description Out si In sj that refers to the information required by In sj and effectively provided by Out si .
Figure 3.
s
Slow 2 Network Connection
T3 1 cl2,3
T6
s3 1 cl3,5
s4 Input Parameter
T5
s5
OR-Branching
T4
1 cl5,6
1 cl4,5
1 cl5,7
s6 AND Branching
T7
s7
1 cl6,8 T8
s8
1 cl7,8
Output Parameter T: Task s: Service
Illustration of an (Executable) Causal Link Composition.
The example 3 illustrates an executable composition wherein tasks Ti have been concretized by one of their candidate services e.g., here si . Indeed some services with common functionality, preconditions and effects although different input and output parameters are given and can be used to perform a target task in the composition. In this way we address the issue of composing a large and changing collection of semantic web services. In our approach the choice of services is done at composition time, only based on their causal links with A other services. Thus each abstract causal link cli,j between two tasks Ti , Tj of an abstract composition needs to be concretized. Ideally, a k,1≤k≤n relevant link is selected among its n candidate causal links cli,j between two of their services to obtain an executable composition. Example 4. (Tasks, Candidate Services & Causal Links) Let s2 be a candidate service for T2 with NetworkConnection 2 as input parameter. The causal link cl1,2 between s1 and s2 is then 1 2 more robust than cl1,2 . Indeed cl1,2 is valued by an Exact matching 1 type whereas cl1,2 is valued by a Subsume matching type.
F. Lécué et al. / Optimizing Causal Link Based Web Service Composition
3
Causal Link Quality Model
As previously presented, several candidate services are grouped together in every task of an abstract composition. A way to differenti1 2 ate their causal links (e.g., cl1,2 and cl1,2 in example 4) consists in considering their different functional quality criteria. To this end, we adopt a causal link quality model, effective to any causal link. In this section, we first present the quality criteria used for elementary causal links, before turning our attention to composite causal links. For each criterion, we provide a definition and indicates rules to compute its value for a given causal link.
3.1
Quality Criteria for Elementary Causal Links
We consider three generic quality criteria for elementary causal links cli,j defined by si , SimT (Out si , In sj ), sj : its i) Robustness, ii) Common Description rate, and iii) Matching Quality. • Robustness. The Robustness qr of a causal link cli,j is defined by 1 in case the link cli,j is robust (see Definition 1), and 0 otherwise. • Common Description rate. This rate4 qcd ∈ (0, 1] is defined by: qcd (cli,j ) =
|Out si In sj | |In sj \Out si | + |Out si In sj |
(1)
This criterion estimates the rate of descriptions which is well specified for upgrading a non robust causal link into its robust form. In (1), Out si In sj is supposed to be satisfiable since only relevant links between two services are considered in our model. • Matching Quality. The Matching Quality qm of a link cli,j is a value in (0, 1] defined by SimT (Out si , In sj ) i.e., either 1 (Exact), 34 (PlugIn), 12 (Subsume) and 14 (Intersection). The Disjoint match type is not considered since Out si In sj is satisfiable. In case we consider Out si In sj to be not satisfiable, it is straightforward to extend and adapt our quality model by computing contraction [9] between Out si and In sj . Given the above quality criteria, the quality vector of a causal link cli,j is defined as follows: ` ´ q(cli,j ) = qr (cli,j ), qcd (cli,j ), qm (cli,j ) (2) In case of services si and sj related by more than one causal link, the value of each criterion is retrieved by computing their average.
3.2
Quality Criteria for Causal Link Composition
The above quality criteria are also applied to evaluate the quality of any causal link composition c. To this end, Table 1 provides aggregation functions for such an evaluation. A brief explanation of each criterion’s aggregation function follows (here cl stands for cli,j ): • Robustness. On the one hand the robustness Qr of both a sequential and an AND-Branching composition c is defined as the average of its causal link cl’s robustness qr (cl). On the other hand the robustness of an OR-Branching causal link composition is a sum of qr (cl) weighted by pr i.e., the probability that causal link cl be chosen at run time. • Common Description rate. This Description rate Qcd of c is defined as its robustness, by simply changing qr (cl) by qcd (cl). • Matching Quality. The matching quality Qm of a sequential and AND-Branching causal link composition c is defined as a product of qm (cl). The matching quality of an OR-Branching causal link composition c is defined as Qr (c), by changing qr (cl) by qm (cl). 4
|.| refers to the size of ALE concept descriptions ([12] p.17) i.e., ||, |⊥|, . |A|, |¬A| and |∃r| is 1; |C D| = |C| + |D|; |∀r.C| and |∃r.C| is 1 + |C|. For instance |Adsl1M | is 3 in Figure 2.
47
Using the above aggregation functions, the quality vector of an executable causal link composition is defined by (3). For each criterion l ∈ {r, cd, m} the higher the value Ql for c the higher its lth quality. (3) Q(c) = (Qr (c), Qcd (c), Qm (c)) Even if criteria qr , qm used to value a single causal link are correlated, their aggregated values of compositions Qr , Qm for Sequential, AND-Branching are independent since they are computed from different functions i.e., linear for Qr , not for Qm . Thus a composition c with a high robustness may have either a high or low overall matching quality. We have the same conclusion on the other criteria. Composition Quality Criterion Construct Robustness Qr Com. Desc. rate Qcd Match. Qual. Qm Q Sequential/ 1 P 1 P q (cl) |cl| cl qcd (cl) cl qm (cl) AND- Branching |cl| cl r P P P OR-Branching cl qr (cl).pcl cl qcd (cl).pcl cl qm (cl).pcl Table 1. Quality Aggregation Rules for Causal Link Composition.
4
Global Causal Link Selection
In the following we study the optimal composition5 as the selection of causal links that optimize the overall quality of the composition. On the one hand the selection can be locally optimized at each abA stract causal link cli,j of the composition, but two main issues arise. k First, the local selection of a candidate link cli,j enforces a specific service for both tasks Ti and Tj . Thus, these constraints can no longer A ensure to select neither the best links for its closest abstract links clα,i A and clj,β nor the optimal composition (e.g., the best local selection in A 1 cl1,2 i.e., cl1,2 does not lead to the optimal composition in Figure 4). Secondly, quality constraints may be not satisfied, leading to a suboptimal composition e.g., a constraint with a robustness more than 70% cannot be enforced. On the other hand, the naive global approach considers an exhaustive search of the optimal composition among all A the executable compositions. Let |cli,j | be the number abstract links in an composition and n be the number of candidate services by task, A the total number of executable causal link compositions is n2.|cli,j | , making this approach impractical for large scale composition. Here, we address these issues by presenting an integer linear programming (IP) [21] based global causal link selection, which i) further constrains causal links, and ii) meets a given objective.
4.1
IP Based Global Selection & Objective Function
There are 3 inputs in an IP problem: an objective function, a set of integer decision variables (restricted to value 0 or 1), and a set of constraints (equalities or inequalities), where both the objective function and the constraints must be linear. IP attempts to maximize or minimize the value of the objective function by adjusting the values of the variables while enforcing the constraints. The problem of retrieving an optimal executable composition is mapped into an IP problem. Here we suggest to formalize its objective function. To this end, the robustness, common description rate and matching values of the p potential executable compositions i.e., Qλ,1≤λ≤p l,l∈{r,cd,m} have been first determined by means of aggregation functions in Table 1. Then, the latter quality values Qλr , Qλcd , Qλm has been scaled according to (4). ( min Qλ l −Ql ∼λ if Qmax − Qmin = 0 l l −Qmin Ql = Qmax (4) l ∈ {r, cd, m} l l max 1 if Ql − Qmin =0 l In (4), Qmax is the maximal value of the lth quality criteria whereas l min Ql is the minimal value of the lth quality criteria. This scaling phase complexity is linear in the number of abstract links in the composition. Finally, the objective function (5) of the IP problem follows. 5
The relation and combination with quality of services is not addressed here.
48
F. Lécué et al. / Optimizing Causal Link Based Web Service Composition
X
max
1≤λ≤p
!
“∼λ ” Ql × ωl
(5)
T1
Candidates
l∈{r,cd,m}
where Pωl ∈ [0, 1] is the weight assigned to the l quality criterion and l∈{r,cd,m} ωl = 1. In this way preferences on quality of the desired executable compositions can be done by simply adjusting ωl e.g., the Common Description rate could be weighted higher.
4.2
2 q(cl1,2 ) = (0, 35 , 12 )
Causal Link cl
Allocation Constraint. Only one candidate link should be selected A for each abstract link cli,j between tasks Ti and Tj . This constraint k,1≤k≤n in (6). is formalized by exploiting the integer variables yi,j n X k A yi,j = 1, ∀cli,j (6) k=1
Example 5. (Allocation Constraint) Suppose the sequential composition of tasks T1 , T2 , T3 in Figure 4. Two candidate causal links can be applied between tasks T1 and T2 1 2 i.e., cl1,2 , cl1,2 . Since only one candidate between two tasks will be 1 2 1 2 A selected, we have y1,2 + y1,2 = 1. We have y2,3 + y2,3 = 1 for cl2,3 . Incompatibility Constraint. Since the selection of a candidate k A for cli,j enforces a specific service for both tasks Ti causal link cli,j (e.g., si ) and Tj (e.g., sj ), the number of candidate links concretizA A ing its closest abstract links clα,i and clj,β is highly reduced. Indeed A A the candidate links for clj,β (clα,i ) have to use only input (output) parameters of sj (si ). Thus, a constraint (7) for each pair of incomk l patible candidate links (cli,j , clj,β ) is required in our IP problem. k l A A yi,j + yj,β ≤ 1, ∀cli,j ∀clj,β
(7)
Example 6. (Incompatibility Constraint) Suppose the composition in Figure 4. According to (7), the incompat1 2 2 1 ibility constraints are i) y1,2 + y2,3 ≤ 1, ii) y1,2 + y2,3 ≤ 1. Indeed 1 2 2 1 (cl1,2 , cl2,3 ), (cl1,2 , cl2,3 ) are pairs of incompatible candidate links since task T2 cannot be performed by two distinct services sa and sb . Besides (6), (7), IP constraints on the quality criteria of the whole abstract composition are required. Here, we focus on the sequential, AND-Branching compositions, but a similar formalization for ORBranching compositions and a fortiori their combinations is required. k Robustness Constraint. Let ri,j be a function of (i, j, k) representk ing the robustness quality of a causal link cli,j . Constraint (8) is required to capture the robustness quality of a causal link composition. n 1 XX k k Qr = A ri,j .yi,j (8) |cli,j | A k=1 cli,j
An additional constraint (9) can be used to constrain the robustness quality of the executable composition to not be lower than L. n 1 XX k k ri,j .yi,j ≥ L, L ∈ [0, 1] (9) A |cli,j | A k=1 cli,j
Common Description Rate Constraint. Let cdki,j be a function of k (i, j, k) representing the Common Description rate of a link cli,j . Its k constraint is defined as (8), (9) by replacing Qr by Qcd , ri,j by cdki,j .
A Candidates cl2,3
sb
T3
Candidates
1 q(cl2,3 ) = (0, 15 , 14 )
sa
Input Parameter
Figure 4.
Integer Variables & Constraints of IP Problem
k,1≤k≤n A For every candidate link cli,j of an abstract link cli,j , we ink clude an integer variable yi,j in the IP problem indicating the seleck k tion or exclusion of link cli,j . By convention yi,j is 1 if the kth cank A didate link cli,j is selected to concretize cli,j between tasks Ti and Tj , 0 otherwise. The selected links will form an optimal executable composition satisfying (5) and meeting the following constraints:
T2
Candidates
1 q(cl1,2 ) = (1, 1, 1)
s1
th
A Candidates cl1,2
sα
2 q(cl2,3 ) = (1, 1, 1)
Output Parameter T: Task s: Service
Tasks, Candidate Services & Causal Links.
Matching Quality Constraint. Among the criteria used to select causal links, the Matching quality is associated with a nonlinear aggregation function (see Table 1). A transformation in a linear function is then required to capture it in the IP problem. Assume mki,j be a function of (i, j, k) representing the Matching quality of causal link k cli,j . The overall Matching quality of the executable composition is: n ” Y “Y k Qm = (mki,j )yi,j (10) clA i,j
k=1
The Matching quality constraints can be linearised by applying the logarithm function ln. Equation (10) then becomes: ! n X X k k ln(Qm ) = ln(mi,j ).yi,j (11) clA i,j
Pn
k=1
k = 1 and = 1 or 0 for each causal link cli,j . since ln(Qm ) is formalized to capture the Matching quality in our work. Changing a nonlinear constraint in its linear form requires also to linearise the objective function. Thus, (12) is replaced by (13) in (4). k k=1 yi,j
k yi,j
Qλm − Qmin m Qmax − Qmin m m
(12)
ln(Qλm ) − ln(Qmin m ) (13) min ln(Qmax m ) − ln(Qm )
Local Constraint. The IP problem can also include local selection and encompass local constraints. Such constraints can then predicate on properties of a single link and can be formally included in the A model. In case a target causal link cli,j requires its local robustness to be higher than a given value v, this constraint is defined by (14). n X k k ri,j .yi,j > v, v ∈ [0, 1] (14) k=1
Local constraints are enforced during the causal links selection. Those which violate the local constraints are filtered from the list of candidate links, reducing the number of variables of the model. The proposed method for translating the problem of selecting an optimal execution composition into an IP problem is generic and, although it has been illustrated with criteria introduced in Section 3, other semantic criteria to value causal links can be accommodated.
5
Computational Complexity & Experimentation
The optimization problem formulated in section 4 , which is equivalent to an IP problem, is NP-hard [17]. In case the number of abstract and candidate causal links is expected to be very high, finding the exact optimal solution to such a problem takes exponential run-time complexity in the worst case, so no practical. However our approach scales well by running a heuristic based IP solver wherein hundreds of abstract and candidate causal links are involved. This is a suitable upper bound for practicable industrial applications. We conducted experiments on an Intel(R) Core(TM)2 CPU, 1.86GHz with 512 RAM. Compositions with up to 500 abstract causal links and 100 candidates for each abstract link have been considered. In our experiments we assumed that robustness, common
F. Lécué et al. / Optimizing Causal Link Based Web Service Composition
Computation Cost (ms)
description rate and matching quality of each causal link have been inferred in a pre-processing step of semantic reasoning. From these, the IP model formulation is computed, and the optimization problem is solved by running CPLEX, a state of the art integer linear programming solver based on the branch and cut technique 6 [21]. The experimentation (Figure 5) aimed at comparing the global selection based approach by IP with the local optimization and naive global selection (i.e., exhaustive search). We measured the computation cost (in ms) of selecting causal links to create an optimal executable composition under the three different selection approaches. 10000 Global Selection Using Exhaustive Search Global Selection Using IP Local Optimization Based-Selection
8000 6000 4000 2000 0 0
100 200 300 400 Number of Abstract Causal Links in Composition
500
Figure 5. Number of Abstract Causal Links vs. Computation Cost for Optimal Executable Composition. (100 candidates for each causal links).
The computation cost of global selection by exhaustive search is very high even in very small scale in aspect of the number of abstract causal links and their candidates. Although the computation cost of global selection by IP is higher than that of local optimization, it is still acceptable. Finding the optimal solution to the optimization problem takes 10 seconds for a composition of 450 abstract causal links with 100 candidate links (i.e., 10 candidate services by task). In case of higher number of links, the problem can be, for instance, divided in several global selection problems. Alternatively, suboptimal solutions satisfying revisited quality thresholds can be sufficient.
6
Related Work
Despite considerable work in the area of service composition, few efforts have specifically addressed optimization in ’causal link’-based service composition. Even if [13] introduce validity and robustness in causal link composition, no quality model is explicitly supported. In addition, the most valid and robust compositions are only addressed in their future work. In contrast, we present a model with various types of quality criteria used for optimizing the composition. Unlike our work that considers quality of causal links, [23, 2] focused on QoS-aware service composition. To this end, they suggest a QoS-driven approach to select candidate services valued by non functional criteria such as price, execution time, and reliability. In the same way as our approach, they consider their problem as an optimization problem. Towards this issue different strategies as optimization techniques can be adopted, e.g., Integer Programming [23], Genetic Algorithms (GAs) [8], or Constraint Programming [11]. As discussed in [8], GAs better handle non-linearity of aggregation functions, and better scale up when the number of candidate services for each abstract service is high. In IP based approaches all quality criteria are used for specifying both constraints and objective function. In contrast to our problem the incompatibility constraints are not required since they assume independence between the services of any task. The global selection problem is also modelled as a knapsack problem [22], wherein [3] performed dynamic programming to solve the problem. Unfortunately all the previous QoS-aware service composition approaches consider only causal links valued by an Exact match. The causal link quality is then disregarded by these approach. 6
LINDO API version 5.0, Lindo Systems Inc. http://www.lindo.com/
7
49
Conclusion and Future Work
In this work we study causal links based semantic web service composition. Our approach has been directed to meet the main challenge facing this problem i.e., how effectively retrieve optimal compositions of causal links. To this end we have first presented a general and extensible model to evaluate quality of both elementary and composition of causal links. Since the global causal link selection is formalized as an optimization problem, IP techniques are used to compute optimal executable composition of services. Our global selection based approach is not only more suitable than the local approach but also outperforms the naive approach. Moreover the experimental results show an acceptable computation cost of the IP-based global selection for a high number of abstract and candidates causal links. Since several executable compositions maximizing the overall quality of causal links may be retrieved, the main direction for future work is to consider optimality for quality of service (driven by empirical analysis of compositions usage) to further optimize them.
REFERENCES [1] Anupriya Ankolenkar, Massimo Paolucci, Naveen Srinivasan, and Katia Sycara, ‘The owl-s coalition, owl-s 1.1’, Technical report, (2004). [2] Danilo Ardagna and Barbara Pernici, ‘Adaptive service composition in flexible processes’, IEEE Trans. Software Eng., 33(6), 369–384, (2007). [3] Ismailcem Budak Arpinar, Ruoyan Zhang, Boanerges Aleman-Meza, and Angela Maduko, ‘Ontology-driven web services composition platform’, Inf. Syst. E-Business Management, 3(2), 175–199, (2005). [4] F. Baader and W. Nutt, in The Description Logic Handbook: Theory, Implementation, and Applications, (2003). [5] Rainer Berbner, Michael Spahn, Nicolas Repp, Oliver Heckmann, and Ralf Steinmetz, ‘Heuristics for qos-aware web service composition’, in ICWS, pp. 72–82, (2006). [6] Tim Berners-Lee, James Hendler, and Ora Lassila, ‘The semantic web’, Scientific American, 284(5), 34–43, (2001). [7] S. Brandt, R. Kusters, and A. Turhan, ‘Approximation and difference in description logics’, in KR, pp. 203–214, (2002). [8] Gerardo Canfora, Massimiliano Di Penta, Raffaele Esposito, and Maria Luisa Villani, ‘An approach for qos-aware service composition based on genetic algorithms’, in GECCO, pp. 1069–1075, (2005). [9] Simona Colucci, Tommaso Di Noia, Eugenio Di Sciascio, Francesco M. Donini, and Marina Mongiello, ‘Concept abduction and contraction in description logics’, in DL, (2003). [10] David Harel and Amnon Naamad, ‘The statemate semantics of statecharts’, ACM Trans. Softw. Eng. Methodol., 5(4), 293–333, (1996). [11] Ahlem Ben Hassine, Shigeo Matsubara, and Toru Ishida, ‘A constraintbased approach to horizontal web service composition’, in ISWC, pp. 130–143, (2006). [12] Ralf K¨usters, Non-Standard Inferences in Description Logics, volume 2100 of Lecture Notes in Computer Science, Springer, 2001. [13] Freddy L´ecu´e and Alexandre Delteil, ‘Making the difference in semantic web service composition.’, in AAAI, pp. 1383–1388, (2007). [14] Freddy L´ecu´e and Alain L´eger, ‘A formal model for semantic web service composition’, in ISWC, pp. 385–398, (2006). [15] L. Li and I. Horrocks, ‘A software framework for matchmaking based on semantic web technology’, in WWW, pp. 331–339, (2003). [16] M. Paolucci, T. Kawamura, T.R. Payne, and K. Sycara, ‘Semantic matching of web services capabilities’, in ISWC, pp. 333–347, (2002). [17] Christos H. Papadimitriou, ‘On the complexity of integer programming’, J. ACM, 28(4), 765–768, (1981). [18] K. Sivashanmugam, K. Verma, A. Sheth, and J. Miller, ‘Adding semantics to web services standards’, in ICWS, pp. 395–401, (2003). [19] Michael K. Smith, Chris Welty, and Deborah L. McGuinness, ‘Owl web ontology language guide’, W3c recommendation, W3C, (2004). [20] Gunnar Teege, ‘Making the difference: A subtraction operation for description logics’, in KR, pp. 540–550, (1994). [21] L. Wolsey, Integer Programming, John Wiley and Sons, 1998. [22] Tao Yu, ‘Service selection algorithms for composing complex services with multiple qos constraints’, in ICSOC, pp. 130–143, (2005). [23] Liangzhao Zeng, Boualem Benatallah, Marlon Dumas, Jayant Kalagnanam, and Quan Z. Sheng, ‘Quality driven web services composition’, in WWW, pp. 411–421, (2003).
50
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-50
Extending the Knowledge Compilation Map: Closure Principles H´el`ene Fargier1 and Pierre Marquis2 Abstract. We extend the knowledge compilation map introduced by Darwiche and Marquis with new propositional fragments obtained by applying closure principles to several fragments studied so far. We investigate two closure principles: disjunction and implicit forgetting (i.e., existential quantification). Each introduced fragment is evaluated w.r.t. several criteria, including the complexity of basic queries and transformations, and its spatial efficiency is also analyzed.
1
INTRODUCTION
This paper is concerned with knowledge compilation (KC). The key idea underlying KC is to pre-process parts of the available data (i.e., turning them into a compiled form) for improving the efficiency of some computational tasks (see among others [2, 1, 10, 4]). A research line in KC [7, 3] addresses the following important issue: How to choose a target language for knowledge compilation? In [3], the authors argue that the choice of a target language for a compilation purpose in the propositional case must be based both on the set of queries and transformations which can be achieved in polynomial time when the data to be exploited are represented in the language, as well as the spatial efficiency of the language (i.e., its ability to represent data using little space). Thus, the KC map reported in [3] is an evaluation of dozen of significant propositional languages (called propositional fragments) w.r.t. several dimensions: the spatial efficiency (i.e., succinctness) of the fragment and the class of queries and transformations it supports in polynomial time. The basic queries considered in [3] include tests for consistency, validity, implicates (clausal entailment), implicants, equivalence, sentential entailment, counting and enumerating theory models (CO, VA, CE, EQ, SE, IM, CT, ME). The basic transformations are conditioning (CD), (possibly bounded) closures under the connectives ∧, ∨, and ¬ ( ∧ C, ∧BC, ∨C, ∨BC, ¬C) and (possibly bounded) forgetting which can be viewed as a closure operation under existential quantification (FO, SFO). The KC map reported in [3] has already been extended to new propositional languages, queries and transformations in [12, 5, 11]. In this paper, we extend the KC map with new propositional fragments obtained by applying closure principles to several fragments studied so far. Intuitively, a closure principle is a way to define a new propositional fragment from a previous one. In this paper, we investigate in detail two disjunctive closure principles, disjunction (∨) 1 2
IRIT-CNRS, Universit´e Paul Sabatier, France, email:
[email protected] Universit´e Lille-Nord de France, Artois, CRIL UMR CNRS 8188, France, email:
[email protected] and implicit forgetting (∃), and their combinations. Roughly speaking, the disjunction principle when applied to a fragment C leads to a fragment C[∨] which allows disjunctions of formulas from C, while implicit forgetting applied to a fragment C leads to a fragment C[∃] which allows existentially quantified formulas from C. Obviously enough, whatever C, C[∨] satisfies polytime closure under ∨ (∨C) and C[∃] satisfies polytime forgetting (FO). Applying any/both of those two principles may lead to new fragments, which can prove strictly more succinct than the underlying fragment C; interestingly, this gain in efficiency does not lead to a complexity shift w.r.t. the main queries and transformations; indeed, among other things, our results show that whenever C satisfies CO (resp. CD), then C[∨] and C[∃] satisfy CO (resp. CD). The remainder of this paper is organized as follows. In Section 2, we define the language of quantified propositional DAGs. In Section 3, we extend the usual notions of queries, transformations and succinctness to this language. In Section 4, we introduce the general principle of closure by a connective or a quantification before focusing on the disjunctive closures of the fragments considered in [3] and studying their attractivity for KC, thus extending the KC map. In Section 5, we discuss the results. Finally, Section 6 concludes the paper.
2
A GLIMPSE AT QUANTIFIED PDAGS
All the propositional fragments we consider in this paper are subsets of the following language of quantified propositional DAGs QPDAG: Definition 1 (quantified PDAGs) Let P S be a denumerable set of propositional variables (also called atoms). • QPDAG is the set of all finite, single-rooted DAGs α (called formulas) where each leaf node is labeled by a literal over P S or one of the two Boolean constants or ⊥, and each internal node is labeled by ∧ or ∨ and has arbitrarily many children or is labeled by ¬, ∃x or ∀x (where x ∈ P S) and has just one child. • Qp PDAG is the subset of all proper formulas of QPDAG, where a formula α is proper iff for every literal l = x or l = ¬x labelling a leaf of α, at most one path from the root of α to this leaf contains quantifications of the form ∃x or ∀x, and if such a path exists, it is the unique path from the root of α to the leaf. Restricting the language QPDAG to proper formulas α ensures that every occurrence of a variable x corresponding to a literal at a leaf of α depends on at most one quantification on x, and is either free or bound. As a consequence (among others), conditioning a proper formula can be achieved as usual (without requiring any duplication of nodes).
H. Fargier and P. Marquis / Extending the Knowledge Compilation Map: Closure Principles
PDAG [12] is the subset of Qp PDAG obtained by removing the possibility to have internal nodes labeled by ∃ or ∀; PDAG-NNF [3] (resp. ∃PDAG-NNF, resp. ∀PDAG-NNF) is the subset of Qp PDAG obtained by removing the possibility to have internal nodes labeled by ¬, ∃ or ∀ (resp. ¬, ∀, resp. ¬, ∃). Distinguished formulas from QPDAG are the literals over P S; if V is any subset of P S, LV denotes the set of all literals built over V , i.e., {x, ¬x | x ∈ V }. If a literal l of LP S is an atom x from P S, it is said to be a positive literal; otherwise it has the form ¬x with x ∈ P S and it is said to be a negative literal. If l is a literal built up from the atom x, we have var(l) = x. A clause (resp. a term) is a (finite) disjunction (resp. conjunction) of literals or the constant ⊥ (resp. ). The size |α| of any QPDAG formula α is the number of nodes plus the number of arcs in α. The set V ar(α) of free variables of a Qp PDAG formula α is defined in the standard way. Let I be an interpretation over P S (i.e., a total function from P S to BOOL = {0, 1}). The semantics of a QPDAG formula α in I is the truth value from BOOL defined inductively in the standard way; the notions of model, logical consequence (|=) and logical equivalence (≡) are also as usual. Finally, if α ∈ QPDAG and X = {x1 , . . . , xn } ⊆ P S, then ∃X.α (resp. ∀X.α) is a short for ∃x1 .(∃x2 .(...∃xn .α)...) (resp. ∀x1 .(∀x2 .(...∀xn .α)...)) (this notation is well-founded since whatever the chosen ordering on X, the resulting formulas are logically equivalent).
3
QUERIES, TRANSFORMATIONS, AND SUCCINCTNESS
The following queries CO, VA, CE, EQ, SE, IM, CT, ME for PDAG-NNF formulas have been considered in [3]; their importance is discussed in depth in [3], so we refrain from recalling it here; we extend them to Qp PDAG formulas and add to them the MC query (model checking), which is trivial for PDAG formulas (every formula from PDAG satisfies MC), but not for Qp PDAG formulas. Definition 2 (queries)
Let C denote any subset of Qp PDAG.
• C satisfies CO (resp. VA) iff there exists a polytime algorithm that maps every formula α from C to 1 if α is consistent (resp. valid), and to 0 otherwise. • C satisfies MC iff there exists a polytime algorithm that maps every formula α from C and every interpretation I over V ar(α) to 1 if I is a model of α, and to 0 otherwise. • C satisfies CE iff there exists a polytime algorithm that maps every formula α from C and every clause γ to 1 if α |= γ holds, and to 0 otherwise. • C satisfies EQ (resp. SE) iff there exists a polytime algorithm that maps every pair of formulas α, β from C to 1 if α ≡ β (resp. α |= β) holds, and to 0 otherwise. • C satisfies IM iff there exists a polytime algorithm that maps every formula α from C and every term γ to 1 if γ |= α holds, and to 0 otherwise. • C satisfies CT iff there exists a polytime algorithm that maps every formula α from C to a nonnegative integer that represents the number of models of α over V ar(α) (in binary notation). • C satisfies ME iff there exists a polynomial p(., .) and an algorithm that outputs all models of an arbitrary formula α from C in time p(n, m), where n is the size of α and m is the number of its models (over V ar(α)). The following transformations for PDAG-NNF formulas have been considered in [3]; again, we extend them to Qp PDAG formulas:
Definition 3 (transformations) Qp PDAG.
51
Let C denote any subset of
• C satisfies CD iff there exists a polytime algorithm that maps every formula α from C and every consistent term γ to a formula from C that is logically equivalent to the conditioning α | γ of α on γ, i.e., the formula obtained by replacing each free occurrence of variable x of α by (resp. ⊥) if x (resp. ¬x) is a positive (resp. negative) literal of γ. • C satisfies FO iff there exists a polytime algorithm that maps every formula α from C and every subset X of variables from PS to a formula from C equivalent to ∃X.α. If the property holds for each singleton X, we say that C satisfies SFO. • C satisfies ∧C (resp. ∨C) iff there exists a polytime algorithm that maps every finite set of formulas α1 , . . . , αn from C to a formula of C that is logically equivalent to α1 ∧ . . . ∧ αn (resp. α1 ∨ . . . ∨ αn ). • C satisfies ∧BC (resp. ∨BC) iff there exists a polytime algorithm that maps every pair of formulas α and β from C to a formula of C that is logically equivalent to α ∧ β (resp. α ∨ β). • C satisfies ¬C iff there exists a polytime algorithm that maps every formula α from C to a formula of C logically equivalent to ¬α. Finally, the following notion of succinctness (modeled as a preorder over propositional fragments) has been considered in [3]; we also extend it to QPDAG formulas: Definition 4 (succinctness) Let C1 and C2 be two subsets of QPDAG. C1 is at least as succinct as C2 , denoted C1 ≤s C2 , iff there exists a polynomial p such that for every formula α ∈ C2 , there exists an equivalent formula β ∈ C1 where |β| ≤ p(|α|). ∼s is the symmetric part of ≤s defined by C1 ∼s C2 iff C1 ≤s C2 and C2 ≤s C1 . <s is the asymmetric part of ≤s defined by C1 <s C2 iff C1 ≤s C2 and C2 ≤s C1 .
4 4.1
EXTENDING THE KC MAP BY DISJUNCTIVE CLOSURES Closure Principles
Intuitively, a closure principle is a way to define a new propositional fragment starting from a previous one, through the application of “operators” (i.e., connectives or quantifications):3 Definition 5 (closures) Let C be a subset of QPDAG and be any finite subset of {∨, ∧, ¬, ∃, ∀}. C[] is the subset of QPDAG inductively defined as follows:4 • if α ∈ C, then α ∈ C[], • if δ ∈ ∩ {∨, ∧}, and αi ∈ C[] with i ∈ 1 . . . n and n > 0, then δ(α1 , . . . , αn ) ∈ C[], • if ¬ ∈ and α ∈ C[], then ¬α ∈ C[], • if δ ∈ ∩ {∀, ∃}, α ∈ C[], and x ∈ P S then δx.α ∈ C[]. Observe that if C ⊆ Qp PDAG then C[] ⊆ Qp PDAG: closure does not question properness. We also have the following easy proposition, which makes precise the interplay between elements of in the general case: 3
Other closure principles could have been defined in a similar way, would the underlying propositional language contain other connectives. 4 In order to alleviate the notations, when = {δ , . . . , δ }, we shall write n 1 C[δ1 , . . . , δn ] instead of C[{δ1 , . . . , δn }].
52
H. Fargier and P. Marquis / Extending the Knowledge Compilation Map: Closure Principles
Proposition 1 For every subset C of QPDAG and every finite subsets 1 , 2 of {∨, ∧, ¬, ∃, ∀}, we have: • • • •
C[∅] = C. If 1 ⊆ 2 then C[1 ] ⊆ C[2 ]. (C[1 ])[2 ] ⊆ C[1 ∪ 2 ]. If 1 ⊆ 2 or 2 ⊆ 1 then (C[1 ])[2 ] = C[1 ∪ 2 ].
Before focusing on some specific “operators”, we add to succinctness the following notions of polynomial translation and polynomial equivalence, which prove helpful in the following evaluations: Definition 6 (polynomial translation) Let C1 and C2 be two subsets of QPDAG. C1 is said to be polynomially translatable into C2 , noted C1 ≥P C2 , iff there exists a polytime algorithm f such that for every α ∈ C1 , we have f (α) ∈ C2 and f (α) ≡ α. Like ≥s , ≥P is a preorder (i.e., a reflexive and transitive relation) over the power set of QPDAG. It refines the spatial efficiency preorder ≥s over QPDAG in the sense that for any two subsets C1 and C2 of QPDAG, if C1 ≥P C2 , then C1 ≥s C2 (but the converse does not hold in general). Thus, if C1 is polynomially translatable into C2 , we have that C2 is at least as succinct as C1 . Furthermore, whenever C1 is polynomially translatable into C2 , every query which is supported in polynomial time in C2 also is supported in polynomial time in C1 ; and conversely, every query which is not supported in polynomial time in C1 unless the polynomial hierarchy collapses cannot be supported in polynomial time in C2 , unless the polynomial hierarchy collapses. The corresponding indifference relation ∼P given by C1 ∼P C2 iff C1 ≥P C2 and C2 ≥P C1 , is an equivalence relation; when C1 ∼P C2 , C1 and C2 are said to be polynomially equivalent. Obviously enough, polynomially equivalent fragments are equally efficient (and succinct) and possess the same set of tractable queries and transformations. Before presenting some useful polynomial equivalences, we first need to introduce the notion of stability under uniform renaming. It characterizes the subsets C of Qp PDAG for which, intuitively, the choice of variables names does not really matter; technically it allows to rename (bound) variables in a formula α of C without leaving the fragment. Definition 7 (stability under uniform renaming) Let C be any subset of Qp PDAG. C is stable under uniform renaming iff for every α ∈ C, there exists arbitrarily many distinct bijections r from V ar(α) to subsets V of fresh variables from P S (i.e., not occurring in α) such that the formula r(α) obtained by replacing in α (in a uniform way) every free occurrence of x ∈ V ar(α) by r(x) belongs to C as well. We are now ready to present more specific results: Proposition 2 Let C be any subset of Qp PDAG, s.t. C is stable under uniform renaming. We have: • (C[∃])[∨] ∼P (C[∨])[∃] ∼P C[∨, ∃]. • (C[∀])[∧] ∼P (C[∧])[∀] ∼P C[∧, ∀]. It is important to note that such polynomial equivalences, showing in some sense that the “sequential” closure of a propositional fragment stable under uniform renaming by a set of “operators” among {∨, ∃} (resp. among {∧, ∀}) is equivalent to its “parallel” closure, cannot be systematically guaranteed for any choices of fragments
and “operators”. For instance, if C is the set LP S ∪ {, ⊥}, then (C[∨])[∧] is the set of all CNF formulas, (C[∧])[∨] is the set of all DNF formulas, and C[∨, ∧] is the set of all PDAG-NNF formulas. From the succinctness results reported in [3], it is easy to conclude that those three fragments are not pairwise polynomially equivalent. Similarly, if C is the set of all clauses over P S, then (C[∧])[∃] and C[∧, ∃] are polynomially equivalent to CNF[∃], but (C[∃])[∧] is polynomially equivalent to CNF, which is not polynomially equivalent to CNF[∃] (this follows from the forthcoming Proposition 8).
4.2
Disjunctive Closures
In the rest of this paper, we will focus on the two disjunctive closure principles [∨] (closure by disjunction), [∃] (closure by forgetting), and their combinations. At the start, this choice was motivated by the fact that any closure C[∃] obviously satisfies forgetting, which is an important transformation for a number of applications, including planning, diagnosis, reasoning about action and change, reasoning under inconsistency (see e.g. [2, 8, 9] for details), while any closure C[∨] clearly preserves the crucial query CO and transformation CD. Our purpose is now to locate on the KC map all languages obtained by applying the disjunctive closure principles to the eight languages PDAG-NNF, DNNF, CNF, OBDD< DNF, PI, IP, MODS considered (among others) in [3]; all those languages are subsets of PDAG: • PDAG-NNF is the subset of PDAG consisting of negation normal form formulas. • DNNF is the subset of PDAG-NNF consisting of decomposable negation normal form formulas. • CNF is the subset of PDAG-NNF consisting of conjunctive normal form formulas. • OBDD< is the subset of DNNF consisting of ordered binary decision diagrams. < is a strict and complete ordering over P S and we assume the ordered set (P S, 1, |DIψ (c)| = 1},
1 ϕ= N H (ψ)
=
{c ∈ ϕ | |P (c)| > 1, |DIψ (c)| = 1}.
The following example illustrates the definitions. Example 4 Suppose that ϕ = c1 ∧ c2 ∧ c3 , where c1
=
(¬x1 ∨ ¬x3 ∨ x4 ∨ x6 )
c2
=
(¬x1 ∨ ¬x2 ∨ x3 ∨ ¬x5 ∨ x6 )
c3
=
(¬x1 ∨ ¬x2 ∨ ¬x3 ∨ ¬x5 ∨ x6 )
and ψ = (¬x1 ∨ ¬x3 ∨ x4 )(¬x1 ∨ ¬x2 ∨ ¬x5 ∨ x6 ). Then DIψ (c1 )
=
{ ¬x1 ∨ ¬x3 ∨ x4 }
DIψ (c2 )
=
{ ¬x1 ∨ ¬x2 ∨ ¬x5 ∨ x6 },
DIψ (c3 )
=
{c3 },
1 and ϕH = c3 , ϕ1N H (ψ) = { c1 , c2 }, ϕ= N H (ψ) = ∅. As predicted, 1 ϕ = ϕH ∧ ϕ1N H (ψ) ∧ ϕ= N H (ψ) = c3 ∧ c1 ∧ c2 .
Using the above concepts, we now characterize Horn cores. Definition 6 (ϕ∗ψ ) Given a CNF ϕ and formula ψ, define the CNF 1 ϕ∗ψ = ϕH ∧ μϕ (ψ) ∧ ϕ= N H (ψ),
where μϕ (ψ) = {c ∈ DIψ (c) | c ∈ ϕ1N H (ψ)}. That is, replace non-Horn clauses c in the CNF for ϕ by strengthenings to definite clauses c , if c is the only such clause implied by ψ. We have now the following result. Theorem 1 (Horn Core Characterization) A given Horn CNF ψ is a Horn core of a CNF ϕ if and only if ψ ≡ ϕ∗ψ . The formal proof is omitted here. Intuitively, by construction of ϕ∗ψ any Horn core ψ of ϕ must fulfill ψ ≤ ϕ∗ψ . On the other hand, ϕ∗ψ ≤ ϕ; thus if ϕ∗ψ is equivalent to ψ, it must be a Horn core. Example 5 (cont’d) In Example 4, we had ϕH = {c3 }, ϕ1N H (ψ) = 1 { c1 , c2 }, and ϕ= N H (ψ) = ∅. We thus obtain μϕ (ψ) = DIψ (c1 ) ∪ DIψ (c2 ) = { ¬x1 ∨ ¬x3 ∨ x4 , ¬x1 ∨ ¬x2 ∨ ¬x5 ∨ x6 }, and thus ϕ∗ψ = ϕH ∧ μϕ (ψ) = (¬x1 ∨ ¬x2 ∨ ¬x3 ∨ ¬x5 ∨ x6 ) ∧ (¬x1 ∨¬x3 ∨ x4 )(¬x1 ∨¬x2 ∨¬x5 ∨ x6 ). Now ψ = (¬x1 ∨ ¬x3 ∨ x4 ) ∧ (¬x1 ∨ ¬x2 ∨ ¬x5 ∨ x6 ) ≡ ϕ∗ψ ; hence, ψ is a Horn core of ϕ.
5
Computation
We now turn to computing a Horn core of a Horn disjunction, for which we exploit the characterization in the previous section. Our strategy is to increase an initial Horn CNF repeatedly, until we arrive at a Horn CNF that satisfies the condition in Theorem 1. To this end, we first consider recognizing a Horn core, and show that the problem is polynomial if a CNF for ϕ is constructible in polynomial time.
5.1 Recognizing Horn Cores We observe the following facts. Lemma 2 Let ϕ1 ∨ · · · ∨ ϕl be a disjunction of l ≥ 2 Horn CNFs ϕi , and let ϕ be a CNF for it. Let ψ be a Horn CNF. Then, 1. ϕ∗ψ is constructible from ϕ and ψ in polynomial time; 2. checking whether ψ ≤ ϕ∗ψ is feasible in polynomial time; 3. checking whether ϕ∗ψ ≤ ψ is feasible in polynomial time.
T. Eiter and K. Makino / New Results for Horn Cores and Envelopes of Horn Disjunctions
Proof Items 1 and 2 are clearly feasible in polynomial time (note that ϕ∗ψ is a CNF). For item 3, we rewrite ϕ∗ψ as a Horn disjunction: ϕ∗ψ
=
1 ϕH ∧ μϕ (ψ) ∧ ϕ= N H (ψ)
≡
1 1 ϕH ∧ μϕ (ψ) ∧ ϕ= N H (ψ) ∧ ϕN H (ψ)
≡
ϕ ∧ μϕ (ψ) ≡ (ϕ1 ∧ μϕ (ψ)) ∨ · · · ∨ (ϕl ∧ μϕ (ψ)).
As α ∨ β ≤ γ iff α ≤ γ and β ≤ γ, we can check for i = 1, . . . , l 2 that ϕi ∧ μϕ (ψ) ≤ ψ; this is feasible in polynomial time. In particular, if l is bounded by a constant, a CNF ϕ for ϕ1 ∨ · · · ∨ ϕl is computable in polynomial time by simple means (e.g., ϕ := S(ϕ1 , . . . , ϕl )). We thus obtain the following result. Theorem 2 Deciding whether a given Horn CNF ψ is a Horn core of a given Horn disjunction ϕ = ϕ1 ∨ · · · ∨ ϕl , l ≥ 2, is feasible in polynomial time, if a CNF for ϕ is computable in polynomial time. In particular, if l is bounded by a constant, this is decidable in time O(max{n, l}n|ψ|Πli=1 |ϕi |) (here |γ| is the number of clauses in γ). Here and later, we assume in the time analysis that clauses c are represented by bitmaps (of size n) for P (c) and N (c).
5.2
Algorithm N EWCORE Input: Horn CNFs ψ, ϕ1 , . . . ϕl , l ≥ 2. Output: A Horn core ψ of ϕ = ϕ1 ∨ · · · ∨ ϕl such that ψ ≤ ψ ≤ ϕ, or “no” if none exists. Step 1. convert ϕ to a CNF α (e.g., α := S(ϕ1 , . . . , ϕl )); if ψ ≤ α then return “no’; S =1 := {c ∈ α | |P (c)| > 1, |DIψ (c)| = 1}; β := {N (c) ∪ {xj } | c ∈ S =1 , xj ∈ P (c)}; μ := {c ∈ DIψ (c) | c ∈ α, |P (c)| > 1, |DIψ (c)| = 1}; ψ := ψ; Step 2. while ϕi ∧ μ ≤ ψ for some i ∈ {1, . . . , l} do // (ϕψ ≤ ψ ) begin select v ∈ {0, 1}n witnessing ϕi ∧ μ ≤ ψ β := β − { c ∈ β | c(v) = 0 }; for each c ∈ S =1 do if a single clause c ∈ β fulfills N (c ) = N (c), P (c ) ⊆ P (c) then begin S =1 := S =1 − { c }; μ := μ ∪ { c }; end ψ := {c ∈ α | |P (c)| ≤ 1} ∪ μ ∪ β; end{while}; Step 3. Output ψ . Figure 3. New algorithm for Horn core computation
Constructing a Horn Core
We now present our algorithm to construct a Horn core of a Horn disjunction ϕ = ϕ1 ∨ · · · ∨ ϕl that contains a given Horn CNF ψ. If ψ ≤ ϕ, then obviously there is no Horn core ψ of ϕ such that ψ ≤ ψ ≤ ϕ. Otherwise, we can construct some such ψ by iteratively increasing ψ, exploiting the characterization in Theorem 1. The following lemma is crucial. Lemma 3 Suppose ψ ≤ ϕ and ψ ≡ ϕ∗ψ . Then, there exists some v ∈ {0, 1}n such that (i) ψ(v) = 0 and ϕ∗ψ (v) = 1 (i.e., ϕ∗ψ ≤ ψ), and (ii) for every such v and Horn CNF ψ = ϕH ∧ μϕ (ψ) ∧ β, 1 where β contains for each clause c ∈ ϕ= N H (ψ) at least one clause c ∈ DIψ (c) such that c (v) = 1, it holds that ψ < ψ ≤ ϕ. The algorithm N EWCORE, shown in Figure 3, proceeds as follows. After converting ϕ to a CNF α and testing ϕ ≤ α, it initializes auxiliary variables and a candidate Horn core ψ . In Step 2, ψ is tested using Lemma 2; if not a Horn core yet, ψ is repeatedly updated according to Lemma 3. Example 6 (cont’d) Reconsider ϕ = ϕ1 ∨ ϕ2 in Example 3, where ϕ1 = (¬x1 ∨¬x3 ∨x4 )∧(¬x2 ∨¬x5 ∨x6 ), and ϕ2 = (¬x1 ∨¬x2 ∨ x3 )∧(¬x1 ∨¬x3 ∨x6 ), and let ψ = (¬x1 ∨¬x3 ∨x4 )∧(¬x1 ∨¬x2 ). (As seen from Example 4, ψ ≤ ϕ but ψ is not a Horn core of ϕ.) In Step 1 of N EWCORE, α = c1 ∧ c2 ∧ c3 and DIψ (c1 ) = { ¬x1 ∨ ¬x3 ∨ x4 } DIψ (c2 ) = { ¬x1 ∨ ¬x2 ∨ ¬x5 ∨ x6 , ¬x1 ∨ ¬x2 ∨ ¬x5 ∨ x6 }, DIψ (c3 ) = { ¬x1 ∨ ¬x2 ∨ ¬x3 ∨ ¬x5 ∨ x6 }. Thus, S =1 β μ ψ
= {c2 }, = {¬x1 ∨ ¬x2 ∨ ¬x5 ∨ x3 , ¬x1 ∨ ¬x2 ∨ ¬x5 ∨ x6 }, = {¬x1 ∨ ¬x3 ∨ x4 }, and = (¬x1 ∨ ¬x3 ∨ x4 ) ∧ (¬x1 ∨ ¬x2 ).
In Step 2, the test of the while loop succeeds as ϕ1 ∧ μ ≡ ϕ1 ≤ ψ holds; e.g., for v = (110011), we have ϕ1 (v) = 1 and ψ (v) = 0. The set β is then updated to β = {¬x1 ∨ ¬x2 ∨ ¬x5 ∨ x6 }, and for c2 the updates S =1 := ∅ and μ = {¬x1 ∨ ¬x3 ∨ x4 , ¬x1 ∨ ¬x2 ∨ ¬x5 ∨ x6 } are performed; finally, ψ is updated to
63
ψ
=
(¬x1 ∨ ¬x2 ∨ ¬x3 ∨ ¬x5 ∨ x6 )∧ (¬x1 ∨ ¬x3 ∨ x4 ) ∧ (¬x1 ∨ ¬x2 ∨ ¬x5 ∨ x6 ).
The test for the next while-iteration fails, since ψ ≡ μ; hence, ψ is output. Note that ψ is indeed a Horn core of ϕ such that ψ ≤ ψ . The following result states that the new algorithm is correct (the formal proof is omitted here). Theorem 3 N EWCORE correctly computes a Horn core ψ of ϕ = ϕ1 ∨· · ·∨ϕl such that ψ ≤ ψ ≤ ϕ. Moreover, it can be implemented to run in time O(nlm(l ˆ m ˆ + |ψ|)), where m ˆ = Πli=1 |ϕi |. In particular, if l is bounded by a constant, N EWCORE runs in polynomial time: In Step 1, building α = S(ϕ1 , . . . , ϕl ) is feasible in time O(nlm), ˆ and the test ψ ≤ α in time O(n|ψ|m). ˆ Each DIψ (c) is computable in time O(nl|ψ|), and thus the initial S =1 , β, and μ in time O(mnl|ψ|). ˆ In total, Step 1 is feasible in time O(nlm|ψ|). ˆ In Step 2, the while loop is executed at most (l−1)|S =1 |+1 ≤ (l−1)m+1 ˆ often. Using appropriate data structures, the loop body is executable in time O(lnm) ˆ and the loop tests need throughout the computation in total time O(nlm(l ˆ m+|ψ|)): ˆ since μ only increases, all tests ϕ for each potential clause c in ψ i ∧ μ ≤ c are feasible in P ˆ + |ϕi |)) = O(lnm). ˆ There are at most lm ˆ total time O( li=1 n(m such clauses c from the initial β and at most |ψ| many from ψ \ β. In total, Step 2 is feasible in time O(nlm(l ˆ m ˆ + |ψ|)). In summary, we obtain a bound of O(nlm(l ˆ m ˆ + |ψ|)). However, in practice better behavior is plausible as |α|, |S =1 | etc. are likely to be smaller than m ˆ and far less than (l−1)m+1 ˆ loop executions are expected; furthermore, simple optimizations can be incorporated. Important features of algorithm N EWCORE are, different from algorithms C ORE and C ORE∗ , that it can compute targeted Horn cores with ψ and that it is nondeterministically complete; upon proper choices of v, each Horn core ψ such that ψ ≤ ψ ≤ ϕ is obtainable. Example 7 (cont’d) In Example 6, in the first while-iteration for e.g. v = (110100), which also witnesses ϕ1 ∧ μ ≤ ψ , β is updated differently (it stays unchanged). In the next iteration, β is necessarily updated to β = {¬x1 ∨ ¬x2 ∨ ¬x5 ∨ x6 }, and the same Horn core ψ as in Example 6 is re-obtained. In fact, all successful choices lead to this ψ . Hence, it is the unique Horn core such that ψ ≤ ψ ≤ ϕ.
64
6
T. Eiter and K. Makino / New Results for Horn Cores and Envelopes of Horn Disjunctions
Horn Envelope of a Horn Disjunction
We now turn to the question of whether a Horn envelope of a Horn disjunction can be computed efficiently. As we show, it has a negative answer, which is a consequence of the intractability of recognizing the Horn envelope. More precisely, the following holds. Theorem 4 Given Horn CNFs ψ, ϕ1 , and ϕ2 , deciding whether ψ is a Horn envelope of ϕ = ϕ1 ∨ ϕ2 is co-NP-complete. Proof (Sketch) As for the membership in co-NP, ψ is not a Horn envelope of ϕ if and only if either (a) ϕ ≤ ψ, or (b) there exists a Horn clause c such that (b.1) ϕ ≤ ψ ∧ c and (b.2) ψ ≤ c (thus ϕ ≤ (ψ ∧ c) < ψ). Such a clause c can be guessed, and the tests (a), (b.1), and (b.2) are feasible in polynomial time. The co-NP-hardness is shown by a reduction from the complement of SAT. Let α = c1 ∧ · · · ∧ cm be a CNF of nonempty clauses ci on variables x1 . . . , xn . Let y, z, x1 , . . . , xn be fresh variables. Define β1
=
β2
=
c∗1 ∧ · · · ∧ c∗m ,
V (¬y ∨ ¬x1 ∨ · · · ∨ ¬xn ) ∧ n i=1 (¬y ∨ xi ∨ ¬xi ), W W where c∗i = y ∨ x ∈ N (c ) ¬xj ∨ x ∈ P (c ) ¬xj , and let ϕ 1 = z ∧ β1 ∧ β 2 ,
ϕ2 = y ∧ β2 , and
ϕ = ϕ 1 ∨ ϕ2 .
Note that ϕ1 and ϕ2 are Horn CNFs and that ψ ∗ ≤ β1 ∧ β2 must hold, where ψ ∗ is any Horn envelope of ϕ, as ϕ ≡ (y ∨ z) ∧ β1 ∧ β2 . Intuitively, ϕ2 generates the models v for the variables . , xn , which are x1 , . . W W encoded by prime implicates of β2 of form pv = v = 0 ¬xi ∨ v = 1 ¬xi , while ϕ1 generates, via interaction (resolution) of clauses in β1 and β2 , all models v resp. clauses pv such that α(v) = 0. Now if α is unsatisfiable, then each pv is an implicate of both ϕ1 and ϕ2 , thus of ψ ∗ , and ψ ∗ ≡ β1 ∧ β2 follows. Otherwise, some pv is not a joint implicate, and ψ ∗ < β1 ∧ β2 holds. Thus ψ = β1 ∧ β2 is a Horn envelope of ϕ iff α is unsatisfiable. 2 Armed with this result, we now derive that most likely we cannot efficiently construct a compact Horn envelope of a Horn disjunction. Theorem 5 There is no algorithm that constructs, given Horn CNFs ϕ1 and ϕ2 , a prime irredundant Horn envelope ψ for ϕ1 ∨ ϕ2 in time polynomial in the size of ψ, ϕ1 , and ϕ2 , unless P = NP. Proof We show that if such an algorithm would exist, then the co-NP-complete problem of recognizing the Horn envelope in Theorem 4 could be solved in polynomial time. The proof makes use of the following lemma, which states an important property of Horn CNFs. Denote by α the representation size of a CNF α Lemma 4 All prime irredundant (Horn) CNFs for a Horn CNF ϕ differ at most polynomially in size, i.e., there exists a polynomial p(·) such that for every two irredundant prime CNFs ϕ1 and ϕ2 equivalent to ϕ, ϕ1 ≤ p(ϕ2 ) and ϕ2 ≤ p(ϕ1 ). This lemma follows, e.g., by combining results in [8] on prime irredundant Horn CNFs (especially, the number of clauses c with P (c) = ∅ in them) and in [7] on the FD-covers, which correspond to sets of definite clauses. Now suppose that an algorithm A would exist that computes a prime irredundant Horn envelope for ϕ = ϕ1 ∨ ϕ2 in polynomial total-time, i.e., in time bounded by a polynomial q(os, is) in the output size os = A(ϕ1 , ϕ2 ), and the input size is = ϕ1 + ϕ2 . We use then A to decide, given Horn CNFs ψ, ϕ1 and ϕ2 , whether ψ is a Horn envelope of ϕ1 ∨ ϕ2 in polynomial time (which implies P = NP) as follows. We run A for at most q(os∗ , is) steps,
where os∗ = p(ψ) is a (polynomial) upper bound on the size of A(ϕ1 , ϕ2 ) from Lemma 4 (note that ψ need not be prime). If A halts, then we check whether the output of A is equivalent to ψ; this is feasible in polynomial time. Otherwise, A will compute a Horn CNF ψ ∗ such that ψ ∗ ≡ ψ, and hence ψ is not the Horn envelope of ϕ. This algorithm works in polynomial time in the size of ψ, ϕ1 , and ϕ2 . 2 We remark that in the hardness proof of Theorem 4 neither ϕ1 nor ϕ2 may be replaced by a small Horn CNF. In fact, we can show that the problem is tractable S if in some ϕi the number of variables that occur positively, i.e., | {P (c) | c ∈ ϕi }|, is bounded by a constant. Moreover, if both ϕi have this property, a Horn envelope of ϕ1 ∨ ϕ2 is computable in input-polynomial time. This holds since a CNF ϕ∗i with all prime implicates of ϕi is computable in polynomial time (in the size of ϕi ) and S(ϕ∗1 , ϕ∗2 ) contains all prime implicates of ϕ1 ∨ ϕ2 . More generally, we have the following result. S Proposition 2 Given arbitrary CNFs ϕi such that | {P (c) | c ∈ ϕi }| ≤ k for a constant k, where 1 ≤ i ≤ l and l is bounded by a constant, a prime irredundant Horn envelope for ϕ1 ∨ · · · ∨ ϕl is computable in time polynomial in the size of ϕ1 ∨ · · · ∨ ϕl .
7
Conclusion
Horn cores and Horn envelopes are important concepts for propositional formulas that have appealing properties. We have obtained both positive results, like a novel characterization of Horn cores for CNFs and a new algorithm to compute Horn cores for a Horn disjunction, and a negative result in terms of the intractability of computing the Horn envelope of a Horn disjunction wrt. polynomial total-time. These results provide a computational basis for crafting implementations in the context of knowledge bases. Several issues remain for future work. One is to explore consequences and applicability of the present results to other combinations of Horn theories than disjunctions. Another is to further delineate the (in)tractability frontier for Horn envelopes that was briefly discussed here. Finally, efficient enumeration of multiple or all Horn cores would be interesting (a suitable variant of algorithm N EWCORE is non-obvious).
REFERENCES [1] Y. Boufkhad, ‘Algorithms for Propositional KB Approximation’, in Proc. National Conference on AI (AAAI ’98), pp. 280-285. AAAI Press. [2] M. Cadoli and F. Scarcello, ‘Semantical and Computational Aspects of Horn Approximations’, Artificial Intelligence, 119(1-2), 1–17, (2000). [3] A. del Val, ‘An Analysis of Approximate Knowledge Compilation’, in Proc. IJCAI ’95, pp. 830–836, (1995). [4] W. Dowling and J. H. Gallier, ‘Linear-time Algorithms for Testing the Satisfiability of Propositional Horn Theories’, Journal of Logic Programming, 3, 267–284, (1984). [5] T. Eiter, K. Makino, and T. Ibaraki, ‘Disjunctions of Horn Theories and their Cores’, SIAM Journal on Computing, 31(1), 269–288, (2001). [6] G. Gogic, Ch. Papadimitriou, and M. Sideri, ‘Incremental Recompilation of Knowledge’, J. Artif. Intell. Res., 8, 23–37, (1998). [7] G. Gottlob, ‘On the size of nonredundant FD-covers’, Information Processing Letters, 24(6), 355–360, (1987). [8] P. Hammer and A. Kogan, ‘Horn functions and their DNFs’, Information Processing Letters, 44, 23–29, (1992). [9] P. Hammer and A. Kogan, ‘Optimal Compression of Propositional Horn Knowledge Bases: Complexity and Approximation’, Artificial Intelligence, 64, 131–145, (1993). [10] D. Kavvadias, Ch. Papadimitriou, and M. Sideri, ‘On Horn Envelopes and Hypergraph Transversals’, in Proc. 4th Int’l Symp. Algorithms and Computation (ISAAC-93), LNCS 762, pp. 399–405. Springer. [11] B. Selman and H. Kautz, ‘Knowledge Compilation and Theory Approximation’, Journal of the ACM, 43(2), 193–224, (1996).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-65
65
Belief revision with reinforcement learning for interactive object recognition Thomas Leopold1 and Gabriele Kern-Isberner2 and Gabriele Peters3 Abstract. From a conceptual point of view, belief revision and learning are quite similar. Both methods change the belief state of an intelligent agent by processing incoming information. However, for learning, the focus in on the exploitation of data to extract and assimilate useful knowledge, whereas belief revision is more concerned with the adaption of prior beliefs to new information for the purpose of reasoning. In this paper, we propose a hybrid learning method called S PHINX that combines low-level, non-cognitive reinforcement learning with high-level epistemic belief revision, similar to human learning. The former represents knowledge in a sub-symbolic, numerical way, while the latter is based on symbolic, non-monotonic logics and allows reasoning. Beyond the theoretical appeal of linking methods of very different disciplines of artificial intelligence, we will illustrate the usefulness of our approach by employing S PHINX in the area of computer vision for object recognition tasks. The S PHINX agent interacts with its environment by rotating objects depending on past experiences and newly acquired generic knowledge to choose those views which are most advantageous for recognition.
1
INTRODUCTION
One of the most challenging tasks of computer vision systems is the recognition of known and unknown objects. An elegant way to achieve this is to show the system some samples of each object class and thereby train the system, so that it can recognize objects that it has not seen before, but which look similar to some objects of the training phase (due to some defined features). Several methods to do so have been successfully used and anaylized. One of them is to set up a rule-based system and have it reason, another one is to use numerical learning methods such as reinforcement learning. Both of them have advantages, but also disadvantages. Reinforcement learning yields good results in different kinds of environments, but its training is time consuming, since it is a trial-anderror method and the agent has to learn from scratch. The possibilities to introduce background knowledge (e. g., by the choice of the initial values of the QTable) are more limited as for example with knowledge representation techniques. Another disadvantage consists in a limited possibility to generalize experiences and so to be able to act appropriately in unfamiliar situations. Though some generalization can be obtained by the application of function approximization techniques, the possibilities to generalize from learned rules to unfa1 2 3
University of Technology Dortmund, Germany, email:
[email protected] University of Technology Dortmund, Germany, email:
[email protected] University of Applied Sciences and Arts Dortmund, Germany, email:
[email protected] miliar situations are more diverse again with for example knowledge representation techniques. Knowledge representation and belief revision techniques have the advantage that the belief of the agent is represented quite clearly and allows reasoning about actions. The belief can be extended by new information, but needs to be revised when the new information contradicts the current belief. One drawback is that it is difficult to decide which parts of the belief should be given up, so that the new belief state is consistent, i.e., without inherent contradictions. In this paper, we present our hybrid learning system S PHINX, named after the Egyptian statue of a hybrid between a human and a lion. It combines the advantages of both Q-Learning and belief revision and diminishes the disadvantages, thus synergy effects can emerge. S PHINX agents, on the one hand, are intelligent agents equipped with epistemic belief states which allows them to build a model of the world and to apply reasoning techniques to focus on most plausible actions. On the other hand, they use QTables to determine which action should be carried out next, and are able to process reward signals from the environment. Moreover, S PHINX agents can learn situational as well as generic knowledge which is incorporated into their epistemic states via belief revision. In this way, they are able to adjust faster and more thoroughly to the environment, and to improve their learning capabilites considerably. This will be illustrated in detail by experiments in the field of computer vision. This paper is organized as follows: Chapter 2 summarizes related work. In chapter 3 we recall basic facts on Q-Learning, ordinal conditional functions and revision. Chapter 4 contains the main contribution of this paper, the presentation of the S PHINX system. Chapter 5 summarizes results from experiments in computer vision carried out in different environments. Finally, we conclude in chapter 6.
2
RELATED WORK
Psychological findings propose a two-level learning model for human learning [1], [6], [3], [10]. On the so called bottom level, humans learn implicitly and acquire procedural knowledge. They are not aware of the relations they have learned and can hardly put it into words. On the other level, the top level, humans learn explicitly and acquire declarative knowledge. They are aware of the relations they have learned and can express it, e. g., in form of if-then rules. A special form of declarative knowledge is episodic knowledge. This kind of knowledge is not of general nature, but refers to specific events, situations or objects. Episodic knowledge makes it possible to remember specific situations where general rules do not hold. These two levels do not work separately. Depending on what is learned, humans learn top-down or bottom-up [11]. It has been found [8] that in completely unfamiliar situations mainly implicit learning
66
T. Leopold et al. / Belief Revision with Reinforcement Learning for Interactive Object Recognition
takes place and procedural knowledge is acquired. The declarative knowledge is formed afterwards. This indicates that the bottom-up direction plays an important role. It is also advantageous to continually verbalize to a certain extent what one has just learned and so speed up the acquisition of declarative knowledge and thereby the whole learning process. Sun, Merrill and Peterson developed the learning model CLARION [9]. It is a two-level, bottom-up learning model which uses QLearning for the bottom level and a set of rules for the top level. The rules have the form ’Premise ⇒ Action’, where the premise can be met by the current state signal of the environment. For the maintainance of the set of rules (i. e., adding, changing and deleting rules) the authors have conceived a certain technique. They have proven their model, which works similar to human learning, to be successful in a mine field navigation task and similar to human learning. Cang Ye, N. H. C. Yung and Danwei Wang propose a neural fuzzy system [2]. Like CLARION, this is a two-level learning model, combining reinforcement learning and fuzzy logic. The system has successfully been applied to a mobile robot navigation task.
3
BASICS AND BACKGROUND
In this section, we will recall basic facts on the two methodologies that are used and combined in this paper. First, we briefly describe Q-Learning, a popular approach used for solving Markov Decision Processes (MDPs) (see e.g. [12]). The scenario is the usual one for agents, where one or more agents interact with an environment. Normally, the environment starts in a state and ends, when one terminal state is reached. This timespan is called an episode. For each action, the agent is rewarded. The more reward it collects during an episode, the better. Episodes consist of steps in which the agent first perceives the current state s of the environment via a (numerical) state signal, e. g., an ID. It looks up in its memory, called QTable, which action a seems to be the best in this situation and performs it. The environment reacts on this action by changing its state to s . After this change, the agent gets a reward r for its choice and updates its QTable. Q(λ)-learning is an enhanced Q-Learning method that not only takes the expected rewards into account but also considers the stateaction-pairs that have led to a state s. Let Q(s, a) represent the sum of rewards the agent expects to receive until the end of the episode, if it performs action a in situation s, and let A(s) be the set of actions the agent can perform in state s. The update formula for a state-action-pair (˜ s, a ˜) for Q(λ)-learning is Q(˜ s, a ˜) := Q(˜ s, a ˜) + α · e(˜ s, a ˜) · δ, where e(˜ s, a ˜) is an eligibility factor, expressing how much influence on (s, a) is conceded to (˜ s, a ˜) (the longer ago, the smaller the value), and δ := r+ max Q(s , a )−Q(s, a). a ∈A(s )
Before updating the (˜ s, a ˜)-values, the eligibility factor of the current state-action-pair (s, a) is increased by 1. After the update, the parameter λ is used to decrease the e(˜ s, a ˜)-values to e(˜ s, a ˜) := λ · e(˜ s, a ˜). For λ = 0, we get the basic Q-Learning approach. The decision which action to take in a situation s is usually done by choosing the one with the greatest Q(s, a)-value. To make the discovery of new solutions possible, the agent chooses a random action with a small probability . Now, the concept of ordinal conditional functions (OCFs) and appropriate revision techniques will be explained. OCFs will serve as representations of epistemic states of agents in this paper. Ordinal conditional functions [7] are also called ranking functions, as they assign a degree of plausibility in the form of a degree of disbelief, or surprise, respectively, to each possible world. We will
work within a propositional framework, making use of multi-valued propositional variables di with domains {vi,1 , . . . , vi,mi }. Possible worlds are simply interpretations here, assigning exactly one value to each di , and thus correspond to complete elementary conjunctions of multivalued literals (di = vi,j ), mentioning each di . Let Ω be the set of all possible worlds. Formally, an ordinal conditional function (OCF) is a mapping κ : Ω → N ∪ {∞} with κ−1 (0) = ∅. The lower κ(ω), the more plausible is ω, hence the most plausible worlds have κ-value 0. A degree of plausibility can be assigned to formulas A by setting κ(A) := min{κ(ω) | ω |= A}, so that κ(A ∨ B) = min{κ(A), κ(B)}. This means that a formula is considered as plausible as its most plausible models. Therefore, due to κ−1 (0) = ∅, at least one of κ(A), κ(A) must be 0. A proposition A is believed if κ(A) > 0 (which implies particularly κ(A) = 0). Moreover, degrees of plausibility can also be assigned to conditionals by setting κ(B|A) = κ(AB) − κ(A). A conditional (B|A) is accepted in the epistemic state represented by κ, or κ satisfies (B|A), written as κ |= (B|A), iff κ(AB) < κ(AB), i.e. iff AB is more plausible than AB. OCFs represent the epistemic attitudes of agents in quite a comprehensible way and offer simple arithmetics to propagate information. Therefore, they can be revised by new information in a straightforward manner, making use of the idea of so-called c-revisions [4] that are capable of revising ranking functions even by sets of new conditional beliefs. Here, we will only consider revisions by one conditional belief, so we will present the technique for this particular case. Given a prior epistemic state in the form of an OCF κ and a new conditional belief (B|A), the revision κ∗ = κ ∗ (B|A) is defined by j κ0 + κ(ω) + λ, if ω |= AB, ∗ κ (ω) = (1) κ0 + κ(ω) , otherwise, where κ0 is a normalizing additive constant and λ is the least natural number to ensure that κ∗ (AB) < κ∗ (AB). Although c-revisions are defined in [4] for logical languages defined from binary atoms, the approach can be easily generalized to considering multi-valued propositional variables. Note that also c-revision by facts is covered, as facts are identified with degenerate conditionals with tautological premises, i.e. A ≡ (A|). OCFs and c-revisions provide a framework to carry out high quality belief revision meeting all standards which are known to date, even going beyond that [4].
4
THE SPHINX LEARNING METHOD
Similar to the cognitive model, our learning method consists of two levels. For the bottom level we use Q(λ)-Learning, and for the top level, ordinal conditional functions (OCFs) are employed to represent the epistemic state of an agent and perform belief revision. This brings together two powerful methodologies from rather opposite ends of the scale of cognitive complexity, meeting the challenge of combining learning and belief revision in a particularly extreme case. To combine belief revision and reinforcement learning, each (subsymbolic) state s is described by a logical formula from a language defined over propositional variables di with domains {vi,1 , . . . , vi,mi }. The symbolic representation of a specific state is a conjunction of literals mentioning all di and reflects the logical perception of s by the agent. Furthermore, we define a variable action having as domain the set Actions of possible actions. Hence, the possible worlds on which ranking functions are defined here correspond to elementary conjunctions of the form (d1 = v1,k1 ) ∧ . . . ∧ (dn = vn,kn ) ∧ (action = a).
T. Leopold et al. / Belief Revision with Reinforcement Learning for Interactive Object Recognition
Figure 1.
The S PHINX system
The S PHINX system interlinks Q-learning, the epistemic state and belief revision in two ways: First, it uses current beliefs to restrict the search space of actions for Q-Learning. Second, direct feedback to an action in the form of a reward is processed to acquire specific or generic symbolic knowledge from the most recent experience by which the current epistemic state is revised. It is displayed in figure 1 and works as follows: Algorithm ’Sphinx-Learning’: While the current state s is not a terminal state 1. The Sphinx agent perceives the signal of the state s coming from the environment and its logical description d(s). 2. The agent queries its current epistemic state κ which actions Aκ (s) = {a1 , . . . , ak } are most plausible in s. 3. The agent looks up the Q-values of these actions and determines the set Abest (s) ⊆ Aκ (s) of those actions in Aκ (s) that have the greatest Q-value. 4. The agent chooses a random action a ∈ Abest (s) and performs it. 5. The environment changes to the successor state. 6. The agent receives the reward r from the environment. 7. The agent updates the QTable as described in section 3. 8. The new Q-values for actions in s are being read and the new best actions for s are determined. 9. The agent tries to find new rules that relate d(s) to best actions (according to the updated QTable) and revises κ with this information in form of conditionals. End While We will now explain the algorithm step by step. When a state s is perceived (step 1), then κ is browsed for the most plausible worlds satisfying d(s). Aκ (s) (step 2) is the set of actions occurring in the most plausible d(s)-worlds: Aκ (s) = {a ∈ Actions | κ(d(s) ∧ action = a) = κ(d(s))} Then, the actions in Aκ (s) are filtered according to their Q-values (step 3), and one of these actions is carried out (step 4). It is particularly in these two steps that the enhancement of reinforcement learning with epistemic background pays out, since an ordinary QAgent determines the set of best actions from the set of all possible actions. Steps 5 to 7 are pure Q-Learning. In step 8, the best actions for s due to the new Q-values are determined. This is done to exploit the experience by the received reward for future situations and make it usable on the epistemic level in step 9. The operations performed in step 9 are quite complex and described in the following. The aim of the mentioned revision of κ is to make those actions most plausible in d(s) that have the greatest Q-value in s. As inputs for this revision, the agent tries to find patterns in the state descriptions for which certain actions are generally
67
better than others. This is done by a frequency based heuristics. For each pattern (i.e., a conjunction of literals of some of the variables) p and each action a, the agent remembers how often a was a best resp. a poor action by using counters. If the agent finds in step 8, that an action a is a best action in s and has not been among the best actions before, then the counters for a of all patterns covered by d(s) are increased by 1. If a was a best action in s before but is no longer, the counters are decreased by 1. Negative experiences where a was a poor action are handled in an analogous manner. With these counters, probabilities can be calculated, expressing, if a is usually a best resp. a poor action, when a situation s for which d(s) satisfies p is perceived. If such a relation between a pattern and a set of actions is found, a revision of κ with a conditional encoding such newly acquired strategic knowledge is performed; basically, the following four different types of revision occur: 1. Revision with information about a poor action in a specific state (episodic knowledge). 2. Revision with information about a poor action in several, similar states (generalization). 3. Revision with information about best actions in a specific state (episodic knowledge). 4. Revision with information about best actions in several, similar states (generalization). A ’poor’ action in a specific state resp. in several, similar states was defined as an action that yields a reward less than -1. The conditionals used to revise κ have the following forms: 1. (action = a|d(s)), where d(s) is the symbolic representation of a certain state s in which a is poor. 2. (action = a|p), where p is a pattern satisfied by d(s), representing a set of states, which are similar because they share a common pattern. W 3. ( action = ai |d(s)), where all ai are best actions (due to their i
Q-values) in s. W 4. ( action = ai |p), where each ai is a best action in at least one i
of the states covered by the pattern p. ai needs not to be a best action in all states covered by p. The last form of revision should exclude not best actions from being plausible when p is perceived, so the agent has to find the best action for a specific state covered by p only among the actions ai . Since revisions and especially revisions with generalized rules have a strong influence on the choice of actions, they have to be handled carefully, i. e., the agent should be quite sure about the correctness of a rule before adding it to its belief. Therefore, the agent uses several counters counting, how often an action has been poor, not poor, a best or not a best one under certain circumstances. With these counters some probabilities can be calculated which can be used to evaluate the certainty about the correctness of a specific rule. However, since all rules are merely plausible but not correct in a logical sense, further revisions may alleviate or even cancel the effects of erroneously acquired rules. Our learning model also supports background knowledge. If the user knows some rules that might be helpful for the agent and its task, he can formulate them as conditionals and let the agent revise κ with them before starting to learn.
5
INTERACTIVE OBJECT RECOGNITION
We tested our learning method in a navigation environment and in two different simulations of object recognition environments. In this
68
T. Leopold et al. / Belief Revision with Reinforcement Learning for Interactive Object Recognition
paper, we present the results of the latter in two different scenarios.
5.1
Recognition of Geometric Objects
In this test environment, the agent has to learn to recognize the following objects: sphere, ellipsoid, cylinder, cone, tetrahedron, pyramid, prism, cube, cuboid. By interacting with the environment the agent can look at the object from the front, from the side or from the top or it can choose to try to name the current object. The possible front, side, and top views are represented by five elementary shapes, namely: circle, ellipse, triangle, square, and rectangle. For example, the cone has the front view ’triangle’, the side view ’triangle’, and the top view ’circle’. The prism is given by the front view ’triangle’, the side view ’rectangle’, and the top view ’rectangle’. This leads to the following domains for this environment: • FrontView = {Unknown, Circle, Ellipse, Triangle, Square, Rectangle} • SideView = {Unknown, Circle, Ellipse, Triangle, Square, Rectangle} • TopView = {Unknown, Circle, Ellipse, Triangle, Square, Rectangle} • Action = {LookAtFront, LookAtSide, LookAtTop, RecognizeUnknown, RecognizeSphere, RecognizeEllipsoid, RecognizeCylinder, RecognizeCone, RecognizeTetrahedron, RecognizePyramid, RecognizePrism, RecognizeCube, RecognizeCuboid} At the beginning of each episode, the environment chooses one of the nine geometric objects and generates the state signal ’FrontView = Unknown ∧ SideView = Unknown ∧ TopView = Unknown’. If the agent’s action is LookAtFront, LookAtSide, resp. LookAtTop, the FrontView, SideView, resp. TopView is revealed in the new state signal following the agent’s action. If the agent’s action is an action of type ’Recognize’ action, the episode ends. The reward function returns -1, if one of the ’Look’ actions has been performed. Otherwise, the agent is rewarded with 10, if it has recognized the objects correctly, and with -10, if not. After ten steps the running episode is forced to end. Figure 2 shows the recognition rates after each training phase. In each training phase, each object is shown ten times to the current agent. The values result from 1000 independend agents. If the agents are provided with the background knowledge If no view has been perceived yet, then look at the front, the side, or the top of the object via the conditional (action = LookAtFront ∨ action = LookAtSide ∨ action = LookAtTop|FrontView = Unknown ∧ SideView = Unknown ∧ TopView = Unknown), the recognition rates improve, as can also be seen from figure 2. In the following, we list some of the rules that the agents learned by exploring the effects of updating the QTables on the cognitive (i.e. logical) level: • If FrontView = Circle, then action = RecognizeSphere • If FrontView = Unknown ∧ SideView = Triangle, then action = LookAtFront • If FrontView = Triangle ∧ SideView = Unknown, then action = RecognizePrism
5.2
Recognition of Simulated Real Objects
To analyse Sphinx under more realistic conditions, we set up another environment. We defined shape attributes that are suitable for representing objects within a simple object recognition task and then
Figure 2.
Recognition Rates for Geometric Objects
chose random objects and describe them with these previously defined attributes. These attributes are the input to Sphinx. Again, there are three possible perspectives: the front view, the side view, and a view from a position between these two views. The decision for these persepectives, especially for the intermediate view, was made based on the results found by [5] who revealed that the intermediate view plays a special role in human object recognition. The front and the side view are described by three attributes each: approximate (idealized) shape, size (i.e. proportion) of the shape, and deviance from the idealized shape. The approximate shape can take on the values unknown, circle, square, triangle up, and triangle down. The size can be unknown, flat, regular, or tall. The deviance can be little, medium, or big. Besides these attributes the object is described by the complexity of its texture. This attribute can take on the values simple, medium, and complex. We set the attributes for each object manually. In a real application they can be determined easily by a simple image processing module which merely has to quantize the shape and texture of an object. If the agent looks at the object from the front or the side, it perceives the matching idealized shape, its size, its deviance, and the complexity of the texture. From the intermediate view the agent can only perceive the idealized shapes of the front and the side view and the complexity of the texture, but not the size and deviances. Formally the domains are: • FrontViewShape = {Unknown, Circle, Square, TriangleUp, TriangleDown} • FrontViewSize = {Unknown, Flat, Regular, Tall} • FrontViewDeviation = {Unknown, Little, Medium, Much} • SideViewShape = {Unknown, Circle, Square, TriangleUp, TriangleDown} • SideViewSize = {Unknown, Flat, Regular, Tall} • SideViewDeviation = {Unknown, Little, Medium, Much} • Texture = {Simple, Medium, Complex} • Action = {RotateLeft, RotateRight, RecognizeUnkown} ∪ R where R is the set of ’Recognize’ actions. At the beginning of each episode, the agent looks at the current object from a random perspective and the variables are set according to this perspective. Now, the agent can rotate the object clockwise or counter-clockwise or name it. If the agent’s action is a ’Recognize’ action, the episode ends. After ten steps the running episode is forced to end. The reward function is the same as in the previous test environment. We have chosen
T. Leopold et al. / Belief Revision with Reinforcement Learning for Interactive Object Recognition
15 different objects from nine different object classes such as bottle, tree, and house for which we provide the three attributes mentioned (shape, size, and deviation) (see figure 3).
69
• If FrontViewShape = Circle ∧ SideViewShape = Unknown ∧ Texture = Simple, then action = RotateLeft • If Texture = Complex, then action = RecognizeBottle What remains to be done at this point to apply our system to real images of objects, is the extraction of shape attributes from the images. This can be done by existing segmentation methods.
6 Figure 3.
Approximated geometrical forms of objects
Similar to the previous scenario, the experimental results obtained by testing 100 independend agents are depicted in Figure 4. Again, it can be seen clearly that S PHINX-Learning does better than Q(λ)learning with respect to the speed of learning.
CONCLUSION
Both low-level, non-cognitive learning and high-level learning with using epistemic background and acquiring generic knowledge are present in human learning processes. In this paper, we presented the hybrid S PHINX approach that enables intelligent agents to adjust to its environment in a similar way by combining epistemicbased belief revision with experience-based reinforcement learning. We linked both methodologies for two purposes: First, the current epistemic state allows the agent to focus on most plausible actions that are evaluated by QTables to find the most promising actions in some current state. Second, the direct feedback by the environment is used not only to update QTables, but also to generate specific or generic knowledge with which the epistemic state is revised. In order to illustrate the usefulness of our approach, we described application scenarios from computer vision and performed experiments in which S PHINX agents are employed for object recognition tasks. The evaluation of these experiments shows clearly that the proposed interplay of belief revision and reinforcement learning benefits from the advantages of both methodologies. Therefore, the S PHINX approach allows complex yet flexible interactions between learning and reasoning that help agents perform considerably better.
REFERENCES
Figure 4.
Recognition Rates for Simulated Real Objects
In a second step we added background knowledge that enabled the agent to recognize all objects correctly, if it has perceived all of the three views. Furthermore, we added rules to the background knowledge that told the agent to look at the object from all perspectives first. With these rules the agent has a complete, but not optimal, solution for the task. We wanted to find out how fast the agent learns that it does not need all views to classify the current object. To protect the background knowledge from being overwritten by the agent’s own rules too early, some parameters were changed, so that the agent had to be more sure about the correctness of a rule before adding it to its belief. This setup resulted in a constantly high recognition rate of over 99 %. The number of perceived views decreased over time from 3.28 to 1.99. The value of 3.28 perceived view vs. 3 possible views results from the fact, that the intermediate view has to be perceived twice if the environment starts in this view. Then, the agent perceives this view at the beginning, then rotates the object to the front and then back to the intermediate view so it can rotate the object to the side view in the next step (or vice versa). Here are some of the rules the agent learned and assimilated during its training: • If FrontViewShape = TriangleUp ∧ FrontViewSize = Tall, then action = RecognizeBottle
[1] Anderson, J. R., The architecture of cognition, Hardvard University Press, Cambridge, MA, 1983. [2] C. Ye and Yung, N. H. C. and D. Wang, ‘A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance’, IEEE Transactions on Systems, Man, and Cybernetics, Part B, 33(1), 17–27, (2003). [3] Gombert, J.-E., ‘Implicit and explicit learning to read: Implication as for subtypes of dyslexia’, Current Psychology Letters, 10(1), (2003). [4] G. Kern-Isberner, Conditionals in nonmonotonic reasoning and belief revision, Springer, LNAI 2087, 2001. [5] Pereira, A. and James, K. H. and Jones, S. S., and Smith, L. B. Preferred views in children’s active exploration of objects, 2006. [6] Reber, A. S., ‘Implicit learning and tacit knowledge’, Journal of Experimental Psychology: General, 118(3), 219–235, (1989). [7] W. Spohn, ‘Ordinal conditional functions: a dynamic theory of epistemic states’, in Causation in Decision, Belief Change, and Statistics, II, eds., W.L. Harper and B. Skyrms, 105–134, Kluwer Academic Publishers, (1988). [8] Stanley, W. B. and Mathews, R. C. and Buss, R. R. and Kotler-Cope, S., ‘Insight without awareness: On the interaction of verbalization, instruction and practice in a simulated process control task’, The Quarterly J. of Exp. Psychology Section A, 41(3), 553–577, (1989). [9] Sun, R. and Merrill, E. and Peterson, T., ‘From implicit skills to explicit knowledge: a bottom-up model of skill learning’, Cognitive Science, 25(2), 203–244, (2001). [10] Sun, R. and Slusarz, P. and Terry, C., ‘The interaction of the explicit and the implicit in skill learning: A dual-process approach’, Psychological Review, 112(1), 159–192, (2005). [11] Sun, R. and Zhang, X. and Slusarz, P. and Mathews, R., ‘The interaction of implicit learning, explicit hypothesis testing learning and implicit-toexplicit knowledge extraction’, Neural Networks, 20(1), 34–47, (2007). [12] Sutton, R. S. and Barto, A. G., Reinforcement Learning: An Introduction, Bradford Book, The MIT Press, 1998.
70
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-70
A Formal Approach for RDF/S Ontology Evolution George Konstantinidis and Giorgos Flouris and Grigoris Antoniou and Vassilis Christophides1 Abstract. In this paper, we consider the problem of ontology evolution in the face of a change operation. We devise a general-purpose algorithm for determining the effects and side-effects of a requested elementary or complex change operation. Our work is inspired by belief revision principles (i.e., validity, success and minimal change) and allows us to handle any change operation in a provably rational and consistent manner. To the best of our knowledge, this is the first approach overcoming the limitations of existing solutions, which deal with each change operation on a per-case basis. Additionally, we rely on our general change handling algorithm to implement specialized versions of it, one per desired change operation, in order to compute the equivalent set of effects and side-effects.2
1
INTRODUCTION
Stored knowledge, in any knowledge-based application, may need to change due to various reasons, including changes in the modeled world, new information on the domain, newly-gained access to information previously unknown, and other eventualities [11]. Here, we consider the case of ontologies expressed in RDF/S (as most of the Semantic Web Schemas (85,45%) are expressed in RDF/S [14]) and introduce a formal framework to handle the evolution of an ontology given a change operation. We pay particular attention to the semantics of change operations which can, in principle, be either elementary (involving a change in a single ontology construct) or composite (involving changes in multiple constructs) [5]. Even though RDF/S does not support negation, the problem is far from trivial as inconsistencies may rise due to the validity rules associated with RDF/S ontologies. In fact, naive settheoretical addition or removal of ontological constructs (i.e., direct application of a change) has been acknowledged as insufficient for ontology evolution [4, 6, 12]. Most of the implemented ontology management systems (e.g., [1, 2, 8, 13]), are designed using an ad-hoc approach, that solves the problems related to each change operation on a per-case basis. More specifically, they explicitly define a finite, and thus incomplete, set of change operations that they support, and have determined, a priori, the semantics of each such operation. Hence, given the lack of a formal methodology, the designers of these systems have to determine, in advance, all the possible invalidities that could occur in reaction to a change, the various alternatives for handling any such possible invalidity, and to pre-select the preferable solutions for implementation per case [6]; this selection is hard-coded into the systems’ implementations. This approach requires a highly tedious, case-based reasoning which is error-prone and gives no formal guarantee that
the cases and options considered are exhaustive. To overcome these limitations, we propose an ontology evolution framework and elaborate on its formal foundations. Our methodology is driven by ideas and principles of the belief revision literature [3]. In particular, we adopt the Principle of Success (every change operation is actually implemented) and the Principle of Validity (resulting ontology is valid, i.e., it satisfies all the validity constraints of the underlying language). Satisfying both these requirements is not trivial, as the straightforward application of a change operation upon an ontology can often lead to invalidity, in which case certain additional actions (side-effects) should be executed to restore validity. Sometimes, there may be more than one ways to do so, in which case a selection mechanism should be in place to determine the “best” option. In this paper, we employ a technique inspired by the Principle of Minimal Change [3] (stating that the appropriate result of changing an ontology should be as “close” as possible to the original). The general idea of our approach is to first determine all the invalidities that any given change (elementary or composite) could cause upon the updated ontology, using a formal, well-specified validity model, and then to determine the best way to overcome potential invalidity problems in an automatic way, by exploring the various alternatives and comparing them using a selection mechanism based on an ordering relation on potential side-effects. In particular, our formal approach is parameterizable to this relation, thus providing a customizable way to guarantee the determination of the “best” result. Although our framework is general, in this paper we focus on a fragment of the RDF/S model which exhibits interesting properties for deciding query containment and minimization [10]. To the best of our knowledge, our implementation is the first one that allows processing any type of change operation, and in a fully automatic way.
2
PROBLEM FORMULATION
2.1
In order to abstract from the syntactic peculiarities of the underlying language and develop a uniform framework, we will map RDF to First-Order Logic (FOL). Table 1 (restricted for presentation purposes) shows the FOL representation of certain RDF statements. The language’s semantics is not carried over during the mapping, so we need to combine the FOL representation with a set of validity rules that capture such semantics. For technical reasons, we assume that all constraints can be encoded in the form of (or can be broken down into a conjunction of) DEDs (disjunctive embedded dependencies), which have the following general form: ∀ uP ( u) → ∨i=1,...,n ∃ vi Qi ( u, vi )
1
Institute of Computer Science, FO.R.T.H., Heraklion, Greece, email:gconstan,fgeo,antoniou,
[email protected] 2 This work was partially supported by the EU projects CASPAR (FP6-2005IST-033572) and KP-LAB (FP6-2004-IST-27490).
Modeling RDF/S, ontologies and updates
where: • u, vi are tuples of variables
(DED)
71
G. Konstantinidis et al. / A Formal Approach for RDF/S Ontology Evolution
Table 1. Representation of RDF facts using FOL predicates
Table 2. Validity Rules
RDF triple
Intuitive meaning
Predicate
Rule ID/Name
Integrity Constraint
Intuitive Meaning
C rdf:type rdfs:Class P rdf:type rdf:Property x rdf:type rdfs:Resource P rdfs:domain C P rdfs:range C C1 rdfs:subClassOf C2 P1 rdfs:subPropertyOf P2 x rdf:type C xPy
C is a class P is a property x is a class instance domain of property range of property IsA between classes IsA between properties class instantiation property instantiation
CS(C) P S(P ) CI(x) Domain(P, C) Range(P, C) C IsA(C1 , C2 ) P IsA(P1 , P2 ) C Inst(x, C) P I(x, y, P )
R2 Domain Applicability
∀x, y ∈ Σ: Domain(x, y) → P S(x) ∧ CS(y) ∀x, y ∈ Σ: C IsA(x, y) → CS(x) ∧ CS(y) ∀x, y ∈ Σ: C Inst(x, y) → CI(x) ∧ CS(y) ∀x, y, z ∈ Σ: Domain(x, y) → ¬Domain(x, z) ∨ (y = z) ∀x ∈ Σ, ∃y, z ∈ Σ: P S(x) → Domain(x, z) ∧ Range(x, y) ∀x, y, z ∈ Σ: C IsA(x, y) ∧ C IsA(y, z) → C IsA(x, z) ∀x, y, ∈ Σ: C IsA(x, y) → ¬C IsA(y, x) ∀x, y, z ∈ Σ : C Inst(x, y) ∧ C IsA(y, z) → C Inst(x, z) ∀x, y, z, w ∈ Σ : P I(x, y, z) ∧ Domain(z, w) → C Inst(x, w) ∀x, y ∈ Σ: P IsA(x, y) → ¬P IsA(y, x)
Domain applies to properties; the domain of a property is a class Class IsA applies between classes
R4 C IsA Applicability R6 C Inst Applicability R8 Domain is unique
• P , Qi are conjunctions of relational atoms of the form R(w1 , ..., wn ) and equality atoms of the form (w = w ), where w1 , ..., wn , w, w are variables or constants • P may be the empty conjunction We employ DEDs, as they are expressive enough for capturing the semantics of different RDF fragments and other simple data models which are appropriate for our purposes in this paper. Moreover, DEDs will prove suitable for constructing a convenient mechanism for detecting and repairing invalidities. Table 2 shows some rules that are used to capture the semantics of the various RDF constructs (e.g., R11 captures IsA transitivity), as well as the restrictions imposed by our RDF model (e.g., R8 captures that the domain of a property should be unique). It should be stressed that the semantics of the language captured by tables 1 and 2 essentially corresponds to a fragment of the standard RDF/S data model3 in which there is a clear role distinction between ontology primitives, no cycles in the subsumption relationships, while property subsumption respects corresponding domain/range subsumption relationships. Such a fragment has been first studied in [10] in an effort to provide a group of sound and complete algorithms for query containment and minimization while it is compatible with W3C guidelines4 for devising restricted fragments of the RDF/S data model. Similarly, the general-purpose change handling algorithm presented in this paper can be also applied to other fragments of RDF/S (see also [7, 9]) or the standard RDF/S semantics. In Table 2, Σ denotes the set of constants in our language. We equip our FOL with closed semantics, i.e., CWA (closed world assumption). This means that, for two formulas p, q, if p q then p % ¬q. Abusing notation, for two sets of ground facts U , V , we will say that U implies V (U % V ) to denote that U % p for all p ∈ V . Any expression of the form P (x1 , ..., xk ) is called a positive ground fact where P is a predicate of arity k and x1 , ..., xk are constant symbols. Any expression of the form ¬P (x1 , ..., xk ) is called a negative ground fact iff P (x1 , ..., xk ) is a positive ground fact. L denotes the set of all well-formed formulae that can be formed in our FOL. We denote by L+ the set of positive ground facts, L− the set of negative ground facts and set L0 = L+ ∪ L− , called the set of ground facts of the language. We define: • An ontology is a set K ⊆ L+ • An update is a set U ⊆ L0 In simple words, an ontology is any set of positive ground facts whereas an update is any set of positive or negative ground facts. Applying an update to an ontology should result in the incorporation of the update in the ontology. By definition, ontologies have two properties: (a) they are always consistent (in the purely logical sense) and (b) they imply only the 3 4
http://www.w3.org/TR/rdf-concepts/ http://www.w3.org/TR/2004/REC-rdf-mt-20040210/#technote
R10 Domain and Range exists R11 C IsA Transitivity R12 C IsA Irreflexivity R15 Determining C Inst R17 Property Instance of and Domain R23 P IsA Irreflexivity
Class Instanceof applies between a class instance and a class The domain of a property is unique Each property has a domain and a range Class IsA is Transitive
Class IsA is Irreflexive Class instance propagation Instanceof between properties reflect in their sources/domains Property IsA is Irreflexive
positive ground facts that are already in the ontology. The above two properties together with the CWA semantics, imply that: • P (x) ∈ K ⇔ K % P (x) ⇔ K ¬P (x) • P (x) ∈ / K ⇔ K % ¬P (x) ⇔ K P (x) An application of these properties is that updating K with ¬P (x) corresponds to contracting P (x) from K, because “incorporating” ¬P (x) in an ontology could be achieved only by removing P (x) from K. Therefore, updating an ontology with negative ground facts corresponds to contraction/erasure in the standard terminology, whereas updating an ontology with positive ground facts corresponds to revision/update in the standard terminology.
2.2
Updating under constraints
We say that an ontology K satisfies a validity rule c, iff K % c. Obviously for a set C of validity rules, K satisfies C (K % C) iff K % c for all c ∈ C. It is easy to see that for a simple constraint of the form c = ∀uP (u) → Q(u), where P, Q are simple positive predicates and u is a variable, it holds that: K % c iff for all constants x : K % {¬P (x)} or K % {Q(x)}. This can be easily extended to the general case. Suppose that c = ∀ uP ( u) → ∨i=1,...,n ∃ vi Qi ( u, vi ), where P ( u) = P1 ( u) ∧ . . . ∧ Pk ( u) for some k 0 and Qi ( u, vi ) = Qi1 ( u, vi ) ∧ . . . ∧ Qim ( u, vi ) for some m > 0 depending on i. Then K % c iff for all tuples of constants x at least one of the following is true (note that in case of obvious reference to tuples of constants or variables we will be omitting the symbol): • There is some j : 0 < j k such that K % {¬Pj (x)}.
72
G. Konstantinidis et al. / A Formal Approach for RDF/S Ontology Evolution
• There is some i : 1 i n and some tuple of constants z such that for all j = 1, 2, ..., m K % {Qij (x, z)}. We can conclude that K % c iff for all tuples of constants x at least one of the following sets is implied by K: • {¬Pj (x)}, 0 < j k • {Qi1 (x, z) ∧ Qi2 (x, z) ∧ ... ∧ Qim (x, z)}, 1 i n, z:constant
negative ground facts, so they are updates in our terminology. This is a very useful remark, as we will subsequently take advantage of the elements of Comp(c, x), applying them as updates. In our example, the validity of rule R2.2, for x = P, y = D can be restored iff either {¬Domain(P, D)} or {CS(D)} are added as additional updates (side-effects) to the ontology. Note that sideeffects could trigger side-effects of their own if violating any rules.
Based on the above observation, we define the component set of c with respect to some tuple of constants x as follows:
Rule ID/Name
Components of the rule
Comp(c, x) = {{¬Pj (x)}|0 < j k} ∪ {{Qi1 (x, z) ∧ Qi2 (x, z)
R2 Domain Applicability
R2.1 : ∀x, y ∈ Σ : Comp(R2.1, (x, y))= {{¬Domain(x, y)}, {P S(x)}} R2.2 : ∀x, y ∈ Σ : Comp(R2.2, (x, y))= {{¬Domain(x, y)}, {CS(y))}} ∀x, y, z ∈ Σ : Comp(R8, (x, y, z))= {{¬Domain(x, y)}, {¬Domain(x, z)}, {(y = z)}} R10.1 : ∀x ∈ Σ, ∃z ∈ Σ : Comp(R10.1, (x, z))= {{¬P S(x)}, {Domain(x, z)}} R10.2 : ∀x ∈ Σ, ∃y ∈ Σ : Comp(R10.1, (x, y))= {{¬P S(x)}, {, Range(x, y)}} ∀x, y, z, w ∈ Σ : Comp(R17, (x, y, z, w)) = {{¬P I(x, y, z)} ,{¬Domain(z, w)}, {C Inst(x, w)}}
Table 3. Some validity rules in component set form
∧... ∧ Qim (x, z)} |1 i n, z : constant} Prop. 1 will subsequently help us define a valid ontology.
R8 Domain is unique
Prop. 1 K % c iff for all constants x there is some V ∈ Comp(c, x) such that K % V .
R10 Domain and Range exists
Def. 1 Consider a FOL language L and a set of validity rules C. An ontology K will be called valid with respect to L and C iff K is consistent and it satisfies the validity rules C.
R17 Property Instance of and Domain
Note that a valid ontology, by our rules of Table 2, contains all its implicit knowledge as well (i.e., it is closed with respect to inference). Due to the special characteristics of our framework (e.g., CWA, the form of rules, etc), one does not need to employ full FOL reasoning to determine whether an ontology K is valid (i.e., using Def. 1 and Prop. 1); instead, we can use the specialized procedure described below (Prop. 2).
A
P
Make D domain of P
a
P
(a)
Prop. 2 A ground fact P (x), added to an ontology K, would violate rule c, iff there is some set V and tuple of constants u for which ¬P (x) ∈ V and V ∈ Comp(c, u) and for all V ∈ Comp(c, u), V = V it holds that K V . As an example, consider the ontology of Fig. 1(a). The original ontology in our case, per Table 1, is: K = { CS(A), CS(B), CI(a), CI(b), P S(P ), Domain(P, A), Range(P, B), P I(a, b, P ), C Inst(a, A), C Inst(b, B)} and the update is: U = {Domain(P, D)}. To detect rule violations in an automated way, according to Prop. 2, we must find all the rules that contain ¬Domain(x, y), set x = P , y = D, and determine whether some other component for the particular instantiation is implied by the ontology. If the answer is no, then the addition of Domain(P, D) would violate the particular instantiation of this rule. In our case, this is true for rule R2.2 (domain applicability), for x = P, y = D and rule R8 (unique domain) for x = P, y = D, z = A as well as for x = P, y = A, z = D (see also Table 3 for some rules in their component set format). Moreover, it violates rule R17 for x = a, y = b, z = P, w = D. One nice property of our detection mechanism is that it provides an immediate way to restore invalidities as well, i.e., generate potential side-effects that would restore the violation. In particular, the violation that Prop. 2 detects can be restored by making any of the elements of Comp(c, u) true in the ontology. At this point note that when a Qij (x, z) in some set V ∈ Comp(c, x) is an equality of the form w = w , then the truth value of this equality is revealed as soon as we instantiate this rule’s variables to constants. Therefore, by evaluating an equality as a tautology () or contradiction (⊥) and replace it accordingly in the rule’s instances, we are able to eliminate all the equality atoms from the components sets. Without equalities, the elements of Comp(c, x) contain only positive and
D P
B
b
A
B
a
P
b
(b)
Figure 1. Adding a new domain to a property.
2.3
Selection of side-effects: ordering
If there were no validity rules or we were not interested in the result being a valid ontology the most rational way to perform an update would be to simply apply the changes in U upon K. Def. 2 The raw application of an update U to an ontology K is denoted by K + U and is the following ontology: K + U = {P (x) ∈ L+ |P (x) ∈ K ∪ U and¬P (x) ∈ / U} When a set of changes (i.e., an update U ) is raw applied to a valid ontology K, some of the changes that appear in U may be void, i.e., they don’t need to be performed because they are already implemented (implied) by the original ontology. We define an operator which, given a resulting ontology K that an update would produce on a valid ontology K, calculates the actual effects of the update: Def. 3 For K a valid ontology and K an ontology: Delta(K, K ) = {P (x) ∈ L0 |K % {P (x)} and K {P (x)}} Delta function is some kind of “edit distance”5 between K and K ; if K = K + U , then Delta represents the actual changes that U enforces upon K. Thus, K + U = K + Delta(K, K + U ) = K , so Delta(K, K + U ) produces the same result as U when applied upon an ontology; however they may be different as U could contain void changes. 5
Note that the term “edit-distance” is usually used for sequences and not sets (i.e., edit scripts)
G. Konstantinidis et al. / A Formal Approach for RDF/S Ontology Evolution
As already mentioned, the raw application of an update would not work for our case, because it may not respect the validity constraints of the language. Thus, applying an update involves the application of some side-effects. In some cases, it may not be possible to find adequate side-effects for the update at hand; such updates are called infeasible and cannot be executed. For example, any inconsistent update (such as, U = {CS(A), ¬CS(A)}) is infeasible. In most cases though, an update has several possible alternative sets of side-effects, which implies that a selection should be made. Consider an update U with alternative side-effects U1 and U2 . Then, the set of changes that should be raw applied on the initial ontology, in order to reach a valid result, is either U ∪ U1 or U ∪ U2 . According to the Principle of Minimal Change we should choose the one which causes the “mildest” effects upon the ontology; to determine the “relative mildness” (or “relative cost”) of such effects, we define an ordering between updates. Note that this ordering should depend on K itself: for example, the “cost” of removing an IsA relation between A and B should depend on the importance of the concepts A, B in the RDF graph itself. The following conditions have proven necessary for an ordering to produce “rational” results. Def. 4 An generating
ordering iff the
Delta Antisymmetry: Transitivity: Totality: Conflict Sensitivity: Monotonicity:
K is following U :
called conditions U
U
updatehold:
For any U , U K and K U implies Delta(K, K + U )=Delta(K, K + U ). For any U , U , U : U K U and U K U implies U K U . For any U , U : U K U or U K U . For any U , U : U K U iff Delta(K, K + U ) K Delta(K, K + U ). For any U , U : U ⊆ U implies U K U .
Similarly, an ordering scheme {K |K : a valid ontology} is called update-generating iff K is update-generating for all valid ontologies K. For our RDF case we introduced a particular update-generating ordering, which is based on the ordering shown in Table 4 (among the positive and negative predicates presented in Table 1 for simplicity). The details of the expansion of this ordering to refer to ground facts and sets of ground facts (i.e., updates) is omitted due to space limitations. In short, the general idea is that an update U1 is “cheaper” (or preferable) than U2 (denoted by U1 K U2 ) iff the “most expensive” predicate used in update U1 , is “cheaper” than the “most expensive” predicate used in update U2 where the predicates’ relative preference is determined by the order shown in Table 4. Ties are resolved using cardinality considerations and/or the relative importance of the predicate’s arguments in the original ontology (details omitted). Our ordering was based on the results of experimentation on various alternative orderings and results to an efficient and intuitive implementation. Nonetheless, our algorithm works with any update-generating ordering; each different ordering would model and impose a different global evolution policy on our algorithm. Fig. 1(b) depicts the outcome of the requested update with respect to our ordering. Table 4.
Ordering of predicates
P I
π ˆθ αw if a πθ (bt , a)Q a ¯ (bt , a)Q (bt , a) αt = αl otherwise where αl > αw are, respectively, the loosing and winning learning rates and π ¯ is the “average” policy, obtained by using in (6) the average parameter vector over time.
3.4
The critic: Advantage estimation in belief space
In the gradient expressions (4) and in Theorem 1, one can add an ˆ θ . Such function is known as arbitrary function F (x) to Qθ and Q a baseline function and, as shown in [3], if F is to be chosen so ˆ θ and Qθ , the as to minimize the mean-squared error between Q optimal choice of baseline function is F (x) = V θ (x). Recalling that the advantage function associated with a policy π is defined as Aπ (x, a) = Qπ (x, a) − V π (x, a), the performance of the overall algorithm can be improved by estimating the advantage function instead of the Q-function [3], . As seen in the previous subsection, the actor component will update the parameter along the direction of the parameter vector w corresponding to the orthogonal projection of Qθ (or, equivalently, Aθ ) on the linear space spanned by the compatible basis functions, 6
ˆ θ as the orthogonal projection of Qθ on L (φ) with respect to We take Q the inner product f, g = f (b, a) · g(b, a)π θ (b, a)p(b)db, B
a) Grid world
a
where B denotes the belief-space and p is the distribution introduced in (5), with the beliefs b playing the role of x.
Figure 2.
b) Dec-Tiger problem
Two simple problems used to illustrate the application of our algorithm.
defined in (7). However, unlike Qθ or V θ , the advantage function does not verify a Bellman-like recursion and, therefore, it is necessary to independently estimate the value function V θ . for which we also consider a linear approximation. In particular, we admit that θ Aθ (b, a) ≈ φ θ (b, a)w and V (b) ≈ ξ (b)v, where φθ are the compatible basis functions defined according to (7) and each component ξi belongs to a second set of linearly independent basis functions that we use to approximate the value function. Since we are considering multiagent problems, where multiple independent decision makers interact in a common environment, it is best that each agent k computes this estimate online, since the transition data sampled from the process reflects (although implicitly) the eventual learning process taking place in the other agents. Therefore, our critic uses a TD-based update to estimate both the value function V θ and the advantage function Aθ by means of the following recursion (similar in spirit to that in [3])7 vt+1 = vt + βt ξt rt + γξt+1 vt − ξt vt ; wt+1 = (I − βt φ t φt )wt + βt φt rt + γξt+1 vt − ξt vt , where I is the identity matrix, ξt is the row-vector ξ (bt ), ξt+1 = ξ (bt+1 ) and φt = φ θ (bt , at ).
4
EXPERIMENTAL RESULTS
To illustrate the working of our algorithm, we tested it in several very simple Dec-POMDPs scenarios. The first set of results was obtained in a small grid-world problem, as represented in Figure 2.a. In this problem, each of two robots must reach the opposite corner in a 3×3 maze. When both agents reach the corresponding corners, they receive a common reward of 20. If they “collide” in some state, they receive a reward of −10. Otherwise, they receive a reward of −1. The robots can move in one of four directions, N , S, E and W . The transitions in each direction have some uncertainty associated: with probability 0.8 the movements succeeds and, with probability 0.2 it fails. The robots can observe “Null”, indicating that nothing is detected; “Goal” indicating that the robot has reached its individual target position; and “Crash”, indicating that both robots are in the same position. After successfully reaching the goal, the position of the robots is reset. We ran the algorithm for 104 learning steps and then tested the learnt policy on the environment for 50 time-steps. In Figure 3.a, we present the total discounted reward obtained during a sample run. Notice that the robots are able to quickly reach the goal, which clearly indicates that they were able to learn the desired task. Notice also that the robots are able to avoid collisions, which indicates that they 7
We remark, however, that we are using a discounted framework, unlike the average per-step reward framework featured in [3].
F.S. Melo / Exploiting Locality of Interactions Using a Policy-Gradient Approach in Multiagent Learning
(Sampled) discounted performance 0
70
−10
60
−20 −30
50
Total disc. reward
Total disc. reward
(Sampled) discounted performance 80
40 30 20
−40 −50 −60 −70
10
−80
0 −10
−90
5
10
15
20
25 30 Time steps
35
40
45
a) Grid world
50
−100
5
10
15
20
25 30 Time steps
35
40
45
50
b) Dec-Tiger
Figure 3. Sample runs with the learnt policies in the two test problems.
were able to coordinate without communicating and using only local information during learning. The second problem is the well-known Dec-Tiger problem [9]. In this problem, two agents must choose between two doors, behind one of which is hidden a tiger. The other door hides a treasure. The purpose of the two agents is to figure out behind which door the treasure is hidden, by listening the noises behind the doors. They must act in a coordinated fashion at all times, since their performance greatly depends on this ability to coordinate. We remark that this problem, unlike the grid-world problem, is not particularly suited to be addressed by our algorithm. In fact, the Dec-Tiger problem is not transition independent: the state-space cannot be factored and the actions of each agents have a large influence on both states, observations and rewards received by the other agent. Nevertheless, we applied our algorithm to this problem, to better understand the general applicability of the method. Once again, we ran the algorithm for 104 learning steps and then tested the learnt policy on the environment for 50 time-steps. In Figure 3.b, we present the total discounted reward obtained during a sample run. Notice that, although some miscoordinations sometimes occur (which are impossible to overcome since each agent only has available local information), the agents are, nevertheless, able to attain many coordinated action choices. And, the remarkable thing is that, once again, this was achieved without communication and using only local information during learning (and execution). Finally, to conclude this section, we summarize in Table 1 the average total discounted reward obtained during a 50-step run. The results presented correspond to the average over 2, 000 independent Monte-Carlo trials. Environment Grid world Dec-Tiger
Total disc. reward 34.001 11.049
Table 1. Total discounted reward obtained in the two problems. The results correspond to the average over 2, 000 independent Monte Carlo runs.
5
CONCLUSIONS
We conclude the paper with several important remarks. First of all, the algorithm introduced here is closely related to the Gra-WoLF algorithm in [4]. The main differences lie on our usage of natural gradients and on our ability to address problems with partial state observability and no joint-action observability. Partial observability is addressed by considering the problem to be described by a transition independent Dec-POMDP. We take advantage of this fact by proposing several strategies that allow the agents to maintain independent beliefs that can be used for decision-making. Another important observation is that the optimistic initialization considered will naturally bias the initial policy of the agents towards
161
the goal. This bias may potentially lead to more frequent initial visits to the rewarding states and thus allowing the learning process to converge more rapidly. Finally, it is important to remark that the results presented herein allow for little comprehension of the actual potential of the algorithm. We are currently testing this algorithm in much larger problems, which will allow us to infer how well our algorithm can cope with the high dimensionality arising from the consideration of large problems. We remark, however, that the fact that our algorithm does not take into account any global information, it is reasonable to expect that its complexity to grow linearly with the number of agents (instead of the exponential growth in fully coupled approaches). It is also important to somehow compare the performance of our algorithm with that of the several planning methods in the literature, in the particular class of problems that can adequately be addressed by our algorithm. We remark, however, that these algorithms compute the policy off-line which difficults direct comparison.
ACKNOWLEDGEMENTS This research was partially sponsored by the Portuguese Fundação para a Ciência e a Tecnologia under the Carnegie Mellon-Portugal Program and the Information and Communications Technologies Institute (ICTI), www.icti.cmu.edu. The views and conclusions contained in this document are those of the author only.
References [1] R. Becker, S. Zilberstein, V. Lesser, and C. Goldman, ‘Transitionindependent decentralized Markov decision processes’, in Proc. AAMAS, pp. 41–48, (2003). [2] D. Bernstein, S. Zilberstein, and N. Immerman, ‘The complexity of decentralized control of Markov decision processes’, Mathematics of Operations Research, 27(4), 819–840, (2002). [3] S. Bhatnagar, R. Sutton, M. Ghavamzadeh, and M. Lee, ‘Incremental natural actor-critic algorithms’, in Proc. NIPS 20, pp. 105–112, (2007). [4] M. Bowling and M. Veloso, ‘Scalable learning in stochastic games’, in Workshop on Game & Decision Theor. Agents, pp. 11–18, (2000). [5] S. Kakade, ‘A natural policy gradient’, in Proc. NIPS 14, pp. 1531– 1538, (2001). [6] V. Konda and J. Tsitsiklis, ‘On actor-critic algorithms’, SICON, 42(4), 1143–1166, (2003). [7] H. Kuhn, ‘Extensive games and the problem of information’, Annals of Mathematics Studies, 28, 193–216, (1953). [8] M. Littman, ‘Value-function reinforcement learning in Markov games’, J. Cognitive Systems Research, 2(1), 55–66, (2001). [9] R. Nair, D. Pynadath, M. Yokoo, M. Tambe, and S. Marsella, ‘Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings’, in Proc. IJCAI, pp. 705–711, (2003). [10] F. Oliehoek, M. Spaan, S. Whiteson, and N. Vlassis, ‘Exploiting locality of interaction in factored Dec-POMDPs’, in Proc. AAMAS, pp. 517–524, (2008). [11] C. Papadimitriou and J. Tsitsiklis, ‘The complexity of Markov chain decision processes’, Mathematics of Operations Research, 12(3), 441– 450, (1987). [12] J. Peters, S. Vijayakumar, and S. Schaal, ‘Natural Actor-Critic’, in Proc. ECML, pp. 280–291, (2005). [13] M. Roth, R. Simmons, and M. Veloso, ‘Exploiting factored representations for decentralized execution in multi-agent teams’, in Proc. AAMAS, pp. 469–475, (2007). [14] S. Singh, M. Kearns, and Y. Mansour, ‘Nash convergence of gradient dynamics in general-sum games’, in Proc. UAI, pp. 541–548, (2000). [15] M. Spaan and F. Melo, ‘Interaction-driven Markov games for decentralized multiagent planning under uncertainty’, in Proc. AAMAS, pp. 525–532, (2008). [16] R. Sutton, D. McAllester, S. Singh, and Y. Mansour, ‘Policy gradient methods for reinforcement learning with function approximation’, in Proc. NIPS 13, pp. 1057–1063, (2000). [17] X. Wang and T. Sandholm, ‘Reinforcement learning to play an optimal Nash equilibrium in team Markov games’, in Proc. NIPS 15, pp. 1571– 1578, (2002).
162
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-162
A Fast Method for Property Prediction in Graph-Structured Data from Positive and Unlabelled Examples Susanne Hoche1 and Peter Flach2 and David Hardcastle3 Abstract. The analysis of large and complex networks, or graphs, is becoming increasingly important in many scientific areas including machine learning, social network analysis and bioinformatics. One natural type of question that can be asked in network analysis is “Given two sets R and T of individuals in a graph with complete and missing knowledge, respectively, about a property of interest, which individuals in T are closest to R with respect to this property?”. To answer this question, we can rank the individuals in T such that the individuals ranked highest are most likely to exhibit the property of interest. Several methods based on weighted paths in the graph and Markov chain models have been proposed to solve this task. In this paper, we show that we can improve previously published approaches by rephrasing this problem as the task of property prediction in graph-structured data from positive examples, the individuals in R, and unlabelled data, the individuals in T , and applying an inexpensive iterative neighbourhood’s majority vote based prediction algorithm (“iNMV”) to this task. We evaluate our iNMV prediction algorithm and two previously proposed methods using Markov chains on three real world graphs in terms of ROC AUC statistic. iNMV obtains rankings that are either significantly better or not significantly worse than the rankings obtained from the more complex Markov chain based algorithms, while achieving a reduction in run time of one order of magnitude on large graphs.
1
Introduction
The analysis of large and complex networks or graphs is becoming increasingly important in a variety of scientific disciplines. Graphs allow us to model various tasks for graph-structured data which consist of individuals that are connected to each other in terms of, e.g., a shared interest or common function. In a graph G = (V, E), the individuals are modelled as nodes v ∈ V , and the connection between the individuals as links e ∈ E ⊆ V ×V between the nodes. One prominent task in the analysis of graph-structured data is to rank one fraction T ⊂ V of target nodes in a graph relative to another fraction R ⊂ V of root nodes exhibiting a certain property of interest φ, in order to answer the question how close or similar they are to the ones in R with respect to φ. Here, we focus on co-authorship graphs where the nodes are papers which are linked to each other by 1 2 3
University of Bristol, Department of Computer Science, UK, email: [email protected] University of Bristol, Department of Computer Science, UK, email: [email protected] University of Bristol, Department of Computer Science, UK, email: [email protected]
an undirected weighted edge iff the papers have one or more author in common; R ⊂ V is a set of papers having scientific topic φ, and T ⊂ V is a set of papers with unknown topics for which we want to know how similar they are to the papers in R with observed topic φ. To answer such a question, we can attempt to rank the nodes in T such that the nodes ranked highest are most likely to exhibit φ and can thus be assumed to be closest to R with respect to φ. A number of approaches have been proposed in different scientific areas to determine a node’s importance in a graph, such as, e.g., numerous node centrality measures in social network analysis [19], and ranking algorithms motivated by the necessity to sort Web pages in a specific Web search task (e.g., HITS [11] and PageRank [3]). However, while these algorithms operate on a global level, the task we are interested in is to rank nodes on a local level, i.e., with respect to a given set R of nodes exhibiting property φ which can be interpreted as existing background knowledge, or ranking bias. Several such local ranking methods which answer the question of relative importance for graph structured data have been proposed in [20]. These methods are based on weighted paths and Markov chain models and thus computationally expensive which makes their application for large graphs inefficient. We can improve these approaches by rephrasing the ranking problem as the task of property prediction in graph-structured data from positive examples, the nodes in R, and unlabelled data, the nodes in T , and applying an inexpensive iterative neighbourhood’s majority vote based prediction algorithm (“iNMV”) that allows an effective and efficient ranking of the nodes in T with respect to the nodes in R. Given a set R ⊂ V of papers in a co-authorship graph G with an observed topic φ ∈ Φ, one can predict – on the basis of the known topics and the graph’s link structure – the probability that for a given set T of papers with unknown topics, t ∈ T has topic φ, and rank the nodes in T according to this predicted probability, i.e., according to their similarity to R with respect to φ. The remainder of the paper is organised as follows. We discuss two Markov chain based methods proposed in [20] for ranking individuals in graphs in Section 2. In Section 3, we present our iNMV prediction algorithm and detail how we obtain a ranking of T . In Section 4, we show that on three real world graphs the iNMV prediction algorithm achieves rankings that are either significantly better or not significantly worse than the rankings obtained from the two methods described in Section 2, and at the same time reduces the run time on large graphs by one order of magnitude. We review related work in Section 5 and conclude in Section 6.
S. Hoche et al. / A Fast Method for Property Prediction in Graph-Structured Data from Positive and Unlabelled Examples
2
Local Ranking Methods based on Markov Chains
3
White and Smyth propose in [20] several local ranking methods – based on weighted paths and Markov chain models – which answer the question of the relative importance of a set T of nodes in a graph G with respect to another set R in G. Here, we discuss two of their proposed methods that are based on Markov chains. In a Markov chain based approach G is viewed as representing a firstorder Markov chain. The idea is to traverse the graph in a Markov random walk, i.e., to start at some node and then randomly follow an outgoing edge to the next node from where the process then repeats itself. The first-order Markov chain, or the transitions between the nodes, is characterized by a transition probability matrix P. The descriptions in the next two sections are based on [20].
2.1
Inverse Average Mean First Passage Time
The mean first passage time mrt from a node r to a node t in a firstorder Markov chain is defined as the expected number of steps in an infinite-length Markov random walk starting at r until the first arrival at t, i.e., as ∞
mrt =
∑ n frt
(n)
,
(1)
n=1 (n) frt
denotes the probability that the random walk starting at r where reaches t after exactly n steps. [20] defines the importance I1 (t|R) of a node t with respect to a set R in terms of the inverse average mean first passage time, i.e., as I1 (t|R) =
1 1 |R|
∑r∈R mrt
(2)
That is, important nodes are relatively close to all the nodes in R. A so-called mean first passage time matrix M with entries mi j for all pairs of nodes (vi , v j ) in the graph can be obtained as follows. The fundamental matrix is defined as Z = (I − P − eπT )−1 , where P is the Markov transition probability matrix, e a column vector containing all ones, and π a column vector of the stationary distribution for the Markov chain. The mean first passage time matrix is then obtained as M = (I − Z + EZdg )D,
(3)
where I is the identity matrix, E a matrix containing all ones, Zdg the matrix that agrees with Z on the diagonal but is 0 elsewhere, and D 1 the diagonal matrix with elements dii = π(i) for node i’s stationary distribution π(i) for the Markov chain.
2.2
K-Step Markov Approach
An alternative approach investigated in [20] defines the importance I2 (t|R) of a node t with respect to a set R on the basis of a Markov random walk of fixed length K, i.e., as the probability that the Markov random walk starting at r and ending after exactly K steps reaches t. The value K determines the bias towards the set R: the smaller K the larger is R’s influence, the larger K the more we approach the Markov chain’s stationary distribution. I2 (t|R) can be computed as I2 (t|R) = [PpR + P2 pR + · · · + PK pR ]t ,
(4)
where P is the Markov transition probability matrix, pR is a column vector containing the initial probabilities for the set R, and [X]t denotes the t-th entry of the column vector X.
163
Rephrasing the Task of Local Ranking in Terms of Property Prediction
Our main contribution in this paper is to show that we can solve the local ranking problem more efficiently by rephrasing it as the task of property prediction from positive and unlabelled examples. Specifically, let G = (V, E) be a given co-authorship graph with a set of nodes (papers) V and a set E ⊆ V ×V of undirected (co-authorship) edges (vi , v j ) with weight wi j , and let Φ be a set of topics that each paper can have (we assume that a paper can have several topics). Fur/ where R is a set of root nodes, thermore, let V = R ∪ T, R ∩ T = 0, or positive examples, for which we have observed the topics, and T is a set of target nodes, or unlabelled examples, for which we do not know the topics. The task is to rank the nodes in T for each φk ∈ Φ separately on the basis of the set R of root nodes and the graph’s link structure given by E according to their probability of exhibiting topic φk .
3.1
Iterative Neighbourhood’s Majority Vote based Property Prediction
To this end, we apply our iterative neighbourhood’s majority vote prediction algorithm iNMV which is based on a simple majority vote of directly linked nodes, or neighbours, and which consists of an initialisation step and an update step which can be applied iteratively. In the initialisation step, we assign for each target node an initial estimate to its topic probability on the basis of the topics observed for the root set R. In an update step, a node’s existing estimate is modified based on the neighbouring nodes’ current estimates. This way, entities are classified in dependence of each other, and mutual influence of the predictions is accounted for. The more often the update step is iterated, the more the predictions are propagated through the graph. Since papers can have multiple topics, we consider for each topic φk ∈ Φ a binary learning problem where nodes having topic φk constitute the positive examples. For each topic φk ∈ Φ separately, iNMV derives for each target node vi ∈ T , an estimate of the probability of observing φk for vi . We denote the set of topics of paper vi as its topic set yi ⊆ Φ. Our approach assumes that nodes in the same neighbourhood of the graph tend to have similar properties, and that the predicted topic for one node in the graph depends on the topic of the nodes directly linked to it. Therefore, we assume that the probability of observing topic φk for node vi ∈ T given G is equal to the probability of observing φk for vi given vi ’s neighbourhood Ni := {v j ∈ V |(vi , v j ) ∈ E} consisting of those nodes in V that are directly linked to vi . We base the prediction of an unlabelled node’s topic probability both on labelled and unlabelled neighbours in the graph, and thus derive a topic probability estimate from the known topics and topic probability estimates of directly linked root and target nodes, respectively. To predict the probability of observing φk for a node vi ∈ T with (1) unknown topic set yi , we assign to vi an initial estimate pik := P(φk ∈ yi |R), where P(φk ∈ yi |R) denotes the probability that paper vi has topic φk , conditioned on the topics observed in R. This estimate is based on the number nk of times that φk is observed in R using the maximum likelihood based m-estimate where the observations are augmented by m additional samples which are assumed to be distributed according to p: (1)
pik := P(yi = φk |R) =
nk + p · m , |R| + m
(5)
164
S. Hoche et al. / A Fast Method for Property Prediction in Graph-Structured Data from Positive and Unlabelled Examples
where |R| denotes the cardinality of set R. We choose m = 1 and p = 0.5 (each topic is equally likely to be present or absent). (1) For a node vi ∈ R with observed topic, let pik := 1 for every topic φk that is observed for vi . (1) For each topic φk , we update the initial probability estimates pik for each node vi ∈ T based on its neighbourhood’s estimates: the (t+1) := P(t+1) (yi = φk |Ni ) is derived on the basis modified estimate pik (t)
of the estimates p jk := P(t) (y j = φk |N j ) for observing φk for vi ’s neighbours v j ∈ Ni in the t-th update step: (t+1)
pik
:= P(t+1) (yi = φk |Ni ) =
1 ∑n j ∈Ni wi j
∑
n j ∈Ni
(t)
wi j p jk ,
(6)
where wi j is the weight of the edge between the nodes vi and v j . As we are dealing with an undirected graph, equation (6) is recursive. To account for the mutual influence between linked nodes, the estimates can be propagated through the graph by iterating equation (6) several times. With more iterations, predictions are propagated further through the graph.
3.2
Ranking the Target Set using ROC Analysis
iNMV obtains for every topic φk ∈ Φ and every node vi ∈ T an estimate pik of the probability of observing φk for vi . We interpret pik as a score which we use to order the target nodes T . iNMV learns from positive and unlabelled examples, i.e., from root and target nodes. However, for each topic φk ∈ Φ we have originally positive and negative examples, i.e., those examples which exhibit φk and those which do not. To generate unlabelled examples, we delete for each topic and each target node the label indicating to which topic the paper belongs, but use it, after we have obtained the ranking of the nodes, to compute the ranking’s AUC. The area under the ROC Curve statistic, or AUC, is a measure based on the pairwise comparisons between the results of a binary prediction problem, and is often used to evaluate the performance of a prediction or ranking algorithm. It can be interpreted as the probability that for a pair (+, −) of a positive and a negative example that are both drawn uniformly at random, a higher score will be assigned to the positive example than to the negative (which means that these two examples are ranked correctly relative to each other). An algorithm’s AUC is the fraction of (+, −)-pairs that it correctly ranks relative to each other, and is defined as AUC =
n ∑m i=1 ∑ j=1 1(+i >− j )
m·n
,
(7)
where +1 , · · · , +m are the scores assigned to the m positive examples, −1 , · · · , −n are the scores assigned to the n negative examples, and 1(+i >− j ) is the indicator function which is equal to 1 if +i > − j , and 0 otherwise. An algorithm’s AUC is maximal, i.e., equal to 1, iff it ranks all positive examples higher than the negative examples. Any misranked (+, −)-tuple decreases the AUC.
4
Empirical Evaluation
We evaluate the three methods described in Sections 2 and 3 on co-authorship graphs induced from the bibliographic data sets “IPLNet2” [1] and “Cora” [14]. The weighted links between the nodes are modelled in terms of an adjacency matrix A which holds for each pair (vi , v j ) of connected nodes vi , v j ∈ V a non-zero entry
wi j according to the overlap of the papers’ author lists. We obtain the Markov transition probability matrix P from A by normalising the rows in A.
4.1
Data and Experimental Setup
The ILPNet2 bibliographic database contains hand-selected ILPrelated references from 1970 onwards. Our co-authorship graph consists of the largest connected component of 406 nodes with known topics and 6354 links (on average ≈ 15 links per node). We restrict our evaluation to the 10 topics that include at least 20 papers each. For each topic φ, we generate in 10 trials 4 distinct root and target set partitions. In each partition, the root set consists of 75% of the positive examples, i.e., the papers which have topic φ. The target set contains the remaining 25% of the positive examples and all negative examples, i.e., the papers which do not have topic φ. The target nodes are distinct in each of the 4 root and target set partitions, and their union results in the complete set of nodes. Thus, each node serves for each topic and trial exactly once as an unlabelled example, or target node. For each topic, we apply the three methods to the 40 distinct data partitions. From this we yield for each topic φ and each node v ∈ T an estimated degree to which v belongs to φ. We interpreted these values as scores and use them to rank the nodes as detailed in Section 3.2, where a higher score indicates a higher probability of exhibiting φ. Cora is a collection of ≈ 34, 000 computer science research papers that have been automatically collected from the web [14]. Our co-authorship graph consists of the largest connected component of 10,513 nodes with known topics and 87,438 links (on average ≈ 8 links per node). The topics establish a hierarchy with general computer science topics at the top level which branch out into several sub-levels. We restrict our evaluation to the 6 top-level topics with the highest number of positive examples (“6 Top”), and to the 7 Machine Learning sub-topics on the lowest hierarchy level (“7 ML”). For each topic φ, we generate in 5 trials 2 distinct root and target set partitions, where a root set consists of 50% of the positive examples, and a target set of the remaining 50% of the positive examples and all negative examples. For each topic, we apply the three methods to 10 “6 Top” and “7 ML” root and target set partitions, respectively, and use the resulting scores to generate rankings of the target nodes which we evaluate in terms of ROC AUC statistic.
4.2
Results
In Figure 1, we show for the three methods described in Sections 2 and 3 and the three domains described in Section 4.1 boxplots of the AUCs for all topics averaged over all partitions and trials. We show for the ILPNet2 data from left to right boxplots for the AUCs obtained from the inverse average mean first passage time (iaMFPT) method, iNMV with 1, 5, and 10 iterations, respectively, and the KStep Markov method for K = 1, 2, 5, 10, 25. Each boxplot shows the median, lower and upper quartile, and the lower and upper limit of the AUCs for the single topics, for one method. Since the iaMFPT method has been found numerically too complex for the large Cora graph, results for this method are only shown for the small ILPNet2 graph. We think that this is justified since the ranking of this method is significantly worse than the rankings of all other methods (see below). We have also performed experiments for the K-Step Markov method for K > 25 but found that the AUCs are further decreasing and significantly lower than those for iNMV with 1, 5 or 20 iterations, and thus omit these results.
165
S. Hoche et al. / A Fast Method for Property Prediction in Graph-Structured Data from Positive and Unlabelled Examples
4.3
Discussion
For iNMV, we obtain with 5 iterations on all three domains rankings with the highest AUCs. Equally, the K-Step Markov method yields for small K (2 or 5) the best AUCs. This indicates that on the domains we are investigating, the rankings benefit from a mixture of local patterns from small neighbourhoods in the graph rather than from a global method that considers information from large areas of the graph (as, e.g., the K-step Markov with larger K, or iaMFPT). The K-Step Markov method considers for a target node t ∈ T all nodes r ∈ R that are K hops in G away from t. In contrast, iNMV with K iterations of the update step considers for the estimate of t’s topic probability all nodes r ∈ R that are K hops in G away from t, and additionally all nodes t ∈ T that are K hops in G away from t, where the topic probability estimate of t itself is modified in each
iteration of the update step on the basis of its direct neighbourhood. This way, mutual influence of the unlabelled nodes is also taken into account which seems to be advantageous for the ranking of T with respect to R and φ. ILPnet2, Cora 6 Top and 7 ML: K -Step Markov vs. iNMV - boxplots for all topics averaged over all trials and partitions
1 ILPNet2
Cora 6 Top
Cora 7ML
0.9
AUC averaged over all nCV runs
For the two Cora domains, we show in Figure 1 from left to right boxplots for the AUCs obtained from iNMV with 1, 5, and 10 iterations, respectively, and the K-Step Markov method for K = 1, 2, 5, 10, 25. For the two Cora domains and all methods, the single topics’ AUCs are in close range to each other. In contrast, the AUCs of the ILPNet2 topics exhibit large differences for all methods. In all the domains, nodes belonging to some topics form heterogeneous clusters in the graph, while nodes belonging to others topics are spread more widely over the graph. This seems to be more problematic when only a small number of positive examples exists. We perform a significance test to answer the question whether the results are significantly different. When comparing more than two classifiers, the non-parametric Friedman test [9] is widely recommended [6]. The Friedman test compares k algorithms over N data sets by ranking each algorithm on each data set separately, with the best result receiving rank 1, etc., and assigning average ranks in case of ties. The test then compares the average ranks of all algorithms on all data sets. If the null-hypothesis – that all algorithms are performing equivalently – is rejected under the Friedman test statistic, post-hoc tests such as the Nemenyi test [15] can be used to determine which algorithms perform statistically different. Note that for each topic φ, distinct root and target set partitions are generated, and that the Friedman test can thus be applied to these N = |φ| mutually independent data sets. According to the Friedman test, the AUCs averaged over all trials and partitions for the ILPNet2 data set obtained from the iaMFPT method are significantly worse than the rankings obtained from any other method. The AUC of the ranking obtained from the iaMFPT is most likely so much smaller because a target node t’s importance I1 (t|R) is equally influenced by all root nodes in R. By contrast, a target node’s ranking obtained from iNMV or the K-Step Markov method for small K depends on a much smaller neighbourhood. This seems to indicate that the set of root nodes has to be rather coherent in order for the iaMFPT to produce a good ranking as, e.g., in the data sets evaluated in [20] (e.g., a set of collaborating authors, or interacting terrorists, where |R| = 2). In the ILPNet2 data, where the root set consists of a set of papers which have the topic of interest but which most likely belong to different “co-authorship cliques”, this assumption does not seem to hold, but rather the neighbourhood assumption that directly linked papers tend to be on the same topic. For the Cora “6 Top” data, the Friedman test reports for the AUCs averaged over all trials and partitions that both iNMV with 5 and 20 iterations are significantly better than the K-Step Markov method for both K = 1 and K = 25. No significant differences have been found for the rankings on the Cora “7 ML” data.
0.8
0.7
0.6
0.5
0.4
iaMFPT
iNMV
K -Step -Step Markov
iNMV
K -Step Markov
iNMV
K -Step Markov
0.3
Figure 1. Boxplots for the AUCs of the rankings resulting from the methods described in Sections 2 and 3 on the ILPNet2, Cora “6 Top” and “7 ML” data sets for all topics averaged over all partitions and trials. For each domain, we show – from left to right – a boxplot for iNMV with 1, 5, and 20 Iterations, and for the K-Step Markov method for K = 1, 2, 5, 10, 25, respectively. For the ILPNet2 data, the leftmost boxplot is for the iaMFPT method. Each boxplot shows the median, lower and upper quartile, and the lower and upper limit of the data points (not considered to be outliers), i.e., the AUCs for the single topics, for one method. An outlier is depicted as “+”.
For the domains investigated in this paper, the obtained AUCs do not seem to depend on the percentage of positive examples for a topic. Rather, the main factors seem to be the number of intra- and inter-topic neighbours, respectively, that a node is linked to, and the way that the nodes with the same topic are positioned in the graph G. The more the nodes in G establish areas homogeneous with respect to their topics the more successful can a method be that assumes similar nodes in the neighbourhood of each other and thus bases its prediction for a node v on a small region around v in the graph.
ILPNet2 Cora6Top Cora7ML
iNMV 1It 2.3±0.06 216±12 218±6
iNMV 5 Its 13.4±0.7 252±15 266±7
iNMV 20 Its 34±1.6 414±29 465±16
1-Step Markov 7.5±0.6 1477±2 1508±27
2-Step Markov 7.5±0.6 1479±2 1555±33
5-Step Markov 7.6±0.6 1638±27 1649±29
10-Step Markov 7.9±0.7 2309±23 2312±21
25-Step Markov 8.6±0.6 4446±6 4460±19
inv. avg MFPT 17.5±1.6 n/a n/a
Figure 2. Run time complexity and standard deviations of the compared methods in seconds on a Intel(R) Xeon(TM) MP CPU 3.16GHz processor.
In Figure 2, we report the run time complexity for the iNMV and K-Step Markov methods and all domains, and that of the iaMFPT method for ILPNet2. On the small ILPNet2 co-authorship graph, iNMV is with 5 and 20 iterations 2 to 5 times slower than the K-Step Markov method. However, all methods’ run time lies in the range of a a few seconds only. For the large graphs, the K-Step Markov method’s run time is 6 to 10 times larger than that of iNMV, i.e., in the range of hours rather than minutes.
166
5
S. Hoche et al. / A Fast Method for Property Prediction in Graph-Structured Data from Positive and Unlabelled Examples
Related Work
Closely related to our work with respect to prediction methods in graph-structured data are the publications in the fields of link-based object classification, collective inference, and iterative classification. [4] and [17] were among the first to study the effects of using related objects’ attributes to enhance classification in graph-structured domains. [4] proposes a relaxation-labelling based method for topic prediction in hyperlinked domains. [17] incrementally classifies a collection of encyclopedia articles and take into account the classes of unlabelled documents only after they have been classified on the basis of neighbouring documents. [2] introduces conditional random fields for link-based object classification, e.g. for part-of-speech tagging, while [18] extends this approach to a setting of arbitrary graphs instead of chains. [16] proposes the use of relational dependency networks and Gibbs sampling to collectively infer labels for linked instances. [12] proposes an iterative link-based object classification method based on modelling link distributions which describe the neighbourhood of directed links around an object. [13] investigates the effectiveness of relaxation labelling based methods for classification of graph-structured data similar to the one proposed in [4]. However, none of these works consider the task of ranking a set of target nodes with respect to a set of root nodes exhibiting a specific property. Although we have for all domains that we investigate in this paper both positive and negative labelled examples, we only consider the positive examples as labelled. We argue that it is realistic to assume a paper that is not labelled as belonging to a specific topic to be unlabelled rather than to be a negative example. In the areas of social network analysis and Web mining, several approaches have been proposed to determine a node’s importance in a graph. Freeman developed several measures of node centrality which express how important a node is in a graph [7, 8]. A comprehensive overview about centrality measures in graphs is given in [19]. Several algorithms have been proposed to rank the nodes in a graph of Web pages. Well known examples are HITS [11] and PageRank [3] – which operate on a global level – and personalised variants thereof, e.g., a topic-sensitive PageRank [10] where the ranking of Web pages is biased towards a set of specific topics, and a personalised version of HITS [5] which adjusts the measure of an authoritative source on the basis of incorporating user feedback. These personalised variants bias the standard ranking towards a set of apriori defined root nodes. However, they have been designed specifically for the context of Web queries.
6
Conclusion
We presented an effective and efficient algorithm to solve the task of ranking a set of target nodes in a graph with respect to a pre-defined set of root nodes which exhibit a specific property of interest. To this end, we rephrased the ranking problem as the task of property prediction in graph-structured data from positive and unlabelled examples, and proposed an inexpensive iterative neighbourhood’s majority vote based prediction algorithm, iNMV. On three real-world co-authorship networks, iNMV obtains rankings that are either significantly better or not significantly worse with respect to AUC than the rankings obtained from two previously published Markov chain based algorithms, and at the same time achieves a reduction in run time of one order of magnitude on large graphs. For a local ranking method, it seems to be advantageous to not only account for the root nodes’ influence on the prediction for a target node but to also consider, as iNMV with several iterations of the update step does, the
mutual influence of linked target nodes. In future work we plan to investigate whether there are benefits in learning a joint model for two or more topics. Topics are likely to be correlated (overlapping or disjoint), and we may be able to take advantage of that. We are furthermore investigating the time dependency of co-authorship networks and paper topics.
Acknowledgements The authors would like to acknowledge funding and support for this work from GCHQ in Cheltenham in the UK. and would like to thank J¨org Kaduk for numerous interesting discussions.
REFERENCES [1] ILPnet2 on-line library. http://www.cs.bris.ac.uk/ ∼ILPnet2/Tools/Reports. [2] J. Lafferty, A. McCallum, and F. Pereira, ‘Conditional random fields: Probabilistic models for segmenting and labeling sequence data’, in Proceedings of the 18th International Conference on Machine Learning, pp. 282–289 (2001). [3] S. Brin and L. Page, ‘The anatomy of a large-scale hypertextual web search engine’, in Proceedings of the 7th International World Wide Web Conference, pp. 107–117 (1998). [4] S. Chakrabarti, B.E. Dom, and P. Indyk, ‘Enhanced hypertext categorization using hyperlinks’, in Proceedings of the SIGMOD-98 ACM International Conference on Management of Data, pp. 307–318 (1998). [5] H. Chang, D. Cohn, and A. McCallum, ‘Creating customized authority lists’, in Proceedings of the 17th International Conference on Machine Learning, pp. 167–174 (2000). [6] J. Demˇsar, ‘Statistical comparisons of classifiers over multiple data sets’, Journal of Machine Learning Research, 7, 1–30 (2006). [7] L. C. Freeman, ‘A set of measures of centrality based on betweenness’, Sociometry, 40, 35–41 (1977). [8] L. C. Freeman, ‘Centrality in social networks: I. conceptual clarification’, Social Networks, 1(3), 215–239 (1979). [9] M. Friedman, ‘The use of ranks to avoid the assumption of normality implicit in the analysis of variance’, Journal of American Statistical Association, 32, 675–701 (1937). [10] T. Haveliwala, ‘Topic-sensitive PageRank’, in Proceedings of the 11th International World Wide Web Conference, pp. 517–526 (2002). [11] J. Kleinberg, ‘Authoritative sources in a hyperlinked environment’, in Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms (1998). [12] Q. Lu and L. Getoor, ‘Link based classification’, in Proceedings of the 20th International Conference on Machine Learning, pp. 496–503 (2003). [13] S.A. Macskassy and F. Provost, ‘Classification in networked data: A toolkit and a univariate case study’, Journal of Machine Learning, 8, 935–983 (2007). [14] A. McCallum, K. Nigam, J. Rennie, and K. Seymore, ‘Automating the construction of internet portals with machine learning’, Information Retrieval, 3(2), 127–163 (2000). [15] P. B. Nemenyi, Distribution-free multiple comparisons, Ph.D. dissertation, Princeton University, 1963. [16] J. Neville and D. Jensen, ‘Iterative classification in relational data’, in Proceedings of the AAAI-2000 Workshop on Learning Statistical Models from Relational Data, pp. 13–20 (2000). [17] H.-J. Oh, S. H. Myaeng, and M.-H. Lee, ‘A practical hypertext categorization method using links and incrementally available class information’, in Proceedings of the 23rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 264–271 (2000). [18] B. Taskar, P. Abbeel, and D. Koller, ‘Discriminative probabilistic models for relational data’, in Proceedings of the 18th International Conference on Uncertainty in Artificial Intelligence, pp. 485 – 492 (2002). [19] S. Wasserman and K. Faust, Social Network Analysis: Methods and Applications, Cambridge University Press, 1994. [20] S. White and P. Smyth, ‘Algorithms for estimating relative importance in networks’, in Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 266–275 (2003).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-167
167
VCD Bounds for some GP Genotypes ˜ 1 Jos´e Luis Montana Abstract. We provide upper bounds for the Vapnik-Chervonenkis dimension (VCD) of classes of subsets of Rn that can be recognized by computer programs represented by expression trees built from arithmetic operations ({+, −, ∗, /, }), infinitely differentiable algebraic operations (like l-root extraction), conditional instructions and sign tests. Our VCD bounds for this genotype are expressed as a polynomial function in the height of the expression trees used to represent the programs. This implies, in particular, that a GP learning machine dealing with a search space containing sequential exponential time computer programs of polynomial parallel complexity needs only a polynomial amount of training examples.
1
Introduction
In the last years GP has been applied to a range of complex learning problems including that of classification and symbolic regression in a variety of fields like quantum computing, electronic design, sorting, searching, game playing, etc. A common feature to both tasks is to evolve a population composed by GP expressions built from a set of functionals F = {f1 , . . . , fk } and a set of terminals T = {x1 , . . . , c1 , . . .} (including the variables and the constants). Once we have chosen the functionals and the terminals, the classification (respectively regression) task can be thought as a supervised learning problem where the hypothesis class C is the tree structured search space described from the set of leaves T and the set of nodes F . Analogously, the GP algorithm evolving computer programs P represented by the concepts of class C can be regarded as a learning algorithm. In the seventies the work by Vapnik and Chervonenkis ([9], [7], [8]) provided a remarkable family of bounds relating the performance of a learning machine (see [5] for a modern presentation of the theory). The Vapnik- Chervonenkis dimension (VCD) is a measure of the capacity of a family of functions (or learning machines) {f (x, α)}α as classifiers. Here α denotes the set of parameters of the learning machine. In general, the error, ε(α), of a learning machine with parameters α is written as ε(α) = Q(x, α; y)dμ, where Q measures some notion of loss between f (x, α) and the target concept y, and μ is the distribution from which examples (x, y) are drawn to the learner. For example, for classification problems, the error of misclassification is given taking Q(x, α; y) = |y − f (x, α)|. Similarly, for regression tasks one takes Q(x, α; y) = (y − f (x, α))2 . Many of the classic applications of learning machines can be explained inside this formalism. The starting point of Statistical Learning Theory is that we might not know μ. At this point one nreplace theoretical er1 ror ε(α) by empirical error εm (α) = m Q(xi , yi , α). Now, i=1 the results by Vapnik state that the error ε(α) of learning machine 1
Department of Mathematics, Statistics and Computer Sciences. University of Cantabria, Spain, email: [email protected]. This work was partially supported by Spanish grant TIN2007-67466-C02-02.
with parameters α can be estimated independent of the distribution of μ(x, y) due to the following formula.
ε(α) ≤ εm (α) +
h(log(2m/h) + 1) − log(η/4) , m
(1)
where η es the probability that bound is violated and h is the VCD of the family of classifiers f (x, α). While the existence of the bounds in Equation 1 is impressive, very often these bounds remain meaningless. The VC dimension h depends on the class of classifiers, equivalently on a fully specified learning machine. Hence, it does not make sense to calculate VCD for GP in general, however it makes sense if we choose a particular class of computer programs as classifiers (i.e. a particular genotype). For the simplified genotype that only uses the binary standard arithmetic operators, some chosen computer program structure and a bound on the size of the program, VC dimension remains polynomial in the size of the program and in the number of parameters of the learning machine. This last statement is an easy consequence of [3] (see Theorem 6 below) (this bound also applies to the Decision Tree Model). Hence, GP approach with arithmetic functionals and ”short” programs (of size polynomial in the dimension of the space of events) has small VC dimension. Inspired by the above considerations our aim is to go deep into the study of formal properties of GP algorithms focusing the analysis of the classification complexity (VC dimension) of GP-trees as starting point. This point of view is not knew: a statistical learning approach to GP is proposed in [2]. We mention that as main difference with previous related work ([3]) –where polynomial bounds in the size of the computer programs are given for VC dimension– our bounds show that the capacity of classification of GP-trees depends essentially on parallel complexity more than on sequential time complexity. Moreover, if the GP-tree internal nodes consist of infinitely differentiable algebraic functionals, sign tests and conditional statements, then VC dimension depends polynomially on the height of the tree. This is quite strong since the known polynomial dependence on the size is improved –in the well-parallelizable case– by a logarithmic factor.
1.1
Main results
Following the approach in [3] we deal with general concept classes whose concepts and instances are represented by tuples of real numbers. For such a concept class C, let Ck,n be C restricted to concepts represented by k real values and instances represented by n real values. The membership test of a concept class C over domain X takes as input a concept C ∈ C and an instance x ∈ X, and returns the boolean value ”x ∈ C”. Throughout this paper, the membership test for a concept class Ck,n is assumed to be expressed as a GP-tree Tk,n
168
J.L. Montaña / VCD Bounds for Some GP Genotypes
taking k + n real inputs, representing a concept C ∈ Rk and an instance x ∈ X = Rn . The tree Tk,n uses exact real arithmetic, analytic algebraic operators as primitives (this includes the usual arithmetic operators and other more sophisticated operators like series having fractional exponents), conditional statements, and when evaluated at input (x, y) returns the truth value ”x belongs to the concept represented by y”. For classes defined by GP-trees as described above we announce the following results. • For a hierarchy of concept classes Ck,n , defined by GP-trees Tk,n using analytic algebraic functionals and height bounded by h = h(k, n) the VC dimension of Ck,n is polynomial in h, k, n and in the number of analytic algebraic operators that the programs contains. • For a hierarchy of concept classes Ck,n , defined by GP-trees Tk,n using analytic algebraic functionals and height bounded by a polynomial in k and n the VC dimension of Ck,n is also polynomial in k and n and in the number of analytic algebraic operators that the program contains. The precise statement of our main result is given in Section 5.2, Theorem 17.
2
Tree Structured Search Spaces
Historically the first GP search space was a subset of the LISP language. Today, GP has extended to deal with any tree structured search space. This space is usually describe from a set of leaves or terminals T = {x1 , x2 , ...} including constants, variables and auxiliary variables and a set of internal nodes or functionals representing the operators with a given arity N = {fk1 , fk2 , ...}. The search space includes all well-formed expressions, recursively defined as being either a terminal or the application of a k-ary operator fk to a list of k well formed expressions. Example 1 Rational functions. A simple example of tree structured search space is that of rational functions of any degree of variables x1 , ..., xn . The set of terminals includes all variables xi and a particular R terminal standing for any real valued constant. The set of nodes includes the binary operations +, −, ∗, /. Example 2 Straight Line Programs. Another tree-structured search space is that of computer programs without go to instructions. These programs are usually known as straight line programs. The main restriction is that only functions returning a value can be represented. As in the general tree case a program or a function is recursively defined as a terminal, or as the result of a k-ary operator applied to k-functions. The terminal set (leaves) includes the input variables of the program (real variables) and the constants in R. The set of functionals (internal nodes) includes the following nodes: • Computation nodes which are the binary nodes +, −, ∗, / and a finite set of nodes labeled with elements {f1 , . . . , fq } being infinitely differentiable algebraic operators of arities ki for every i, 1 ≤ i ≤ q. • Sign nodes where ∈ {} is a sign condition. These nodes have a single son which must be either a variable, or a computation node or a branching node. Associated to each sign node there is a function sign(f, ) that outputs true if condition f 0 is satisfied and f alse otherwise.
• Branching nodes if (-) then {−} else{−} which are 3-ary operators, having as sons: a node with boolean output representing the condition B; and two sons f and g with numerical output representing the conditional statements. Associated to a branching node there is a function branch(B, f, g) that outputs f if condition B evaluates to true and outputs g otherwise. Remark 3 Examples of infinitely differentiable algebraic functions are the set of polynomials, rational maps and also functions including k-root extraction. Other more sophisticated examples are Puiseux series, i.e. series having fractional exponents like ∞ i a x q with k ∈ Z , q ∈ N+ and ai ∈ R. See [1] for a i=k i definition and properties of Puiseux series. Remark 4 The sequential running time of a straight line program represented by a GP-tree T is given by the size of the tree T , s(T ), while the parallel running corresponds to the height of the tree T and will be denoted by h(T ).
3
VC Dimension of Formulas
The following definition of VC dimension is standard. See for instance [7]. Definition 5 Let C be a class of subsets of a set X. We say that C shatters a set A ⊂ X if for every subset E ⊂ A there exists S ∈ C such that E = S ∩ A. The VC dimension of C is the cardinality of the largest set that is shattered by C. Along this section we deal with concept classes Ck,n such that concepts are represented by k real numbers, w = (w1 , . . . , wk ), instances are represented by n real numbers, x = (x1 , . . . , xn ), and the membership test to the family Fk,n is expressed by a formula Φk,n (w, x) taking as inputs the pair concept/instance (w, x) and returning the value 1 if ”x belongs to the concept represented by w” and 0 otherwise. We can think of Φk,n or as a function from Rk+n to {0, 1}. So for each concept w, define: Cw := {x ∈ Rn : Φk,n (w, x) = 1},
(2)
The objective is to obtain an upper bound on the VC dimension of the collection of sets Ck,n = {Cw : w ∈ Rk }.
(3)
For boolean combinations of polynomial equalities and inequalities the following seminal result by Golberg and Jerrum is known. Theorem 6 ([3], Theorem 2.2) Suppose Ck,n is a class of concepts whose membership test can be expressed by a boolean formula Φk,n involving a total of s polynomial equalities and inequalities, where each polynomial has degree no larger than d. Then the VC dimension V of Ck,n satisfies V ≤ 2k log2 (4eds) (4) Now assume that formula Φk,n is a boolean combination of s atomic formulas, each of them being of one of the following forms: τi (w, x) > 0
(5)
τi (w, x) = 0
(6)
or
J.L. Montaña / VCD Bounds for Some GP Genotypes
where {τi (w, x)}1≤i≤s are infinitely differentiable functions from Rk+n to R. Next, make the following assumptions about the functions τi . Let α1 , ..., αv ∈ Rn . Form the s.v functions τi (w, αj ) from Rk to R. Choose Θ1 , ..., Θr among these, and let Θ : Rk → Rr
(7)
Θ(w) := (Θ1 (w), ..., Θr (w))
(8)
be defined by Assume there is a bound B independent of the αi , r and 1 , ..., r such that if Θ−1 (1 , ..., r ) is an (k − r)-dimensional C ∞ - submanifold of Rk then Θ−1 (1 , ..., r ) has at most B connected components. With the above set-up, the following result is proved in [4]. Theorem 7 The VC dimension V of a family of concepts Ck,n whose membership test can be expressed by a formula Φk,n satisfying the above conditions satisfies: V ≤ 2log2 B + 2klog2 (2es)
4
(9)
VC Dimension of Formulas with Infinitely Differentiable Algebraic Operators
We study the VC dimension of formulas involving analityc algebraic functions. Such functions are called Nash functions in the mathematical literature (see [1]). A Nash function f : Rn → R is an analytic function satisfying a nontrivial polynomial equation P (x, f (x)) = 0.2 . The degree of a Nash function is the minimal degree of non trivial polynomials vanishing on its graph. A sign assignment to a Nash function f is one of the (in)equalities: f > 0 orf = 0 orf < 0. A sign assignment to a set of s of Nash functions is consistent if all s (in)equalities can be satisfied simultaneously by some assignment of real numbers to the variables. The following Lemma is an easy consequence of B´ezout Theorem for Nash functions which is proved in [6]. Lemma 8 Let f1 , . . . , fs be n-variate Nash functions each fi of degree bounded by d. Then, the subset of Rn defined by the equations: f1 = 0, . . . , fs = 0
(10)
has at most (2d)(s+1)(2n−1) connected components. We state for Nash functions a statement that bounds the number of consistent sign assignments of a finite family of such functions. The technical details of the proof are omitted and are based on [10]. Lemma 9 Let F be a finite family of s n-variate Nash functions with degree bounded by d ≥ 1. If s ≥ (n + 1)(2n − 1) the number of consistent sign assignments to functions of the family F is at most ( 2
8eds )(n+1)(2n−1) . (n + 1)(2n − 1)
(11)
Polynomial and regular rational functions are Nash functions; the function √ 1 + x2 is Nash on R; many activations functions used in neuronal networks are Nash, the function which associates to a real symmetric matrix its i-th eigenvalue (in increasing order) is Nash on the open subset of symmetric matrices with no multiple eigenvalue. Actually, Nash functions are those functions needed in order to have an implicit function theorem in real algebraic geometry.
169
Next we give a result concerning VC dimension of families of concepts defined by Nash functions. The proof is a technical consequence of Theorem 7 and Lemma 8. Proposition 10 Let x = (x1 , ..., xn ) and y = (y1 , ..., yk ) denote vectors of real variables. Suppose Ck,n is a class of concepts whose membership test can be expressed by a boolean formula Φk,n involving a total of s (in)equalities of polynomials belonging to the polynomial ring R[x, y, f1 (x, y), ..., fq (x, y)], where each polynomial has degree no larger than d, and each function fi is Nash of degree bounded by d . Then the VC dimension of Ck,l is bounded above by 2(1+log2 max{d, d })(k(q +1)+1)(2k(q +1)−1)+2klog2 (2es) (12)
5
VC Dimension Bounds for GP-trees
There is an alternative definition of GP-trees representing straight line code to that given in Section 2, by allowing sign gates to output a value in {0, 1} with the obvious meaning. Next we provide a precise definition of this alternative model that is more accurate for combinatorics. Definition 11 A Nash (q, β)-GP tree T of degree D over R is a GP-tree whose leafs are labeled with inputs or with elements of R. The internal nodes having out degree 2 are labeled with a binary arithmetic operation of R, that is one operation in {+, −, ∗, /}; the nodes with outdegree 1 which are sign gates are labeled by a sign condition. Finally there are q nodes labeled by a Nash operator of degree bounded by D with outdegree at most β. The following statement, whose proof is straightforward from the definitions, states the relation between GP-trees with branching nodes and boolean sign gates and the alternative definition given above. Proposition 12 Nash GP-trees with Nash operations and sign nodes as described in Definition 11 are able to simulate Nash GP-trees with boolean sign nodes and selection nodes, defining equivalent models of computation and complexity. The output function of a GP-tree as in Definition 11 can be defined as follows. To each node v we inductively associate a function. • If v is an input or constant node then fv is the label of v. • If v has outdegree 2 and v1 and v2 are the sons of v then fv = fv1 opv fv2 where opv ∈ {+, −, ∗, /} is the label of v. • If v is labeled by a Nash operator f and v1 , . . . , vk are the sons of v then fv = f (fv1 , . . . , fvk ) with k ≤ β. There are at most q nodes of this form. • If v is a sign node then fv = sign(fv ) where v is the son of v in the tree. Remark 13 Observe that the combination of computation nodes with sign nodes (equivalently, the presence of branching nodes and boolean sign nodes)) may increase the number of terms involved in the description as formula of a GP-tree (the size of the formula) up to a number which is doubly exponential in the height of the tree. This implies that the best we can expect from Theorem 6 is an O(k2 (2h +1)2 ) upper bound for the VC dimension of concept classes Ck,n whose membership test is represented by a GP-tree Tn,k having only arithmetic nodes and height h = h(n, k). A formal explanation of this situation is given in the following proposition.
170
J.L. Montaña / VCD Bounds for Some GP Genotypes
Proposition 14 For every l there is a GP-tree T (l) having height O(l) expressing the membership to a concept class C(l) and involvl ing 22 L-terms in its description as formula in the first order language L with symbols +, −, ∗, /, 0, 1 and < for the order. We explicitly construct the GP-tree T (l) as follows. • The input nodes of T (l) are the variables x and y. The dimension of the space of variables x and y is not meaningful in this example. • Consider any set of 3.2l polynomials Qi (x, y) that can be computed in constant height. • In constant height and size O(2l ), build 2l nodes vi0 , 1 ≤ i ≤ 2l , as follows: the output fv0 is the polynomial Q3i−2 , when Q3i = i 0, or Q3i−1 , when Q3i = 0. • Within height l + 1 and size 2 v1i , ..., v2i l−i+1 where
l+1
fvi = fvi−1
2.k−1
k
− 1, add product nodes
∗ fvi−1 . 2.k
(13)
In this latter definition, the superscript index i indicates the height level and ranges in 1...l + 1, and the subscript index k indicates the node number at level i; moreover k ranges in 1...2l−i+1 . • Finally, add a root node v whose output is given by fv = sign(fvl+1 ). 1
the set of new variables zv . We introduce at most qβ new variables. Let v(i, 1),. . . ,v(i, li ) be the collection of sign nodes of the GP-tree Tk,n whose height is i ≤ h = h(k, n). Now, for each pair (i, j), 1 ≤ j ≤ li , let fi,j be the function that the sign node v(i, j) receives as input. Since the outdegree of the arithmetic nodes is bounded by 2 , it easily follows by induction that fi,j is a piecewise rational function of (x, y, z, (fl (x, y, z))1≤l≤q of formal degree bounded by 2i (the variables z can be eliminated by substitution to get fi,j as function of the input variables x, y). Note that at height i the number of non spurious li is bounded above by max{β, 2}h−i . Now, for each sign assignment = (i,j ) ∈ {>, =, , =, , =, , =, , =, , =,