AGENT ENGINEERING
World Scientific
AGENT ENGINEERING
SERIES IN MACHINE PERCEPTION AND ARTIFICIAL INTELLIGENCE* Editors:
H. Bunke (Univ. Bern, Switzerland) P. S. P. Wang (Northeastern Univ., USA)
Vol. 25: Studies in Pattern Recognition Memorial Volume in Honor of K S Fu (Eds. H. Freeman) Vol. 26: Neural Network Training Using Genetic Algorithms (Eds. L. C. Jain, R. P. Johnson and A. F. J. van Rooij) Vol. 27: Intelligent Robots — Sensing, Modeling & Planning (Eds. B. Bolles, H. Bunke and H. Noltemeier) Vol. 28: Automatic Bankcheck Processing (Eds. S. Impedovo, P. S. P. Wang and H. Bunke) Vol. 29: Document Analysis II (Eds. J. J. Hull and S. Taylor) Vol. 30: Compensatory Genetic Fuzzy Neural Networks and Their Applications {Y.-Q. Zhang and A. Kandel) Vol. 31: Parallel Image Analysis: Tools and Models (Eds. S. Miguet, A. Montanvert and P. S. P. Wang) Vol. 33: Advances in Oriental Document Analysis and Recognition Techniques (Eds. S.-W. Lee, Y. Y. Tang and P. S. P. Wang) Vol. 34: Advances in Handwriting Recognition (Ed. S.-W. Lee) Vol. 35: Vision Interface — Real World Applications of Computer Vision (Eds. M. Cheriet and Y.-H. Yang) Vol. 36: Wavelet Theory and Its Application to Pattern Recognition {Y. Y. Tang, L H. Yang, J. Liu and H. Ma) Vol. 37: Image Processing for the Food Industry (£. R. Davies) Vol. 38: New Approaches to Fuzzy Modeling and Control — Design and Analysis (M. Margaliot and G. Langholz) Vol. 39: Artificial Intelligence Techniques in Breast Cancer Diagnosis and Prognosis (Eds. A. Jain, A. Jain, S. Jain and L Jain) Vol. 40: Texture Analysis in Machine Vision (Ed. M. K. Pietikainen) Vol. 41: Neuro-Fuzzy Pattern Recognition (Eds. H. Bunke and A. Kandel) Vol. 42: Invariants for Pattern Recognition and Classification (Ed. M. A. Rodrigues) *For the complete list of titles in this series, please write to the Publisher.
Series in Machine Perception and Artificial Intelligence - Vol. 43
AGENT ENGINEERING Editors
Jiming Liu Hong Kong Baptist University
Ning Zhong Maebashi Institute of Technology, Japan
Yuan Y Tang Hong Kong Baptist University
Patrick S P Wang Northeastern University, USA
fe World Scientific m
Singapore • New Jersey • London • Hong Kong
Published by World Scientific Publishing Co. Pte. Ltd. P O Box 128, Farrer Road, Singapore 912805 USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
AGENT ENGINEERING Copyright © 2001 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 981-02-4558-0
Printed in Singapore by World Scientific Printers (S) Pte Ltd
List of Contributors
K. Suzanne Barber Electrical and Computer Engineering The University of Texas at Austin 201 24th Street, ACE 5.436 Austin, Texas 78712 USA Alan D. Blair Department of Computer Science University of Melbourne Parkville, Victoria 3052 Australia email:
[email protected] Bengt Carlsson Department of Software Engineering and Computer Science Blekinge Institute of Technology Box 520, S-372 25 Ronneby Sweden email:
[email protected] Stefan J. Johansson Department of Software Engineering and Computer Science Blekinge Institute of Technology Box 520, S-372 25 Ronneby Sweden email:
[email protected] Sam Joseph NeuroGrid Consulting 205 Royal Heights 18-2 Kamiyama-cho Shibuya-ku, Tokyo 150-0047 Japan email:
[email protected] Takahiro Kawamura Corporate Research & Development Center TOSHIBA Corp. 1 Komukai Toshiba-cho Saiwai-ku, Kawasaki 212-8582 Japan email:
[email protected] Chunnian Liu School of Computer Science Beijing Polytechnic University (BPU) Beijing 100022, P.R. China email:
[email protected] Jiming Liu Department of Computer Science Hong Kong Baptist University Kowloon Tong, Hong Kong email:
[email protected] Felix Lor Intelligent & Interactive Systems Department of Electrical & Electronic Engineering Imperial College of Science Technology & Medicine London, SW7 2BT United Kingdom
Cheryl E. Martin Electrical and Computer Engineering The University of Texas at Austin 201 24th Street, ACE 5.436 Austin, Texas 78712 USA
vi
List of
Contributors
Setsuo Ohsuga Department of Information and Computer Science School of Science and Engineering Waseda University 3-4-1 Okubo Shinjuku-Ku, Tokyo 169 Japan email:
[email protected] Jordan B. Pollack Department of Computer Science Brandeis University Waltham, MA 02454-9110 USA email:
[email protected] Elizabeth Sklar Computer Science Department Pulton Hall, Room 460 Boston College Chestnut Hill, MA 02467 USA email:
[email protected] Yuan Y. Tang Department of Computer Science Hong Kong Baptist University Kowloon Tong, Hong Kong email:
[email protected] John K. Tsotsos Department of Computer Science York University 4700 Keele Street, Toronto, Ontario Canada M3P 1P3
Patrick S. P. Wang College of Computer Science Northeastern University Boston, MA 02115 USA
Yiming Ye IBM T.J. Watson Research Center 30 Saw Mill River Road (Route 9A) Hawthorne, N.Y. 10532 USA email:
[email protected] Ning Zhong Head of Knowledge Information Systems Lab. Department of Information Engineering Maebashi Institute of Technology 460-1 Kamisadori-Cho Maebashi-City, 371-0816 Japan email:
[email protected] TABLE OF CONTENTS
List of Contributors
v
Introduction to Agent Engineering Jiming Liu, Ning Zhong, Yuan Y. Tang, and Patrick S. P. Wang
1
Chapter 1 Why Autonomy Makes the Agent Sam Joseph and Takahiro Kawamura
7
Chapter 2 Knowledge Granularity Spectrum, Action Pyramid, and the Scaling Problem 29 Yiming Ye and John K. Tsotsos Chapter 3 The Motivation for Dynamic Decision-Making Frameworks in Multi-Agent Systems K. Suzanne Barber and Cheryl E. Martin Chapter 4 Dynamically Organizing KDD Processes in a Multi-Agent KDD System Ning Zhong, Chunnian Liu, and Setsuo Ohsuga
59
93
Chapter 5 Self-Organized Intelligence Jiming Liu
123
Chapter 6 Valuation-Based Coalition Formation in Multi-Agent Systems Stefan J. Johansson
149
Vll
viii
Contents
Chapter 7 Simulating How to Cooperate in Iterated Chicken and Prisoner's Dilemma Games Bengt Carlsson Chapter 8 Training Intelligent Agents Using Human Data Collected on the Internet Elizabeth Sklar, Alan D. Blair, and Jordan B. Pollack
175
201
Chapter 9 Agent Dynamics: Soap Paradigm Felix W. K. Lor
227
Author and Subject Index
261
Introduction to Agent Engineering
Jiming Liu Hong Kong Baptist University Ning Zhong Maebashi Institute of Technology, Japan Yuan Y. Tang Hong Kong Baptist University Patrick S. P. Wang Northeastern University, U. S. A.
Agent engineering is concerned with the development of autonomous computational or physical entities capable of perceiving, reasoning, adapting, learning, cooperating, and delegating in a dynamic environment. It is one of the most promising areas of research and development in information technology, computer science, and engineering today. Motivation Traditionally, ever since the Dartmouth workshop where the notion of AI was first defined, the field of AI has been based primarily on the following premise: For a given problem (either in mathematics, in engineering, or even in medicine), we can always represent it in precise, formal mathematical expressions, such as logical predicates or symbolic manipulation operators. Thereafter, based on these wellformulated representations, we can derive or deduce the exact solution to the problem. One difficulty with this approach can readily be recognized, namely: Sometimes the problem requires formal models from a variety of domains. Owing to this difficulty, researchers later considered the alternative of having several AI systems working l
2
J. Liu, N. Zhong, Y. Y. Tang and P. S. P. Wang
concurrently, each of which would be assigned to handle a particular task at a particular time. Note that here the fundamental "philosophy" remained to be unchanged, i.e., symbolic - it has to rely on logical expression manipulation, and top-down - we have to define how many subsystems are required and how they should cooperate with each other. This later task-decomposition approach represents one step improvement over the single AI systems approach. Now a question that one may ask is the following: If we do not have the complete knowledge or model about the given problem, then how to formulate it into welldefined expressions or statements. This is particularly true in the case of solving real-life problems. The key shortcomings of the traditional AI approaches may be summarized into one sentence: They rely on human beings to plan the exact steps for transforming and solving the problem, and to carefully distribute the task to individual AI systems. Unfortunately, most of the time, we as systems designers fail to do this job very effectively. This is in fact one of the main reasons why we have not seen much early-promised AI applications in real-life, ever since 1956 when Herbert A. Simon, Allen Newell, Marvin L. Minsky, Seymour A. Papert, and even Alan Turing (before 1956) founded this field. With an attempt to account for the above-mentioned limitations and to get rid of the rather unrealistic assumptions of AI, people have started to realize the importance of developing autonomous agent-based systems. Unfortunately, this happened just today - over forty years later. Key Issues Now, the general questions that remain are: What is it meant by "autonomous agents"? How can we build agents with autonomy? What are the desirable capabilities of agents, with respect to surviving (they will not die) and living (they will furthermore enjoy their being or existence)? How can agents cooperate among themselves? Each autonomous agent may have its own very primitive behaviors, such as searching, following, aggregation, dispersion, filtering, homing, and wandering. Two important issues to be considered here are how to develop learning and selection mechanisms (e.g., action observation and evaluation mechanisms) for acquiring agent behaviors and how to implement an array of parameters (e.g., search depth, life-span, and age) for controlling the behaviors. A system of decentralized agents may contain more than one class of agents. All the agents belonging to one class may share some (but not all) of their behavioral
Introduction
to Agent Engineering
3
characteristics. The issues to be addressed here are how to develop agent collective learning and collective behavior evolution/emergence and how to establish and demonstrate the interrelationships between the autonomy of individual agents and the emerged global properties of agent teams, which will result from the dynamic interaction as well as (co)evolution among several classes of agents and their environment. In this regard, some specific questions can readily be posed. For instance, if the behaviors are finite and defined locally for the individual agents as well as for the classes of agents, then in a given environment, how will the agents evolve in time? That is, how will the dynamics of the population change over time? How will the agents dynamically diffuse within the environment? Can they converge to a finite number of observable steady states? How do the parameters (such as the initial number/distribution of the agents and their given behavioral parameters) will affect the converged states? Let us take one step further. Say, we have two concurrent ways to linearly change the behavioral parameters, in order to make the above mentioned steady state convergence faster and also more selective (since we may be interested in only one of the states). The first way is through each of the individual agents itself, i.e., the agent records its own performance, such as the number of encounters and the number of moves, and then it based on such observations tries to control its own behavioral parameters in order to achieve (or to control) an optimal performance, such as the maximum number of encounters and the minimum number of moves. Another way is through the feedback of the global information that can be observed globally from the entire environment, such as the pattern formation change of different classes of the agents. The control in this case comes globally. The examples of this second way of behavioral changes would be: One particular class of the agents switches from one behavior to another as commanded from the global control mechanism, or one behavioral parameter in a particular class changes in a certain way. The purpose of doing this is to achieve the global optimal performance for the entire system. Now it comes to the following question: In order to achieve the optimal performance at the global level only, how much optimization at the local individual level and how much at the global level would be necessary? An Overview of this Volume The aim of this volume is to address some of the key issues and questions in agent engineering. In Chapter 1, Joseph and Kawamura provide a definition of (mobile) agents in terms of whether or not mobile objects can autonomously decide to adjust their stated objectives and modify the ways they achieve their objectives by assessing the outcome of their decisions. The authors explain why this distinction should be made
4
J. Liu, N. Zhong, Y. Y. Tang and P. S. P. Wang
and illustrate this idea with examples from distributed mobile agent systems for conserving network resources. In order to select actions and exhibit goal-directed behaviors, an agent should develop its own awareness, i.e., the knowledge of itself and its environment. In Chapter 2, Ye and Tsotsos explicitly address the two important questions concerning this issue: how much detail the agent should include in its knowledge representation so that it can efficiently achieve its goal and how an agent should adapt its methods of representation such that its performance can scale to different task requirements. In Chapter 3, Barber and Martin argue that a multiagent system must maintain an organizational policy that allows for the dynamic distribution of decision-making control and authority-over relationships among agents in order to adapt to dynamically changing run-time situations. Based on a series of simulation-based experiments and comparative studies focusing on a form of organizational restructuring called Dynamic Adaptive Autonomy (DAA), they explain why such an organizational-level adaptation is desirable from the point of view of the systems performance and how this can be effectively implemented. By increasing autonomy and versatility, multiagent systems have much to offer in solving large-scale, high-complexity computational problems. One such example is in the area of KDD (Knowledge Discovery and Data Mining) where different techniques should be appropriately selected in achieving different discovery goals. In Chapter 4, Zhong, Liu, and Ohsuga provide a generic framework for developing a multiagent KDD system. The important feature of their framework lies in that the KDD processes are dynamically organized through a society of KDD agents. The coordination among the KDD agents is achieved using a planning agent (i.e., metaagent). Chapter 5 is concerned with the issue of inducing self-organized collective intelligence in a multi-agent system. The specific tasks that Liu uses to demonstrate his approach are: (1) cellular agents are used to efficiently search and dynamically track a moving object, and (2) distributed robots are required to navigate in an unknown environment toward shared common goal locations. Besides coordinating among distributed agents, agent engineering must also deal with the problems of cooperation and competition. What makes agents form coalitions? Under what conditions will an agent be included or excluded in a coalition? How coalitions can be strengthened? In Chapter 6, Johansson argues for a rational, continuous view of agent membership in coalitions. The membership is based on how valuable a coalition is for an agent and vice versa. His work results in a theoretical model for updating and correcting group values in order to have a
Introduction
to Agent Engineering
5
trustful relation. In multiagent cooperation situations, sometimes there can be a conflict of interest among agents. When this happens, how should agents cooperate with each other? Carlsson explicitly addresses this issue in Chapter 7 and examines several game strategies based on rational and evolutionary behavior. Can a population of software agents be produced using human behavior as a basis? Sklar, Blair, and Pollack's chapter (Chapter 8) describes a method for training such a population using human data collected at two Internet gaming sites. Their work proposes and tests two different training approaches: individual and collective. Modeling a multiagent system can be helpful in explaining and predicting certain organizational properties of the system. Lor's chapter (Chapter 9) presents an attempt in modeling multiagent dynamics by considering an analogue between an agent and a soap bubble. In the proposed model of soap agents, the interaction among agents in a system is modeled as the dynamic expansion or shrinkage of soap bubbles. We wish you enjoy reading this book. Acknowledgements We wish to express our gratitude to all the contributing authors of this book, not only for submitting their research work, but also for devoting their time and expertise in the cross-review of the chapters. Our special thanks go to Ms. Lakshmi Narayanan of World Scientific for coordinating and handling the publication/production related matters.
Chapter 1
Why Autonomy Makes the Agent Sam Joseph NeuroGrid Consulting Takahiro Kawamura Computer & Network Systems Laboratory, Toshiba
1.1 Introduction This chapter presents a philosophical position regarding the agent metaphor that defines an agent in terms of behavioural autonomy; while autonomy is defined in terms of agents modifying the way they achieve their objectives. Why might we want to use these definitions? We try to show that learning allows different approaches to the same objective to be critically assessed and thus the most appropriate selected. This idea is illustrated with examples from distributed mobile agent systems, but it is suggested that the same reasoning can be applied to communication issues amongst agents operating in a single location. The chapter is structured as follows. Section 1.2 looks at the fundamental metaphors of agents, objects and data, while section 1.3 moves on to consider the more complex concepts such as autonomy and mobility. In section 1.4 the authors attempt to define what a mobile agent actually is, and how one might be used to conserve network resources is addressed in section 1.5. Finally we explore the relationship between autonomy and learning, and try to clear up some loose ends.
7
8
S. Joseph and T.
Kawamura
1.2 Agents, Objects & Data This paper works on the premise that the position stated by Jennings et al. [17] is correct. Specifically that, amongst other things, the agent metaphor is a useful extension of the object-oriented metaphor. Object-oriented (OO) programming [29] is programming where data-abstraction is achieved by users defining their own data-structures (see figure 1), or "objects". These objects encapsulate data and methods for operating on that data; and the OO framework allows new objects to be created that inherit the properties (both data and methods) of existing objects. This allows archetypeal objects to be defined and then extended by different programmers, who needn't have complete understanding of exactly how the underlying objects are implemented. While one might develop an agent architecture using an object-oriented framework, the OO metaphor itself has little to say about the behavioural autonomy of the agents, i.e. their ability to control access to their methods. In OO the process of hiding data and associated methods from other objects, and other developers, is achieved by specifying access permissions on object-internal data elements and methods. Ideally, object internal data is invisible from outside the object, which offers functionality through a number of public methods. The locus of control is placed upon external entities (users, other objects) that manipulate the object through its public methods. The agent-oriented (AO) approach pressures the developer to think about objects as agents that make requests of each other, and then grant those requests based upon who has made the request. Agent systems have been developed that rely purely on the inherited network of accessibility of OO systems (Binder 2000), but ideally an AO programming environment would provide more fine-grained access/security control through an ACL (Agent Communication Language) interface (see figure 1).
Why Autonomy
1) Data Structure
E.g. Public int getHighestlnt(void);
Makes the Agent
9
3) Agent
Figure 1. Example specifications at each level of abstraction. 1) a user-created data structure, 2) methods are added to allow manipulation of the underlying data, giving us an object, 3) to create an agent the object (s) are wrapped in an ACL interface that specifies how to interact with the agent in this case via a DTD (Document Type Definition). Thus, the important aspect of the Agent Oriented approach is that, in opposition to object method specification, an ACL interface requires that the communicating parties must be declared allowing the agent to control access to its internal methods, and thus its behaviour. This in itself means that the agent's objectives must be considered, even if only in terms of which other entities the agent will collaborate with. The AO framework thus supports objects with objectives, which leads us on to the subject of Autonomy.
1.3 Autonomy, Messages & Mobility Autonomy is often thought of as the ability to act without the intervention of humans [3,5,12,13]. Autonomy has also been more generally defined as an agent's possession of self-control, or self-motivation, which encompasses the broader concept of an autonomous agent being free from control by any other agent, including humans [8,10,17] This is all well and good but what does it mean in functional terms?
10
S. Joseph and T.
Kawamura
Autonomous behaviour is often thought of as goal-directed [22] with autonomous agents acting in order to achieve their goals. Pro-activeness is also thought of as another fundamental quality of autonomous agents [14] in as much as an agent will periodically take the initiative, performing actions that will support future goal achievement. Barber & Martin [2] define autonomy as the ability to pursue objectives without outside interference in the decision-making process being employed to achieve those objectives. Going further they make the distinction that agents can have different degrees of autonomy with respect to different goals. For example, a thermostat autonomously carries out the goal to maintain a particular temperature range, but it does not autonomously determine its own set point. To be as concrete as possible, given that an objective may be specified (e.g. transfer data X from A to B), Autonomy can be thought of as the ability of an entity to revise that objective, or the method by which it will achieve that objective1, and an Agent is an Object that possesses Autonomy. We consider messaging as a pre-requisite of Autonomy in that if an agent cannot interact with its environment then it has no relevant basis upon which to modify, or even achieve its objectives. Perhaps we should say sensing/acting rather than messaging, but exchange of information in computer networks is arguably closer to a messaging paradigm, than a sensing/acting one. This relates to the previous discussion in which we considered how an agent communication language (ACL) forces an agent developer to specify who to collaborate with, which is part of the process of specifying an objective. For the purposes of this chapter, let us take an objective to be a goal with a set of associated security constraints. One might argue that there is not a strong connection between being able to modify one's objectives, and restrictions about who to communicate with. However if one thinks of the different agents that one can communicate with as offering different functionalities, then the extent to which one can modify one's objective becomes dependent on what information and functionality we can gain from those around us. For example; for our hypothetical agent attempting to transfer data X from A to B, it's ability to change its approach in the light of ongoing circumstances depends crucially on it's continuing interaction between the drivers that are supporting different message protocols, and the agents it has received its objectives from. If transfer cannot be achieved by available methods, then the agent will need to refer 1 To save space from now on, when we talk about modifying an objective we also mean modification of the way in which it is achieved.
Why Autonomy
Makes the Agent
11
back to other agents to get permission to access alternate transport routes, or receive new instructions. This might all be seen as a needless change in perspective over existing object development frameworks, but before we can demonstrate the benefits of this approach we need to consider code mobility, or the ability to transfer code from one processor to. If we start to ask questions about whether this means that a process running in one location gets suspended and continued in a new location we head into dangerous territory. The actual advantage of mobile agent2 techniques over other remote interaction frameworks such as Remote Procedure Calls (RPC), mobile code systems, process migration, Remote Method Invocation (RMI), etc., is still highly disputed [23] There are various studies that show advantages for mobile agents over other techniques under certain circumstances, but in general they appear to rely on assumptions about the degree of semantic compression that can be achieved by the mobile agent at a remote site [1,9,16,28,32]. In this context semantic compression refers to the ability of an agent to reduce the size of the results of an operation due to its additional understanding of what is and isn't required (e.g. disposing of copies of the same web page and further filtering them based on some user profile). However it is difficult to predict the level of semantic compression a particular agent will be able to achieve in advance3. By moving into the area of mobile agents we encounter various disputes; particularly as regards the concept of a multi-hop agent, a mobile agent that moves to and performs some activity at a number of remote locations without returning to its starting location. Some researchers such as Nwana and Ndumu [24] even go so far as to question the value of current mobile agent research. Nwana & Ndumu advocate that we should solve the problems associated with stationary agents before moving on to the more complex case of mobile agents. While there might be some truth in this, the authors of this chapter would like to suggest that in fact it is possible to gain insight into solutions that can be applied to stationary agents by investigating mobile agents. This seemingly backwards notion might become a little clearer if we allude to the possibility of constructing virtual locations within a single location.
2
Rest assured this term will soon be more concretely defined. Although there are examples in Network Management applications that avoid this problem, e.g. finding the machine with the most free memory [ 1 ]
12
S. Joseph and T.
Kawamura
1.4 Defining a Mobile Agent For further clarity we shall have to dive into definitions of state, mobile code, and mobile agent; but once we have done so we hope to show the utility of all these definitions. Specifically that they help us to think about the different types of techniques that can be used to help an agent or group of agents achieve an objective. In terms of a distributed environment possible techniques include messaging between static agents, or multi-hop mobile agents, or combinations thereof. It will hopefully become clear to the reader that these approaches can be translated into virtual agent spaces, where we consider interactions between agents in a single location. We can perhaps rephrase the issue in terms of the question: Is it more valuable to perform a serial operation (using multi-hop mobility) or a parallel operation (using messaging)? Or in other words, if we need to poll a number of knowledgeable entities in order to solve a problem, should we ask them all and potentially waste some of their time, or should we first calculate a ranking of their ability to help us and then ask them each in turn, finishing when we get the result we want? Or some combination of the two? This question is especially pertinent in the distributed network environment, since transferring information around can be highly expensive, but in the case that all our agents reside in the same place (potentially on a number of adjacent processors), the same issues arise, and the same kinds of tools (chain messages, serial agents, parallel messages) are available as alternate strategies, and their respective utilities need to be evaluated on a case-by-case basis. So let's be specific and further define our terms: • • •
Message: State: Code:
read-only, data structure read/write data structure a set of static operations on data
Where an "operation" means something that can be used to convert one data structure into another. A data structure is taken to follow the C/C++ language idea of a data structure, a variable or set of variables that may be type-specified (e.g. float, int, hashtable, etc.) that may be arbitrarily nested (e.g. hashtable of hashtables of char arrays). State is often used to refer to the maintenance of information about the currently executing step in some code, which requires a read/write data structure.
Why Autonomy
Makes the Agent
13
Given that we are transmitting something from one location to another it is possible to imagine the transmission of any of the eight possible combinations of the three types defined above (e.g. message & code, message & state, etc.). Some of the possible combinations are functionally identical since a read/write component (state) can replicate the functionality of a read-only component (message). We might have considered write-only components as well, but they would not appear to add anything to our current analysis. In summary we can distinguish four distinct entities: • • • •
MESSAGE
(implicitly parallel)
CHAIN MESSAGE (serial) MOBILE CODE
(parallel) (serial)
MOBILE OBJECT
Message Only Message & state Code Only Code & State
We can consider each of the above entities in terms of sending them to a number of network locations in either a serial or parallel fashion (see figures 2 & 3). While there are other possibilities such as star-shaped itineraries [31] or combinations of serial and parallel, we shall leave those for the moment. The important thing to note is that in a parallel operation, state has little value since any individual entity will only undergo a single-hop (one step migration), while state becomes essential to take advantage of a serial operation in order to maintain and compare the results of current processing with previous steps.
14
S. Joseph and T.
Kawamura
SA gm
SA
i
J>
V
MA
MA SA
J
MA
feedback
-i
SA i
MA { f —-J"'
Serial Figure 2. Serial Chain Message or Mobile Object framework. SA (Stationary Agent), MA (Mobile Agent). Arrows represent movement of object or message. Basically we are considering the utility of each of these entities in terms of performing distributed computation or search. If the objective is merely to gather a number of remote data items in one location, then sending a request message to each remote location will probably be sufficient. If we want to run a number of different processes on different machines, mobile code becomes necessary, if not a mobile object. However, if we think an advantage can be gained by remotely comparing and discarding the results of some processing then chain messages and mobile objects seem more appropriate (since they can maintain state in order to know what was has been achieved so far etc.).
Why Autonomy
Makes the Agent
15
Figure 3. Parallel Messaging or Mobile Code framework. SA (Stationary Agent), MA (Mobile Agent). Arrows represent movement of messages or objects. A mobile agent can be defined as a mobile object that possesses autonomy, where autonomy was previously defined as the ability to revise ones objective. In order to support autonomy in an entity we need some way of storing previous occurrences, e.g. state. Which means that a message or a piece of mobile code cannot by itself support autonomy. We also require some kind of processing in order to make the decision to change an objective or method of achieving it, which means that by itself a chain message cannot be autonomous, although by operating in tandem with the processing ability of multiple stationary agents, autonomous behaviour can be achieved. Which leaves mobile objects, which carry all the components required to support autonomy. This by itself does not make a mobile object an autonomous entity, but given that it is set up with an objective and framework for revising it, it may be made autonomous, and we would suggest that in this case that it is worth breaking out a new term, i.e. Mobile Agent. So, just to be clear about the distinction we are making: in the serial itinerary of figure 2, a mobile object will visit all possible locations, while a mobile agent has the ability to stop and revise the locations it plans to visit at In the simplest case this could be a while}} loop monitoring some environmental variable; the complexity of the decision making process is not at issue here.
16
S. Joseph and T.
Kawamura
any point. While the reader might disagree with the use of these particular words, there does seem to be a need to distinguish between the two concepts, particularly since, as we shall discuss in the next section, the presence of autonomy enables a more efficient usage of network resources.
1.5 Efficient Use of Network Resources It might well be the case that there is no killer application for mobile agents or indeed for non-mobile agents. Unlike previous "killer-apps." such as spreadsheets or web-browsers that introduced users to a new way of using a computer, agents should perhaps instead be considered as a development methodology with no associated killer-app. There is perhaps little disagreement that software should be easy to develop, maintain, upgrade, modify, re-use, and fail gracefully. One might go so far as to suggest that these kinds of qualities are likely to be provided by systems based on independent autonomous modules, or indeed agents. The pertinent question is what is the associated cost of creating such a system, and will we suffer a loss of efficiency as a consequence. What is efficiency? When we employ a system to achieve some objective on our behalf, any number of resources may be consumed, such as our patience, or emotional resolve, but more quantifiably, things like time (operation-, development-, preparation-, maintenance-), CPU cycles, network bandwidth, heap memory usage. In determining whether a (mobile) agent system is helping us achieve our goals it is important to look at all the different resources that are consumed by its operation in comparison with alternate systems. Some are more difficult to measure than others, and different people and organisations put different premiums on different resources. The authors' research into mobile-agents has focused on time and bandwidth consumption since these are considered to currently be in short supply. If we can keep all of this in mind then we might be able to assess the agent-oriented metaphor with a greater degree of objectivity than previously. The OO metaphor has overheads in terms of specifying and maintaining object hierarchies and permissions, but it seems to have become widely accepted that this is outweighed by the greater maintainability and flexibility of the code developed in this fashion. If we can show that the costs of constructing more complex agent-oriented systems is outweighed by some
Why Autonomy
Makes the Agent
17
similar advantage then perhaps we can put some arguments about agents to rest.
=3: V s
Figure 4. Communicating across the network with RPC calls. Copyright General Magic 1996. A key paper in the recent history of the mobile agent field is the Telescript white paper [33] in which some benefits of using mobile agents were introduced. There are two diagrams from this paper that have been reproduced both graphically and logically in many papers/talks/discussions on mobile agents. The first diagram shows us the Remote Procedure Call (RPC) paradigm approach to communicating with a remote location (figure 3), while the second (figure 4) indicates how all the messy individual communication strands of the RPC can be avoided by sending out a mobile agent. The central idea being that the mobile agent can reduce the number of messages moving around the network, and the start location (perhaps a user on their home computer or Personal Data Assistant • PDA) can be disconnected from the network.
G
-r^U^c
Figure 5. Communicating across the network with mobile agents. Copyright General Magic 1996. The advantage of being able to disconnect from the network is tied up with the idea that one is paying for access to the network, i.e. connecting twice for twenty seconds, half an hour apart will be a lot cheaper than being continuously connected for half an hour. While this might be the case for a
18
S. Joseph and T.
Kawamura
lot of users connecting to the network through a phone-company stranglehold, it in fact does not work well as an argument for using mobile agents throughout the network. A TCP/IP based system will break the mobile agent up into packets in order to transmit it, so the real question becomes, "Is the agent larger than the sum of the size of the messages it is replacing?". Or more generally does encoding our communication in terms of a mobile agent gain any tangible efficiency improvements over encoding it as a sequence of messages? The problem is predicting which communication encoding will be more effective for a given task and a network environment. 1.5.1 Prediction Issue The use of an agent-oriented development methodology helps in the design and maintenance of large software systems, at least as far as making them more comprehensible. But this does not automatically mean that mobile agents will necessarily have any advantage over a group of stationary agents communicating via messages. To illustrate the point let us imagine an example application that is representative of those often used to advocate mobile agent advantages. Let us say that we are searching for a number of web pages, from a variety of different search engines, a meta-search problem so to speak (see figure 6.). Search engines currently available on the web allow us to submit a set of search terms, but will not host our mobile agents. In some future situation in which we could send mobile agents out to web search engines, or in some intranet enterprise environment where database wrappers can host mobile agents [19], we might be tempted to try and send out a single mobile agent rather than lots of separate queries.
Why Autonomy s
User
Connect by Agent?
1*& / f
Makes the Agent
19
~s
j f Search Engine Interface/ A ^ V Database Wrapper J
Connect by Message?
Database
internet/Intranet
Search Engine Interface/A \ Database Wrapper J
~. i f . Database
' '
||
Search Engine Interface/ A \ Database Wrapper J
Database
Figure 6. MetaSearch through the Internet or Intranet. Quite apart from whether we might benefit from a multi-hop mobile agent performing this search for us, we can ask whether we can gain anything from having our agent perform some local processing at a single remote search engine/database wrapper. For example perhaps we are keen not to receive more than ten results from each remote location; perhaps we are searching for documents that contain the word "Microsoft", but rather than just returning the top ten hits when we get more than ten, maybe we would like the search to be narrowed by the use of additional keywords; a sort of conditional increase in search specificity as shown by the flow-chart in figure 7.
20
S. Joseph and T.
Kawamura
"Microsoft"?
No . ht¥.
r~" \
Matches
""•-.
>10?
\
Yes
..--
(Return) "Monopoly"'
No -
^--^ Yes
.--' Matches >10?
\
(Return) Figure 7. Conditionally increasing search specificity The flow chart summarises the kind of code that an agent might execute at a remote location as part of our meta-search. The main point is that if the number of matched documents is actually less than the threshold then all the information apart from the first search term is not needed. Sending out code has just consumed bandwidth without delivering any benefits. Of course, you cry, sometimes the rest of the code will be used, just not on every occasion. Exactly, but what is the likelihood that we will need the extra code or indeed extra information? Clearly we need to hedge our bets; in a search where we expect large numbers of results to require some semantic compression at a remote location, then we can happily send out lots of code and data just in case to make sure we don't take up too much bandwidth. However, we need to be more specific about the details of this trade-off. If we want to show any kind of non-situation specific advantage of transferring code/agents over the network, we need to be able predict the kinds of time/bandwidth efficiency savings they will create against the time/bandwidth their implementation consumes.
Why Autonomy
Makes the Agent
21
Joseph et al. [18] work through a more specific example of an object search application, showing that the ability to roughly predict the location of an object allows efficient switching between two different search protocols. If we refer back to figure 3 the parallel diagram indicates either Mobile Code or Message transfer, while the serial diagram in figure 2. indicates Mobile Object or Chain Message paradigms. Let us re-emphasise what it means to change our mobile object into a mobile agent. In the serial diagram we can see that the presence of behavioural autonomy, the ability to adjust one's method of achieving a goal, would allow the entity being transferred around the system to return early if the desired item was found, the network environment changed etc. The ability to adjust a plan, in this case to visit four locations in sequence, and return to base at will allows network resources to be conserved. For our single-hop agent performing meta-search, autonomy concerns controlling when to finish processing and return the results. All this requires is a while loop waiting for some change (achieving the goal), which can be achieved using a mobile object, you protest; but wait, the OO paradigm has nothing to say about whether or not that kind of framework should be set up. What the AO framework should provide is a way for these goals and the circumstances under which they should be adjusted, to be easily specified [34]. The remaining issue is how can an agent make a decision to adjust its goals or method of achieving them, if it can't predict the effects of the change. It is the authors' humble opinion that in the absence of predictive ability, agents cannot effectively make decisions, except in relatively simple environments. If a problem is sufficiently well understood then the probabilities of any occurrence might well be known, but in all those really interesting problems, where they aren't known in advance, we are forced to rely upon learning as we go along. 1.5.2 Learning We define learning as the adjustment of a model of the environment in response to experience of the environment, with the implicit objective being that one is trying to create a model that accurately reflects the true nature of the environment, or perhaps more specifically those sections of the environment that influence the objectives an agent is trying to achieve. In these terms the simple updating of a location database to accurately reflect the contents of a network location can be considered learning, but really what additional characteristics are required? Where learning is mentioned
22
S. Joseph and T.
Kawamura
one tends to think of the benefits gained through generalisation and analogy, which are in fact properties of the way in which the environment is represented in the memory of our learner. A more concrete example of this is provided by Joseph et al.[18], which we summarise here. Specifically that learning about the location of objects within a distributed network environment can lead to a more efficient use of resources. Essentially the particular learning algorithm used is not as important as the representation of the objects; although the learning algorithm needs to be able to output probabilistic estimates of an objects location (for a review of probabilistic learners see Buntine [6]), and Joseph et al. [18] used a representation based on object type5 after Segal [27]. To make a long story short, when a chain message or mobile agent is performing a serial search for an object, knowing the probability that it exists in each of the search locations allows one to estimate when the search will terminate. Even when using parallel messaging the same estimates can be used to choose a subset of possible locations to make an initial inquiry. That information then allows the alternative methods of achieving the same objective to be quantitatively compared and the most efficient option selected. The natural question that follows is "how can we be sure that our probability estimates are correct?" We can't, but the reasoning of Etzioni [11] seems sound, that we should use the results of previous searches, or processing, in order to create future predictions. It might also be expedient to rank the different representational units in terms of their predictive ability, such as finding that knowing a file is a word file allows us to predict its location with some accuracy, while knowing that something is a binary file is not so useful. In the meta-search example an estimate of how many results will be generated in response to a particular set of search terms can perform the same function. Effectively setting up a profile of which search engines are experts in which domains so that the most appropriate subset can be contacted depending on our current query. One can easily imagine a network of static agents that function as searchable databases learning about each others specialities and forwarding queries based on their mutual understanding of each other. It is in this kind of environment that one could practically use mobile agents and expect to make measurable gains in
In fact file type: executable, text, word file, etc.
Why Autonomy
Makes the Agent
23
efficiency, or at least be able to determine with some accuracy if there were any gains to be made.
1.6 Discussion The main points that have been brought up in this chapter are that the Agent-Oriented approach to software might have something to offer over and above the Object-Oriented approach. We can think of an Agent-Oriented approach as offering a developer an easy way to establish an ACL and a format for specifying the objectives of individual agents. This can be thought of as just a shift in terminology, but the authors of this chapter go further, to suggest that if this Agent-Oriented framework allows for the dynamic adjustment of agent's objectives, then functional differences can be achieved in the system performance and efficiency. We have tried to present an argument that there is a quantifiable difference between those mobile objects that can decide to adjust their objectives en route and those that can't; and if we want to take advantage of "mobile objects" they should be able to switch behaviours to suit circumstances and make those decisions on the basis of predictions about the most effective course of action. There are of course many unanswered questions, such as what our Agent-Oriented programming languages should look like, and what functions they should provide in order to assist developers in creating agents with objectives that can be modified in the face of their ongoing experience; which we hope to make the subject of future publications. For an example of the kind of work going on in this area we refer readers to the work of Wooldridge et al. [35].
Appendix Now, on to some of those prickly issues we dodged in the main sections. Firstly there is the question of how we specify an agent's objective, for example in terms of the Belief-Desire-Intention (BDI) framework [25]. While this kind of framework is clearly very important in the long term, it would seem expedient in the short term at least to simply encourage agent developers that they think about their agent's objectives. Insisting that agent system developers employ (and by implication learn) an unfamiliar new
24
S. Joseph and T.
Kawamura
objective modelling language is likely to put many people off. In the short term the objectives of agents get specified implicitly by way of any number of restrictions on agent behaviour (security restrictions, temporal restrictions, etc.). For example, an agent may be trying to load balance, but be restricted in which resources can be used to balance the processing load over a number of machines, for security reasons or whatever; this creates a bounded objective, e.g. "Solve this problem, but don't use machine B to do it". It is likely to only be a matter of time before agreed upon (or at least widely used) formats for these kind of specifications arise, but in their advance the authors' believe it is useful to work towards some kind of philosophically consistent agent-oriented metaphor before working on a detailed specification; in much the same way that the object-oriented language specifications came after the philosophical development of the OO metaphor. Next comes the problem of definitions and their value. Throughout this paper we make a number of definitions, and the fall out may be that, for example, some systems people would not like to think of as agents will be labelled as agents. An analogy could be drawn with trying to define a concept like "alive", which might be the quality an entity possesses if it matches a number of criteria such as growth, metabolism, energy use, nutrition, respiration, reproduction and response to stimuli (as one of the authors seems to remember from a high-school biology textbook). The point is that with any such definition there might be unfortunate side effects such as a car-making factory being classified as more "alive" than a virus, or that kind of thing. While this might be regarded as a horrific consequence by some, it seems that rather than repeatedly modifying definitions to try and make them fit in with our "intuition" about what is meant by a particular term, we should focus on making definitions that make a distinction that is of some value, e.g. whether or not a system can modify its stated objectives, and gaining insight from the categorisations that follow. There are various issues relating the messaging protocols since autonomy is by its nature tied up with the ability to communicate. This might not be clear at first but if autonomy is defined as ah ability to modify ones objectives there needs to be some basis to make those decisions upon. In the absence of any interaction with an environment (whether or not it has any other autonomous entities in it our not), any such decision becomes of no consequence. In a relatively static environment we might want to talk about sensing instead of communicating, but when we think about computer
Why Autonomy
Makes the Agent
25
networks, any sensing of the environment takes place in an active fashion, i.e. we might just be "sensing" the file system, but increasingly we are communicating with some file system agent or wrapper. In order for any sensing or communication to be useful in the computer network environment, protocols are necessary; or perhaps we mean ontologies? The distinction becomes complex and a full investigation is beyond the scope of this paper. In order to summarise the current convoluted state of affairs let us describe four possible outcomes of current research: 1. Everyone spontaneously agrees on some communication framework/protocol (FIPA-ACL, KQML, Labrou et al., [21]) 2. Someone works out how to formally specify all the different ACL (Agent Communication Language) dialects within one overarching framework, that includes lots of helpful ontology brokering services that make communication work [30] 3. Someone figures out how to give agents enough wits to be able to infer the meanings of speech acts from the context in which they are communicated [20] 4. Some combination of the above. While this is not a trivial issue, it is possible to overlook it in a given agent system by assuming that all the agents subscribe to a single protocol, which is often the case in most implemented agent systems. There is also the "who does what for free?" issue. Jennings et al. [17] summarise the difference between objects and agents in terms of the slogan "Objects do it for free; agents do it for money". However due to possible semantic conflict with the saying "Professionals do it for money; amateurs do it for the love of it", we suggest the possible alternate slogan "Objects do it because they have to; agents do it because they want to" in order to directly capture the point that the agent-oriented approach is advocating that software entities (i.e. agents) have a policy regarding their objectives: what they are intending to achieve, and which objectives they are prepared to collaborate in achieving. Finally we should look more closely at our definition of code. The issue being that any piece of information could be taken to represent an operation. We can get into complex epistemological questions about whether a meteor shower, or RNA protein manufacture constitute data processing. However for the current purposes we seek to define an operation as something that can
26
S. Joseph and T.
Kawamura
be interpreted within our current system as an operation. For example a simple list of letters (e.g. E,F,U,S) could be taken to represent a series of operations in a system set up to recognise that representation. In summary we are tempted to think of code as a set of operations that can be mterpreted within the system in question, although a different distinction could be created by suggesting that code distinguishes itself from data by having control flow, i.e. that conditional statements can be interpreted so that different policies will be employed under different circumstances. One final note is that we could actually construct a read-only chain message by having the remote stationary agent checking its own ID against the read-only destination IDs in the chain message, but this is not a general solution and would create security issues about untrusted hosts knowing the complete itinerary of the chain message, although some Peer to Peer (P2P) protocols do use this approach. Also of note is that we have a lot more possibilities than just sending purely parallel or purely serial messages, but then our search space gets very big very quickly. Still, these possibilities do deserve further attention.
Acknowledgements We wish to thank Takeshi Aikawa, Leader of the Computer & Network Systems Laboratory, for allowing us the opportunity to conduct this research, and Shinichi Honiden & Akihiko Ohsuga for their input and support.
References 1. Baldi M. & Picco G. P. Evaluating the Tradeoffs of Mobile Code Design Paradigms in Network Management Applications. In Kemmerer R. and Futatsugi K. (Eds.), Proc. 20th Int. Conf. Soft. Eng. (ICSE'98), ACM Press, 146155, 1998. 2. Barber S. K. & Martin C. Specification, Measurement, and Adjustment of Agent Autonomy: Theory and Implementation. Technical Report TR99-UT-LIPSAGENTS-04, University of Texas, 1999. 3. Beale R. & Wood, A. Agent-based Interaction. Proc. People and Computers IX: Proceedings of HCI'94, Glasgow, UK, 239-245, 1994.
Why Autonomy
Makes the Agent
27
4. Binder W. Design and Implementation of the J-SEAL2 Mobile Agent Kernel. http://cui.umge. ch/~ecoopws/wsOO/index.html 6th ECOOP Workshop on Mobile Object Systems: Operating System Support, Security, and Programming Languages, 2000. 5. Brown S. M., Santos Jr. E., Banks S. B., & Oxley M. E. Using Explicit Requirements and Metrics for Interface Agents User Model Correction. Proc. Second InternationalConference on Autonomous Agents, Minneapolis/St. Paul, MN. 1-7, 1998. 6. Buntine W. A guide to the literature on learning probabilistic networks from data. IEEE Trans. Knowl. & Data Eng., 8(2):195-210, 1996. 7. Carzaniga A., Picco G. P., & Vigna G. Designing distributed applications with mobile code paradigms. In Taylor R. (Ed.), Proc. 19th Int. Conf. Soft. Eng. (ICSE'97), ACM Press, 22-32, 1997. 8. Castelfranchi C. Guarantees for Autonomy in Cognitive Agent Architecture. Intelligent Agents: ECAI-94 Workshop on Agents Theories, Architectures, and Languages, M. J. Wooldridge and N. R. Jennings, Eds. Berlin: Springer-Verlag. 56-70, 1995. 9. Chia TH. & Kannapan S. Strategically mobile agents. In Rothermel K. and Popescu-Zeletin R. (Eds.) Lecture Notes in Computer Science: Mobile Agents, Springer, 1219:174-185, 1997. 10. Covrigaru, A. A. & Lindsay R. K. Deterministic Autonomous Systems. AI Magazine, vol.12. 110-117, 1991. 11. Etzioni 0. Embedding decision-analytic control in a learning architecture. Artificial Intelligence, 49:129-159, 1991. 12. Etzioni O. & Weld D. S. Intelligent Agents on the Internet: Fact, Fiction, and Forecast. IEEE Expert. 10(4). 44-49, 1995. 13. Evans M., Anderson J. & Crysdale G Achieving Flexible Autonomy in MultiAgent Systems Using Constraints. Applied Artificial Intelligence, vol. 6. 103-126,1992. 14. Foner L. N. What's An Agent, Anyway? A Sociological Case Study. MIT Media Lab, Boston, Technical Report, Agents Memo 93-01, 1993. 15. Fuggetta A., Picco G. P., and Vigna G. Understanding code mobility. IEEE Trans. Soft. Eng., 24(5):342-361, 1998. 16. Ismail L. & Hagimont D. A performance evaluation of the mobile agent paradigm. OOPSLA, ACM SigPlan Notices, 34(10):306-313, 1998. 17. Jennings N. R., Sycara K., and Wooldridge M. A roadmap of agent research and development. Autonomous Agents and Multi-Agent Systems, 1:7-38, 1998 18. Joseph S., Hattori M. & Kase N. Efficient Search Mechanisms For Learning Mobile Agent Systems. Concurrency: Practise and Experiment. In Press. 19. Kawamura T., Joseph S., Hasegawa T., Ohsuga A. & Honiden S. Evaluating the Fundamental Agent Paradigms. In Kotz D. & Mattern, F. (eds) Agent Systems, Mobile Agents, and Applications. Lecture Notes in Computer Science 1882, 2000.
28
S. Joseph and T.
Kawamura
20. Kirby S. Syntax out of learning: the cultural evolution of structured communication in a population of induction algorithms. In Floreano D., Nicoud J.-D. and Mondada F. (eds). Advances in Artificial Life. Lecture Notes in Computer Science 1674, 1999. 21. Labrou Y., Finin T., & Peng Y. Agent communication languages: the current landscape. IEEE Int. Sys. \& their App., 14:45-52, 1999. 22. Luck M. & D'Inverno M. P. A Formal Framework for Agency and Autonomy. Proc.First International Conference on Multi-Agents Systems, San Francisco, CA, 254-26, 1995. 23. Milojicic D. Mobile agent applications. IEEE Concurrency, 80-90, 1999 24. Nwana H. S. & Ndumu D. T. A perspective on software agents research. To appear in Knowledge Engineering review. 25. Rao A. S. & Georgeff M. P. Modeling rational agents within a BDI-architecture. In Fikes R. & Sandewall E. Proceedings of Knowledge Representation and Reasoning, Morgan Kaufmann. 473-484, 1991. 26. Schwartz C. Web search engines. Journal of the American Society for Information Science, 49(ll):973-982, 1998. 27. Segal R. St. Bernard: the file retrieving softbot. Unpublished Technical Report, FR-35, Washington University, 1993. 28. Strasser, M. & Schwehm, M. A performance model for mobile agent systems. In H Arabnia (Ed.) Proc. Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA'97) II: 1132-1140, 1997. 29. Stroustrup B. What is "Object-Oriented Programming"? AT&T Bell Laboratories Technical Report, 1991. 30. Sycara K., Lu. J & Klusch M. Interoperability amongst heterogeneous software agents on the internet. Carnegie Mellon University, PA(USA), Technical Report CMU-RI-TR-98-22, 1998. 31. Tahara Y., Ohsuga A. & Honiden S. Agent system development method based on agent. Proc. ICSE, IEEE, 1999. 32. Theilmann W. & Rothermel K. Disseminating mobile agents for distributed information filtering. Proc ASA/MA, IEEE Press, to appear. 33. White J. Mobile agents white paper. http:// wwwiiuf.ch/~chantem/white_whitepaper/ whitepaper.html 1996. 34. Woolridge M., Jennings N. R. & Kinny D. A Methodology for Agent-Oriented Analysis and Design. Autonomous Agents 99: 69-76, 1999. 35. Woolridge M., Jennings N. R. & Kinny D. The Gaia Methodology for AgentOriented Analysis and Design. Autonomous Agents and Multi-Agent Systems: 3,285-312,2000.
Chapter 2
Knowledge Granularity Spectrum, Action Pyramid, and the Scaling Problem
Y i m i n g Ye IBM T . J . Watson Research Center, USA J o h n K. Tsotsos York University, C a n a d a
2.1
Introduction
This paper studies the scaling problem with respect to an agent - a computational system t h a t inhabits dynamic, unpredictable environments. An agent has sensors to gather d a t a about the environment and can interpret this d a t a to reflect events in the environment. Furthermore, it can execute motor c o m m a n d s t h a t produce effects in the environment. Usually, it has certain knowledge about itself and the world. This knowledge can be used to guide its action selection process when exhibiting goal-directed behaviors [l] [13]. It is i m p o r t a n t for an agent to choose a reasonable representation scheme in order to scale to the task at hand. There are two extremes regarding granularity of knowledge representation. At one end of the spectrum is the scheme t h a t the selection of actions requires little or even no knowledge representation [3]. At the other end of the spectrum is the purely planning scheme which requires the agent to maintain and use as much detailed knowledge as possible. Experience suggests t h a t neither of the above two extreme schemes are capable of producing the range of behaviors required by intelligent agents in a dynamic, unpredictable environment. For example, Tyrrell [18] has noted the difficulty of applying, without modification, the model of Brooks [3] to the problem of modeling
29
30
Y. Ye and J. K. Tsotsos
action selection in animates whose behavior is supposed to mirror t h a t of real animals. On the other hand, although it is theoretically possible to compute the optimal action selection policy for an agent t h a t has a fixed set of goals and t h a t lives in a deterministic or probabilistic environment [18], it is impossible to do so in most practical situations for the following reasons: (A) resource limitations (time limit, c o m p u t a t i o n complexity [20], memory limit); (B) incomplete and incorrect information (knowledge difference [21], sensor noise, etc); (C) dynamic, non-deterministic environment. Thus, m a n y researchers argue to use hybrid architectures [19] [9] [15] [ l l ] , a combination of classical and alternative approaches, to build agent systems. One example is the layered architecture [9] [15]. In such an architecture, an agent's control subsystems are arranged into a hierarchy, with higher layers dealing with information at increasing levels of abstraction. Thus, the very lowest layer might m a p raw sensor d a t a directly onto effector outputs, while the uppermost layer deals with long-term goals. Or, the upper abstract space might be used to solve a problem and then the solution might be refined at successive levels of detail by inserting operators to achieve the conditions t h a t were ignored in the more abstract spaces [12]
[14]. Much of the previous work on scaling emphasizes the absolute complexities (efficiency) of planning systems. We, however, believe t h a t the problem of scaling is a relative term and is closely related to the task requirements of an agent in uncertain, dynamic or real-time environments. We will say t h a t an agent scales to a given task, if the agent's planning syst e m and knowledge representation scheme are able to generate the range of behaviors required by the task. We consider knowledge abstraction over a spectrum based on the granularity of knowledge representation. Our approach is different from previous approaches [9] [15] in t h a t there is no logical relationship between elements of any two adjacent layers. We study the scaling problem related to different representation schemes, be it a single granularity scheme or a hybrid granularity scheme. Many factors, such as the planning engine, the way knowledge is represented, and the dynamic environment can influence whether an agent scales to a given task. Here, we concentrate on the influences of knowledge granularity. It is obvious t h a t knowledge granularity can influence the efficiency of a given inference engine, since granularity influences the amount of d a t a to be processed by the engine. It has been suggested t h a t one m a y increase the computational efficiency by limiting the form of the statements in the knowledge base [16]
Knowledge Granularity Spectrum, Action Pyramid, and the Scaling Problem 31 [7]. In this paper, we study the relationship between different representation schemes and the performance of an agent's planning system. T h e goal is to find the proper scheme for representing an agent's knowledge such t h a t the representation allows the agent to scale to a given task. We address the following issues. T h e first is how to define the granularity of an agent's representation of a certain kind of knowledge. T h e second is how this granularity influences the agent's action selection performance. T h e third is how the hierarchical granularity representation influences the agent's action selection performance. T h e study of these issues can help an agent in finding a reasonable granularity or scheme of representation such t h a t its behavior can scale to a given task.
2.2
A Case Study: t h e Object Search Agent
To start, we use object search as an example to study the influence of knowledge granularity on the performance of an agent. Object search is the task of searching for a given object in a given environment by a robotic agent equipped with a pan, tilt, and zoom camera (Figure (1))- It is clear t h a t exhaustive, brute-force blind search will suffice for its solution; however, the goal of the agent is to design efficient strategies for search, because exhaustive search is computationally and mechanically prohibitive for nontrivial situations. T h e action selection task for the agent refers to the task of selecting the sensing parameters (the camera's position, viewing direction and viewing angle size) so as to bring the target into the field of view of the sensor and to make the target in the image easily detectable by the given recognition algorithm. Sensor planning for object search is very i m p o r t a n t if a robot is to interact intelligently a n d effectively with its environment. In [23] [20] Ye and Tsotsos systematically study the task of object search and give an explicit algorithm to control the state parameters of the camera by considering both the search agent's knowledge about the target distribution and the ability of the recognition algorithm. In this section, we first briefly describe the two dimensional object search agent and its action selection strategy (please refer to [23] for corresponding three dimensional descriptions). Then we study the issue of knowledge granularity with respect to object search agent and present experimental results.
32
Y. Ye and J. K.
2.2.1
Task
Tsotsos
Formulation
We need to formulate the agent's sensor planning task in a way that incorporates the available knowledge of the agent and the detection ability of the recognition algorithm. The search region 0 can be in any two dimensional from/such as a two dimensional room with many two dimensional tables, etc. In practice, 0 is tessellated into a series of elements cit Q = (J? = 1 Ci and Cif\cj = 0 for i # j . In the rest of the paper, it is assumed that the search region is a two dimensional office-like enYironment and it is tessellated into little square cells of the same size. An operation f = f (SDC, yc, #, wt a) is an action of the search agent within the region 0 . Here (xC}yc) is the position of the two dimensional camera center (the origin of the camera viewing axis); # is the direction of the camera Yiewing axis, 0 < tf < 2w; w is the width of the viewing angle of the camera; and a is the recognition algorithm used to detect the target.
Fig. 2.1 An example hardware of a search agent and a search environment, (a) The search agent - a mobile platform equipped with a camera; (b) The pan, tilt, and zoom camera on the platform; (c) An example search region.
The agent's knowledge about the possible target position can be specified by a probability distribution function p , so that p(cj,Tf) gives the agent's knowledge about the probability that the center of the target is within square c» before an action f (where Tf is the time just before f is applied). Note, we use p(c 0> Tf) to represent the probability that the target is outside the search region at time Tf. The detection function on O is a function b , such that h(cit £) gives the conditional probability of detecting the target given that the center of the target is located within Ci and the operation is f. For any operation, if the projection of the center of the square c* is outside the image, we assume
Knowledge Granularity Spectrum, Action Pyramid, and the Scaling Problem 33
b(ci,f) = 0. If the square is occluded or it is too far from the camera or too near to the camera, we also have b(ci,f) = 0. It is obvious that the probability of detecting the target by applying action f is given by
P(f) = £ > ( C i , r f ) b ( C i , f ) .
(2.1)
t=i
The reason that the term Tf is introduced in the calculation of P(f) is that the probability distribution needs to be updated whenever an action fails. Here we use Bayes' formula. Let o^ be the event that the center of the target is in square Ci, and a0 be the event that the center of the target is outside the search region. Let (3 be the event that after applying a recognition action, the recognizer successfully detects the target. Then P(-i/3 | ai) = 1 — b(c;,f). It is obvious that the updated probability distribution value after an action f failed should be P(oti | "'/?), thus we have p(c;, Tf+) = P(cti \ ~^P)- Where Tf+ is the time after f is applied. Since the above events ai, . . ., an, a0 are mutually complementary and exclusive, from Bayes formula we get the following probability updating rule:
„/, T ^ , P(ci,rf)(l-b(cj,f)) p(e;,T f + ) i— = ^ 5 — — — — . (2.2) EjiiP(ci,Tf)(l-b(cj,f)) where i = 1,. . ., n, o. The cost t(f) gives the total time needed to perform the operation f. Let On be the set of all the possible operations that can be applied. The effort allocation F = {f l t .. .,f^} gives the ordered set of operations applied in the search, where f; £ On- It is clear that the probability of detecting the target by this allocation is:
fc-i P[F) = P(fO + [1 - P(f 1 )]J'(f 2 ) + . . . + { f ] [ l - P(fi)]}P(f fc ) .
(2.3)
i=l
The total cost for applying this allocation is: k
T[F] = £ t ( * i ) . »=i
(2.4)
34
Y. Ye and J. K.
Tsotsos
Suppose K is the total time t h a t can be allowed in applying selected actions during the search process, then the task of sensor planning for object search can be defined as finding an allocation F C O n , which satisfies T[F] < K and maximizes P[F]. Since this task is NP-Complete [20], we consider a simpler problem: decide only which is the very next action to execute. Our objective then is to select as the next action the one t h a t maximizes the t e r m
m =™ •
(2.5)
We have proved t h a t in some situations, the one step look ahead strategy m a y lead to an optimal answer.
2.2.2
The
Sensor
Planning
Strategy
The agent needs to select the camera's viewing angle size and viewing direction for the next action f such t h a t E(f) is maximized. Normally, the space of available candidate actions is huge, and it is impossible to take this huge space of candidate actions into consideration. According to the image formation process and geometric relations, we have developed a m e t h o d t h a t can tessellate this huge space of candidate actions into a small number of actions t h a t must be tried. A brief description of the sensor planning strategy is as follows (please refer to [23] for detail). For a given recognition algorithm, there are m a n y possible viewing angle sizes. However, the whole search region can be examined with high probability of detection using only a small n u m b e r of them. For a given angle size, the probability of successfully recognizing the target is high only when the target is within a certain range of distance. This range is called the effective range for the given angle size. Our purpose here is to select those angles whose effective ranges will cover the entire depth TV of the search region, and at the same time there will be no overlap of their effective ranges. Suppose t h a t the biggest viewing angle for the camera is w0, and its effective range is [NQ,FO\. Then the necessary angle sizes (wi) (where 1 < i < no) and the corresponding effective ranges [Ni, Fi] (where 1 < i < no) are:
Knowledge Granularity
Wi =
Spectrum, Action Pyramid,
and the Scaling Problem
35
2arctan[(^)Han{if)];
(2.6)
For each angle size derived above, there are an infinite number of viewing directions t h a t can be considered. We have designed an algorithm t h a t can generate only directions such t h a t their union can cover the whole viewing sphere with m i n i m u m overlap [23]. Only the actions with the viewing angle sizes and the corresponding directions obtained by the above m e t h o d are taken as the candidate actions. So, the huge space of possible sensing actions is decomposed into a finite set of actions t h a t must be tried. Finally, E(f) can be used to select a m o n g t h e m for the best viewing angle size and direction. After the selected action is applied, if the target is not detected, the probability distribution will be u p d a t e d and a new action will be selected again. If the current position does not seem to find the target, the agent will select a new position and begin to search for the target at the new position.
2.2.3
Knowledge
Granularity
for
Search
Agent
As we have illustrated above, the object search agent uses its knowledge a b o u t the target position to guide its action selection process. This knowledge is encoded as a discrete probability density t h a t is updated whenever a sensing action occurs. To do this, the search environment is tessellated into a number of small squares, and each square c is associated with a probability p(c). To perfectly encode the agent's knowledge, the size of the square should be infinitely small - resulting in a continuous encoding of the knowledge. But this will not work in general because an infinite a m o u n t of memory is needed. In order to make the system work, we are forced to represent the knowledge discretely - to use squares with finite size. This gives rise to an interesting question: how we should determine the granularity of the representation (the size of the square) such t h a t the best effects or reasonable effects can be generated. To make the discussion easier, a = ( s , k s , k p , G , I , t s e i e c t , tapPiy, rameters of the agent, kjg is the configuration of the environment.
we denote an object search agent a as M, T, U). Where s is the state paagent's knowledge a b o u t the geometric k p is the agent's knowledge about the
36
Y. Ye and J. K.
Tsotsos
target position a n d is encoded as probabilities associated with tessellated cubes. G is the granularity function, which gives a measurement of the granularity of a certain knowledge representation scheme. I is the inference engine, which selects actions and updates agent's knowledge. By applying I to hs and k p , an action is generated. T h e term tappiy is the cost function for applying actions: tappiy(f) gives the t i m e needed to apply an action f and is determined by the time needed to take a picture and run the recognition algorithms. T h e term tseiect is the cost function for selecting actions. M is the agent's memory limit. T h e m e m o r y used to store all the knowledge and inference algorithms should not exceed this limit. T is the time limit. T h e total time spent by the agent in selecting actions and executing actions should be within T. U is the utility function, which measures how well the agent performs during its search process within T. T h e granularity function G can be defined as the total m e m o r y used by the agent to represent a certain kind of knowledge divided by the m e m ory used by the agent to represent a basic element of the corresponding knowledge. For example, G ( k p ) gives the granularity measurement of the knowledge representation scheme kp. Suppose the length of the search environment is L units (the side length of a square is one unit), the width of the search environment is W units. Then the total environment contains LW squares. T h e probability p(c) associated with each square c is a basic element in the representation scheme k p . Suppose m[p(c)] gives the m e m ory of the agent used to represent p(c). Then the total memory for the agent to represent kp is LWm\p(c)]. Thus, G ( k P ) = m[T(e)i = LW. Here we study the influence of G ( k p ) on the performance of the search agent. This performance can be measured by the utility a n d time limit pair {U,T). Where U = P[F] is calculated by Formula (2.3). T h e actions in F are selected according to Section 2. For a finer granularity G ( k p ) , more time will be spent on action selection, leaving less time for action execution. T h e selected actions are generally with better quality because the calculation of E(F) is more accurate in most situations. For a coarser granularity G ( k p ) , less time will be spent on action selection, leaving more time for action execution. The selected actions are generally of lower quality because calculation of E(F) is less accurate in most situations. In the following sections, we will present experiments to illustrate the influence of knowledge granularity on the agent's performance.
Knowledge Granularity Spectrum, Action Pyramid,
2.2.4
and the Scaling Problem
37
Experiments
A two dimensional simulation object search system is implemented to test the influence of the knowledge granularity on the performance of the action selection process. T h e system is implemented in C on IBM RISC System/6000. T h e search environment is a two dimensional square as shown in Figure 2.2(a). If we tessellate the two dimensional square into 1000 x 1000 small square cells, then the relevant d a t a for the system is as follows. T h e two dimensional camera has two effective angle sizes. T h e width of the first angle size is 40°. Its effective range is [50,150]. Its detection function is: 6(c,f) = D(l)(l - | | j - ) , where a < 20.5° is the angle between the agent's viewing direction and the line connecting the agent center and the cell center, D(l) is as shown in Figure 2.2(c), and I is the distance from the cell center to the agent center. According to formulas in Section 2, the width of the second effective angle size is 14°, and its effective range is [150,450]. T h e initial target distribution is as follows. T h e outside probability is 0.05. For any cell c within region A (bounded by 30 < x < 75 and 30 < y < 75), p(c) = 0.000004. For any cell c within region C (bounded by 600 < x < 900 and 600 < y < 900), p{c) = 0.000005. For any other cell c, p(c) = 0.000001. T h e agent is at position [10, 10] in the beginning. We assume t h a t there is only one recognition algorithm, thus the time needed to execute any actions are same.
1
^
(luuo.iooui
j'
I
a
disunite A—
0.4 • 0.2 •
,1.1 '^€ •
I
(a)
A
ii
|
(b)
,
1
0 20 40 60 f^O 100 120 140 160 Dili
(c)
Fig. 2.2 (a) The two dimensional environment. The agent is at the lower left corner of the region. An obstacle is present within the region, (b) The two dimensional environment when it is tessellated into a square of size 1000 X 1000- (c) The value of D(l).
In the first group of experiments, the agent only selects actions at position [10,10]. In the second group of experiments, the agent first select 7
38
Y. Ye and J. K. Tsotsos
actions at position [10,10], then it moves to position [700, 400] to begin the new search. T h e following sections list the experimental results. 2.2.4.1
Knowledge
Granularity
and Action
Selection
Time
To select the next action f, the agent need to calculate P ( f ) (Equation (2.1)) for any candidate actions (Section 2). It is obvious t h a t the knowledge granularity G ( k p ) has a great influence on the action selection time tjeiect(f)- T h e higher the value of the knowledge granularity, the longer the time needed to select an action. We have performed a series of experiments to test the influence. T h e results are listed in the following table.
G(k p ) ^select
G(k p ) ^select
G(k p ) ^•select
[30 x 30] 15
[40 x 40] 30
50 x 50] 41
[60 x 60] 91
[70 x 70] 121
[80 x 80] 157
[90 x 90] 217
[100 x 100] 289
[200 x 200] 1083
[300 x 300] 2443
[400 x 400] 4380
[500 x 500] 7467
Table 1 Note t h a t tseiect(f) (measured in seconds) is obtained by taking the difference in times obtained from the c o m m a n d "system("date")" executed before the system enters the action selection module and after the system finishes the action select module. T h e average value for different actions with the same granularity is taken as the value of tseiect f ° r the corresponding granularity. T h e accuracy is within one second. 2.2.4.2
The Error Associated
with Knowledge
Granularity
Clearly the approximations involved in discretization will cause errors in calculating various values. In general, the higher the value of the knowledge granularity, the less the error caused by discretization. The error associated with knowledge granularity m a y influence the quality of the selected actions, and thus influence the performance of the agent. Figures 2.3(d)(e)(f)(g) show how the granularity influences the error in calculating -P(f). We notice t h a t in general the higher the knowledge granularity, the less the error of the calculated -P(f). For example, for
Knowledge Granularity Spectrum, Action Pyramid, 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0
1
1
!-
1
1 i— Real -9 Calculated -H— .
and the Scaling Problem
0.07 0.06 0.05
•
O
t
flhi
1 *
10 15 20 Action Index
0.04 0,03 0.02 0.01
?A*
5
0 25
0
30
5
10
15 20 Action Index
25
0
30
5
10 15 20 Action Index
25
30
() - i(100,100) as a function of gc and n3c. In order to make the comparison easier, we also draw the surface of z = 0. Figure 2.7(1,1) shows the situation t h a t the action selection t i m e is not influenced by the granularity a n d the number of actions to be selected from. In this situation, the two layer strategy is always worse t h a n the single layer strategy by a constant. This constant is the time used to pre-select the set of actions by the coarse layer. Figure 2.7(1,1)(1,2)(1,3) show the situation when the action selection time is not influence by granularity. In this case, adding a new coarse layer does not save time. Because the coarse layer itself will spend the same t i m e as the old granularity g0, and extra t i m e must be spent by g0 to select an action from the pre-selected action pools by gc. Figures 2.7(2,1)(2,2)(2,3)(2,4)(2,5) show the situation t h a t the influence of granularity on the action selection is governed by 72, while the influence of the number of actions is governed
Knowledge Granularity
Spectrum, Action Pyramid,
and the Scaling Problem
51
by 71, 72, 73, 741 and 75 respectively. We can notice t h a t the more sensitive the action selection time is influenced by the number of actions in the action pool, the better the two layer strategy. This is illustrated by the increase of the area of those gc and n9c t h a t is below the plane z = 0. T h e reason is t h a t a decrease in granularity for a more sensitive situation tends to have a bigger saving in action selection time. T h e same analysis can be applied to Figures 2.7(3,2)(3,3)(3,4), Figures 2.8(4,1)(4,2)(4,3)(4,4)(4,5), and Figures 2.8(5,1)(5,2)(5,3)(5,4)(5,5). From Figure 2.7 and Figure 2.8 we can also notice t h a t for a fixed granularity gc, the smaller the value of n9c, the better the two layer strategy. T h e reason is t h a t a smaller n3c tends to save t i m e for g0. We can also notice t h a t for a fixed ngc, the smaller the value of gc, the better the two layer strategy. From above experiment, we know t h a t in some situations adding a coarse layer can increase the performance of an agent. T h u s , when a single granularity does not allow the agent to scale to the task at hand, we can consider adding a coarse layer to increase the chances of scaling. To do this, we can first draw the performance figure as above, and then select the granularity t h a t corresponds to the lowest point on the surface as the granularity for the coarse layer.
2.5.2
Adding
a finer layer
to obtain
better
quality
actions
Another way to use hierarchical representation to increase performance and chances of scaling is to add a finer layer. T h e idea is to use the current granularity g0 to pre-select a small set of candidate actions, and then use a finer granularity gj to choose a better quality action to execute. T h e utility for the single layer strategy is:
For the two layer strategy, Suppose n3o is the number of actions t h a t must be selected by g0 in order to guarantee t h a t the actions selected by gj will reach a desired quality Q(gf). T h e time to select an action for the two layer strategy is: ts = t(g0,N) +t(gf , n 3 o ) Suppose the t o t a l time available for the agent is T. T h e utility of the new strategy is: U{9f)
= L
t. + t.0r o ,JV)'+*.^,»,.) Jga " )
52
Y. Ye and J. K.
Tsotsos
(2,4)
(2,5)
Fig. 2.7 The performance comparison when a coarse layer of granularity is added to pre-select a small set of actions to consider.
Knowledge Granularity Spectrum,
Action Pyramid,
53
(4,5)
(4,4)
(5,1)
and the Scaling Problem
(5,2)
(5,3)
Fig. 2.8 Continued: the performance comparison when a coarse layer of granularity is added to pre-select a small set of actions to consider.
54
Y. Ye and J. K.
Tsotsos
Experiments have been performed to show the performance difference of the new strategy and the old strategy, U&ijj = U(gf) — U(g0). In the experiments, we assume: T — 100, g0 = 100, N = 100. We also assume t h a t the action execution t i m e is te = 6. In general te has a big influence on the analyzing result. Here we take te = 6 as an example to study the influence of other factors on the agent performance. As in the previous section, we assume ts(g,n) — t\{g)t2{n). Q(g) is another function, which gives the quality of the action selected with granularity g. In the experiments, we take gj as one variable, and n9o as another variable, and we draw the surface formed by {/p from A to START (i.e. put q into WSD) ELSE excise all the relevant ranges q — • p I* q fails to appear */ W i t h the new rules, failed effects always cause new flaws (unsupported Preconds), which sooner or later will trigger some planning activities (Rules 1 and 2). As the results, some of the executed (thus removed) agents will be re-introduced into the plan, and will be re-executed in due time. Obviously this can be regarded as a nice mechanism to automatically handle the problem of process iteration. T h e feedback p a t h s of iteration are determined dynamically, automatically, and based on logical reasoning which is in the
112
N. Zhong, C. Liu and S. Ohsuga
core of AI planning. We again use the stars database as an example to handle the K D D process iteration. First we show how the coupling mode works. Based on the specifications of W S D , goal, and K D D agent types, the planner and the controller can cooperate to produce a full KDD process plan as shown in Figure 4.4. Next we show how the integration mode automatically solves the problem of process iteration. Suppose t h a t in the above scenario the KOSI agent KOSI-1 is being executed, and t h a t the Monitoring Rule (Rule 5) has detected t h a t the expected effect does not appear (i.e. Regression Models Store-1 is not produced or is not acceptable). Then, on one hand, according to the Agent Execution Rule (Rule 3), agent KOSI-1 is removed nevertheless when it times out; On the other hand, the ELSE part of the Monitoring Rule (Rule 5) excises all the relevant ranges, leaving the subsequent agent IIBR-1 with unsupported precondition: there is no proper regression models store as its input to work on. This unsupported Precond flaw will trigger some planning activities (the Planning Rules: Rules 1 and 2) to re-introduce to the plan the previously removed KOSI agent to reestablish the precondition of IIBR-1 (that is, to fix up the unsupported Precond flaw). But the re-introduced KOSI agent will have its own Precond unsupported, therefore some Select agent will be also re-introduced to the plan, and so on. All these re-introduced agents will be re-executed to select better DB-1 a n d / o r learn better regression models. In summary, we may name this mechanism as automatic possesses the following desirable features:
iteration which
• Execution failures are detected by Monitoring Rule; • Feedback p a t h s are determined dynamically and automatically by cooperation of several (meta-)rules; • Re-execution is also realized by the (meta-)rules; • T h e iterating number for each loop is also determined dynamically and automatically. Finally we give a remark about the overall architecture of the GLS system. Even with this integrated mode, our GLS system still needs two m e t a agents (planning and controlling), because here only a part of the functionalities of the controlling m e t a agent is integrated with planning. T h e controlling m e t a agent remains in the architecture as shown in Figure 4.1 and
Dynamically
Organizing KDD Processes in a Multi-Agent
KDD System
113
is responsible for other tasks such as: scheduling, resource allocation, manmachine interaction, interaction and communication among KDD agents, and etc.
4.5.2
Change
Management
by Incremental
Replanning
T h e KDD process is a long-term and evolving process. During its lifetime, many kinds of change may occur, hence change management is recognized as an important research issue with practical significance in the field of K D D process. We can identify the following kinds of changes (but we do not claim t h a t the list is complete): • Local Data Changes in Databases: W h e n the K D D process is planned and executed first time, the original database is used and the discovered regression models are stored for each sub-databases. Later, whenever local d a t a change (a new d a t a item is added, an old d a t a item is deleted/modified, etc.) occurs, the planning and execution process will iterate to find and add new version of regression models to the stores, and each IIBR agent will manage and refine the corresponding tree of regression models. This is a universal and imp o r t a n t problem in all real world K D D applications, as the contents of most databases are ever changing. • Large-Scale and/or Structural Changes in Databases: Some changes in the d a t a could be big and structural, resulting in different decomposition of the central database, for example. In this case, the process plan itself should be changed accordingly. This is called process evolution. • Changes in the Process Schema: T h e formal description of all available KDD techniques (i.e. the Agent types) in the KDD system is called the process schema. Process schema could change during the lifetime of the K D D process. For example, new K D D techniques can be introduced into the K D D system; existing K D D techniques could become obsolete, or remain in the system but with new parameter settings; new/modified strategies coordinating various discovery steps are adopted; and etc. These changes should be reflected in the process schema accordingly: some new Agent types are added, while some old Agent types are either removed, or modified in their "type-level" attributes (In/Out, Precond/Effect, Decomp). Finally,
114
N. Zhong, C. Liu and S. Ohsuga
process schema changes in t u r n cause process plan changes. T h a t is, we see process evolution here again. For some of the changes mentioned above, the integration mode presented in the previous section can be further extended t o deal with them. For example, if we add the following Monitoring Rule: 6. (More Monitoring Rule:) IF there is local change in the databases THEN restart the process according to the same process plan. W i t h this new (meta-)rule, the databases are under monitoring. Whenever their contents change locally, (a new d a t a item is added, or an old d a t a item is deleted/modified), the integrated m e t a agent restarts the K D D process according t o the same process plan. However, in the case of process evolution, changes are difficult to be handled in this way. Because here the problem we are facing is not the re-execution of (part of) the existing plan. Rather, we should replan the K D D process to reflect the changing environment. More precisely, we have the following observations: • If we insist in solving the problem of process evolution and process replanning by further extending the set of (meta-)rules, our production system will become too complicated. As Jonsson & Backstrom points out [7], the integrated mode of planning and execution is suitable only for some restricted classes of planning problems (the 3S class, for example). As we are not sure if the KDD planning problem in its full-scale can be solved properly by the ever-expanding set of (meta-)rules, we may try to realize replanning as an additional component of the searching strategy of the production system. • As replanning from scratch is in most cases unpleasant and unnecessary, we need a method to reuse the existing KDD process plan, with local adjustment adapted to the changes. In other word, we need an incremental replanning algorithm. • T h e big variety of possible changes does not mean t h a t we need a separate replanning algorithm for each kind of changes. In fact, all possible changes can disturb an existing plan only in the following ways: — Some new preconditions come in;
Dynamically Organizing KDD Processes in a Multi-Agent KDD System
115
— Some old preconditions become unsupported; — Some old effects become obsolete; — Decomps of some agents change when new (old) Agent types are added into (removed from) the schema. A general incremental replanning algorithm just needs to consider all these situations and take proper replanning activities. • Because of the hierarchical planning, the K D D process plan has a hierarchical structure. Incremental replanning always works on a particular p a r t at particular levels of the existing plan, and at a particular time. So we should specify when, where and how to replan. In light of the above observations, we have designed a general, incremental replanning algorithm. In the following, we present the algorithm in the context of the original coupling mode (replanning in the context of the integrated mode can be described similarly). T h e incremental replanning algorithm is also called by the K D D controller. Recall t h a t one of the main tasks of the K D D controller is to monitor the execution of the process plan. Concerning change management, we charge it with the following extra responsibilities: • Detecting changes in the databases; • Receiving and approving changes in the process schema; • Determining the starting point of replanning - the high-level K D D agent A t h a t is the root of the affected part in the existing, hierarchical plan; • Calling the replanning algorithm ( A L G O R I T H M - 2 below) with agent A and the changes as the input parameters. ALGORITHM-2: INPUT:
Incremental Replanning (1) High-level agent A and its existing plan (2) Changes demanding replanning from A (3) Current WSD Re-adjusted plan of A, coping with the changes
OUTPUT: METHOD: 1. IF there is any change in WSD (databases), or in Out/Effect of A THEN re-create the STRIPS goal G' for A; 2. IF there is a change in Decomp of A THEN delete those agents whose types disappear in the new Decomp of A;
116 N. Zhong, C. Liu and S. Ohsuga /* New agents of new types may be added in step 6 below */ 3. For each agent Ai in the existing plan of A : IF there is any change in WSD, or in Out/Effect of Ai THEN re-adjust Effect of Ai according to the change; /* specially, START will have new WSD as its Effect */ 4. For each agent Ai in the existing plan of A : IF there is any change in WSD, or in In/Precond of Ai THEN re-adjust Precond of Ai according to the change; /* specially, FINISH will have new goal G' as its Precond */ 5. Delete all "dead" agents in the existing plan; /* an agent supporting no Precond of other agents becomes "dead" */ 6. /* Now the existing plan is disturbed, because step 1-4 above have introduced various flaws into it. */ Invoke the planner to resume its work at step3 of ALGORITHM-1 to find and fix up new flaws, returning a new plan of A; 7. For each high-level sub-agent HAj in the new plan of A : IF HAj is newly introduced, or though HAj was in the old plan but had not been expanded THEN do nothing here /* planning of HAj will be done later when controller tries to execute it. */ ELSE Apply this ALGORITHM-2 recursively to EA3 /* because HAj may need replanning as well as its parent A */. Note t h a t the replanning algorithm ( A L G O R I T H M - 2 ) is a recursive procedure, and it in t u r n calls the non-linear planning algorithm ( A L G O R I T H M 1 shown in Section 4.4.1). Let us look at an example of replanning (see Figure 4.5. Supposing t h a t we have got the following events: • Time-series data, come implying possible structural changes in the central DB; • A new K D D technique SCT (stepwise Chow test to discover structural changes in time-series data) is introduced into the K D D system; • Decomp of Select type is modified from (FSN,CBK) to
(FSN,CBK,SCT). When the K D D controller detects and approves these changes, it determines t h a t the high-level agent Kdiscover in Figure 4.5 is the starting point of
Dynamically
Organizing KDD Processes in a Multi-Agent
Fig. 4.5
KDD System
117
A sample KDD process plan and replan
replanning, and calls A L G O R I T H M - 2 to recursively re-adjust the existing, hierarchical plan, resulting in the following changes in the plan, which are marked in Figure 4.5 by bold lines and boxes: • T h e sub-plan of the Select agent has an additional SCT sub-agent to discover possible structural changes in time-series data; • T h e Select agent has more sub-databases as its o u t p u t ; • There are more KOSIs in the sub-plan of Kelicit to learn Regression Models from the new subDBs; • There are more IIBRs in the sub-plan of Krefine to build Model Trees from the new Regression Models.
4.6
Concluding Remarks
We presented a methodology of dynamically organizing KDD processes for increasing both autonomy and versatility of a discovery system, and the framework of the GLS system based on this methodology. In comparison, GLS is mostly similar to INLEN in related systems [14]. In INLEN, a database, a knowledge-base, and several existing methods of machine learning are integrated as several operators. These operators can generate diverse kinds of knowledge about the properties and regularities existing in the d a t a . INLEN was implemented as a multi-strategy KDD system
118
N. Zhong, C. Liu and S. Ohsuga
like GLS. However, GLS can dynamically plan and organize the discovery process performed in a distributed cooperative mode for different discovery tasks. Moreover, the refinement for knowledge is one of important capabilities of GLS that was not developed in INLEN. Since the GLS system to be finished by us is very large and complex, however, we have only finished several parts of the system and have undertaken to extend it for creating a more integrated, organized society of autonomous knowledge discovery agents. That is, the work that we are doing takes but one step toward a multi-strategy and multi-agent KDD system.
Acknowledgements The authors would like to thank Prof. Jan Zytkow and Mr. Y. Kakemoto for their valuable comments and help. This work was partially supported by Telecommunications Advancement Foundation (TAF).
Dynamically
Organizing KDD Processes in a Multi-Agent
KDD System
119
Bibliography
1] B r a c h m a n , R . J . and A n a n d , T . " T h e Process of Knowledge Discovery in D a t a b a s e s : A H u m a n - C e n t r e d A p p r o a c h " , In Advances in Knowledge Discovery and Data Mining, M I T Press (1996) 37-58. 2] Dong, J.Z., Zhong, N., and O h s u g a , S. "Probabilistic Rough Induction: T h e G D T - R S Methodology and A l g o r i t h m s " , Z.W. R a s and A. Skowron (eds.) Foundations of Intelligent Systems. LNAI 1609, Springer-Verlag (1999) 621629. 3] Dong, J.Z., Zhong, N., and O h s u g a , S. "Using Rough Sets with Heuristics to F e a t u r e Selection", Zhong, N., Skowron, A., and O h s u g a , S. (eds.) New Directions in Rough Sets, Data Mining, Granular-Soft Computing, LNAI 1711, Springer-Verlag (1999) 178-187. 4] Engels, R. " P l a n n i n g T a s k s for Knowledge Discovery in D a t a b a s e s - Performing Task-Oriented U s e r - G u i d a n c e " , Proc. Second International Conference on Knowledge Discovery and Data Mining (KDD-96), A A A I Press (1996) 170175. 5] Fayyad, U.M., P i a t e t s k y - S h a p i r o , G, and S m y t h , P. "From D a t a Mining to Knowledge Discovery: an Overview", In Advances in Knowledge Discovery and Data Mining, M I T Press (1996) 1-36. 6] Fayyad, U.M., P i a t e t s k y - S h a p i r o , G., S m y t h , P., and U t h u r u s a m y , R. (eds.) "Advances in Knowledge Discovery and D a t a Mining", A A A I Press (1996). 7] Jonsson, P. a n d B a c k s t r o m , C. " I n c r e m e n t a l P l a n n i n g " , in New Directions in AI Planning, IOS Press (1996) 79-90. 8] Klosgen, W . " P r o b l e m s for Knowledge Discovery in D a t a b a s e s and Their T r e a t m e n t in t h e Statistics I n t e r p r e t e r E x p l o r a " , International Journal of Intelligent System, Vol.7, No.7 (1992) 649-673. 9] Liu, C. "Software Process P l a n n i n g and Execution: Coupling vs. I n t e g r a t i o n " , LNCS 498, Springer (1991) 356-374. 10] Liu, C. and C o n r a d i , R. " A u t o m a t i c R e p l a n n i n g of Task Networks for Process Evolution in E P O S " , Proc. ESEC'93, L N C S 717, Springer (1993) 437-450.
120
N. Zhong, C. Liu and S. Ohsuga
111 Liu, C. and Zhong, N. "Handling K D D Process Iteration by I n t e g r a t i o n of P l a n n i n g a n d C o n t r o l l i n g " , Proc. 1998 IEEE International Conference on Systems, Man, and Cybernetics (SMC'98) (1998) 411-416. 12] Liu, C. and Zhong, N. "Rough P r o b l e m Settings for I n d u c t i v e Logic P r o g r a m m i n g " , Zhong, N., Skowron, A., and O h s u g a , S. (eds.) New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, LNAI 1711, Springer-Verlag (1999) 168-177. 131 M a t h e u s , C.J., C h a n , P.K., a n d P i a t e t s k y - S h a p i r o , G. "Systems for Knowledge Discovery in D a t a b a s e s " , IEEE Transactions on Knowledge and Data Engineering, Vol.5, No.6 (1993) 904-913. 14] Michalski, R.S., Kerschberg, L., K a u f m a n , K.A., a n d Ribeiro, J.S. "Mining for Knowledge in D a t a b a s e s : T h e I N L E N A r c h i t e c t u r e , Initial Implement a t i o n a n d First R e s u l t s " , Journal of Intell. Infor. Sys., Kluwer A c a d e m i c Publishers, Vol.1, N o . l (1992) 85-113. 15] Minsky, M. The Society of Mind, Simon a n d Schuster, New York (1986). 161 Nguyen, S.H. and Nguyen, H.S. " Q u a n t i z a t i o n of Real Value A t t r i b u t e s for C o n t r o l P r o b l e m s " , Proc. Forth European Congress on Intelligent Techniques and Soft Computing EUFIT'96 (1996) 188-191. 171 O h s u g a , S. "Framework of Knowledge Based S y s t e m s - Multiple Meta-Level A r c h i t e c t u r e for R e p r e s e n t i n g P r o b l e m s and P r o b l e m Solving Processes", Knowledge Based System, Vol.3, No.4 (1990) 204-214. 181 O h s u g a , S. "A W a y of Designing Knowledge Based S y s t e m s " , Knowledge Based System, Vol.8, No.4 (1995) 211-222. 191 O h s u g a , S. and Y a m a u c h i , H. "Multi-Layer Logic - A P r e d i c a t e Logic Including D a t a S t r u c t u r e as Knowledge R e p r e s e n t a t i o n L a n g u a g e " , New Generation Computing, Vol.3, No.4 (1985) 403-439. 201 Russell, S.J. and Norvig, P. Artificial Intelligence - A Modern Approach Prentice Hall, Inc. (1995). 211 P i a t e t s k y - S h a p i r o , G. and Frawley, W . J . (eds.), Knowledge Discovery in Databases, A A A I Press and T h e M I T Press (1991). 221 Zhong, N. and O h s u g a , S. "GLS - A M e t h o d o l o g y for Discovering Knowledge from D a t a b a s e s " , P.S. Glaeser and M . T . L . Millward (eds.) New Data Challenges in Our Information Age (1992) A20-A30. 231 Zhong, N. and O h s u g a , S. " T h e GLS Discovery System: Its Goal, Architect u r e a n d C u r r e n t R e s u l t s " , Z . W . R a s a n d M . Z e m a n k o v a (eds.) Methodologies for Intelligent Systems. LNAI 869, Springer-Verlag (1994) 233-244. 241 Zhong, N. and O h s u g a , S. "Discovering C o n c e p t Clusters by Decomposing D a t a b a s e s " , Data & Knowledge Engineering, Vol.12, No.2, Elsevier Science Publishers (1994) 223-244. 251 Zhong, N. a n d O h s u g a , S. "KOSI - An I n t e g r a t e d Discovery System for Discovering F u n c t i o n a l Relations from D a t a b a s e s " , Journal of Intelligent Information Systems, Vol.5, N o . l , Kluwer Academic Publishers (1995) 2550. 261 Zhong, N. and O h s u g a , S. "Toward A M u l t i - S t r a t e g y and C o o p e r a t i v e Discov-
Dynamically
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
Organizing KDD Processes in a Multi-Agent
KDD System
121
ery S y s t e m " , Proc. First International Conference on Knowledge Discovery and Data Mining (KDD-95), A A A I Press (1995) 337-342. Zhong, N. and O h s u g a , S. "A Hierarchical Model Learning A p p r o a c h for Refining and M a n a g i n g C o n c e p t Clusters Discovered from D a t a b a s e s " , Data & Knowledge Engineering, Vol.20, No.2, Elsevier Science P u b l i s h e r s (1996) 227-252. Zhong, N. and O h s u g a , S. "System for Managing and Refining S t r u c t u r a l C h a r a c t e r i s t i c s Discovered from D a t a b a s e s " , Knowledge Based Systems, Vol.9, No.4, Elsevier Science Publishers (1996) 267-279. Zhong, N., K a k e m o t o , Y., and O h s u g a , S. "An Organized Society of Aut o n o m o u s Knowledge Discovery A g e n t s " , P e t e r K a n d z i a and M a t t h i a s Klusch (eds.) Cooperative Information Agents. LNAI 1202, Springer-Verlag (1997) 183-194. Zhong, N., L i u , C , a n d O h s u g a , S. "A W a y of Increasing b o t h A u t o n o m y and Versatility of a K D D S y s t e m " , Z.W. R a s and A. Skowron (eds.) Foundations of Intelligent Systems. LNAI 1325, Springer-Verlag (1997) 94-105. Zhong, N., Liu, C , K a k e m o t o , Y., and O h s u g a , S. " K D D Process P l a n n i n g " , Proc. Third International Conference on Knowledge Discovery and Data Mining (KDD-97), A A A I Press (1997) 291-294. Zhong, N. and O h s u g a , S. "A M u l t i - P h a s e Process for Discovering, M a n a g i n g and Refining Strong F u n c t i o n a l Relationships Hidden in D a t a b a s e s " , T r a n s actions of Information Processing Society of J a p a n , Vol.38, No.4 (1997) 698-706. Zhong, N., Dong, J.Z., and O h s u g a , S. " D a t a Mining: A Probabilistic Rough Set A p p r o a c h " , L, Polkowski a n d A. Skowron (eds.) Rough Sets in Knowledge Discovery, Vol.2, Physica-Verlag (1998) 127-146. Zhong, N., Liu, C , a n d O h s u g a , S. "Handling K D D Process C h a n g e s by I n c r e m e n t a l R e p l a n n i n g " , J. Zytkow and M. Quafafou (eds.) Principles of Data Mining and Knowledge Discovery. LNAI 1510, Springer-Verlag (1998) 111-120. Zhong, N., Yao, Y.Y., and O h s u g a , S. "Peculiarity Oriented M u l t i - D a t a b a s e Mining", J. Zytkow and J a n R a u c h (eds.) Principles of Data Mining and Knowledge Discovery. LNAI 1704, Springer-Verlag (1999) 136-146. Ziarko, W . 1991. " T h e Discovery, Analysis, a n d R e p r e s e n t a t i o n of D a t a Dependencies in D a t a b a s e s " , P i a t e t s k y - S h a p i r o and Frawley (eds.) Knowledge Discovery in Databases, T h e A A A I Press (1991) 195-209. Zytkow, J.M. " I n t r o d u c t i o n : Cognitive A u t o n o m y in M a c h i n e Discovery", Machine Learning, Kluwer Academic Publishers, Vol.12, No.1-3 (1993) 716. Zytkow, J.M. and Zembowicz, R. " D a t a b a s e Exploration in Search of Regularities", Journal of Intelligent Information Systems, Kluwer A c a d e m i c Publishers, Vol.2, N o . l (1993) 39-81.
Chapter 5
Self-Organized Intelligence
J i m i n g Liu Department of Computer Science Hong Kong Baptist University
5.1
Introduction
This chapter is concerned with the problem of how to induce self-organized intelligence in a multi-agent system. It addresses one of the central issues in the development and applications of multi-agent robotic systems, namely, how to develop self-organized multi-agent systems to collectively accomplish certain tasks in robot vision and group navigation. In doing so, we will explicitly define and implement two multi-agent systems; one is for searching and tracking digital image features and another is for controlling a group of distributed robots to navigate in an unknown task environment toward goal locations. The coordination, cooperation, and competition among the agents will manifest in the ways in which the agents exchange and share certain information, such as the current status of the overall system and/or those of neighboring agents, and in which they select their own actions based on such information. We will consider the problem of goalattainability with a group of distributed autonomous agents. The agents self-organize their behaviors based on their previously-acquired individual dynamics, called local memory-driven behavioral selection, and their average distance to target locations, termed global performance-driven behavioral learning. The aim of our work is to show to what extent the
123
124
J. Liu
two types of information can affect the goal-attainability of the system. We will empirically investigate the performance of agent behavioral selforganization incorporating local memory-driven behavioral selection and global performance-driven behavioral learning with respect to the goalattainability as well as task-efficiency of the multi-agent system.
5.2
O r g a n i z a t i o n of t h e C h a p t e r
T h e remainder of this chapter is organized as follows: Section 5.3 introduces the key notions and states the general problem to be addressed from the point of view of a multi-agent approach. Section 5.4 describes the selforganized vision approach, covering the models of reactive behaviors, their adaptive self-organization, and the empirical validation of an implemented multi-agent vision system in performing image feature tracking. Section 5.5 focuses on the formulation of a robot group navigation problem into a multiagent self-organized motion problem. Section 5.6 provides an overview of some of the related work in the areas of image processing, robot group behavior, and adaptive self-organization. Finally, Section 5.7 concludes the chapter by highlighting the key contributions of this chapter and pointing out several avenues for future extension.
5.3
Problem Statement
T h e goal of our work is to show (1) how the tasks of robot vision and group motion can be handled collectively by classes of distributed agents t h a t respond locally to the conditions of their environment and (2) how the behavioral repository of the agents can be constructed. For the ease of understanding our proposed approach, in the sequel, we will carry out our discussions based on the following general search problem: There are several convex regions in a rectangular search space, S, each of which is composed by a number of feature elements or locations with the same physical feature characteristics, i.e., each convex region is homogeneous. Distributed agents are required to search and find all the feature (goal) locations within S. Now let us formally describe the specific problem of self-organized intelligence to be addressed in this chapter.
Self-Organized
Intelligence
125
(1) The environment: (a) Physical features: S contains a number of homogeneous regions composed by elements or locations with the same physical feature characteristics. The feature characteristics can be calculated and evaluated based on certain numerical measures. (b) Geometrical characteristics: S is a rectangular grid-like search space in a two-dimensional plane, with the size of U x V. Each homogeneous region in S is connected and convex, and possesses a boundary of connected locations. (2) The task: Distributed agents are dispatched in S in order to search and label all the feature locations of homogeneous regions. (3) The behaviors of the agents: (a) Primitive behaviors: The agents can recognize and distinguish certain physical feature locations, if encountered, based on some predefined criteria. (b) Complex behaviors: The agents can decide and execute their next step reactive behavior. That is, they may breed, move, or vanish in S, based on their task, previously executed behavior, and current environment characteristics. Remark 5.1: Primitive behaviors are the fixed intrinsic operations of agents. We may create and distinguish various classes of agents based on their primitive behaviors. In our present work, we assume that the feature locations to be found correspond to the borders of certain homogeneous regions. Mathematically, we will define the feature characteristics of the border of a homogeneous region using the relative contrast of measurement values within a small region. Here, the term measurement is taken as a generic notion; the specific quantity that it refers to will depend on the nature of applications. For instance, it may refer to the grey-level intensity of an image in the case of image processing. Or, it may refer to a spatial measurement function in the case of robot environment modeling. Remark 5.2: By complex behaviors, we mean that the agents can self-organize and make decisions on what behaviors to produce next. In this regard, we say that the agents possess the characteristics of autonomy.
126
J. Liu
Remark 5.3: The agents may vanish, as soon as they leave a marker, breed the next generation of offspring agents, or leave the space geometrically described by the environment. (4) Feature searching: Definition 5.1 (Feature searching) Let N denote the total number of feature locations in S. The goal of distributed agents in S is to extract all the feature regions. This problem is equivalent to the problem of extracting the borders, or all the locations on the borders, of homogeneous regions. If the total number of feature locations detected and labeled by the distributed agents is equal to TV, it is said that all the feature locations in S are reachable by the agents. In other words, the goal of the agents is attainable in the given environment.
5.4
Adaptive Self-Organized Vision for Image Feature Detection and Tracking
In this section, we will consider our first task, i.e., to apply collective behavior in solving the robot vision problem of searching and tracking image features. Here, the two-dimensional lattice, S, in which the proposed autonomous agents reside is a grey-level image of size U x V {i.e., an array of U columns by V rows of pixels). Suppose that S contains a certain number of pixels whose intensity relative to those of its neighboring pixels satisfies some specific mathematically well-defined conditions. Furthermore, whether a pixel p in S can be classified as belonging to the feature can be decided by evaluating the outcome of a mathematical operator, D (i.e., a feature descriptor), as applied at p. The total number of feature pixels in S is denoted by M. Thus, the objective of the autonomous agents in S is to extract all the predefined features of S by finding and marking at the feature pixels. This is essentially an optimization problem as stated below. Definition 5.2 (Optimal feature extraction) If the total number of feature pixels detected and marked by active agents, N, is equal to M, it is said that an optimal feature extraction is achieved.
Self-Organized
Intelligence
127
Definition 5.3 (Active agents) At a certain time t in the two-dimensional lattice, autonomous agents whose ages do not exceed a given life span will continue to react to their image environment by way of evaluating the pixel grey-level intensity and selecting accordingly some of their behaviors. Such agents are called active agents at time t.
5.4.1
An Overview
of Adaptive
Self-Organized
Vision
With respect to the image feature detection problem as mentioned above, one may consider to use an extreme approach in which the entire plane is placed with the agents and each of them reacts to its immediate environment simultaneously, whereas in another extreme approach, a border is traced using some predefined templates. Our approach can be viewed as a compromise between these two approaches. The main distinction lies in that in our approach, each autonomous agent can locally reproduce and diffuse, and hence adaptively extract image features (e.g., contours in an image). Now let us take a look at the detailed formalisms of adaptive selforganizing autonomous agents, including their environment, local pixel evaluation functions, the fitness definition, and the evolution of asexual self-reproduction and diffusion.
5.4.2
Two-Dimensional
Lattice
of an Agent
Environment
The adaptive nature of our proposed agent automata consists in the way in which generations of autonomous agents are replicated and selected. Such agents directly operate in two-dimensional rectangular grid lattices that correspond to the digitized images of natural scenes. That is, each of the 8-connected grids represents an image pixel. The grid also signifies a possible location for an autonomous agent to inhabit, either temporarily or permanently, as illustrated in Figure 5.1. Definition 5.4 (Neighboring region of an agent) The neighboring region of an agent at location (i, j) is a circular region centered at the given location with radius R(i,j).
128
J. Liu
An agent located at
nj
Agent neighboring region
\ An agent
Fig. 5.1
5.4.3
An autonomous agent, at location («', j ) , and its local neighboring region.
Local Stimulus
in Two-Dimensional
Lattice
Definition 5.5 The local stimulus that selects and triggers the behaviors of an agent at pixel location (i,j) is computed from the sum of the pixels belonging to a neighboring region which satisfy the following condition: the difference between their grey-level intensity values and the value at (i,j) is less than a positive threshold. In other words, the stimulus is determined by the density distribution of all the pixels in its neighboring region whose grey-level intensity values are close to the intensity at (i, j). More specifically, the density distribution can be defined as follows:
R
Di ' ( » J )
£
R
£
{s°*°l \\m(i + s,j + t)-m(i,j)\\<S}
(5.1)
s = -Rt = -R
where R s, t m(i, j)
the radius of a circular region centered at (i,j), the indices of a neighboring pixel relative to (i,j), the grey-level value at a location (i, j), and a predefined positive constant.
Self-Organized
5.4.4 5.4.4.1
Self-Organizing
Intelligence
129
Behaviors
Diffusion
When the age of an agent does not exceed its life span (i.e., it is an active agent) and the agent has not found a feature pixel whose grey-level intensity satisfies the condition as set by Eq. 5.1, it will move in a certain direction to a location inside its neighboring region. Diffusion behavior plays an important role for the agent to search feature pixels in the two-dimensional lattice. The specific stimulus that triggers this behavior is given as follows: Definition 5.6 (Diffusion) Let <j> = [fa, fa] be an acceptable range of the pixel count as defined by Eq. 5.1, where fa < fa. An agent moves to its adjacent locations whenever the outcome of its evaluation of the density distribution falls outside the interval, i.e., D^,f x ^ fa The direction of the diffusion is selected based on an 8-element probability vector in which each value indicates the probability of becoming high-fitness if the agent diffuses in the corresponding direction. The direction vector of the agent as mentioned in the above definition is updated based on the diffusion directions of its previously selected high-fitness agents. The details on the updating computation are given in Subsection 5.4.4.5. 5.4.4.2
Self-Reproduction
When an agent detects a feature pixel, p, it will reproduce a finite number of offspring agents within its neighboring sectors. This behavior enables the agent to populate its offspring agents near a pixel location that meets the feature definition, and hence increases the likelihood of further feature extraction. Definition 5.7 (Self-reproduction) Let <j> = [fa, fa] be an acceptable range of the pixel count as defined by Eq. 5.1, where fa < fa. An agent reproduces a finite number of offspring agents inside its neighboring region of radius R(i,j) in a direction as computed from its direction probability vector, if the outcome of its evaluation of the density distribution at p falls into the <j> interval, i.e., D^ .* € fa The direction vectors for self-reproduction by the parent agent and its offspring will be determined based on an updating mechanism.
130
J. Liu
5.4.4.3 Feature Marking When an agent detects a feature pixel, p, it will place a fixed marker at p. There may be different kinds of features in an image, hence several kinds of markers can exist. The marking behavior of an autonomous agent is necessary in order to label detected image features. The stimulus for selecting this behavior is stated as follows: Definition 5.8 (Feature marking) Let (f> = [. Thus, F(W)
= (1-
5 t 6 P S bef
°—Reduction
I —1
jf
y
finds
& t r i g g e r i n g
s t i m u l u s
otherwise (5.2)
As can be noted from this definition, the fitness function measures how long it takes the agent to find a feature pixel. The maximum fitness value will be equal to one if the agent is directly placed at the feature pixel when being reproduced. 5.4.4.5
Direction Adaptation
What follows describes the updating mechanism for an autonomous agent to compute its diffusion and self-reproduction direction vectors. By definition, a direction vector for a certain behavior specifies an array of probabilities of success if respective directions are chosen for that behavior.
Self-Organized
Intelligence
131
Assume that a parent agent u)9' of generation g produces a set of agents {UJJ }• This set will further produce offspring of generation g+2, denoted as {u>,-^ }, if any of them encounters a triggering condition in the environment. Thus, the directions for diffusion and reproduction by agent u\a-k are determined from the directions of the selected agents from {w,-? '} and {w,-?^ }• The selection criterion is based on their fitness values, as computed using Eq. 5.2. Specifically, the probability value associated with direction 77 for diffusion and self-reproduction by agent uMk can be derived, respectively, as follows: For all u G {w|| + 1 ) } and { w ^ 2 ) } , and F(u) > 0, compute:
where
5.4.5
0 T Ni Mi
Experimental
the the the the
P(ve9)u = =^w
(5.3)
*£r)» =E ^
(54)
directions for diffusion, directions for self-reproduction, agents diffused to stimuli in direction agents reproduced in direction i.
Studies
The preceding section has provided a formal model (e.g., rules) for agent behavioral self-organization. Now let us examine how such agents are applied in a digital image environment to extract interesting image features. Specifically, we discuss a typical image-processing experiment on feature tracking. Figure 5.2(a) illustrates a sequence of time steps, over which the location of a T-shaped feature region changes. In this example, self-organizing agents may detect the borders of this region at one time step t as shown in Figure 5.2(b), but lose some of the detected feature pixels at another time step, t + 1, simply because the feature region has moved to a new location. Thus, some of the agents previously selected at t may no longer be selected at time t + 1, as illustrated in Figure 5.2(c). When this change
132
r
J. Liu
(a)
(b) t
(c)
t+1
Fig. 5.2 (a) An example dynamic environment in which a T-shaped object moves in discrete space and time, (b) Assume that agents have been selected in the T-shaped environment at time step t. (c) At time step t + 1, the T-shaped object moves to a new location, resulting in previously high-fitness agents to become lower-fitness.
in the agent environment occurs, the low-fitness agents will actively diffuse to their adjacent locations, self-reproduce offspring agents as soon as some feature pixels are encountered again, and update their behavioral vectors accordingly in order to maximize their fitness in the new environment (i.e., local fitness optimization). As a result, the agents can quickly figure out the right diffusion and self-reproduction directions for tracking the moving target at the subsequent time steps.
Self-Organized
5.5
Intelligence
133
Self-Organized M o t i o n in Group R o b o t s
Self-organized motion is concerned with the problem of how to effectively generate emergent motion (e.g., navigation) behaviors in a group of robots when complete information about the robot environment is not available or too costly to obtain. This issue is particularly relevant if we are to develop robust group robots t h a t can work collectively and adaptively on a common task, even though each of t h e m only senses and hence reacts to its environment locally.
5.5.1
The Task of Group
Robot
Navigation
and
Homing
In the task of distributed robot navigation and homing, we assume t h a t a commonly-shared goal, i.e., a set of points p £ £ C Rn, is given where JC satisfies certain constraints p. T h e constraints may be mathematically expressed as follows: P = {*>$& I
I
OX
=0} S=x'a
(5.5) J
In addition to the goal locations, we also delineate a closed area as the robot environment, which is denoted as follows: S = | ( z > y) Xmin <X
ymin>ymax
(5.6)
£ '*-
Note t h a t a robot environment can be of various shapes, e.g., the enclosed area of a circle. T h e only requirement is t h a t the environment be closed and connected. For the sake of illustration, we will consider the environment as being a convex set in 1tn.
5.5.1.1
Performance
Criteria
When distributed agents (i.e., group robots) with different behavioral rules are dispathed in an environment, what kind of collective behaviors with respect to the given goal locations in the given environment can be expected or self-organized by the agents? Before we address this question in details, let us first define two notions.
134
J. Liu
Definition 5.10 Suppose that there is a group of N agents in environment S. The shared goal locations for the agents are specified by C and the attributes of the agents are defined in A. We say that the agents with attributes A following behavioral self-organization rules V can attain their goal £ in S iff the agents can reach goal C after their interaction with the environment. That is,
{S,l,A,V}t^oo{S,C,A,V}
(5.7)
where I denotes the initial distribution of agents. Otherwise, we say the goal of the agents is unreachable.
The above notion can also be defined in the terms of probability: Definition 5.11 If
p({S,l,A,V}t^l{S,C,A,V])
= l
(5.8)
then we say the goal of the agents, C, is reachable with probability 1.
Our task is to create proper self-organizing rules {i.e., local motion controllers) for the agents that would enable them to move from their current positions toward the given goal locations L. In our system, the position of an agent will change according to its velocity. Note that here the velocity is a vector, representing both the direction and the magnitude of changes in the agent position. The velocity of the agent will change based on the observations from the environment, that is, the velocity will be updated according to certain rules triggered by the signals received from other local neighboring agents. In addition to such local signals, the agent will also receive global performance feedback, denoted by gB(t), that corresponds to an overall group performance evaluation calculated and sent by a higherlevel agent.
Self- Organized Intelligence
5.5.2 5.5.2.1
An Overview
of the Multi-Agent
135
System
The Attributes of Agents
For a group of robots Ai where i is numbered from 1 to iV, the attributes of Ai, i.e., A, are defined as follows: " ai a-2 «3 04
a5 a6 a7 as a9 _
0 as t —> oo. Generally speaking, a larger A value leads to a faster convergence speed, i.e., B(t) at t = 3000 is lower. Nevertheless, a quantitative relationship between the A value and the resulting B(t) value at t = 3000 still remains to be explored.
5.5.5
Discussions
Our simulations have shown that if global performance-driven learning is incorporated, agents will move faster toward goal C and successfully attain
142
J. Liu Snapshot: 1
Snapshot: 2000
.
300 v
" *
.
• ,\
200
* ••
\ 100
•
•
1
•
•
•
"
0 0
100 200 300 Distribution of agents. Snapshot: 3000
0
100 200 300 Distribution of agents. C20S3w5L0AgN127
150 300 100
0
100 200 300 Distribution of agents.
1000 2000 Av dis & gr.2; l=-0.00256
3000
Fig. 5.5 The dynamics of a multi-agent system with 127 agents, without global performance feedback (A = 0). In this case, many agents are far away from the goal, keeping on wandering.
their shard goal (goal-attainable). On the other hand, if there is no global performance feedback Xsign(v(t))sat(gB(£)) involved, the agents in G3 are not goal-attainable from the practical point of view. If we use — Xsat(gB (t)) to modify the velocity of agents according to Eq. 5.13, all the agents pertaining to Go,Gi, and G2 will be goal-attainable. Concerning the agents in Gz when A > 0, although our simulation supports the goal-attainability conclusion, we have not analytically examined the goal-attainability of the agents in group G3. What we can say is that it is very likely that they will be goal-attainable due to the existence of a global performance feedback term — \sign(v(t))sat(gB{t))There are other factors affecting the convergence rate of agents as they are moving toward target locations. In our simulations, it has been found
Self-Organized
Intelligence
143
C20S3w5L20AgN58
C20S3w5L-20AgN58 200
200 150 100 50
0
200 r
200
0
1000 2000 3000 a=0.7;b0=15.17l=-0.002228 C20S3w5L10AgN58
1000 2000 3000 a=0.7;b0=18.33l=-0.001249 C20S3w5L100AgN58
1000 2000 3000 a=0.7;b0=4.793l=-0.002572
0
1000 2000 3000 a=0.7;b0=16.24l=-0.001638
Fig. 5.6 The effects of global performance feedback factor A on the convergence of averaged distance B(t) in the multi-agent system.
out that increasing the coefficient of random term 7 or just increasing the variance of random signal r (t) will only have little influence on convergence rate £. It simply increases the variations of the curve at different time steps; the curves vibrate in a larger magnitude. The nonzero mean of f(t) will cause the agents move in a fixed direction until they are reflected back by the boundary of robot environment S.
144
J. Liu
5.6 5.6.1
Related Work Image
Feature
Detection
In robot vision, detecting geometric features such as regions, edges, curves, corners, and borders can greatly facilitate the interpretation of the scenes. Many theories and algorithms have been proposed and applied in the fields of computer vision and image processing. For instance, Liow [13] proposed an extended border tracing technique that combines the operations of region finding and closed contour detection. Alter and Basri [l] applied the so-called Salient Network method for extracting salient curves and noted that this method could suffer the problem of failing to identify any salient curve other than the most salient one (according to their proposed saliency measure). Lee and Kim [12] presented a method of extracting topographic features directly from a grey-level character image, without calculating eigenvalues and eigenvectors of the underlying image intensity surface. The method efficiently computes the directions of principal curvature. Maintz et al. [16] investigated the problem of evaluating ridge seeking operators for multimodality medical image matching. They have constructed various ridge measures related to isophote curvature in an attempt to identify the useful convolution operators for CT/MRI matching of human brain scans. With conventional techniques for image feature identification, grid template-like look-up tables [13] and/or models [3; 19] are used to determine the existence of any features by tracing from a current pixel or region to its neighbors. The main disadvantage of this approach is that all the possible situations must be carefully analyzed and exhaustively searched.
5.6.2
Learning
in Group
Robots
Fukuda and Iritani [6] proposed a mechanism for modeling group cooperative behaviors among decentralized autonomous robots, called CEBOT (i.e., Cellular Robots). Their work simulated the generation of group behaviors based on a globally stable attractor and the identification of new group behaviors based on bifurcation-generated new attractors. Mataric [18; 17] studied the problem of group behaviors such as coordination among robots, and developed a group behavioral learning method in which heterogeneous reward function-based reinforcement learning (RL)
Self-Organized
Intelligence
145
was applied to synthesize collective behaviors, such as flocking, foraging, and docking by means of direct/temporal summation and switching of some basic behaviors.
5.6.3
Adaptive
Self-Organization
Adaptation is concerned with applying the computational models of evolutionary processes (e.g., [2]) to either achieving intelligent agent behaviors, where intelligence is measured in terms of the agent ability to contribute to its self-maintenance at genetic, structural, individual, as well as group levels [2l], or solving real-life computation-intensive engineering problems, such as numerical optimization. Fogel [5] has provided a thorough treatment on the foundation and scope of this field (also see [7; 8]). Adaptive self-organizing agents as applied to digital image processing is a newly-explored area of research that studies the emergent behaviors in a lattice of finite automata in which agents react locally according to a set of behavioral rules [4; 9; 10; 11; 14; 15]. Each of the agents may be viewed as a learning automaton [20]; the probabilities of individual actions are updated whenever the output of a certain action is observed and evaluated using a performance criterion.
5.7
Concluding Remarks
In this chapter, we have investigated how to apply a multi-agent approach to tackling robot vision and group motion problems. The key to the emergence of collective agent behavior to solve those problems lies in the utilization of some bottom-up, self-organizing rules by autonomous agents. We have presented and demonstrated an approach to image feature searching and tracking that utilizes adaptive self-organizing agents. In our approach, an adaptive agent, being a distributed computational entity, resides in the two-dimensional lattice of the digital image, and exhibits a number of reactive behaviors. Also presented in this chapter is a self-organized motion approach applicable to the cases where a group of distributed robots is required to navigate in an unknown environment. While providing the detailed formu-
146
J. Liu
lations and self-organizing rules for each individual robot, i.e., an agent in the self-organized multi-agent system, we have also carried out various case studies. It is evident from our simulations that if a global performance feedback signal is introduced, distributed agents can quickly navigate toward shared common goal C.
Acknowledgements The author wishes to acknowledge the support provided by Hong Kong Baptist University throughout this research project. Special thanks go to Mr. Y. Lei for his assistance and help in part of the experimentation.
Self-Organized
Intelligence
147
Bibliography
[1] T. D. Alter and R. Basri. Extracting salient curves from images: An analysis of the saliency network. Memo 1550, MIT AI Lab, 1995. [2] W. Banzhaf and F. H. Eeckman, editors. Evolution and Biocomputation: Computational Models of Evolution. Springer-Verlag, Berlin, 1995. [3] M. Barzohar and D. B. Cooper. Automatic finding of main roads in aerial images by using geometric-stochastic models and estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(7):707-721, 1996. [4] F. Dellaert and R. D. Beer. Toward an evolvable model of development for autonomous agent synthesis. In R. A. Brooks and P. Maes, editors, Artificial Life IV: Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, pages 246-257. The MIT Press, Cambridge, MA, 1994. [5] D. B. Fogel. Evolutionary Computation: Toward a New Philosophy of Machine Intelligence. IEEE Press, Piscataway, NJ, 1995. [6] T. Fukuda and G. Iritani. Construction mechanism of group behavior with cooperation. In Proceedings of the 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 535-542, Penns y l v a n i a , ! ^ , 1995. [7] D. E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley Publishing Company, Reading, MA, 1989. [8] J. H. Holland. Adaptation in Natural and Artificial Systems. The University of Michigan Press, Ann Arbor, 1975.
148
J. Liu
[9] C. G. Langton. Self-reproduction in cellular automata. Physica D, 10:135-144, 1984. [10] C. G. Langton. Studying artificial life with cellular automata. Physica D, 22:120-140, 1986. [11] C. G. Langton. Artificial life. In Artificial Life: Proceedings of an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems, Los Alamos, New Mexico, pages 1-47, Redwood City, CA, 1988. Addison-Wesley Publishing Company, Inc. [12] S. Lee and C. Yi. Assemblability evaluation based on tolerance propagation. In Proceedings of the 1995 IEEE International Conference on Robotics and Automation, pages 1593-1598, 1995. [13] Y. Liow. A contour trancing algorithm that presevers common boundaries between regions. CVGIP - Image Understanding, 53(3):313—321, 1991. [14] M. W. Lugowski. Computational metabolism: Towards biological geometries for computing. In Artificial Life: Proceedings of an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems, Los Alamos, New Mexico, pages 341-368, Redwood City, CA, 1988. Addison-Wesley Publishing Company, Inc. [15] P. Maes. Modeling adaptive autonomous agents. Artificial Life, 1(12):135-162, 1994. [16] J. B. A. Maintz, P. A. van den Elsen, and M. A. Viergever. Evaluation of ridge seeking operators for multimodality medical image matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(4):353-365, 1996. [17] M. J. Mataric. Reinforcement learning in the multi-robot domain. Autonomous Robots, 4(l):73-83, 1997. [18] M. J. Mataric and D. Cliff. Challenges in evolving controllers for physical robots. Robotics and Autonomous Systems, 19(1), 1996. [19] N. Merlet and J. Zerubia. New prospects in line detection by dynamic programming. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(4):426-431, 1996. [20] K. S. Narendra and M. A. L. Thathachar. Learning Automata. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1989. [21] L. Steels. Intelligence — dynamics and representations. In L. Steels, editor, The Biology and Technology of Intelligent Autonomous Agents, pages 72-89. Springer-Verlag, Berlin, 1995.
Chapter 6
Valuation-Based Coalition Formation in Multi-Agent Systems
Stefan J. Johansson Department of Software Engineering and Computer Science, Blekinge Institute of Technology, Sweden
6.1
Introduction
The notions of coalitions, norms and agents raise a lot of interesting questions. What makes agents form coalitions? When are new agents in a Multi-Agent System (MAS) considered to be members in a coalition? And when does a coalition think it might be time for certain agents to leave? Can we design agents in such a way that they will continuously improve and strengthen the coalitions that they are part of? Does the size of the coalition matter? How are the norms of a coalitions updated? Not all of these questions will be treated here, but we will try to provide some thoughts on issues such as the value of a coalition having a certain agent as a member, continuous degrees of membership, and whether cheating is possible in such models or not. The main contribution of this work is a discussion about design principles for coalition formation based on some (of many) possible value-based models of choice of actions. The question of whether to cooperate or not is not new. Game theorists such as Lloyd Shapley discussed the matter of values and alternative costs in n-person games in the fifties [14]. Shapley argued that the value of a cooperating agent is directly associated with the alternative cost for it leaving the coalition of cooperators (i.e. its Shapley value), and thus, it would be fair for that agent to have a share of the profit that is proportional to its Shapley value. 15 years later Owen showed that the Shapley values cannot be interpreted as a measure of power (i.e. the ability to bargain) of the agents [10]. 149
150
S. J.
Johansson
Others, for example Conte and Paolucci, have tried to model situations in which social control may be reached, but neither ultimate, utilitarian nor a normative strategies are optimal in all situations [3]. The art of building states in which people voluntarily (or by force) cooperate for the best of the state, even though it implies paying high taxes is discussed by Iwasaki et al. [6]. Klusch and Vielhak have discussed negotiations in coalition formation and implemented the COAL A environment for simulating it [8; 18].
6.1.1
Examples
of
coalitions
Let us give an example of some situations in which there are both agents and explicitly or implicitly stated coalitions: A person normally has several coalitions with other people he knows. His family, his employer, his neighbors, friends all have some expectations on how he should behave explicitly or implicitly stated in their common norms. For instance: a forgotten birthday may result in a weakened value in the coalition with the forgotten person and receiving help from a neighbor will hopefully increase the value of the neighbor from the perspective of the person. In the same way, we may expect computerized agents to have explicit or implicit expectations on getting paid for the services they provide. Micro payments and similar fine-grained ways to describe debts could be of use here. Automated multiple multi-commodity markets have in some sense caught the essence of mutual valuations. If an agent is unable to find what it is looking for (at the right price) in one market, it proceeds to the next one. If the number of potential buyers has a positive impact on the utilities of the sellers, then the sellers will try to make themselves as valuable as possible for the buyers, and the buyers go to the marketplaces where the most valuable offers are available. The agents may of course include their probabilities of actually getting their hands on one of the cheap offers. Therefore, it may be the case that the market with lowest price is not the market it eventually prefers when e.g. expected delivery time etc. are taken into account.
Valuation-Based
Coalition Formation
in Multi-Agent
Systems
151
All three examples show how coalitions of agents (the companies, the persons known and the markets) affect the choice of actions of individual agents (consultants, the person herself and the agent at the market) as well as the individual agent may have an impact on the norms and the membership of others in the coalition. Given these examples, we will describe a theory of mutual valuation between agents and their coalitions that is able to, at least in theory, model the situations above. We take the approach of considering the membership in coalitions as something continuous where the matter of the degree of membership is decided through how it values the coalition as well as how the coalition values the agent. We will get back to the question of how to calculate the degree of membership later in the chapter where we give one model (of many possible ones) of such a calculation. Of course this will lead to a possibility for an rational agent to leave a coalition or join more than one coalition if there are other more tempting offers, as discussed for example by Sandholm [13]. Our approach also opens up the possibility for an agent to believe it is 42% part of one coalition, even if the coalition as such does not think that the agent is part of it to more than 17% and for dynamic continuous coalitions to evolve both over time and in strength. Based on this point of view, we would like to make the following definition of the term coalition: Definition 1 A coalition is a tuple (N, M) consisting of a set of norms N, and a set of degrees of memberships M = {mi...m„}, where each m; is a pair (OJ, d,) where ai is an unique agent id and di is a number describing to what degree the agent is part of the coalition. The agents are also supposed to be rational, or at least boundedly rational in their decision of what to do, i.e. they will, as far as they know, do their best to reach the goals that they are designed to achieve. Such a point of view has been discussed and criticized for example by Doyle [4]. One of the advantages with the approach is the possibility of characterizing types of behaviors at a knowledge level, rather than just enumerating them, making it easier to relate them to relevant concepts in other sciences. However, in practice no agent is skilled enough to make truly rational choices, and even if they are, the choices of actions they make are rational given a set of conditions that in turn may be inaccurate and dynamically changing. We will take a pragmatic point of view claiming that boundedly rational agents
152
S. J.
Johansson
will do their best regardless of their knowledge, i.e. they will be able to choose the best action, given their limited amount of knowledge and sparse reasoning capabilities. 6.1.2
Outline of the
chapter
In the next Section, we will make some definitions concerning agents, their actions, the consequences, probabilities and so on. Sec. 6.3 will introduce two different values, Vj and V?, the value of the coalition i for an agent j and the value of agent j for the coalition i, respectively. Sec. 6.3 proposes a set of recurrence relations that may work as a simple model for updating these values and refine them to employ an adjustable degree of memory loss or forgiveness. Finally, we draw some conclusions and point out possible future trajectories of the work.
6.2
Agents and Actions
An agent ai £ A = {ai,..., an} may at each point in time choose to perform one of the actions bi,...,bm chosen from the set of possible actions B. We refer to the action taken by a* at time t as (3(i, t), i.e. j3 : Axt —>• B. Three things are worth take notice of in this description. Not to do anything is also a decision of what to do, hence an action and thus in B. \B\ = m may or may not be finite, but for reasons of simplicity we assume that it is finite and that the agents of the system may be unable to perform all actions in B. The actions that an agent performs may lead to intended or unintended consequences, i.e. partial descriptions of states in the environment. Regardless of whether the agent acted with the intentions to cause a certain consequence or not, we will assume that the casual relations between actions and consequences is describable, at least on an a posteriori basis. Of course, we could use the notion of states instead of consequences, but by letting the consequences be partial descriptions of the environment, the current state (as interpreted by the agent) is the current consequences that the agent believes are true.* Each action b{ will, by a probability of p* lead to a consequence qj 6 Q = ' F o r sensitive persons, the word believe may be exchanged by another word describing the data known by the agent.
Valuation-Based
Coalition Formation
in Multi-Agent
Systems
153
{ 0
(62)
When deciding how interested a^ is in coalition Cj, it takes into account the actions of the "members" of c,- and the effect these actions have on the
Valuation-Based
Coalition Formation
in Multi-Agent
Systems
159
state sj. V?(0) = 0,
(6.3)
(6.4)
6.3.2.2
A geometric model
Instead of updating the values arithmetically, it may be done geometrically, i.e. by multiplying the values with their updates and then averaging by taking the n:th root of the product. V?'(0)=0,
(6.5)
Vi (t) = y ( V ? ' ( t - l ) ) ' - i . ^ ( / 3 ( * , t - l ) ) , t > 0
(6.6)
In this model, not only the previous moves are averaged geometrically; also the effect of the actions of the other agents are multiplied and rooted. V?(0) = 0,
Vjii) = « (v;(t- i))*-i • (|e .,_W JJ I - v>(i -z$(s))Ui(s,t-
(6-7)
i),t > o
N (6.8) 6.3.2.3
A comparison between the models
Both models are in one respect like two elephants.* Every single move by every agent done so far is remembered and equally valued, no matter how long it has been since it was performed. However, note that the models cope with changes in the norms of a coalition in that an action always is judged in its actual time and environment, not in a future state (where norms may have changed the value of that action). What varies between the models is that in the former one, other agents influence the final result in a way proportional to their part of the coalition. *I assume without proofs that elephants lack the gift of forgetting things. Since this assumption is used for strictly metaphoric reasons, we will leave the discussion of the memory function of the elephant here.
160
S. J. Johansson
If we for instance have a coalition in which nine out of ten agents do a very good job (with Zj(s)v,i(s,t — 1) near 1), and the tenth behaves badly, the latter will only effect 10% of the result and the overall impression of the coalition will be that it solves its problems quite well (Vj! at about 0.9). This may work for some domains, but in others, especially the ones where agents are highly dependent on each other, the deceit of one agent may spoil the result of the coalition as a whole. It may therefore be of interest for an agent to know if the whole coalition works or not. In that case, the geometric model might be handy, since it focus more on the weaknesses of the coalition. However, it is very hard for the model to forget previous mistakes and since all previous moves are weighted equally, a "bad action" will effect the Vs for an (unnecessarily) long time. The elephant property make it impossible to fully forget a "mistake" in the sense that every single move by every agent done so far is remembered and equally valued, no matter how long it has been since it was performed. However, note that the model copes with changes in the norms of a coalition in that an action always is judged in the time and environment in which it occurs, not in a future state (where norms may have changed the value of that action). Sometimes we may prefer a model that let the present actions have a greater impact on the valuations than the actions of a previous step in time.
6.3.3
Two forgiving
models
Just as humans are able to forget and forgive, this may be a desired property in a MAS as well. It turns out that such a change is quite easy to implement and the previous model can be changed to the following: 6.3.3.1
The arithmetic model V?'(0) = 0, m
1
=
(6.9)
7-Vj^-i)
+
W(M-D)
(6 . 10)
>0
1+7
y/(o) = o, VUt) 3
7-F/ft-l + = —2± i
(6.ii) ^ ' 1+ 7
|c. M
,*>0
, ^ (6.12)
Valuation-Based
An B = m
{ai,...,an} {bi,...,bm}
C = {ci,...c e } Ni Mi = (m1,...mn) Q = {qi,--,qr} r S = {si,s 2 ,...,s 2 -}
M,t) P) Pl Pi
Uj{s,t)
VI
vi •4>j{h) z){s)
M 7
4>
Coalition Formation
in Multi-Agent
Systems
161
The set of agents (considered in the system) The number of agents (considered in the system) The set of possible actions The number of actions possible to perform in the system The set of possible coalitions Cj = (Ni,Di) The set of norms of coalition Ci The vector of degrees of memberships of coalition c; The set of consequences The number of possible consequences in a system The set of states of the system, each si C Q The action performed by a,j at time point t The probability that action bi leads to qj The vector (of size r) that describes the probabilities of each one of the consequences of action bi The vector (of size m) that describes the probabilities that the action bj will lead to a certain consequence The utility of agent aj being in a state formed by s C Q at time t The value of coalition i for agent aj The value of agent i for coalition j The fitness of coalition j given its norms and an action bi G B performed by agent a; The opinion of coalition Cj that agent a; is responsible for the system being in state s C Q The size of cj measured e.g. by the sum ^- V? The forgiveness factor, i.e. the parameter deciding the weight of long-term vs. short-term memory. The payoff function used in the scenarios.
Table 6.1
Symbols used in this chapter
We see that the 7-factor decrease the influence of past V and V values to the benefit of the most recent action and judgment.
162
S. J. Johansson
6.3.3.2
the geometric model
Vi(0) = 0, V/(t) =
(6.13)
1+
^/(V/(t-l))-r.^(P(i,t-l)), t > 0
VJ(Q) = 0,
(6.14)
(6.15)
Vj(t) = i+j
(Vj(t - 1))T • ( , e .,_^/ J ] 1 - V£(l - z^s))Ui(s,t-
l),t>0
aky£a,i
^
(6.16) The forgiveness factor 7 > 0 will decrease the weight of previous actions ranging from they have no impact on calculations (7 = 0) to all previous actions are equally valued (just as in our former model 7 = t — 1). 6.3.4
An illustration
of the
models
So how does these models work out in practice? Let us construct an easy example in order to illustrate the differences. 6.3.4.1
Specification of the models
Imagine a situation where four agents ai,...,a± can choose between fully cooperative actions (bi — 1.0) and fully selfish actions (bi = 0.0). The total payoff is then built upon two payoffs, the individual payoff (fiind(i) and the total coalition payoff <j>Coai, each defined through: 4>ind(i) = V(0.5+ bt),
(6.17)
4
&00i = E M 1 ' 5
(6-18)
As can be seen in Fig. 6.1, the example requires that more than one agent join the coalition in order for it to be successful (in the sense that the agent get a higher payoff than the rest of the agents). To create a fair split of the coalition payoff between the members of the coalition, they get as much of the payoff as they are members in the coalition (relative to the
Valuation-Based
Coalition Formation
in Multi-Agent
1.5
3
Systems
163
Coalition payoff Individual payoff
3
0.5
1
2
2.5
3.5
4
Coalition size
Fig. 6.1 The payoff of the coalition Co as a function of its size (|co|) compared to the maximum individual payoff <j>i = 2.
other members), i.e.:
coai(i) =
l
| , c ° a ', where
(6.19)
4
\co\ = Y,V?>and
(6.20)
1=1
total{l) = coal{i) + ind(i)
(6.21)
What the norms are concerned, we will in this example use a norm function bnorm{t) that is the averaged action in the previous step of time among the members, where the influence is relative to their degree of membership.
M _T.Uv?-i>i
(6.22)
col
How well the own actions correspond to the norm (ipoih)) is then calculated through: ipo(bi) =
l-\bi-b„
(6.23)
164
S. J.
Time 0 10 20 30 60 80
Johansson
Event The scenario starts. Four agents are present of which agent one is cooperating. Agent two joins the coalition Agent three joins the coalition The last agent joins the coalition Agent one perform an action that differ considerably from the norm of the coalition The scenario ends Table 6.2
The events of the scenario
In this simple model, we will let the -Zg(s)-function be:
4(s)
= M*);v?
(6.24)
l c o| In the case where we look at the forgiving models, 7 is set to 1.5. 6.3.5
Specification
of the
scenario
The chosen scenario will show us two things. Firstly, how the model will react when new agents join the coalition. Secondly how will it react on agents trying to rip off the coalition by choosing a single very uncooperative action 6j = 0.01. The scenario is described in Table 6.2. 6.3.5.1
The results of the scenario evaluation
So, let us move on to the results of the evaluation of the models. We have no noise in the calculations and will show the results in terms of the total payoff of the agents, showing how temptations of fast pay-backs will be punished by the coalition. In Fig. 6.2, we see how the arithmetic model is very slow in converging to a fair distribution of the payoff. The payoff for the agents within the coalition raises steep as a new agent joins. That is because the new agent contributes as much as the full members of the coalition, but without being able to collect more than its relative impact (which is roughly based on its value of V"). When agent one is uncooperative, the other agents gets a temporary dip in their payoff, at the same time as the former agent collects
Valuation-Based
Coalition Formation
in Multi-Agent
Systems
165
Fig. 6.2 The total payoff for the agents in the arithmetic non-forgiving model. At time points 10, 20 and 30, new agents join the coalition and at time point 60, the first agent break the norm to make a short term profit.
Agenl 1 Agent 2 Agent3
-
—
I7\^J
^
-
~\l ~"~
......
" " " ' • . /
i
Fig. 6.3
'
The total payoff for the agents in the geometric non-forgiving model
the overhead payoff. Fig 6.3 shows the geometric version of Fig. 6.2. We recognize a slower convergence but also that the deviation from the norms done by agent one is punished harder.
166 S. J. Johansson Agent Agent Agent Agent
£
1 2 3 4
-u\ _ J
2
'
0
10
20
30
40
50
60
70
80
Tina
Fig. 6.4 The total payoff for the agents in the arithmetic forgiving model
We see a great difference in the shape of the payoffs as we move on to the forgiving models. In the forgiving models in Fig. 6.4 and 6.5, we see how the agents reach a convergence in payoffs in just a few rounds. Also, these models will punish the breaking of norms more immediate than the previous models. The difference between these two models lies in the shape of the payoff function, but also in the ability to punish misbehavior, where the geometric forgiving model is less forgiving than the arithmetic one.
6.4
Some thoughts on agent self-contemplation
In Sec. 6.3 we argued that if the values of V? and V/ is not in balance, the parts may try to level out the differences by exploiting the other. Is it then possible, as an agent system designer, to actively differentiate valuations in order to get systems that are more cooperative? 6.4.1
Agents
and the law of J ante
One thing that is put forth as typical for Scandinavia is the law of Jante. The law consists of ten statements written down by Aksel Sandemose based on studies of how people in his home town behaved [12] and consists of the following rules:
Valuation-Based
Coalition Formation
1N ^ ^ M
Agent 1 Agent 2 Agent 3 Agent 4
-^
/
\
I I
in Multi-Agent
Systems
167
1 1/ /
I
-'
y
Fig. 6.5
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
You You You You You You You You You You
•
•
y
/
y
The total payoff for the agents in the geometric forgiving model
shall shall shall shall shall shall shall shall shall shall
not not not not not not not not not not
believe you are something. believe you are as good as we. believe you are more wise than we are. fancy yourself better than we. believe you know more than we. believe you are greater than we. believe you amount to anything. laugh at us. believe that anyone is concerned about you. believe you can teach us anything.
The essence of it is that You shall not plume yourself or think that your work is better than ours or that you could teach us anything, etc.. If applied to the agents, what would the result be? Well, firstly we must find out: what does it mean in terms of VJ and V? ? To underestimate its own value to the coalition in relation to the coalitions value to itself is to create a situation in which (at least from the perspective of the agent) Vj > V?. In order to re-establish the equilibrium, the agent may try even harder to follow norms, etc so that it will be accepted by the coalition. If the coalition have the same opinion about the relationship, it may lead to that it will try to exploit the agent, in order to decrease the agents interest for it, but since the assumption is that it is the agent that underes-
168
S. J.
Johansson
timates its own value, the coalition may think they are in balance, while the agent does not. In all, if a majority of the agents have low self-confidence, it will lead to stronger coalitions with high degrees of membership. 6.4.2
Agents
with high
self-confidence
The opposite of the agent underestimating its value, is the one overestimating it, i.e. V- < V?. For various reasons it has a tremendous self-confidence and it thinks that it is irreplaceable for the coalition, or at least that the coalition has much more use of it, than it has use for the coalition. Such a situation lead to that the agent performs actions that decrease its value for the group (that is, the V?) in order to make short-term gains e.g. by cheating on its coalition members although it breaks the norms of the coalition. Or the coalition increase the value for the agent to a level that fits it, e.g. by changing its norms so that it suits the agent better^ . In all, if all agents apply a self-confident strategy, the system as a whole will have trouble creating stable coalitions, since none of them will work for the sake of the coalition, if they may gain more by not doing so. 6.4.3
An evolutionary
perspective
Although a small proportion of the law of Jante in every agent may seem to be a promising design principle (in that it strengthens coalitions), it is not the case that it automatically leads to robust systems. On the contrary, Janteists are subject to invasion and exploitation by the self-confident agents in an open system. This leaves us with two kinds of stable solutions1': • either with self-confident agents only (if the gain of strong coalitions is low). Then no agent is willing to sacrifice anything of their resources for the good of the other agents, since if it did, the other agents would take the resources and never pay back. • or a mixed equilibrium of Janteists, true valuators and self-confident agents (if the gain of strong coalitions exceeds the expected value §This is done for false reasons. It is mainly the opinion of the agent that it is worth more than it actually is, thus the inequality is a concern of the agent, rather than the coalition. ' s t a b l e in the sense that no agent will improve their payoff if they alone change their behavior, c.f. the Hawk-and-Dove game described e.g. in [9]
Valuation-Based
Coalition Formation
D D H
in Multi-Agent
Systems
169
H
R, R 0,2R 2R,0 R-F,R-F
Table 6.3 A Hawk-and-Dove game matrix. When two doves (D) meet, they equally share the common resource (2R), a hawk (H) will always take all of the resources when meeting a dove, and two hawks will fight over the resources to an averaged cost F. An evolutionary stable strategy in this game is a mix of a hawk behavior 2R/F parts of the time, and a dove behavior the rest of the time.
of acting in a self-confident way). In this case, there are enough agents willing to trust each other and build a coalition in order to maintain the coalition, but neither the self-confident agents, nor the Janteists would improve their payoff by changing strategy. In Table 6.3 we see a the (famous) Hawk-and-Dove (HD) game. Compared to the discussion earlier on equilibria in coalition formation, we see that there are similarities, as a matter of fact, the HD game is a formalization of the decision of whether or not to cooperate in a coalition. If F is high enough compared to R, the agents will cooperate, since the risk of a hawk to run into another hawk may make the dove behavior beneficial. If F is low, e.g. F = 0, the payoffs for two hawk (self-confident) agents meeting will be equal to the ones of two dove (Janteist) agents meeting, but for every time the hawk meets the dove, it will win over the dove, making the only rational choice of strategy being the hawk (or self confident) behavior. In the literature of evolutionary game theory the matters of mixed strategies and equilibria are discussed thoroughly for instance in the classic book by Maynard Smith [9]. Rosenschein and Zlotkin formulated several agent scenarios in terms of game theory in their Rules of Encounter [ll] and Weibull gives an rationalistic economics perspective [19]. Given that every system possible to exploit will be exploited, we must ask ourselves the question whether the behaviors described above (the law of Jante and the self-confident) are exploitable or not. 6.4.3.1
Exploiting the Janteists
It would actually be enough not to underestimate the value of yourself in the coalition in order to get an advantage over the "Janteists". By doing
170
S. J.
Johansson
so, you will have more impact on the coalition and forming its norms" and this can be used to form norms that at an average suits you slightly better than the others. Better norms (for an agent) in this case, is interpreted as norms that suits the agents own intentions better, so that it does not have to choose actions that contradicts its own goals, just because the norms of the coalition says so. 6.4.3.2
Exploiting the self-confident
To exploit self-confident agents is harder. We cannot approach the problem in the same way as we did with in the previous section, since if we were to raise our own value above the ones of the self-confident, it would only make us even more self-confident, i.e. non-willing to cooperate in an altruistic fashion.
6.5
Conclusions
We have argued for a rational, continuous view of membership in coalitions, where the membership is based on how valuable the coalition is for the agent and vice versa. We have also presented a theoretical model of updating group values, both from the individual agent and the coalition perspectives and an improvement that generalizes the notion of forgiveness and make the model range from elephants to "forgetters". Three examples of how valuations between agents and coalitions may work have been discussed and one of them has been explicitly expressed in the proposed models. However, the models are just examples and we believe that several other models will fit into the discussion about exploiters and Janteists as well, e.g. the work of Verhagen [17]. The main contribution of this work is instead the discussions around the models and that of what actually can be done by the agents themselves and what we as designers have to think about when designing agents that will form coalitions. It seems like if the law of Jante may give the coalitions extra fuel in that agents will do a little bit more than they are expected to, in order to be even more accepted in the coalition; however that behavior is possible to exploit and an equilibrium may be expected between exploiters and II This is under the assumption that the more "member" you are, the more impact will you have on the norms of the coalition.
Valuation-Based
Coalition Formation
in Multi-Agent
Systems
171
exploited agents. What the self-confident agents are concerned, they do not seem to suffer from exploiters, but instead, the system in which they act might be characterized by weak (if any) coalitions, a claim that is supported e.g. by the work of Shoham and Tanaka [15]. Acknowledgements I would like to thank Paul Davidsson, Magnus Boman, Harko Verhagen, Patrik Werle, Bengt Carlsson, Sam Joseph and the anonymous reviewers for their comments on various drafts of this work (first published at IAT '99 [7]), and the participants of the IAT '99 for the discussions.
172 S. J. Johansson
Bibliography
[1] M. Boman. Norms in artificial decision making. Artificial and Law, 7:17-35, 1999.
Intelligence
[2] K. Carley and A. Newell. The nature of the social agent. Journal of Mathematical Sociology, 19(4):221-262, 1994. [3] R. Conte and M. Paolucci. Tributes or norms? the context-dependent rationality of social control. In R. Conte, R. Hegelmann, and P. Terna, editors, Simulating Social Phenomena, volume 456 of Lecture Notes in Economics and Mathematical Systems, pages 187-193. Springer Verlag, 1997. [4] J. Doyle. Rationality and its role in reasoning. Computational gence, 8(2):376-409, 1992. [5] E.H. Durfee. Practically coordinating. AI Magazine, 1999.
Intelli-
20(1):99-116,
[6] A. Iwasaki, S.H. Oda, and K. Ueda. Simulating a n-person multi-stage game for making a state. In Proceedings of Simulated Evolution and Learning, volume 2, 1998. [7] S. Johansson. Mutual valuations between agents and their coalitions. In Proceedings of Intelligent Agent Technology '99, 1999. [8] M. Klusch. Cooperative Information Agents on the Internet. PhD thesis, University of Kiel, Germany, 1997. in german. [9] J. Maynard Smith. Evolution and the theory of games. Cambridge University Press, 1982. [10] G. Owen. A note on the Shapley value. Management Science, 14:731732, 1968. [11] J. S. Rosenschein and G. Zlotkin. Rules of Encounter. MIT Press, 1994.
Valuation-Based Coalition Formation in Multi-Agent Systems [12] A. Sandemose. En flykting korsar sitt spar. Forum, 1977. In Swedish, first edition in danish 1933. [13] T. Sandholm. Leveled commitment contracting among myopic individually rational agents. In Proceedings of the third International Conference on Multi-Agent Systems (ICMAS) '98, pages 26-33, 1998. [14] L.S. Shapley. A value for n-person games. Annals of Mathematics Studies, 2(28):307-317, 1953. [15] Y. Shoham and K. Tanaka. A dynamic theory of incentives in multiagent systems. In Proceedings of Fifteenth International Joint Conference on Artificial Intelligence (IJCAI) '97, volume 1, pages 626-631, 1997. [16] R.S Sutton and A.G. Barto. Reinforcement Learning: An MIT Press, 1998.
introduction.
[17] H.J.E. Verhagen. Norm Autonomous Agents. PhD thesis, Department of Computer and Systems Sciences, Stockholm University and Royal Institute of Technology, 2000. [18] T. Vielhak. COALA — a general testbed for simulation of coalition formation among autonomous agents. Master's thesis, Institute of Computer Science and Applied Mathematics, University of Kiel, Germany, 1998. user's guide. [19] J. Weibull. Evolutionary Game Theory. MIT Press, 1996. [20] D.H. Wolpert and K. Turner. An introduction to collective intelligence. Technical Report NASA-ARC-IC-99-63, NASA Ames Research Centre, 2000.
173
Chapter 7
Simulating How to Cooperate in Iterated Chicken and Prisoner's Dilemma Games Bengt Carlsson Department of Software Engineering and Computer Science, Blekinge Institute of Technology, Sweden 7.1
Introduction
In the field of multi-agent systems (MAS) the concept of game theory is widely in use ([15]; [23]; [30]). The initial aim of game theorists was to find principles of rational behavior. When an agent behaves rationally it "will act in order to achieve its goal and will not act in such a way as to prevent its goals from being achieved without good cause" [19]. In some situations it is rational to cooperate with other agents to achieve its goal. With the introduction of the "trembling hand" noise ([32]; [4]) a perfect strategy would take into account that agents occasionally do not perform the intended action1. To learn, adapt, and evolve will be of a major interest for the agent. It became a major task for game theorists to describe the dynamical outcome of model games defined by strategies, payoffs, and adaptive mechanisms, rather than to prescribe solutions based on a priori reasoning. The crucial thing is what happens if the emphasis is on a conflict of interest among
1 In this metaphor an agent chooses between two buttons. The trembling hand may, by mistake, cause the agent to press the wrong button.
175
176 B. Carlsson
agents. How should in such situations agents cooperate with one another if at all? A central assumption of classical game theory is that the agent will behave rationally and according to some criterion of self-interest. Most analyses of iterative cooperate games have focused on the payoff environment defined as the Prisoner's dilemma ([5]; [10]) while the similar chicken game to a much less extent has been analyzed. In this chapter, a large number of different (Prisoner's dilemma and chicken) games are analyzed for a limited number of simple strategies. 7.2
Background
Game theory tools have been primarily applied to human behavior, but have more recently been used for the design of automated interactions. Rosenschein and Zlotkin [30] give an example of two agents, each controlling a telecommunication network with associated resources such as communication lines, routing computers, short and long-term storage devices. The load that each agent has to handle varies over time, making it beneficial for each if they could share the resources, but not obvious for the common good. The interaction for coordinating these loads could involve prices for renting out resources within varying message traffic on each network. An agent may have its own goal trying to maximize its own profit. In this chapter games with two agents each having two choices are considered2. It is presumed that the different outcomes are measurable in terms of money or a time consuming value or something equivalent. 7.2.7
Prisoner's dilemma and chicken game
Prisoner's dilemma (PD) was originally formulated as a paradox where the obvious preferable solution for both prisoners, low punishment, was unattainable. The first prisoner does not know what the second prisoner intend to do, so he has to guard himself. The paradox lies in the fact that both prisoners has to accept a high penalty, in spite of a better solution for 2
Games may be generalized to more agents with more choices, a n-persons game. In such games the influence from the single agent will be reduced with the size of the group. In this paper we will simulate repeated two person's games which enlarge the group of agents, and at least partly may be treated as a n-persons game (but still with two choices).
Simulating How to Cooperate in Iterated Chicken and Prisoner's Dilemma Games 177
both of them. This paradox presumes that the prisoners were unable to talk to each other or take revenge after the years in jail. It is a symmetrical game with no background information. In the original single play PD; two agents each have two options, to cooperate or to defect (not cooperate). If both cooperate, they receive a reward, R. The pay-off of R is larger than of the punishment, P, obtained if both defect, but smaller than the temptation, T, obtained by a defector against a cooperator. If the suckers payoff, S, where one cooperates and the other defects, is less than P there is a Prisoner's dilemma defined by T > R > P > S and 2R > T+S (see Fig. 7.1). The second condition means that the value of the payoff, when shared in cooperation, must be greater than it is when shared by a cooperator and a defector. Because it pays more to defect, no matter how the opponent choose to act, an agent is bound to defect, if the agents are not deriving advantage from repeating the game. More generally, there will be an optimal strategy in the single play PD (playing defect). This should be contrasted to the repeated or iterated Prisoner's dilemma where the agents are supposed to cooperate instead. We will further discuss iterated games in the following sections. The original Chicken game (CG), according to Russell [31] was described as a car race: "It is played by choosing a long straight road with a white line down the middle and starting two very fast cars towards each other from opposite ends. Each car is expected to keep the wheels of one side of the white line. As they approach each other, mutual destruction becomes more and more imminent. If one of them swerves from the white line before the other, the other, as he passes, shouts Chicken! and the one who has swerved becomes an object of contempt.. ."3 The big difference compared to Prisoner's dilemma is the increased costs for playing mutually defect. The car drivers should not really risk crashing into the other car (or falling off the cliff). In a chicken game the pay-off of S is bigger than of P, that is T > R > S > P. Under the same conditions as in the Prisoner's dilemma defectors will not be optimal winners when playing the chicken game. Instead there will be a combination between playing defect and playing cooperate, winning the game. In Fig. 7.1b R and P are assumed An even earlier version of the chicken game came from the 1955 movie "Rebel Without a Cause" with James Dean. Two cars are simultaneously driving off the edge of a cliff, with the car driving teenagers jumping out at the last possible moment. The boy who jumps out first is "chicken" and loses.
178
B.
Carlsson
to be fixed to 1 and 0 respectively. This can be done through a two steps reduction where in the first step all variables are subtracted by P and in the second step divided by R-P. This makes it possible to describe the games with only two parameters S' and T' (see Fig. 7.7 in the simulation section of this chapter). In fact we can capture all possible 2 x 2 games in a twodimensional plane4. a.
Cooperate
Defect
b.
Cooperate
Defect
Cooperate
R
S
Cooperate
1
(S-P)/(R-P)
Defect
T
P
Defect
(T-P)/(R-P)
0
Fig. 7.1 Pay-off matrices for 2 x 2 games where R = reward, S= sucker, T= temptation and P= punishment. In b the four variables R, S, T and P are reduced to two variables S'= (S-P)/(T-P) and T'= (T-P)/(R-P) As can be seen in Fig. 7.2 these normalized games are limited below the line S'= 1 and above the line T'= 1. CG has an open area restricted by 0 < S' < 1 and T' > 1 whereas PD is restricted by T' + S' < 2, S'< 0 and T' > 1. If T'+ S' > 2 is allowed there will be no upper limit for the value of the temptation. There is no definite reason for excluding this possibility (see also [12]). This was already pointed out when the restriction was introduced. 'The question of whether the collusion of alternating unilateral defections would occur and, if so, how frequently is doubtless interesting. For the present, however, we wish to avoid the complication of multiple 'cooperative solutions'." [28]. In this study no strategy explicitly make use of unilateral defections, so the extended area of PD is used.
4
Although there are an infinitely number of different possible games, we may reduce this number by regarding the preference orderings of the payoffs. Each agent has 24 (4!) strict preference orderings of the payoffs between it's four choices. This makes 24*24 different pairs of preference orderings, but not all of them represent distinct games. It is possible to interchange rows, columns and agents to optain equal games. If all doublets are put away we still have 78 games left [29]. Most of these games are trivial because there is one agent with a dominating strategy winning.
Simulating
How to Cooperate in Iterated Chicken and Prisoner's
Dilemma
Games
179
s,n
1
o •
! ! ! 1
i • • 1
| !
Chicken game r 2
x
p X
X
X
Prisoner's di\ lemma T'+S'®
XD XD
>© ( D ^ xc d)
xc
c)
XC
€®L ©
1 it is a PD. If Si < 1 it is a CG. To obtain a more general treatment of IPD and ICG, we used several variants of payoff matrices within these games, based on the general matrix of Fig. 7.7 (corresponding to Fig. 7.2).
190 B. Carlsson
In the first set of simulations we investigated the successfulness of the agents using different strategies (one strategy per agent) in a round-robin tournament. Since this is independent of the actual payoff value, the same round-robin tournament can be used for both IPD and ICG. Every agent was paired with all the other agents plus a copy of itself. Every meeting between agents in the tournament was repeated on average 100 times (randomly stopped) and played for 5000 times. The result from the two-by-two meetings between agents using different strategies in the round-robin tournament was used in a population tournament. The tournament starts with a population of 100 agents for each strategy, making a total population of 900. The simulation halts when there is a winning strategy (all 900 agents use the same strategy) or when the number of generations exceeds 10.000. Agents are allowed to change strategy and the population size remains the same during the whole contest. For the IPD the following parameters were used: sL e {1.1, 1.2...2.0} and s2 e {0.1,0.2...1.0,2.0}, making a total of 110 different games8. For the ICG games with parameter settings sL e {0.1,0.2,...0.9} and s2 e {0.1,0.2,.... 1.0,2.0} a total of 99 different games were run. Each game is repeated during 100 plays and the average success is calculated for each strategy. For each kind of game there is both the cooperative-set and the defective-set. 7.4
7.4.1
Results
Variants ofAxelrod's original matrix
Out of 36 different strategies Gradual won in a PD game. Gradual cooperates on thefirstmove, then defects n times after n defections, and then calms down its opponent with 2 cooperation moves. In CG a strategy Coop_puis_tc won. This strategy cooperates until the other agent defects and then alters between defection and cooperation the rest of the time. TjT was around 5th place for both games. Two other interesting strategies are joss_mou (2nd place) and jossjdur (35th place). Both start with cooperation 8
For the strategies used in this simulation the constraint 2R > T + S does not affect the results, so these combinations are not excluded.
Simulating
How to Cooperate in Iterated Chicken and Prisoner's
Dilemma
Games
191
and basically play TfT. Joss-mou plays cooperation strategy one time out of ten instead of defect and joss_dur plays defect one time out of ten instead of cooperate. This causes the large differences in scores between the strategies.
Fig.7.8. Comparing PD and CG. In thefigureabove the CG is in the foreground and the PD in the background, the best strategies are to the left and the worst to the right. The top scoring strategies start with cooperation and react towards others i.e. they are not static. Both PD and CG have the same top strategies. A majority of the low score games are either starting with defect or have a static strategy. Always defect has the biggest difference in favor of PD and always cooperate the biggest difference in favor of CG. The five games with the largest difference in favor of CG are all cooperative with a static counter. There is no such connection for the strategies in favor of PD, instead there is a mixture of cooperate, defect and static strategies. Our simulation indicates that the chicken game to a higher extent rewards cooperative strategies than the Prisoner's dilemma because of the increased cost of mutual defections. The following parts of the result confirm these statements: All the top six strategies are nice and start with cooperation. They have small or moderate differences in scores between the chicken game and Prisoner's dilemma. TfT is a successful strategy but not the best. All the 11 strategies, with a lower score than random, either starts with defect or, if
192
B.
Carlsson
they start with cooperation, is not nice. All of these strategies are doing significantly worse in the CG than in the PD. This means that we have a game that benefits cooperators better than the PD, namely the CG. A few of the strategies got, despite of the overall decreasing average score, a better score in the CG than in the PD. They all seem to have taken advantage of the increasing score for cooperation against defect. In order to do that, they must, on the average, play more C than D, when its opponent plays D. The mimicking strategies, like TjT, cannot be in this group, since they are not that forgiving. In fact, most strategies that demand some kind of revenge for an unprovoked defect will be excluded, leaving only the static strategies9. All static strategies, which cooperate on the first move, and some of the partially static ones, do better in the CG than in the PD. We interpret this result to be yet another indicator of the importance of being forgiving in aCG. 7.4.2
Adding noise to PD and CG
ToUl
. ^ __ s » - -»—£ _ - * * ' «- . — -— S i — •»• C " .
/
•*»*
Y-
"*"•
PDO
PO0.O1
PD0.1
-
/ . -v.' ®
s*r PD 1
V
^
PD 10
Noise
Fig. 7.9 The four most successful strategies in PD games with increasing noise. Total represents the percentage of the population these four strategies represented.
9
In fact extremely nice non-static strategies (e.g. a 7/7"-based strategy that defects with a lower probability than it cooperates on an opponent's defection) also would probably do better in a CG than in a PD, but such strategies were not part of our simulations.
Simulating
How to Cooperate in Iterated Chicken and Prisoner's
Dilemma
Games
193
Instead of looking at all the different games we formed two different groups: PD, consisting of the Axelrod, 1.6D and 1.9D matrices, and CG consisting of 2.ID, 2.4D and 3.0D matrices. For each group we examined the five most successful strategies for different levels of noise. Fig. 7.9 and Fig. 7.10 show these strategies for PD and CG when 0, 0.01, 0.1,1.0, and 10.0 per cent noise is introduced. Among the four most successful strategies in PD there were three greedy and one even-matched strategy (Fig. 7.9. see also Fig. 7.3.). In all, these strategies constituted between 85% (1% noise) and 60% (0.1%) of the population. TfT was doing well with 0.01% and 0.1% noise; Davis was most successful with 1% noise, and A11D with 10% noise. —
-
•
•»••
/^
^~"
J*fo\&\
s^ w
CG0
SimDleton „
CG0.01
CG0.1
^ ^
CGI
CG 10
Noise
Fig. 7.10 The five most successful strategies in CG games with increasing noise. Total represents the percentage of the population thesefivestrategies represented. Three out of five of the most successful strategies in CG were generous. The total line in Fig. 7.10. shows that five strategies constitute between 50% (no noise) and nearly 100% (0.1% and 1% noise) of the population. TJT, the only even-matched strategy, was the first strategy to decline as shown in the diagram. At a noise level of 0.1% or more, TJT never won a single population competition. Grofman increased its population until 0.1% noise, but then rapidly disappeared as noise increased. Simpleton that declined after 1% noise level showed the same pattern. Only Fair continued to increase when more noise was added, making it a dominating strategy at 10% noise together with the greedy strategy AllD.
194
B.
Carlsson
7.4.3 Normalized matrices 7.4.3.1 Playing random If agents with a number of random strategies are allowed to compete with each other, they will find a single winning strategy after a number of generations. This has to do with genetically drift and small simulation variations between different random strategies about how they actually play their C and D moves. As can be seen in Fig. 7.11 there are an increasing number of generations for finding a winning strategy when the total population size increases. This almost linear increase (r = 0.99) is only marginally dependent of what game is played. 10000
50
100
150
200
250
300
350
Population size each strategy
Fig. 7.11. Number of generations for finding a winning strategy among 15 random strategies with a varying population size The simulation consists of strategies with a population size of 100 individuals each. Randomized strategies with 100 individuals are, according to Fig. 7.11., supposed to halt after approximately 2800 generations in a population game. There are two possible kinds of winning strategies: pure strategies that halt and mixed strategies (two or more pure strategies) that do
Simulating
How to Cooperate in Iterated Chicken and Prisoner's
Dilemma
Games
195
not halt. If there is an active choice of a pure strategy it should halt before 2800 generation, because otherwise playing random could be treated as a winning pure strategy. Fig. 7.12 shows the relations between pure and mixed strategies for IPD and ICG. For all 110 games each run with one cooperative-set and one defective-set within IPD this is true. For the ICG only one out of 99 different games halted before 2800 generations. This game (T=l.l, S=0.1) was very close to an IPD. For the rest of the ICG there was a mixed strategy outcome. There is no reason to believe us to find a single strategy solution by extending the simulation beyond 10000 generations. If there exists a pure solution, this solution should turn up much earlier. 7.4.3.2 Pure and mixed strategies for cooperative and defective sets. Fig. 7.12 shows a major difference between pure and mixed-strategies for IPD and ICG. IPD has no successful mixed strategies at all, while ICG favors mixed-strategies for a overwhelming majority of the games. Some details not shown in Fig. 7.12 are discussed below.
Pure strategies Mixed strategies
IPD Cooperative-set TjT7S% AUD 20% none
Defective-set TjT15% AUD 20% none
ICG Cooperative-set Defective-set TfT2% TjT 3% 2-strat. 61% 3-strat 33%
2-strat 69% 3-strat 24%
Fig. 7.12. The difference between pure and mixed-strategies in IPD and ICG. For details see text. For the cooperative-set there is a single strategy winner after on average 167 generations. TjT wins 78% of the plays and is dominating 91 out of 110 games10. AUD is dominating the rest of the games and wins 20% of the plays. For the defective-set there is a single strategy winning in 47 generations on average. TjT is dominating 84 games, AUD 21 games and 99.99D, playing D 99.99% of the time, 5 games out of 110 games in all. TjT wins 75% of the plays, AUD 20% and 99.99D 4%. In the cooperative-set there are two formations of mixed-strategies winning most of the games; one with two strategies and the other with three A game is dominated by a certain strategy if it wins more than 50 out of 100 plays.
196 B. Carlsson
strategies involved. This means that when the play was finished after 10000 generations not a single play could separate these strategies finding a single winner. The two-strategy set ATJT and AllD wins 61 % of the plays and the three-strategy set ATJT, AllD and AUCm wins 33% of the plays. AUCto, means that one and just one of the strategies AllC, 99.99C, 99.9C, 99C or 90C is the winning strategy. For 3% of the games there was a single TJT winner within relatively few generations (on average 754 generations). In the defective-set there is the same two formations winning most of the games. ATJT + AUDtot wins 69% of the plays and ATJT + AllC + AUDtot wins 24% of the plays. AUDtot means that one and just one of the strategies AllD, 99.99D, 99.9D, 99D or 90D is the winning strategy. TJT is a single winning strategy in 2% of the plays, which needs on average 573 generations before winning a play. 7.4.3.3 Generous and greedy strategies in IPD and ICG In the C-variant set all AllC variants are generous and TJT is even matched. AllD, ATJT and Random are all greedy strategies. In the D-variant set all AllD variants are greedy and TJT is still even-matched. AllC, ATJT and Random are now representing generous strategies. In the IPD the even-matched TJT is a dominating strategy in both the Cand D-variant set with the greedy AllD as the only primary alternative. So the IPD will end up being a fully cooperative game (TJT) or a fully defecting game (AllD) after relatively few generations. This is the case both for the (Invariant set and, within even fewer generations, for the D-variant set. In ICG there is instead a mixed solution between two or three strategies. In the C-variant ATJT and AllD form a greedy two-strategy set11. In the threestrategy variant the generous AUCtot join the other two. In all, generous strategies only constitute about 10% of the mixed strategies. In the D-variant the generous ATJT forms various strategy sets with the greedy AUDM. 7.5
Discussion
In our first study of variants of Axelrod's original matrix a CG tends to favor cooperation more than a PD because of the values of the payoff matrix. The " With just ATfT and AllD left ATfr will behave as a generous strategy even though it starts off as a greedy strategy in the C-variant environment.
Simulating
How to Cooperate in Iterated Chicken and Prisoner's
Dilemma
Games
197
payoff matrix in this first series of simulations is constant, a situation that is hardly the case in a real world application, where agents act in environments where they interact with other agents and human beings. This changes the context of the agent and may also affect its preferences. None of the strategies in our simulation actually analyses its score and acts upon it, which gave us significant linear changes in score between the games. We looked at an uncertain environment, free from the assumption of any existing perfect information between strategies, by introducing noise. Generous strategies were dominating the CG while greedy strategies were more successful in PD. In PD, TJT was successful with a low noise environment; and Davis and AUD with a high noise environment. Fair was increasingly successful in CG when more noise was added. We conclude that the generous strategies are more stable in an uncertain environment in CG. Especially Fair and Simpleton were doing well, indicating these strategies are likely to be suitable for a particularly unreliable and dynamic environment. The same conclusion about generous strategies in PD, for another set of strategies, has been drawn by Bendor ([6]: [7]). In our PD simulations we found TJT being a successful strategy when a small amount of noise was added while greedy strategies did increasingly better when the noise increased. This indicates that generous strategies are more stable in the CG part of the matrix both with and without noise. In the normalized matrices stochastic memory-0 and memory-1 strategies are used. The main difference between IPD and ICG is best shown by the two strategies TJT and ATJT. TJT does the same as its opponent. This is a successful way of behaving if there is a pure-strategy solution because it forces the winning strategy to cooperate or defect, but not doing both. ATJT is doing very badly in IPD because it tries to jump between playing cooperate and defect. In ICG we have a totally different assumption because a mixed-strategy solution is favored (at least in the present simulation). ATJT does the opposite as its opponent but cannot by itself form a mixed-strategy solution. It has to rely on other cooperative or defect strategies. In all different ICG ATJT is one of the remaining strategies, while TJT is only occasionally winning a play. For a simple strategy setting like the cooperative and defective-set, ICG will not find a pure-strategy winner at all but a mixture between two or more strategies, while IPD quickly finds a single winner.
198
B.
Carlsson
Unlike the single play PD, which always favors defect, the IPD will favor playing cooperate. In CG the advantage of cooperation should be even stronger, because it costs more to defect compared to the PD, but in our simulation greedier strategies were favored with memory-0 and memory-1 strategies. We think this new paradox can be explained by a larger "robustness" of the chicken game. This robustness may be present if more strategies, like the strategies in the two other simulations, are allowed and/or noise is introduced. Robustness is expressed by two or more strategies winning the game instead of a single winner or by a more sophisticated single winner. Such a winner could be cTJT, Pavlov, or Fair in the presence of noise, instead of TJT. In Carlsson and Jonsson [14] 15 different strategies were run in a population game within different IPD and ICG and with different levels of noise. TJT and greedy strategies like AUD dominated the IPD while Pavlov and two variants of cTJT dominated the ICG. For all levels of noise it took on average fewer generations to find a winner in the IPD. This winner was greedier than the winner in the ICG. If instead a lot of non-intuitive strategies were used together with AUD, AUC, TJT and ATJT, IPD very quickly terminated with TJT and AUD winning the games, while ICG did not terminate at all for most of the different noises. We propose that the difference between IPD and ICG can be explained by pure and mixed-strategy solutions for simple memory-0 or memory-1 strategies. For simple strategies like TJT and ATJT, ICG will not have a purestrategy winner at all but a mixture between two or more strategies, while IPD quickly finds a single winner. For an extended set of strategies and/or when noise is present the ICG may have more robust winners than the IPD by favoring more complex and generous strategies. Instead of TJT a complex strategy like Fair is favored. From an agent engineering perspective the strategies presented in this chapter are quite simple. The presupposed agents are modeled in a predestined game theoretical environment without a sophistically internal representation. If we give the involved agents the ability to establish trust the difference between the two kinds of games are easier to understand. In the PD establishing trustworthiness between the agents means establishing trust, whereas in CG, it involves creating fear, i.e. avoiding situations where there are too much to lose. This makes CG a strong candidate for being a major cooperate game together with PD.
Simulating
How to Cooperate in Iterated Chicken and Prisoner's
Dilemma
Games
199
Acknowledgements The author wishes to thank Paul Davidsson, Stefan Johansson, Ingemar Jonsson and the anonymous reviewers from the IAT conference for their critical reviews of previous version of the manuscript and Stefan Johansson for running the simulation.
Bibliography [I] Axelrod, R. Effective Choice in the Prisoner's Dilemma Journal of Conflict Resolution vol. 24 No. 1, p. 379-403, 1980a. [2] Axelrod, R. More Effective Choice in the Prisoner's Dilemma Journal of Conflict Resolution vol. 24 No. 3, p. 3-25, 1980b. [3] Axelrod, R., The Evolution of Cooperation. Basic Books, New York 1984. [4] Axelrod, R., and Dion, D., The further evolution of cooperation Nature, 242:1385-1390, 1988. [5] Axelrod, R., and Hamilton, W.D., The evolution of cooperatioa Science 211, 1390,1981. [6] Bendor, J., and Kramer, R.M., and Stout S., "When in Doubt...Cooperation in a Noisy Prisoner's Dilemma." Journal of conflict resolution vol. 35 No 4 p. 691719, 1991. [7] Bendor, J., "Uncertainty and the Evolution of Cooperation" Journal of Conflict resolution vol. 37 No 4 p. 709-734, 1993 [8] Binmore, K. Playing fair: game theory and the social contract The MIT Press Cambridge, MA, 1994. [9] Boerlijst, M.C., Nowak, MA. and Sigmund, K., Equal Pay for all Prisoners. / The Logic of Contrition IIASA Interim Report IR-97-73 1997. [10] Boyd, R., Mistakes Allow Evolutionary Stability in the Repeated Prisoner's Dilemma Game, J. Theor. Biol. 136, pp. 47-56, 1989. [II] Carlsson, B., How to Cooperate in Iterated Chicken Game and Iterated Prisoner's Dilemma Intelligent Agent Technology pp. 94-98 1999. [12] Carlsson, B. and Johansson, S. "An Iterated Hawk-and-Dove Game." In W. Wobcke, M. Pagnucco, C. Zhang ed. Agents and Multi-Agent Systems Lecture Notes in Artificial Intelligence 1441 p. 179—192, Springer-Verlag, 1998. [13] Carlsson, B., Johansson, S. and Boman, M., Generous and Greedy Strategies Proceedings of the Congress on Complex Systems, Sydney 1998. [14] Carlsson, B. and Jonsson, K.I., The fate of generous and greedy strategies in the iterated Prisoner's Dilemma and the Chicken Game under noisy conditions. Manuscript 2000. [15] Durfee, E.H., Practically Coordinating AI Magazine 20 (1) pp. 99-116, 1999. [16] Fudenberg, D. and Maskin, E., Evolution and cooperation in noisy repeated games, American Economic Review 80 pp. 274-279, 1990.
200
B.
Carlsson
[17] Goldberg, D. Genetic Algorithms Addison-Wesley, Reading, MA, 1989. [18] Holland, J.H. Adaptation in natural and artificial systems MIT Press, Cambridge, MA, 1975. [19] Jennings, N. and Wooldridge, M., "Applying Agent Technology" in Applied Artificial Intelligence, vol.9 NoA p 357-369 1995. [20] Koza, J. R. Genetic Programming On the Programming of Computers by Means of Natural Selection. The MIT press, Cambridge, MA, 1992. [21] Lindgren, K., Evolutionary Dynamics in Game-Theoretic Models in The Economy as an Evolving Complex System II (Arthur, Durlauf and Lane eds. Santa Fe Institute Studies in the Sciences of Complexity, Vol XXVII) AddisonWesley, 1997. [22] Lipman, B.L., Cooperation among egoists in Prisoner's Dilemma and Chicken Game. Public Choice 51, pp. 315-331, 1986. [23] Lomborg, B., Game theory vs. Multiple Agents: The Iterated Prisoner's Dilemma, in Artificial Social Systems (C. Castelfranchi and E. Werner eds. Lecture Notes in Artificial Intelligence 830) 1994. [24] Mathieu, P., and Delahaye, J.P., http:/www.lifl.fr/~mathieu/ipd/ [25] Maynard Smith, J. and Price, G.R., The logic of animal conflict, Nature vol. 246, 1973. [26] Maynard Smith, J., Evolution and the theory of games, Cambridge University Press, Cambridge 1982. [27] Molander, P., The optimal level of generosity in a selfish, uncertain environment, J. Conflict resolution 29 pp. 611-618 1985. [28] Rapoport, A. and Chammah, A.M., Prisoner's Dilemma A Study in Conflict and Cooperation Ann Arbor, The University of Michigan Press 1965. [29] Rapoport, A.and Guyer, M. A taxonomy of 2 x 2 games Yearbook of the Society for General Systems Research, XL203-214, 1966. [30] Rosenschein, J. and Zlotkin, G., Rules of Encounter, MIT Press, Cambridge, MA, 1994. [31] Russell, B., Common Sense and Nuclear Warfare Simon & Schuster 1959. [32] Selten, R., Reexamination of the perfectness concept for equilibrium points in extensive games. International Journal of Game theory, 4:25-55, 1975.
Chapter 8
Training Intelligent Agents Using Human Data Collected on the Internet
Elizabeth Sklar Department of Computer Science Boston College, USA Alan D . Blair Department of Computer Science and Software Engineering University of Melbourne, Australia Jordan B. Pollack DEMO Lab Department of Computer Science Brandeis University, USA
8.1
Introduction
Hidden inside every mouse click and every key stroke is valuable information that can be tapped, to reveal something of the human who entered each action. On the Internet, these inputs are called clickstream data, "derived from a user's navigational choices expressed during the course of visiting a World Wide Web site or other online area." [6] Clickstream data can be analyzed in two ways: individually, as input from single users, or collectively, as input from groups of users. Individualized input may be utilized to create user profiles that can guide activities dn a web site tailored to the needs of a particular person. Data mining 201
202
E. Sklar, A. D. Blair and J. B. Pollack
the clickstream to customize to individual users is nothing new. Starting as early as 1969, Teitelman began working on the a u t o m a t i c error correction facility t h a t grew into DWIM (Do W h a t I Mean) [24]. In 1991, Allen Cypher demonstrated "Eager", an agent t h a t learned to recognise repetitive tasks in an email application and offered to j u m p in and take over for the user [7]. In 1994, P a t t i e Maes used machine learning techniques to train agents to help with email, to filter news messages and to recommend entertainment, gradually gaining confidence at predicting what a user wants to do next [12]. Today, commercial products like MicrosoftWord provide contextsensitive "wizards" t h a t observe their users and pop u p to assist with current tasks. Internet sites like altavista (http://www.aitauista.com) recognise keywords in search requests, offering alternate suggestions to help users hone in on desired information. At the amazon.com (http://www.amazon.com) book store, after finding one title, other books are recommended to users who might be interested in alternate or follow-up reading. On m a n y sites, advertisements which at first seem benign, slowly a d a p t their content to the user's input, subtly wooing unsuspecting surfers. Input from users m a y also be examined collectively and grouped t o illuminate trends in h u m a n behavior. Users can be clustered, based on a feature like age or gender or win rate (of a game), and the behavioral d a t a for all h u m a n s exhibiting the same feature value can be grouped and analyzed, in an a t t e m p t to recognize characteristics of different user groups. An Internet system allows us to combine user profile knowledge with statistics on group behavior (from a potentially very large set of humans) in order to make more informed decisions about software a d a p t a t i o n t h a n input from a single source would provide. These techniques may prove especially useful when applied to educational software. T h e work presented here examines these ideas in the context of an Internet learning community where h u m a n s and software agents play games against each other.
Training Intelligent
8.2
Agents Using Human Data Collected on the Internet
203
Motivation
Many believe t h a t the secret to education is motivating the student. Researchers in h u m a n learning have been trying to identify the elements of electronic environments t h a t work to captivate young learners. In 1991, Elliot Soloway wrote "Oh, if kids were only as motivated in school as they are in playing Nintendo." [23] T w o years later, Herb Brody wrote: "Children assimilate information and acquire skills with astonishing speed when playing video games. Although much of this gain is of dubious value, the phenomenon suggests a potent m e d i u m for learning more practical things." [5] T h o m a s Malone is probably the most frequently referenced author on the topic of motivation in educational games. In the late 1970's and early 1980's, he conducted comprehensive experimental research to identify elements of educational games t h a t m a d e t h e m intrinsically motivating [13]. He highlighted three characteristics: challenge, fantasy and curiosity. We are primarily interested in the first characteristic. Challenge involves games having an obvious goal and an uncertain outcome. Malone recommends t h a t goals be "personally meaningful" , reaching beyond simple demonstration of a certain skill; instead, goals should be intrinsically practical or creative. He emphasizes t h a t achieving the goal should not be guaranteed and suggests several elements t h a t can help provide this uncertainty: variable difficulty level, multiple goal levels, hidden information, randomness. He states t h a t "involvement of other people, both cooperatively and competitively, can also be an i m p o r t a n t way of making computer-based learning more fun." [14] We concentrate on multi-player games, particularly on the Internet because it is widely accessible. T h e Internet offers the additional advantage t h a t participants can be anonymous. Indeed, participants do not even have to be h u m a n — they can be software agents. We take a population-based approach to agency [15]. Rather t h a n building one complex agent t h a t can play a game using m a n y different strategies, we create a population of simple software agents, each exhibiting single strategies. T h e notion of training agents to play games has been around since at least the 1950's, beginning with checkers [20] and chess [21; 4], and branching out to include backgammon [3; 25; 16], tic-tac-toe [l], Prisoner's Dilemma [2; 9] and the game of tag [18]. W i t h these efforts, the goal was to build a champion agent capable of defeating all of its opponents.
204
E. Sklar, A. D. Blair and J. B. Pollack
Our work differs because our goal is to produce a population of agents exhibiting a range of behaviors that can challenge human learners at a variety of skill levels. Rather than trying to engineer sets of strategies associated with specific ability levels or to adapt to individual players, we observe the performance of humans interacting in our system and use these data to seed the population of agents. This chapter describes our efforts training agents in two domains: one is a video game and the other is an educational game. In both cases, the agents were trained using human data gathered on our web site. We use this data both individually and collectively. With the individual, or one-toone, method, we use input from one human to train a single agent. With the collective, or many-to-one, approach, we use input from a group of humans to train a single agent. The first major section of the chapter details the video game domain, outlining the agent architecture, the specifics of the training algorithm and experimental results. The second major section provides similar discussion of the educational game and additionally compares the results obtained in the two domains. Finally, we summarize our conclusions and highlight future directions.
8.3
The first domain: Tron
Tron is a video game which became popular in the 1980's, after the release of the Disney film with the same name. In Tron, two futuristic motorcycles run at constant speed, making right angle turns and leaving solid wall trails behind them — until one crashes into a wall and dies. In earlier work led by PabloFunes[8], we built a Java version of the Tron game and released it on the Internet (http://www.demo.cs.brandeis.edu/tron) (illustrated in Figure 8.1). Human visitors play against an evolving population of intelligent agents, controlled by genetic programs (GP) [ll]. During the first 30 months online (beginning in September 1997), the Tron system collected data on over 200,000 games played by over 4000 humans and 3000 agents. In our version of Tron, the motorcycles are abstracted and are represented only by their trails. Two players — one human and one software agent — each control a motorcycle, starting near the middle of the screen and heading in the same direction. The players may move past the edges of the screen and re-appear on the opposite side in a wrap-around, or toroidal, game arena. The size of the arena is 256 x 256 pixels. The agents are pro-
Training Intelligent
Agents Using Human Data Collected on the Internet
Hli
'. v-r
Trbn
1
j
•j^^QH^I
Fig. 8.1
205
•
The game of Tron.
vided with 8 simple sensors with which to perceive their environment (see Figure 8.2). The game runs in simulated real-time (i.e., play is regulated by synchronised time steps), where each player selects moves: left, right or straight.
Fig. 8.2
Agent sensors.
Each sensor evaluates the distance in pixels from the current position to the nearest obstacle in one direction, and returns a maximum value of 1.0 for an immediate obstacle (i.e., a wall in an adjacent pixel), a lower number for an obstacle further away, and 0.0 when there are no walls in sight.
Our general performance measure is the win r a t e , calculated as the number of games won divided by the number of games played. The overall win rate of the agent population has increased from 28% at the beginning of our experiment (September 1997) to nearly 80%, as shown in Figure 8.3(a).
206
E. Sklar, A. D. Blair and J. B. Pollack
During this time, the number of human participants has increased. Figure 8.3(b) illustrates the distribution of performances within the human population, grouped by (human) win rate. While some segments of the population grow a bit faster than others, overall the site has maintained a mix of human performances. Tion g u i e t wanJlnal {tamplhg r>t»=10aa)
(a) Agent win rate. Fig. 8.3
(b) Distribution of human population.
Results from the Internet experiment.
The data collected on the Internet site consists of these win rate results as well as the content of each game (referred to as the moves string). This includes the length of the game (i.e., number of time steps) and, for every turn made by either player, the global direction of the turn (i.e., north, south, east or west) and the time step in which the turn was made. 8.3.1
Agent Training and
Control
We trained agents to play Tron, with the goal of approximating the behaviour of the human population in the population of trained agents. The training procedure uses supervised learning[17; 26], as follows. We designate a player to be the trainer and select a sequence of games (i.e., moves strings) that were played by that player, against a series of opponents, and we replay these games. After each time step, play is suspended and the sensors of the trainer are evaluated. These values are fed to a third player, the trainee (the agent being trained), who makes a prediction of which move the trainer will make next. The move predicted by the trainee is then compared to the move made by the trainer, and the trainee's control mechanism is adjusted accordingly.
Training Intelligent
Agents Using Human Data Collected on the Internet
207
The trained agents are controlled by a feed-forward neural network (see Figure 8.4). We adjust the networks during training using the backpropagation algorithm [19] with Hinton's cross-entropy cost function [10]. The results presented here were obtained with momentum = 0.9 and learning jrate = 0.0002. Fig. 8.4
Q Q ^ O ^ ^ \ ^ J>=\ r-\ (_) )>N ^~">»- />>
O
o o
tanh
{^J
\^J
Straight
/~\ ^S-^^ /~\ . . (~) output nodes hidden nodes
input nodes
8.3.2
sigmoid
Agent control architecture.
Each agent is controlled by a feed-forward neural network with 8 input units (one for each °f the sensors in Figure 8.2), 5 hidden units and 3 output units — representing each of the three possible actions (left, right, straight); the one with the . largest value is selected as the action for the agent.
Challenges
The supervised learning method described above is designed to minimize the classification error of each move (i.e., choosing left, right or straight). However, a player will typically go straight for 98% of time steps, so there is a danger that a trainee will minimize this error simply by choosing this option 100% of the time; and indeed, this behaviour is exactly what we observed in many of our experiments. Such a player will necessarily die after 256 time steps (see Figure 8.5). Conversely, if turns are emphasized too heavily, a player will turn all the time and die even faster (Figure 8.5b). The discrepancy between minimizing move classification error and playing a good game has been noted in other domains [25] and is particularly pronounced in Tron. Every left or right turn is generally preceded by a succession of straight moves and there is a natural tendency for the straight moves to drown out the turn, since they will typically occur close together in sensor space. In order to address this problem, we settled on an evaluation strategy based on the frequency of each type of move. During training, we construct a table (table 8.1) that tallies the number of times the trainer
208
E. Sklar, A. D. Blair and J. B. Pollack
and trainee turn, and then emphasize turns proportionally, based on these values.
(a) a trainee that makes no turns
(b) a trainee that only makes turns
Fig. 8.5
(c) a trainee that learns to turn
(d) the trainer
A comparison of different trainees.
All had the same trainer; trainee variations include using 12-input network and different move evaluation strategies. All games are played against the same GP opponent. The player of interest is represented by the solid black line and starts on the left.
Table 8.1
Frequency of moves, for the best human trainer.
trainer
8.3.3
Experiments
left straight right
and
left 852 5723 123
trainee straight 5360 658290 4668
right 161 5150 868
Results
We trained three populations of players: one with GP trainers and two with human trainers. Although our goal is to approximate the behaviour of the human population, we initially tuned our training algorithm by training agents to emulate the behaviour of the GP players from the Internet site. These GPs are deterministic players (so their behaviour is easier to predict than humans'), thus providing a natural first step toward our goal. Separate training and evaluation sets were compiled for both train-
Training Intelligent
Agents
Using Human Data Collected on the Internet
209
ing efforts, as detailed in Figure 8.6. There were 69 GPs who had played more than 1000 games on the Internet site (agentslOOO); these were used as trainers. There were 135 GPs who had played between 100 and 1000 games (agentslOO); these were used for evaluation purposes. There were 58 humans who had played more than 500 games on the Internet site (humans500); these were used as human trainers. data for GP trainees training set evaluation set
agentslOOO agentslOOO vs vs agentslOOO agentslOO
data for human trainees Internet data humans500 vs GPs training set
evaluation set
humans500 = humans > 500 Internet games (58 humans) agentslOO = GPs < 1000 Internet games, and > 100 Internet games (135 agents) agentslOOO = GPs > 1000 Internet games (69 agents)
evaluation set n= agentsl 00
Fig. 8.6
D a t a sets for training and evaluation.
The humans500 data set was used both individually and collectively. First, 58 individual trainees were produced, based on a one-to-one correspondance between trainers and trainees. Second, 10 collective trainees were produced, based on a many-to-one correspondance between trainers and trainees, where the 58 individuals were sorted into 10 groups based on their win rates (e.g., group 1 had 0-10% win rate, group 2 had 10-20% win rate, etc.). Each GP trainer played against agentslOOO to produce a training set and against agentslOO to produce an evaluation set. The games played by humans500 were alternately placed into training and evaluation sets, and then the evaluation set was culled so that it consisted entirely of games played against members of the agentslOO group. We examine our training efforts in two ways. First, we look directly at the training runs and show the improvement of the networks during training. Second, we present the win rates of the two populations of trainees, obtained from playing them against a fixed set of opponents, and compare trainers with their trainees. Our measure of improvement during training is based on the frequency of moves table and how it changes. Referring back to table 8.1, if the trainee were a perfect clone of its trainer, then all values outside the diagonal would be 0 and the correlation coefficient between the two players would be 1. In reality, the GP trainees reach a correlation of approximately 0.5, while
210
E. Sklar, A. D. Blair and J. B. Pollack
the human trainees peak at around 0.14. For comparison, we computed correlation coefficients for 127 random players, i.e., players that choose a move randomly at each time step, resulting in a much smaller correlation of 0.003. Figure 8.7 shows the change in correlation coefficient during training for selected trainees.
number of training cycles
(a) GPS
Fig. 8.7
x1n
e
number ot training cycles
(b) humans (one-to-one)
Change in correlation coefficient during training runs.
In the GP experiment, the best trainer gave rise to the worst trainee, hence the label for the figure on the left "best trainer and worst trainee" refer to the same player. In the human one-to-one experiment, the best trainer produced the best trainee, hence the label for the figure on the right "best trainer and best trainee" refer to the same player. The terms "best" and "worst" refer to the win rates of the players as measured in games played against the evaluation set (see Figures 8.8(b) and 8.8(d)). The win rates in the evaluation games for the trainers and trainees are shown in Figure 8.8, for each of three training efforts: (1) G P training (Figures 8.8a and 8.8b), (2) human one-to-one training (Figures 8.8c and 8.8d), and (3) human many-to-one training (Figures 8.8e and 8.8f). There are two types of plots shown. The first column contains the first type of plot (for each training group, Figures 8.8a, 8.8c and 8.8e). Here, the players are sorted within each population according to their win rate, so the ordering of individuals is different within each trainer and trainee population. The plot demonstrates that the controllers have learned to play Tron at a variety of different levels. The second column contains the second type of plot (for each training group, Figures 8.8b, 8.8d and 8.8f). Here, we plot the win rate of individual
Training Intelligent
Agents Using Human Data Collected on the Internet
211
trainees against the win rate of their corresponding trainers. It is interesting to notice that the best human trainer (from Figure 8.8d) has given rise tothe best trainee (see Figures 8.9a and 8.9b), while the best GP trainer (from Figure 8.8b) has produced the worst trainee (see Figures 8.9c and 8.9d). A few of the trainees play very poorly. These are cases where the network either fails to make any turns or makes turns at every move (in spite of the strategy described in section 8.3.2). Also, in a number of cases, the trainee outperforms its trainer.
212
E. Sklar, A. D. Blair and J. B. Pollack
~
50
|
40
-JP-
*
P\ 10
20
30
individual players, in sorted order by win rats
(a)
GPS
40 SO 60 70 win rate of trainers (%)
x.
( b ) GPS
win rates of human population
100 90
90
80
^"°°.
50 of trainees (%)
70
^ ^
gw
oooo°° ^OOO
1 50 oo° 5
40
o°
oo
c
nna DD
QO
5
30
„o°°
•
•
*g 40
ann°
DD D
20
.
30 20
• *
ooo
o •>
10 O
original trainees
"
10
sBDnooDoofmBDeH 10
20
30
individual players, In sorted order by win rate
(c) humans: one-to-one
40 50 60 70 win rate of trainers (%)
(d) humans:
80
90
100
90
100
one-to-one
composite human trainers and trainees
win rates of composite human population
90 80 H
°
M
o
£ 70
•
1
B0