Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen
3 Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Singapore Tokyo
13
Series Editors Gerhard GOOS,Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands
Volume Editor Stefan Covaci GMD Fokus German National Research Center for Information Technology Research Institute for Open Communications Systems Kaiserin-Augusta-Allee 3 1, D-10589 Berlin, Germany E-mail:
[email protected] Cataloging-in-Publication data applied for
Die Deutsche Bibliothek - CIP-Einheitsaufnahme Active network :first international workshop ;proceedings I IWAN '99, Berlin, Germany. June 30 -July 2, 1999. Stefan Covaci (ed.). Berlin ;Heidelberg ; New York ;Barcelona ;Hong Kong ;London ;Milan ;Paris ;Singapore ;Tokyo : Springer, 1999 (Lecture notes in computer science ; Vol. 1653) ISBN 3-540-66238-3
-
CR Subject Classification (1998): C.2, K.6, K.4, K.5, D.2 ISSN 0302-9743 ISBN 3-540-66238-3 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations m e liable for prosecution under the German Copyright Law. 0 Springer-Verlag Berlin Heidelberg 1999 Printed in Germany Typesetting: Camera-ready by author SPIN: 10703977 0613142 - 5 4 3 2 1 0
Printed on acid-free paper
The First International Working Conference on Active Networks — IWAN ’99 The network model is changing from the traditional s„ tore-forward“ towards the s„ tore-compute-forward“ one. We are moving from a passive network, where the level of abstraction is the protocol (service model), towards an active network where the level of abstraction is raised to APIs (programming model) for programming the new network resources (communication, processing and storage). The network becomes a distributed computer capable of routing at gigabit and terrabit per second speeds and hosting several programming environments. This is going to change the whole network service and application design paradigm, enabling a new generation of network-aware software following the model of c„ ompute while travelling.“ For example, packets will carry their custom network service which will be computed in the active nodes. The implications of the active network infrastructure go beyond the technical issues and also address the way business will be created and managed in the future networked environment. Service creation, deployment and operations will no longer be the sole business of the owners and manufacturers of the infrastructure, but will become a customized business shared with the users of this infrastructure, too. In this sense the Active Network Operators will be adopting a new outsourcing-based business model in order to be capable of hosting an increased number of networkaware applications and also to manage an applications-aware network. The 1st International Working Conference on Active Networks – IWAN’99 has set itself the goal of pulling together the main streams of activities in the area of Active Networks in order to strengthen synergies and create a common view of the domain and its most difficult problems. Although no widely agreed taxonomy exists to this end, the contributions have demonstrated that a general architecture will include several dimensions such as active code distribution, active code execution (EE-execution environment, programming language, safety), active code communication (cooperation), active node resource control (NodeOS) and active network security. Each of these subarchitectures is component-based and includes a management component that could offer an API to be used by applications. Important aspects like autonomy (degree of self-management) and asynchronicity start to be addressed in the context of scalability and fault tolerance, and technologies such as mobile agents and CORBA are the first ones to provide the right support. Another important aspect is the integration of active networks with the legacy networks as well as the interoperability at the level of active networks. Issues related to active network architecture are well represented in the program and this is reflected in 9 papers in this volume. The challenge in implementing such architectures is to find the right balance between flexibility, usability, security, robustness, and performance. Solutions to this problem are assessed by a number of prototype platforms and results are presented by 8 papers in this volume. As one of the main advantages of active networks is based upon the rapid introduction of customized network services and applications, we continue to see a growing number of examples primarilyrelated to the Internet. The next generation of Internet will provide its users with dynamically managed QoS, multicast and security
VIII
Preface
and could rely on active networking as an ideal infrastructure for such implementations. Most of the papers about applications (14) relate to infrastructure owner applications – management, control — but some also address the class of customer - owned applications. The papers also give a clear indication that service and application creation paradigms and supporting methods and tools are going to come into focus once the field matures. The proof that active networks are gaining momentum is apparent considering the large number and high quality of papers submitted to this first international working conference – some 80 submissions of which the best 30 have been accepted and published in this volume. This book provides a unique state-of-the-art account of architectural approaches, technologies and prototype systems that will impact the way future networked businesses will be created and managed. It is unique not only in that it reflects all relevant achievements to date from every continent, but also because via its cooperative preparation, it has led to a truly Active Network of authoritative persons in the field. I hope you will benefit from reading it. June 1999
Stefan Covaci IWAN9’ 9 Program Chair
Acknowledgements This volume resulted from the papers accepted to be presented at the 1st International Working Conference on Active Networks – IWAN9’ 9. First, we would like to thank the authors for providing their material and in addition would like to acknowledge the effort of the many other authors whose submitted papers were not accepted. We would like to thank the members of the Technical Program Committee, listed below, for their quality reviews and useful suggestions regarding the technical contents and organization of the book. We are also indebted to our own reviewers, namely Hui Guo, Bahrat Bushan, and Ascan Morlang. Special thanks go to Dr. Eckhard Moeller and Ms. Cynthia Hardey for their continuous help in coordinating the review process and for their support in preparing and organizing this book. Our general chairs, Professor Dr. h. c. Radu Popescu-Zeletin and Mr. Masanori Kataoka, and our sponsors, Hitachi Ltd., European Commission - ACTS Programme, OMG, IBM Zurich Research Lab, Deutsche Telekom Technologiezentrum, IKV++, were a continuous source of help and encouragement. General Chair R. Popescu-Zeletin, Technical University of Berlin, Germany General Co-Chair M. Kataoka, Hitachi Ltd, Japan Technical Program Committee Chair: S. Covaci, GMD FOKUS, Germany Vice Chair: A. Lazar, Columbia University, USA Vice Chair: H. Yasuda, The University of Tokyo, Japan M. Bonatti, ITALTEL, Italy I. Busse, GMD FOKUS, Germany M. Campolargo, European Commission, Belgium H. Dibold, Deutsche Telekom, Germany P Dini, CRIM, Canada A. Gavras, Sprint, USA M. Gien, Sun Microsystems, France G. Le Lann, INRIA, France T. Magedanz, IKV++, Germany W. Marcus, Bellcore, USA I. Marshall, British Telecom, United Kingdom G. Minden, The University of Kansas, USA H. Miyahara, Osaka University, Japan E. Moeller, GMD FOKUS, Germany K. Nakane, Hitachi Ltd, Japan F. Schulz, Deutsche Telekom, Germany J. Smith, University of Pennsylvania, USA
VIII R. L. A. S. A.
Acknowledgements
Soley, OMG, USA Svobodova, IBM, Switzerland Tantawi, IBM, USA Weinstein, NEC, USA Wolisz, Technical University of Berlin, Germany
Steering Committee A. E. O. D.
Casaca, INESC, Portugal Raubold, Deutsche Telekom, Germany Spaniol, RWTH, Germany Tennenhouse, MIT, USA
Organizing Committee Local Arrangements: C. Hardey Publicity: B. Intelmann
IWAN'99 Message from the Conference General Chairs Internet technologies have begun to affect our lives and work. We are still talking about network protocols, reservation mechanisms, performance, and applications. It is time, based on the experience we have and the lessons we are learning from the large deployment of the Internet, to consider new research and development directions for the future. One of the most attractive areas is that of ACTIVE NETWORKS. Welcome to the first International Working Conference on Active Networks I‘ WAN 99’ in Berlin, a city which, similar to this new domain, is in the process of developing its structure and image for the future. We hope that this first workshop will become a forum for the exchange of ideas and results in this domain on an international level. Already a network of R&D activities worldwide in Active Networks can be identified. Most of them are presented at this workshop. The positive answers we received during the organization of the workshop from authors, industry, research groups from all over the world guarantee that this event will continue and grow in the future. We are sure that you will find the program stimulating and that you will take advantage of the opportunity to meet your colleagues from around the world. The success of the symposium depends on the dedication and contribution of many volunteers, committee members, authors, reviewers, invited speakers, and sponsors. Our personal thanks go to all of them and to you who, we are confident, will make IWAN 9‘ 9 a success. Prof. Dr. Dr. h.c. Radu Popescu-Zeletin Technical University of Berlin, Germany
Masanori Kataoka Hitachi Ltd, Japan
Table of Contents Architectures The Architecture of ALIEN D. Scott Alexander, Jonathan M. Smith
1
Designing Interfaces for Open Programmable Routers Spyros Denazis, Kazuho Miki, John Vicente, Andrew Campbell
13
Rcane: A Resource Controlled Framework for Active Network Services Paul Menage
25
The Protean Programmable Network Architecture: Design and Initial Experience Raghupathy Sivakumar, Narayanan Venkitaraman, Vaduvur Bharghavan
37
A Dynamic Pricing Framework to Support a Scalable, Usage-Based Charging Model for Packet-Switched Networks Mike Rizzo, Bob Briscoe, Jérôme Tassel, Konstantinos Damianakis
48
Active Information Networks and XML Ian Marshall, Mike Fry, Luis Velasco, Atanu Ghosh
60
Policy Specification for Programmable Networks Morris Sloman, Emil Lupu
73
A Self-Configuring Data Caching Architecture Based on Active Networking Techniques Gaëtan Vanet , Yoshiaki Kiriha Interference and Communications among Active Network Applications Luca Delgrossi, Giuseppe Di Fatta, Domenico Ferrari, Giuseppe Lo Re
85 97
Platforms The Grasshopper Mobile Agent Platform Enabling Short-Term Active Broadband Intelligent Network Implementation C. Bäumer, T. Magedanz
109
XII
Table of Contents
LARA: A Prototype System for Supporting High Performance Active Networking R. Cardoe, J. Finney, A.C. Scott , W.D. Shepherd
117
A Programming Interface for Supporting IP Traffic Processing Ariel Cohen, Sampath Rangarajan
132
New Generation of Control Planes in Emerging Data Networks Nelu Mihai, George Vanecek
144
An Active Networks Overlay Network (ANON) Christian Tschudin
156
Autonomy and Decentralization in Active Networks: A Case Study for Mobile Agents Ingo Busse, Stefan Covaci, André Leichsenring
165
Towards Active Hardware David C. Lee, Mark T. Jones, Scott F. Midkiff, Peter M. Athanas
180
The Impact of AN on Established Network Operators Arto Juhola, Ian Marshall, Stefan Covaci, Thomas Velte, Seppo Parkkila, Mike Donohoe
188
Active Management and Control Using Active Processes as the Basis for an Integrated Distributed Network Management Architecture Dominic P. A. Greenwood, Damianos Gavalas
199
ANMAC : An Architectural Framework for Network Management and Control Using Active Networks Samphel Norden, Kenneth F. Wong
212
An Active Network Approach to Efficient Network Management Danny Raz, Yuval Shavitt
220
Virtual Networks for Customizable Traffic Treatments Jens-Peter Redlich, Masa Suzuki, Steve Weinstein
232
Table of Contents
XIII
Flexible Network Management Using Active Network Framework Kiminori Sugauchi, Satoshi Miyazaki, Kenichi Yoshida, Keiichi Nakane, Stefan Covaci, Tianning Zhang
241
Managing Spawned Virtual Networks Andrew T. Campbell, John Vicente, Daniel A. Villela,
249
Active Organisations for Routing Steven Willmott, Boi Faltings
262
A Dynamic Interdomain Communication Path Setup in Active Network Jyh-haw Yeh, Randy Chow, Richard Newman
274
Active Network Challenges to TMN Bharat Bhushan, Jane Hall
285
Survivability of Active Networking Services Amit Kulkarni, Gary Minden, Victor Frost, Joseph Evans
299
Security A Secure Plan Michael Hicks, Angelos D. Keromytis
307
Control on Demand Gísli Hjálmtýsson, Samrat Bhattacharjee
315
Agent Based Security for the Active Network Infrastructure Stamatis Karnouskos, Ingo Busse, Stefan Covaci
330
Author Index
345
The Architecture of alien D. Scott Alexander1 and Jonathan M. Smith2, 1
2
Bell Labs, Lucent Technologies, Murray Hill, NJ 07974, USA
[email protected] University of Pennsylvania, Department of CIS, Philadelphia, PA 19104
[email protected] Abstract. The alien architecture exposes all node-resident features to modification by a module loader, with the exception of the loader itself. As a structuring principle, alien divides its loadable portions into a privileged loader-initiated Core Switchlet and an unprivileged collection of libraries which use the Core Switchlet and are loaded by it. The loader, Core Switchlet and libraries comprise the network-resident functionality of alien. We make three claims. First, by dint of a library for Active Packets written in Caml, alien is the first system to support both Active Packets and Active Extensions. Second, by use of language features such as module thinning, alien can provide multiuser security within a single address space. Third, by isolating only a small set of functions with privilege, the system achieves security, flexibility and good performance, with a measured throughput of about 60 Mbps when used for LAN bridging.
1
Introduction
There are a variety of conceptual models or “visions” of what “Active Networking” means and what it could be. These are centered around the programming model. We believe that the distinction between the “programmable switch” (active extension) and “capsule” (active packet) models outlined in [1] is a distraction rather than a central question. The real question is what the programmer of the active extension or active packet can expect from the active network. This question resembles the debate and exploration of what should be resident in an operating system. One model from early computing history was that of the “I/O Control System,” loaded with the compiler in the card deck preceding the program itself. The active network analogy would be active packets which carry everything they need to be “activated” along with them, in some form recognized by the universal “computing machine.” The other historical model was the operating system — a universally available, but privileged “kernel” of resource management services enhancing the raw machine, coupled with standard compilers and run time environments. The active network analogy in this case would be the set of services available in network elements as well as any
This work was supported by DARPA under Contract #N66001-96-C-852.
Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 1–13, 1999. c Springer-Verlag Berlin Heidelberg 1999
2
D. Scott Alexander and Jonathan M. Smith
structuring principles use to organize them. The tension between the two historical approaches continues, as there has been a proliferation of micro-, nanoand exo-kernels which place less and less behind a privilege boundary to provide greater flexibility to unprivileged applications. The analogy in active networking is the tension between the programmer’s freedom, which when maximized offers great flexibility, and the services available at a network element, such as concurrent multiprocessing, which add functionality in exchange for reducing certain freedoms. We believe that we have made significant progress towards showing what can be done to balance concerns of flexibility, usability, security and performance in an active network element. Sect. 2 discusses the design of the alien architecture for active networking. Sect. 3 suggests reasons why many security and performance issues can be addressed by an appropriate programming language. Sections 4, 5 and 6 detail the three layers of the alien architecture and Sect. 7 explains how choices are made in locating functionality. Sect. 8 briefly addresses performance issues stemming from the use of byte-coded languages. Sect. 9 makes our point that the active extensions vs. active packets distinction is of secondary concern by describing implementations of each within alien, and Sect. 10 summarizes our contributions and outlines challenges for the future.
2
The Design of alien
In designing alien, our goal was a system which would allow experimentation and prototyping to test Active Networks ideas. In particular, we wanted to be able to implement experiments built on either active extensions or active packets. At the same time, we felt that it was important that alien have sufficient performance to allow a realistic understanding of the compromises of various designs. An important element of our design is the choice of a computing model. We have chosen to use a full Turing machine model for computation by picking a Turing equivalent language. We then provide the ability to control which shared resources are available to which active program. This allows us to build security into the system. One of the elements of the alien design intended to ensure performance was our choice to use a single address space for alien. This has security implications that we will discuss below, but allows processes to communicate very quickly using a shared memory model. Additionally, we did not feel that it was realistic to expect a hardware router to contain a memory management unit. This affected our choice of language as we will describe in Sect. 3. Another goal was to make the system flexible. This means that it should be possible to change the system at runtime to the greatest reasonable extent. To achieve this, we used a layered design with only a small kernel of unloadable functionality. Similarly, for security reasons, we attempted to minimize the size of the trusted code. The combination of these considerations lead to a three layer design as shown in Fig. 1.
The Architecture of alien
unprivileged loadable privileged loadable privileged non-loadable
3
libraries Core Switchlet Loader runtime (Caml) OS (Linux)
Fig. 1. alien layers
The lowest layer is the alien loader which is invoked to start the system. This is described in Sect. 4. The loader loads in the Core Switchlet as described in Sect. 5. Both of these elements are privileged. Finally, the Core Switchlet loads in various libraries (Sect. 6) which are unprivileged, but provide “expected” services.
3
The Choice of a Language
The tradeoffs between security and performance are critical in the choice of a language. If one were working in a completely trusted environment and failures due to programming errors were not a concern, any modern programming language that was capable of dynamically loading programs would be sufficient. Our need for security in an environment where resources must be shared by competing interests leads to the need for more restrictions on our choice of programming language. When combined with our desire to run in a single address space, we exclude the ability to use any language that allows manipulation of pointers. Additionally, because different principals may have different access to the system, we need to be able to control the view of the system granted to each principal. This lead us to identify six characteristics that we feel are useful to build alien: 1. 2. 3. 4. 5. 6.
strong typing, garbage collection, module thinning, dynamic loading, homogeneous representation of active programs, and performance.
4
3.1
D. Scott Alexander and Jonathan M. Smith
Primary Considerations
Strong typing and garbage collection were chosen to aid security. Strong typing ensures that any readable memory location has a type and that memory locations can only be accessed by functions that operate on the appropriate type. Moreover, conversion functions are carefully regulated to ensure that these properties hold. For further discussion of why we chose strong typing, see [2]. Garbage collection supports strong typing. If an active program can explicitly deallocate memory using a function like free(), the chunk of memory freed loses its type, but the user can still access it. This becomes most important when the memory is reallocated. If the memory is allocated to a different active program, it now is possible for the first program to read (and possibly modify) the second’s data. Module thinning is a technique which allows us to maintain multiple interfaces for a single module. Thus, we might have a module which allows access to the filesystem. One interface would allow access to the entire set of functions normally offered by the filesystem. A second interface would offer access to the functions which allow one to read data, but thin out the functions which would allow modification of the filesystem. A third interface might allow access only to functions which would interpret and return information from the files in the filesystem without providing access to the files themselves. These three interfaces would be appropriate, for example, to a node administrator, to a program gathering status information, and to an application. Dynamic loading is clearly necessary. Our definition of active networking requires that it be possible to load functions while the system is running. Dynamic loading is a name for this ability. While it would be possible to represent active programs heterogenously [3], it adds considerable complexity to the system. For this reason, we have chosen to require that programs be represented in the same form regardless of the hardware present at any node that the program transits. In particular, this means that we will be transmitting an intermediate representation of the programs and that this will need to be translated or interpreted to be executed. Finally, performance is a concern. If alien ran too slowly, we could not have made useful conclusions about active networking systems. At the same time, the experimental nature of alien means that performance is one concern of many. 3.2
Secondary Considerations
In addition to the features in the previous section, we found threads and static typing to be useful characteristics that were available to us. Having a thread system allows us to more naturally structure the system. Each active program in alien consists of at least one thread. The thread scheduler is allowed to mediate access to the CPU. If a thread system was not present in the language we chose, we suspect that we would have ended up implementing one of our own. Static typing allows all types to be determined at compile time. Thus, the only type-related checks which occur at runtime are array bounds checks. Additionally, certain errors are caught at compile time instead of runtime. Since an
The Architecture of alien
5
active network can be a complex distributed system, this can be an aid to debugging. There can be a tradeoff in a system like alien though. Types must be checked at link time and when an object is unmarshaled. With dynamic loading, these are essentially runtime checks. The potential advantage is that these types are checked only once regardless of frequency of use. The potential disadvantage is that if a module is sent with unused functions or a data object is sent with unused values, dynamic typing would never have checked those types. 3.3
The Caml Programming Language
In our implementation of the alien architecture, we chose to use an existing language, Objective Caml [4], as it implements all of the properties identified to varying degrees. We will discuss it and some of the other choices that we considered and discarded in the following sections. Objective Caml is a language from the ML [5] family of languages. It is a strongly typed, garbage collected, functional language. The compilers provided can produce byte code for a wide variety of Unix variants as well as for Microsoft Windows 95. Additionally, native code compilers are provided for the Digital Alpha [6] and Intel x86 [7] amongst others. Both types of compilers use static type checking. Caml also has a dynamic loader which allows byte code to be loaded into a running byte code program. Since the byte code is machine independent, our active programs are composed of byte code files which may be shipped around the network without regard for the underlying machine architecture. Additionally, module thinning is provided. Currently, we use the ability to define one unrestricted interface for the internal components of our system and a second, restricted, interface for active programs that are loaded. This ability is discussed further in Sect. 5. Caml performs a set of checks when a byte code file is loaded. In particular, it checks to see that each interface required by the new byte code file matches an interface provided by the running system. To facilitate this check, the compiler stores an MD5 hash [8,9] of each interface that it compiles or compiles against. At link time, these hashes are compared.1 Because Caml is not designed as a network language, per se, the dynamic loader is designed to load byte code files from disk. This was an area in which an extension of the Caml library was required to make Caml suitable for our purposes. (See Sect. 9 for more details.) 3.4
Other Possible Languages
We considered other possibilities before choosing Caml. There are other languages which meet the requirements listed and alien could have been imple1
Obviously, the hashes can be trivially forged to attack this system. Rouaix [10] suggests using a well-known, certified compiler which digitally signs its output. A verifier such as is used in Java [11] would be another approach to this problem.
6
D. Scott Alexander and Jonathan M. Smith
mented in most if not all of them. Nonetheless, there were some factors which inclined us toward Caml and away from some other languages. Java [11] has been a popular language for implementing active networking systems. While Java meets our requirements, meeting our security requirements requires using the SecurityManager to implement a scheme similar to module thinning. This need along with the need to use native methods to implement some of the network access we require (i.e., access to Ethernet frames) destroys the “write once” advantage that Java enjoys for applets. Thus, we could have implemented alien in Java, but such an implementation did not have any clear technical advantage over Caml. Another approach would be to design a new language. This approach has been taken with PLAN [12] and Netscript [13]. The advantage to such an approach is that the language can be tailored to active networking. The difficulty is that if the designer leaves out a feature needed by a user, it can become difficult or impossible to implement desirable active programs in the new language. Nonetheless, it would have been possible to design a new language and use it to implement alien.
4
The alien Loader
The Loader provides the core of alien’s functionality. It provides the interface to the operating system (through the language runtime) plus some essential functions to allow system startup and loading of active programs. Thus, it defines the “view of the world” for the rest of alien. Moreover, since security involves interaction with either the external system or with other active programs, the Loader provides the basis of security. As its name implies, another role of the Loader is to load active programs into the system. Therefore, to simplify implementation of the architecture, we have made the Loader non-loadable. We also expect this to improve efficiency in some implementations. The other side of this decision, though, is that we attempt to make the Loader as small as possible because we would like to have components of alien replaceable (which means loadable) whenever practical. Also ameliorating this inflexibility of the Loader is the fact that it is often possible to overlay functionality in the Loader with a different interface or sometimes even a different mechanism at a higher level. The Loader provides mechanisms rather than policy; policies are implemented in the Core Switchlet and can be changed by changing it. In addition to the functionality provided by the language runtime, there are three areas in which added capabilities are needed, as shown in Table 1. The first of these areas is a set of startup functionality. This consists of performing any initializations needed by either the runtime or alien itself to bring alien to a stable state. The second area is active program loading. Dynamic loading of code is obviously crucial to an Active Network node. By placing this functionality in the Loader, we are able to make the Core Switchlet loadable. The third area is what we call the system console. This provides a way for the operator to provide
The Architecture of alien
7
commands to the system. This allows maintenance and diagnostic operations to be performed before the network is fully available or in the event of network failure.
Table 1. Loader functionality startup routines initialize system active program loading load active programs consistent with alien security system console console read loop
5
The Core Switchlet
Above the Loader is the Core Switchlet. It is responsible for providing the interface that active programs see. It relies upon the Loader for access to operating system resources and then layers additional mechanisms to add security, and often, utility. In providing an interface to active programs, it determines the security policies of the system. By including or excluding any function, it can determine what active programs can or cannot do. Additionally, it is loadable, so the administrator can change or upgrade its pieces as necessary. This can also allow changes in the security policy. The security policies of the Core Switchlet are enforced through a combination of module thinning and type safety. Type safety ensures that an active program can only access data or call functions that it can name. Module thinning assures that the system controls which data and functions the active program can name so that the security policy is enforced.
5.1
The Facilities of the Core Switchlet
In many ways, the interface that the Core Switchlet presents to active programs and libraries is like the system call interface that a kernel presents to applications. Through design of the interface the system can control access to underlying resources. With a well-designed interface, the caller can combine the functions provided to get useful work done. Table 2 shows the functionality provided by the Core Switchlet. Language Primitives, Operating System Access, and Thread Access The language primitives category encompasses those functions that one expects to find in a programming language such as +, boolean “and,” and simple type conversions. These functions are implemented in the runtime and thus are part of the Loader. However, the Core Switchlet is responsible for maintaining the policy regarding which functions are available to active programs.
8
D. Scott Alexander and Jonathan M. Smith
Table 2. Core Switchlet functionality language primitives operating system access network access thread access loading support message logging
policy for access to the basic functions of the language policy for access to the operating system calls policy and mechanism for access to the network policy for access to threads primitives policy and mechanism to support loading of active programs policy and mechanism for adding messages to the log file
Thus, for example, open in which opens a file for reading, is in the language primitives made available by the Loader, but the Core Switchlet might omit it if there was a policy forbidding active programs access to the disk. Similarly, operating system access functions are those which allow access to a system call. Thread access functions allow access to operations such as the creation or deletion of a thread, to mutual exclusion, and to condition variables. Again, these are implemented in the runtime, but the interface seen by active programs is thinned by the Core Switchlet in accordance with the system policies. Network Access Because we are implementing a network node, access to the network is particularly important. Generally this consists of allowing active programs to discover information about the interfaces on the machines and the attached networks, receive packets, and send packets. One element of this task which is particularly important is the demultiplexing of incoming packets. The Core Switchlet must be able to determine whether zero, one, or more than one active program is interested in an arriving packet. If more than one active program is interested in the packet, policy should dictate which active program or active programs receive a copy of the packet. Security is an important element of the decision as an active program should be able to be certain that it will get all packets that it should receive under the policy, and should not be able to get any packets that it should not receive under the policy. Without such security, denial-of-service attacks and information stealing are quite easy. Loading Support Loading support includes the loading functionality from the Loader with thinning appropriate to control which active programs may be loaded by whom. Additionally, the Core Switchlet adds mechanism for tracking which active programs have been loaded and what functions those active programs wish to make available to other active programs. This mechanism is important because it gives active programs a way to make use of functions found on a switch. In conjunction with a uniform naming scheme [14], it becomes possible for active programs moving through the network to make use of facilities that are present without failing if the facilities are not provided.
The Architecture of alien
9
Message Logging Message logging is a generalized facility which allows an active program to attempt to leave a message for human consumption. Because we expect policies limiting access to persistent storage to be common, we believe that it is important to provide such a facility. Our facility allows the active program to request that a string be logged. This simplicity is important because the facility may be implemented by appending the string to a file, by sending it to an output device such as a terminal, or by using a more powerful logging mechanism. A simple solution is easily mapped onto any of these means. Additionally, no guarantees are made about what will happen to the log message. If, for example, a policy exists limiting the number of messages per unit time produced by an active program, the Core Switchlet may silently discard messages after that limit has been reached.
6
The Library
The library is a set of functions which provide useful routines that do not require privilege to run. The proper set of functions for the library is a continuing area of research. Some of the things that are in the library for the experiments we have performed include utility functions and implementations of IP [15] and UDP [16].
7
Locating Functionality
When expanding our implementation, it is not always obvious in which layer the new functionality belongs. In this section, we present the principles we use to make this determination. Our first principle is that if the functionality can be implemented in a library, it should be. Said another way, if the functions exposed by the Core Switchlet or available from other libraries provide the infrastructure needed to implement the new functionality, a library is warranted. If the new functionality relies on some element of the runtime not made available to unprivileged code, then either the Loader or the Core Switchlet must be expanded. Because these elements define the common, expected interface available at the switch, we attempt to keep them small to minimize the required resources. Thus, our second principle is that we prefer to break off the smallest reasonable portion of the new functionality (consistent with security) that can be implemented in the privileged parts of the system. The remainder becomes a library. In our experience this also aids generality, as the privileged portion is often useful to other libraries developed later. For example, to implement IP, we built a small module inside the Core Switchlet which reads Ethernet frames from the operating system. It also demultiplexes the frames based on the Ethernet type field to increase generality. The remainder of IP, which processes headers, could then be made a non-privileged library. Our third principle is that if this privileged functionality sets policy, it needs to go into the Core Switchlet. As discussed above, policy must be set in the Core Switchlet so that the loading mechanism can be used as needed to change
10
D. Scott Alexander and Jonathan M. Smith
policy. Our final principle is any functionality that provides pure mechanism is placed in the Core Switchlet unless it is needed before the Core Switchlet can be running.
8
Performance and Byte Code
While we have followed the principles outlined in Sect. 7 closely in our implementation, we did find that the performance penalty was too great in one instance. As we describe in [2], implementing SHA-1 (a hash algorithm) in Caml was too slow, so we resorted to a C implementation. We also resort to C to extend the runtime (e.g., to add access to Ethernet frames). Any C extension in Caml appears as part of the runtime and thus is part of the Loader.
9
Active Programs
Of course, the goal of the infrastructure is to be able to use code which is not known ahead of time or is not generally used. Thus, the success of the infrastructure is based on how well it handles a active program or group of interacting active programs sent by a user. The next two sections describe how each type of active program works in alien. 9.1
Active Extensions
Active extensions can be loaded either from the local disk or over the network via the TFTP [17] protocol. TFTP and the underlying UDP [16] and IP [15] services are all implemented as active programs. When an active extension is loaded, Caml first checks to ensure that the interfaces that it requires are satisfied by the set of (thinned) interfaces that alien is willing to provide to this extension. Next, the interface exported by this module is added to the symbol table. The extension is also given the opportunity to register name to function mappings in a table; this allows a module in alien to make calls into extensions even if the callee was loaded after the caller. The Active Bridge [18] is an example of a system that we built using active extensions. It has demonstrated bridging throughput of 57 Mbps [2] when bridging two 100 Mbps Ethernets. Please see the references for further details. 9.2
Active Packets
For active packets to be processed in alien, a set of libraries must be loaded to receive and process the packets. Our active packets are ANEP [19] encapsulated (and currently UDP encapsulated for convenience) as shown in Fig. 2. Thus, the first of the libraries receives an ANEP packet and performs header processing including determination of the execution environment for the packet. Also, some authentication can occur at this level as described in [14].
The Architecture of alien
link layer header
ANEP header
code portion
data portion
11
function name
Fig. 2. An Active Packet
Assuming the active packet is to be executed in the alien environment, the next library is responsible for marshaling and unmarshaling of active packets. In alien, each active packet contains a code portion, a data portion, and a function name. The code is dynamically loaded from memory as an active extension would be. It is responsible for using Func.register to register a function with the function name listed. That function name is then invoked with an argument which is an encoded form of the the code, data, and function name. We have used a linked/ procedure call model for communication with other active programs. When programming one of our active packets, we assume that alien plus some set of active extensions will be on the node. If this is not the case, an error will occur during linking and the packet will be (silently) dropped. (The issue of error handling is left for future work.) When the code is loaded, it is linked against those extensions and alien, and makes use of resources via function calls. This implies that active packets have to trust the services they call to a substantial extent.
10
Conclusions
We have described the alien architecture for active networking. This architecture has made three contributions. First, it organizes the three interesting cases of crossing privilege with loadability: the privileged and immutable loader, the privileged and loadable Core Switchlet, and the unprivileged and loadable libraries. Second, it shows how the use of a modern programming language (Caml), coupled with a set of design principles, can result in a usable active networking system which is flexible and fast while preserving security. Finally, using the active extension model as a basis, it has demonstrated active packet service as a library. This proof-of-concept makes alien the first active networking system to support both models, and makes the case that the distinction between the models is probably not important. The design principles we have proposed are generally useful, and can be used in other Active Networking environments such as ANTS when issues such as concurrent multiprocessing and privilege are addressed in depth. The “takehome message” is in what goes where in the organization rather than any details of the alien implementation. alien suggests two promising directions for exploration. First, while alien provides effective control, and thus security, for objects it manages, there are a variety of resources it does not manage. In particular, the heap managed by
12
D. Scott Alexander and Jonathan M. Smith
the Caml runtime and the multiplexing of the system hardware managed by (in the current instantiation of alien) Linux expose a number of denial-of-service attacks that alien can do nothing about. These examples suggest promising areas for support of realistic systems will be schemes for providing better user isolation in the time domain, such as operating systems with support for Quality of Service. The second direction for exploration is global support for security properties which alien can enforce locally. Some results along these lines were presented in [14], which used the idea of granting cryptographic credentials for access to particular modules to extend module thinning to remotely executing code. Over the long term, however, the issue will become one of mapping security policies onto active network elements, and that will require thinking about scalable trust management in an active network.
References 1. D. L. Tennenhouse, J. M. Smith, W. D. Sincoskie, D. J. Wetherall, and G. J. Minden. A survey of active network research. IEEE Communications Magazine, Jan. 1997. 1 2. D. Scott Alexander. alien: A Generalized Computing Model of Active Networks. PhD thesis, University of Pennsylvania, Philadelphia, December 1998. 4, 10 3. F. C. Knabe. Language Support for Mobile Agents. PhD thesis, CMU, Dec. 1995. 4 4. Xavier Leroy. The Caml Special Light System (Release 1.10). INRIA, France, November 1995. 5 5. R. Milner, M. Tofte, and R. Harper. The Definition of Standard ML. MIT Press. 5 6. Richard L. Sites and Richard T. Witek. Alpha AXP Architecture Reference Manual. Digital Press, 2nd edition, 1995. 5 7. Don Anderson and Tom Shanley. Pentium Processor System Architecture. Addison Wesley, 2nd edition, 1995. 5 8. Ron Rivest. The MD5 message-digest algorithm. RFC 1321, April 1992. 5 9. Bruce Schneier. Applied Cryptography, pages 436 – 441. Wiley, 2nd edition, 1996. 5 10. F. Rouaix. A web navigator with applets in Caml. Fifth WWW Conf., May 1996. 5 11. Ken Arnold and James Gosling. The Java Programming Language. Java Series. Sun Microsystems, 1996. ISBN 0-201-63455-4. 5, 6 12. Michael Hicks, Pankaj Kakkar, Jonathan T. Moore, Carl A. Gunter, and Scott Nettles. PLAN: A packet language for active networks. In Proceedings of the International Conference on Function Programming (ICFP), September 1998. 6 13. Y. Yemini and S. daSilva. Towards programmable networks. In IFIP/IEEE Intl. Workshop on Distributed Systems: Operations and Management, October 1996. 6 14. D. Scott Alexander, William A. Arbaugh, Angelos D. Keromytis, and Jonathan M. Smith. A secure active network architecture: Realization in SwitchWare. IEEE Network, 12(3):37–45, May/June 1998. issue on Active and Programmable Networks. 8, 10, 12 15. Jon Postel. INTERNET protocol. Internet RFC 791, 1981. 9, 10
The Architecture of alien
13
16. Jon Postel. User datagram protocol. Internet RFC 768, 1980. 9, 10 17. Karen R. Sollins. The TFTP protocol (revision 2). Internet RFC 1350, 1992. 10 18. D. Scott Alexander, Marianne Shaw, Scott M. Nettles, and Jonathan M. Smith. Active bridging. Proc. 1997 ACM SIGCOMM Conference, September 1997. 10 19. D. Scott Alexander, Bob Braden, Carl A. Gunter, Alden W. Jackson, Angelos D. Keromytis, Gary J. Minden, and David Wetherall. Active network encapsulation protocol (ANEP). http://www.cis.upenn.edu/˜angelos/ANEP.txt.gz, August 1997. 10
Designing Interfaces for Open Programmable Routers Spyros Denazis1, Kazuho Miki2, John Vicente3, and Andrew Campbell4 1
Centre for Communications Systems Research (CCSR) University of Cambridge, UK Industrial Research Fellow, Hitachi Europe Ltd, UK.
[email protected] 2 Hitachi Ltd., Japan 3 Intel Corporation, USA 4
Center for Telecommunications Research (CTR) Columbia University, NY, USA
{miki,jvicente,campbell}@comet.columbia.edu Abstract. The ability to rapidly create and deploy new network services and architectures in response to new user demands is a driving force behind the emergence of programmable networks. The goal of open network control is being addressed in the IEEE P1520 Working Group (http://www.ieee-pin.org/) through the definition of a set of open network programming interfaces for networks. These interfaces would allow service providers to manipulate the state of the network through high-level languages and abstractions in order to construct and manage new network services with quality of service support. In this paper, we provide an overview of the IEEE P1520 reference model and a detailed framework for the development of low-level, open programmable interfaces for IP-based router and switch networks. Keywords: Open Programmable Routers, Router Interfaces, Differentiated Services
1
Introduction
Over the past several years, we have witnessed a growing amount of work in the area of open programmable networks [1,2,14]. The aim of this work is the design of new network architectures which facilitate rapid creation and deployment of new network services. Central to this goal has been the emergence of open interfaces, enabling control, management and composition of network resources through the introduction of new innovative services, which cannot otherwise be realized with today's proprietary (i.e., closed) network technologies. The need for programmable networks is becoming more apparent as open programmability has become a central theme to a number of standardization efforts and consortia [3,4,5]. As technology for open programmable networks matures, well2 3
Kazuho Miki is a Visiting Scientist at CTR, Columbia University John Vicente is a Visiting Scientist at CTR, Columbia University
Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 13-24, 1999. Springer-Verlag Berlin Heidelberg 1999
14
Spyros Denazis et al.
designed interfaces become more important for flexible customization, operation and extensibility. The development of open programmable interfaces has mainly been the result of academic projects, which have fairly specific objectives. In addition, open programmable interfaces have been constructed in an ad-hoc manner to support the introduction of services as a proof of concept for network programmability. As a result, a design framework for open interfaces in the context of programmable networks has not been addressed. Most existing proposals found in the literature have focused primarily on the control and management of ATM network elements motivated by the proprietary limitations of existing switching technology [6,7,8]. Through the activities of the DARPA initiative for Active Networks [9], [3], there has been an attempt to enable programmability within router-based networks. In parallel to the Active Networks initiative, OPENSIG [10] has been investigating open signaling and programmable network architectures. The open interfaces of a programmable network architecture are structured in a layered fashion whereby the higher interface relies on the services of the interface below it, while it exposes its own services to the layer above it. Hence, an interface is characterized by its scope and the services it offers. For example, the scope of an interface may be such that it distinguishes between node and network-wide services that it can offer. In this paper, we define a set of router (node) interfaces for programmable routerbased networks. We believe that in order to define a node interface, it is first important to describe a framework that assists in the design process and consequently the use and maintenance of the interface. The basic principles that underpin our proposed framework are driven by experiences with existing router technology. The proposed model, terminology and interface definition may serve related initiatives (e.g., active networks) by offering a generalized, yet well structured interface model in the support of programmable network architectures. The contribution as presented in this paper has been submitted to the IEEE P1520 for consideration. The paper is structured as follows. In Section 2, we present an overview of the IEEE P1520 reference model and terminology. Following this, in Section 3 we discuss a number of requirements for open router interfaces. Section 4 introduces our framework and its three basic components. Sections 5 and 6 present resource and service-specific abstractions, respectively. Section 7 presents an example scenario for realization of the proposed interfaces in support of Differentiated Services [12,13,14]. Finally, we present some concluding remarks in Section 8.
2
P1520 Reference Model
The IEEE P1520 standardization effort addresses the need for a set of standard software interfaces for programming of networks in terms of rapid service creation and open signaling [5]. The technology under consideration spans from ATM switches, IP routers to circuit or hybrid switches. The interfaces are structured in a layered fashion offering their services to the layer above. Each layer defines what is termed as a level. Each level comprises a number of entities in the form of algorithms or objects representing logical or physical resources depending on the level’s scope and functionality. This approach gives rise to the reference model depicted on the left of Fig. 1.
Designing Interfaces for Open Programmable Routers
15
P1520 Reference Model Users V interfaceAlgorithms for value-added Value-added communication services created by Services Level network operators, users, and third parties U interface Network Algorithms for routing and connection Generic management, directory services, … Services Level
}
L interface CCM interface
Virtual Network Device (software representation) Physical Elements (Hardware, Name Space)
} }
Virtual Network Device Level PE Level
U Interface Mapping of P1520 RM to IP Routers
}
Applications invoking methods on objects below
Differentiated Services Scheduling
Customised Routing
RSVP or Other per-flow protocol
Routing Algorithms
Software representation of routing resources
CCM Interface
Controller Hardware and other resources
L Interface
Routing table Data
Fig. 1: The P1520 Reference Model and mapping to IP routers.
More specifically, we can distinguish the four levels as follows: • • • •
The physical element (PE) level consisting of entities such as hardware and the device architecture that actually reflects upon the supported capabilities; The virtual network device level (VNDL) which logically represents resources in the form of objects (entities); isolating the upper layers from hardware dependencies or other proprietary interfaces; The network generic services level (NGSL) consists of entities in the form of distributed algorithms that bind (interconnect) together the objects of the VNDL level according to specific network functionality, e.g., routing, connection setup; Finally, the value-added services level (VASL) includes entities in the form of end-to-end algorithms that enhance the generic services of the NGSL level; providing user-oriented features and capabilities in the applications.
The four levels give rise to four interfaces, namely, CCM (Connection Control and Management), L (lower), U (upper), V (value-add) interfaces. The CCM interface is actually a collection of protocols that enable the exchange of state and control information at a very low level between the device and an external agent. The Linterface defines an API that consists of methods for manipulating local network resources abstracted as objects. CCM and L-interfaces fall under the category of node interfaces. The U-interface mainly provides an API that deals with connection setup issues. As in the case of the L-interface, the U-interface isolates the diversity of connection setup requests from the actual algorithms that implement them. Finally, the V-interface (not shown in Figure 1) provides a rich set of APIs to write highly customized software often in the form of value-added services. Additionally, Uand V-interfaces constitute network-wide interfaces. The P1520 Reference Model (RM) provides a general framework for mapping programming interfaces and operations of networks, over any given networking technology. Mapping diverse network technologies and their corresponding
16
Spyros Denazis et al.
functionality to the P1520 RM is essential. The right side of Fig. 1 illustrates a mapping of the P1520 RM to IP routers. Given this mapping, it is important to establish an L-interface definition that abstracts router resources and functionality such that it satisfies the service requirements above it, while it is flexible enough to accommodate future service requirements.
3
L-Interface Requirements
The objective of the L-Interface is to create programmable abstractions of the underlying resources of the router or switch, enabling third-party service providers, network administrators, network programmers or application developers to influence or extend router or switch control through the use of high-level API's. The requirements of the L-interface design are driven from the perspective of the users of the L-interface. It is therefore relevant to detail and understand these as fundamental to the basis of the L-interface. We enumerate the following requirements: 1. Open programmability - This is an enabling requirement for the L-interface where separation is achieved between hardware and software, fostering third-party service creation and open competition. 2. Operational support - Management and administrative functions must be improved or facilitated via the L-interface, where over slow-time scales greater control and intelligence data gathering is achievable by network administrators and architects. 3. Service provisioning - Through open APIs, third-party service providers must have the ability to modify existing network services as well the ability to introduce entirely new network services to the router or switch functionality. 4. Extensibility - The associated L-interface model must provide flexibility for extension without intrinsically creating a proprietary format, limiting the extensibility and virtualization process. 5. Programmable abstraction granularity - The support for granularity in router object programmability and service provisioning is essential to flexible customization. 6. Timescale flexibility - Access and control of router resources through the Linterface must be achievable over different time scales (i.e., control, management and transport). 7. Resource partitioning – Through management and control plane operations, the Linterface should support partitioning of router resources allowing network operators or service providers to deploy and operate their network architectures on the same physical infrastructure confined to the allocated portion of the router(s) resources. Finally, it is imperative that the L-interface requirements are not hindered or otherwise restricted to the abstracted resources level (VNDL) due to limitations of lower-level protocols (e.g., GSMP) or proprietary design features of a router kernel or hardware. The basic tenet argues that not only should the concept of open network programmability be supported by lower level protocols/interfaces, e.g. CCM, but also new router architectures should be designed to support these requirements through the above interfaces. We view this as critical for the success and widespread acceptance of open programmable routers.
Designing Interfaces for Open Programmable Routers
4
17
Generalized L-Interface Model
The framework for designing the L-interface or more generally node interfaces is comprised of three components. The first is a two-layer model representing the Linterface separation structure for abstracting router resources. The second is a hierarchical decomposition approach to representing router resources in the form of a tree-like structure with their corresponding inter-relationship. Finally, the third component describes how the interface definition structure of each abstracted resource, namely each node of the tree, should support control, transport, and management administrative operations on the router resources. In what follows, we further elaborate on each of these model components.
4.1 Two-Layer Abstractions
Fig. 2: Two-layer model for abstracting resources
The process of defining the L-interface through the abstraction of node resources is conceived as a two layer model depicted in Fig. 2. At its core lies the process of abstracting resources that are considered generic in the sense that they are not used in a specific-service context. At the outer circle lies the process of abstracting resources that have only meaning when they are used within a specific service context, e.g. Differentiated services. To this end, we view router resources from two different perspectives, corresponding to an association with a particular layer of the abstraction model. The first viewpoint is to consider router resources as general-purpose facilities; the abstraction of which leads to general-purpose interfaces that may be used and combined simultaneously by different service domains. The second viewpoint is to identify resources associated with the functionality they are intended. In this context, resources of the latter represent a partition of the general-purpose resources eventually used for certain tasks. In addition, L-interfaces that are the result of the generic abstraction process will form the basis on which service-specific interfaces may be defined for a variety of purposes. This in turn may also become a part of the Linterface definition. The advantage of the two-layer model is that it allows for true interface openness and resource reusability under different service contexts. Thus, we are suggesting that such an abstraction model allows, for example, upper level
18
Spyros Denazis et al.
interfaces (e.g. U interface) to create or program completely new network services using generic abstractions or modify existing services using service-specific abstractions, which are themselves built on generic abstractions. This makes the Linterface flexible rather than static in the sense that as new services are conceived, they can form their interface representation in a seamless fashion.
4.2 Hierarchical Resource Decomposition Abstracting resources requires knowledge of an IP router architecture, its corresponding physical resources, identification of those resources pertaining to our objective, followed by some form of classification. Consequently, this activity will result in an IP router reference model that may be used as a guiding model for resource abstraction.
Buffers
e.g.,DiffServ PHB
Bandwidth
Other Buffer Mgmnt
Scheduler
e.g.,OSPF hop cost
e.g., OSPF routing table
CPU Capacity
Other
e.g., EF PHB buffers
Service-specific Abstraction
Various Tables
L Interface
CPU Scheduling
Routing Table
algorithm
Other
Packet Classifier
Routing Calculation Database component
Software representation of routing resources
Generic Abstraction
CCM Interface
Processor
Line Card
Traffic Control
Route Control
Forwarding Engine
Data
Connection Module Line Card
Forwarding Engine
Capacity
IP Router Fig. 3: IP Router Architecture with example resource abstractions
Fig. 3, depicts a generalized IP Router Architecture and an example decomposition of the router resources, superimposed onto the P1520 interface model. As shown, the architecture is composed of a number of distinct functional elements, specifically: the line cards whereon input/output ports are instantiated; the forwarding engines wherein forwarding of packets takes place with routing and traffic control elements influencing the forwarding policy and traffic control services, respectively; a general-purpose processor which executes router kernel and network-level services and is responsible for hosting a number processes like routing protocols and other housekeeping or special purpose functions; finally, the connection module which is
Designing Interfaces for Open Programmable Routers
19
used for transport interconnection among the other elements and may represent a switching fabric or a bus architecture. Each of these elements can be viewed as containers of resources that should eventually be reflected upon the L-interface and thus, become available to the user or consumer of the resource. Router capacity, not unlike the other resources, plays a part in the functional architecture of the router, where local computation capacity, network bandwidth and static configuration are key abstracts for managing proper utilization of local processing and network-level control and transport services. To this end, it constitutes the quantitative representation of the router, and as such it may be viewed orthogonal to the actual router resource representation as illustrated in Fig. 3. The importance of such a distinction will become clearer in later sections. Observing generic abstractions requires viewing resources irrespective of functional service domain. Hence, we would identify generic router resources, translate them into generic L-interfaces, and further, provide methods for resource partitioning and methods to forge partitions according to the specifications of the caller of the method. Generic Abstractions
Base Functional abstractions abstractions Resource Hierarchy Root
(Capacity, Controller, Transporter)
(Examples: Connection Module, line or port, routing services, traffic control, capacity regions)
Service-Specific Abstractions
Component abstractions
Service binding abstractions
(Examples: Queues, tables, classifiers, databases, scheduler, path, ( flow or flow aggregates, threads, addresses, filters)
(Examples: Algorithms, protocols, profiles, code-points, policies, index types, entry )
Control Mgmt Transport
Fig. 4: Hierarchical Abstraction Model
The core router abstractions can be viewed in a hierarchical manner. As illustrated in Fig. 4, we depict this hierarchy and provide a layered mapping of the Generic and Service-specific abstractions. In this section, we focus on the generalized abstractions of the router, namely: i) Base abstractions. The base abstractions serve as the major stems of the binding hierarchy for core router services, and represent the highest-level router binding abstractions. They are fundamental services provided by the router, more specifically transporter, controller and capacity.
20
Spyros Denazis et al.
ii) Functional abstractions. Base abstractions are composed of functional element abstractions. For example, switch fabric and line card resources are functional router elements that serve the transport abstraction for forward processing. iii) Component abstractions. Below the functional layer, one or more component abstractions form router or switch functional abstractions. Through network service binding, static components (e.g., queues, schedulers, etc) are realized through the creation or binding of tables or software components composed via programmable instantiation. iv) Service binding abstractions. The service binding interface realizes or binds new or existing network service abstractions to the generic component resources (e.g., scheduler) supporting the router functional elements (e.g., line card port). These tightly-coupled interfaces cast service specific abstractions onto the component implementations, thereby binding service-specific policy, algorithms, protocols and the like to local router component resources.
4.3 Operational Aspects of the L-Interface The structure of each L-interface adds one more dimension to the framework necessary to account for control, management and transport aspects of the services offered through the L-interface. Generally, each resource abstraction is comprised of data structures and methods, thus reflecting an object-oriented approach. In this context, we characterize data structures and methods according to the type of operations they have originated from or have been used. As a caveat, we generalize the operational aspects to have a flexible requirement to support control, management and transport (i.e., not all resources require these services). Finer granularity within each of these basic categories may be possible. For instance, configuration operations may be considered as a sub-category of management. In this manner, it is possible to map control, management and transport services on to the hierarchical abstraction model. Finally, by allowing a resource abstraction interface to reflect operation types in an explicit way (e.g., standard method invocation naming formats) it can assist the consumer or the designer of the L-interface in the use or development of services.
5
Generic Resource Abstractions
Fig. 5 depicts the interface inheritance hierarchy for a Differential Services IP router. As shown, base abstractions are the highest-level interfaces for the Transporter, Controller and Capacity abstractions of the router. The Transporter is primarily responsible for forwarding functional abstractions; the Controller abstraction serves routing and traffic control functional abstractions; and finally, the Capacity abstraction provides abstractions for local computation, QoS and traffic control supporting, e.g., bandwidth scheduling. As mentioned previously, the Capacity branch of the router hierarchy is orthogonal to the Transporter and Controller base abstractions. It represents router resources, which are quantitatively limited and thus would provide methods that exert
Designing Interfaces for Open Programmable Routers
21
partitioning and shaping of the resource capacity. For example, with the memory resource we can partition it as a queue with a specific structure and size. An advantage of this approach is that it creates measurable, hence comparable, specifications for the router resources. This, in turn, can be used to perform a number of operations like bandwidth management, admission control before actually committing router resources to specific data transportation duties. Similar approaches have been proposed in [8,10]. Marker
Classifier Forwarding Engine
Transporter
Connection Module
Forward Mapping
Connection Module Scheduler
Connection Module Queues
Line & Port Queues
Line & Port Line & Port Scheduler
IP Router
Controller
Route Controller
Routing Tables
Traffic Controller
Storage
Capacity
Threads
Memory Disk Space
Transport Configuration Mapping
Traffic Policy Database(s)
Traffic Tables Flows
Computation
Forward Policy Database
Configuration Space Name Space
Forward Paths
Traffic Descriptor QoS Parameters Address Space
Meter Shaper
Differentiated Services Resource Abstractions
Dropper
• DSCP • Scheduling PHB Configuration parameters • Queuing parameters • Customer TCA • AF/MF filters Configuration • Profiles • Treatment Monitoring Data • DSCP Code-point • PHB name Mapping • Description • Filter Traffic • Name Profiles • Qualitative service-level • Quantitative parameters • Traffic profile Traffic • In-profile treatment Treatment • Out-profile treatment
Fig. 5: Differentiated Services Abstraction Model
The functional and component resource layers compose the generic resources for the router from the base abstractions. The functional abstractions of the router may have one or more service implementation scenarios (e.g., routing - RIP, OSPF, multicast) and may require specific component resource abstractions (e.g., queues structures, table, databases) implementations statically resident or created dynamically by way of a network service-specific implementation. These interfaces abstract the generic resources allowing the ability to manipulate, update, read or modify the implementation of the component resource.
6
Service-Specific Abstractions
Service domains are realized and can operate in any of these elements by accessing, manipulating and consuming resources. It is a requirement therefore not to associate generic component resources with a certain network service domain. The binding realization of network services with router component resources occurs within the service binding layer of the abstraction model. Static or dynamic component abstractions are influenced by way of service, domain-specific abstractions. Algorithms, protocols, policies, code-pointers, etc., which are proprietary to the network service, are instantiated through the interfaces created by
22
Spyros Denazis et al.
this layer of the abstraction model. An important aim of this layer is the abstraction of the context of the service being instantiated such that the proprietary nature of the service is hidden or at least mapped to what seems to be a general router resource abstraction. As an example, consider the Differentiated Services functional domain illustrated in Fig. 5, and lets assume again that a generic resource is a memory module. We may desire to reserve a portion of the memory for use by Differential Services. In addition, we should be able to impose a structure on this portion according to the specifications of a Diff-serv table, namely a table with fields for the packet ID tuple and the DS field. An amount of buffers can be reserved, and a specific structure (queue structure) is imposed. This is further complemented with a buffer management scheme, and finally, a portion of bandwidth is allocated and a code-point algorithm is invoked that implements the per-hop behavior. The purpose of the hierarchical abstraction model is to facilitate and guide the process, of using the L-interface as well as extending it as new services are required from the router resources. This can be achieved by starting from a generic abstraction of resources resulting in a number of generic interfaces, which in turn can be used to create service-specific abstractions of resources which are then realized as specialized interfaces, e.g. Differentiated Services interfaces.
7
Scenario – Differentiated Services
The Differentiated Services architectural model [12,13] is positioned as a collection of service mechanisms, which allow network service providers to offer "differentiated" service levels to alternative customer traffic aggregates. This is done so by packet marking, traffic forwarding using Per Hop Behaviors (PHB), and conditioning of traffic through traffic conditioning mechanisms and/or policies [13]. The policies and router service mechanisms are appropriately realized and enforced at boundary routers or edge devices of different provider Diff-serv domains whereby Service Level Agreements (SLA) are deemed necessary. We illustrate in Fig. 5 an abstraction model for Diff-serv which maps according to our proposed generalized resource abstraction model (presented earlier), separating core generic router resources from those that are specific to the realization of Diffserv mechanisms and policies. As such, the service implementation or binding of Diff-serv services to generic router resources support the service programming or provisioning, administration and operational aspects required of the L-interface in a hierarchical manner. This is necessary in that the user (e.g., programmer or administrator) of the L-interface may require operational services rather generically to manage or provision services over (i) entire functional service (e.g., traffic controller); (ii) a generic resource (e.g. classifier); (iii) a service-specific resource (e.g., shaper); or (iv) an entire network service (e.g., Diff-serv) covering multiple service-specific resources. The hierarchical model may also apply when a 'new' network service (e.g., routing service) is being deployed; requiring service binding to the existing network infrastructure through collective instantiation of service-specific 'resources' to generic router resources. Diff-serv building blocks are instantiated at the lowest tier of the resource abstraction hierarchy tree through the three major Base Abstractions, namely, i) PHB and traffic conditioning mechanisms are employed through the Transport; ii) PHB
Designing Interfaces for Open Programmable Routers
23
configuration, traffic conditioning configuration and monitoring are structured through the Controller; and finally, iii) traffic profiles, code-point mappings, traffic treatment policy and parameters are captured under the Capacity Base Abstraction. Operational services (e.g., install, enable or disable PHB configuration) are invoked through the Diff-serv L-interface abstractions through their methods. The object class definitions are structured in a manner that is consistent with our proposed L-interface abstraction model. We describe the following semantics of a generalized interface definition for resource abstractions and as shown in Table 1, an example of the general interface scheme applied to the Per Hop Behavior object, associated with Differentiated Services. General semantics: - name/ID identifies the abstraction with a unique ID - type identifies the abstract data type - structures defines one or more structures supporting the local abstraction - status implementation status, either new (requiring instantiation) or resident - parent abstraction associates the local abstraction with a parent abstraction - peer abstraction defines peer abstractions as required within a hierarchical level -methods defines methods for operational, management or control of the abstraction Table 1. Example: Differential Services Per-Hop Behavior service-specific abstraction Per-Hop-Behaviour (PHB) The per-hop behavior m echanism s are used to forward different traffic types with differing behavior. These are im plem ented via param eterization and policies that affect interfaces queues and schedulers on the router's egress Interface definition specification name/ID: PHB_configuration; type: database status: Include parent abstraction: traffic_policy_database; Include peer abstractions: LP_interface(); LP_queue(); LP_scheduler(); Structures: Struct{ code-point; *LPqueue_number struct scheduling {parameters}; struct queuing {parameters}; }; Struct{ *LPinterface_number; interfacestatus; EFstatus; Afstatus; MaxEFrate; MaxAFrate; }; Methods: InstallPHBdB(); EnablePHB(); DisablePHB(); ReaddB(), WritedB(), UpdatedB();
24
Spyros Denazis et al.
8
Conclusion
In this paper, we have presented an overview of the IEEE P1520 reference model. Fundamental requirements for open router interfaces were discussed. We have proposed an L-interface framework, introduced through a two-layer model consisting of generic resource and service-specific resource abstractions. In addition, we discussed a hierarchical model for resource inheritance and operational aspects supported by the methods, structures and semantics of individual resources. We also presented a simple scenario using Differentiated Services as an illustrated example of the use of the L-interface.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
14.
IEEE Comm. Mag., Special Issue on Programmable Networks, Vol. 36, No 10, Oct., 1998. IEEE Network, Special Issue on Active and Programmable Networks, Vol. 12, No 3, May/June, 1998. Calvert, K, et al, “Architectural Framework for Active Networks”, Version 0.9, Active Networks Working Group, August, 1998. http://www.dyncorp-is.com/darpa/meetings/anets98jul/anets-arch.html Multiservice Switching Forum (MSF). http://www.msforum.org/ Biswas, J., et al., “The IEEE P1520 Standards Initiative for Programmable Network Interfaces”, IEEE Communications, Special Issue on Programmable Networks, Vol. 36, No 10, October, 1998. http://www.ieee-pin.org/ Buckley, W., “Virtual Switch Interface (VSI) Specification”, MSF Contribution Document, MSF98.002, November, 1998. Newman, P., W. Edwards, R. Hinden, E. Hoffman, F. C. Liaw, T. and G.Minshall, “Ipsilon’s General Switch Management Protocol Specification Version 2.0”, RFC 2297, Internet Engineering Task Force, March 1998. C. Adam, M.C. Chan, J-F. Huard, A.A. Lazar, and K-S. Lim, “Binding Interface Base Specification: Revision 2”, OPENSIG Draft, April 1997. http://comet.ctr.columbia.edu/xbind/documentation/ DARPA Active Network Programs. http://www.darpa.mil/ito/research/anets/index.html OPENSIG Working Group http://comet.columbia.edu/opensig/. J.E van der Merwe, S. Rooney, I.M. Leslie, and S.A. Crosby, “The Tempest: A Practical Framework for Network Programmability”, IEEE Network, Vol. 12, No. 3, pp. 20-28, May/June 1998. http://www.cl.cam.ac.uk/Research/SRG/dcan/ Blake, S., et al, “An Architecture for Differentiated Services”, RFC 2475, December 1998. Bernet, Y., et al, “A Framework for Differentiated Services”, Internet Draft , October 1998 (Work in progress).Nichols, K., et al, “Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers”, RFC 2474, December 1998. Campbell, A.T., Kounavis, M.E., Vicente, J., Villela, Miki, K. and H. De Meer, "A Survey of Programmable Networks", ACM SIGCOMM Computer Communication Review, April 1999.
RCANE: A Resource Controlled Framework for Active Network Services Paul Menage University of Cambridge Computer Laboratory Pembroke Street, Cambridge, CB2 3QG, UK
[email protected] Abstract. Existing research into active networking has addressed the design and evaluation of programming environments. Testbeds have been implemented on traditional operating systems, deferring issues regarding resource control. This paper describes the architecture, resource models and prototype implementation of the Resource Controlled Active Network Environment (Rcane). Rcane supports an active network programming model over the Nemesis Operating System, providing robust control and accounting of system resources, including CPU and I/O scheduling, and garbage collection overhead. It is thus resistant to many classes of denial of service (DoS) attack.
1
Introduction
Adding programmability to a network greatly increases its flexibility. However, with this flexibility comes greater complexity in the ways that network resources, including CPU, memory and bandwidth, may be consumed by end-users. In a traditional network, the resources consumed by an end-user at a network node are roughly bounded by the bandwidth between that node and the user; in most cases, the buffer memory and output link time consumed in storing and forwarding a packet are proportional to its size, and the CPU time required is likely to be roughly constant. Thus, limiting the bandwidth available to a user also limits the usage of other resources on the node. In an active network hostile (or greedy or careless) forwarding code could potentially consume all available resources at a node. Even in the absence of specific denial of service (DoS) attacks, the task of allocating resources according to a specified Quality of Service (QoS) policy is complicated by lack of knowledge about the behaviour of the user-supplied code. The resources consumed by untrusted code need to be controlled in two ways. The first is to limit the execution qualitatively – limit what the code can do. This involves restricting either the language in which the code can be written, or the (possibly privileged) services which it can invoke. The second way is to limit the code quantitatively – limit how much effect its activities can have on the resources available to the system. This requires fine-grained scheduling and accounting. Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 25–37, 1999. c Springer-Verlag Berlin Heidelberg 1999
26
Paul Menage
This paper discusses the design and implementation of a framework to permit such quantitative and qualitative control. Section 2 outlines the design of the framework. Section 3 discusses its implementation on the Nemesis Operating System. Section 4 presents experiments to demonstrate the validity of the approach. Section 5 surveys related approaches to active networking and resource control.
2
RCANE Design
This section provides an outline of the design of Rcane. The architecture follows the principles given in [1] to partition the system: – The Runtime is written in native code and provides access to, and scheduling for, the resources on the node, and services such as garbage collection (GC). – The Loader is written in a safe language (OCaml [2] in the current implementation of Rcane), as are all higher levels. The Loader is responsible for system initialisation and loading/linking other code. – The Core, loaded at system initialisation time, provides safe access to the Runtime and the Loader and performs admission control for the resources on the node. – Libraries may be loaded both at system initialisation and by the actions of remote applications. They have no direct access to the Runtime or the Loader, except where permitted by the Core. 2.1
Discussion – Hardware or Software Protection?
Since an active network node is expected to execute untrusted code, there needs to exist a layer of protection between each principal and the node, and between principals. It is possible to utilise the memory protection capabilities of the node’s hardware, allowing principals to execute programs written in arbitrary languages [3]. However, at the time-scales over which active network applications are likely to execute, this paradigm is too heavyweight. An alternative, taken by Rcane, is to require principals’ code to be written in a safe, verifiable language. This allows much of the protection checking to be done statically at compile or load time, and allows much lighter-weight barriers between principals. In particular, it means that interactions between principals can be almost as efficient as a direct procedure call. 2.2
Sessions
A Session represents a principal with resources reserved on the node. Sessions are isolated, so that activity occurring in one session should have no effect on the QoS received by other sessions, except where explicit interaction is requested (e.g. due to one session using services provided by another session). Figure 1 shows part of the Session interface provided by the Core to permit control over a session and its resources. createSession() requests the creation
RCANE: A Resource Controlled Framework
27
of a new session. Credentials to authenticate the owner of the new session for both security and accounting purposes are supplied, along with a specification of the required resources and the code to be executed to initialise the session. destroySession() releases any resources associated with the current session. loadModule() requests that a supplied code module be loaded and linked for the session. linkModule() requests that an existing code module (possibly loaded by a different session) be made available for use by this session. The code may be specified simply by the interface which it exports, or by a digest of the code implementing the module, to prevent module-spoofing attacks. bindDevice() reserves bandwidth and buffers on the specified network device. Other functions concerning modification of resource requirements are not shown.
bool void bool bool bool
createSession (c : Credentials, r : ResourceSpec, code : CodeSpec); destroySession (void); loadModule (l : LoadRequest); linkModule (l : LinkRequest); bindDevice (d : Device, bu : BufferSpec, bw : BandwidthSpec);
Fig. 1. Part of the Session interface At system initialisation time two sessions are created: – The System session represents activity carried out as housekeeping work for Rcane. It has full control over the Runtime. Many of the control-path services exported from the Loader and the Core are accessed through communication with the System session. – The Best-Effort session represents activity carried out by all remote principals without resource reservations. Packets processed by the Best-Effort session supply code written in a restricted language and are given minimal access to system resources. Access to createSession() is permitted, to allow code to initiate a new session; further packets may then be processed by the newly created session. 2.3
Resource Accounting
Resources used by sessions running on Rcane are be accounted to the appropriate session, and charged to the principal who authorised the creation of the session. Pricing and charging policies will be system dependent. Resource requests are processed by the System session and, if accepted, are communicated to the Runtime’s schedulers. In general, data-path activity, e.g. sending packets, is carried out within the originating session. System modules in the Core are linked against entry points in the (unsafe) Runtime; these are then exported through safe interfaces to which the untrusted sessions can link directly. The Runtime performs a policing function on use of the node’s resources.
28
Paul Menage
2.4
CPU Scheduling
Rcane uses the following abstractions to control CPU usage by sessions: – A virtual processor (VP1 ) represents a regular guaranteed allocation of CPU time, according to some scheduling policy. A session may have one or more VPs. All activities carried out within a single VP share that VP’s CPU guarantee. – A thread is the basic unit of execution, and at any time is be either runnable (working on computation), blocked (e.g. on a semaphore, or awaiting more resources to become available) or idle (in a quiescent state, awaiting the arrival of further work items). – A thread pool is a collection of one or more threads. Each thread is a member of one pool. Associated with each pool is a queue of packets and a queue of events. Each pool is associated with a single VP; its threads are only eligible to run when its VP receives CPU time. Incoming packets (see Sect. 2.5) are routed to the associated pool and added to its packet queue. Events (functions for execution at a given time in the future) may be added to a pool’s event queue. Whenever there is work to be done in a pool (either newly arrived packets, or events whose timeouts have passed), any idle threads in the pool are be dispatched to process the work. When a running thread has finished its task, it returns to the idle state. This allows sessions flexibility in how they map their work onto threads. For tasks that have to be processed serially (e.g. routing a stream of packets) a single thread might be bound into a pool, to perform all processing required for that pool. For network services where it is desirable to service multiple requests at a time, several threads can be bound into a single pool, and as packets come in they will be dispatched to an idle thread. Alternatively, sessions wishing to perform both event-driven and packet-driven activities can choose between running two threads in separate pools (to prevent interference between the two activities), or saving resources by having a single thread in one pool (but risking occasional interference between packet and event activity). Similarly, for even better isolation between activities, the session could associate each activity with a separate VP (i.e. give each activity its own particular CPU guarantee). 2.5
Network I/O
Sessions running under Rcane can pass demultiplexing specifications to the Runtime, associating incoming flows of packets with specified pools and processing functions. To prevent crosstalk between the network activity of different principals, all packets are demultiplexed to their receiving pools by the Runtime at the lowest possible level. As little work as possible is carried out on those packets before 1
For those familiar with the Nemesis Operating System, over which this work is based, this abstraction is distinct from the normal Nemesis notion of a VP
RCANE: A Resource Controlled Framework
29
demultiplexing. Once the VP associated with the receiving pool is given CPU time, one of the pool’s idle threads can be used to invoke the flow’s processing functions. This allows each session full control over decisions such as whether, and what kind of, authentication is used for packets on a given flow. For non-authenticated flows, a session can specify a function which processes the packet’s payload immediately; should authentication be required, the session’s favoured authentication routines may be invoked with the relevant authentication data from the packet. A session may request a guaranteed allocation of buffers for receiving packets from a given network device. Incoming packets demultiplexed to the session will be accounted to this allocation, and returned to it when packet processing is completed. Packets for sessions without a guaranteed allocation are received into buffers associated with the Best-Effort session. Thus, although such sessions can receive packets, they will be competing with other sessions on the node. Similarly, a session may request its own allocation of guaranteed transmission bandwidth and buffers for a specified network device, or may use the transmission resources of the Best-Effort session. 2.6
Memory
The memory managed by Rcane falls into four categories: network buffers (discussed in Sect. 2.5), thread stacks, dynamically-loaded code and heap memory. Network buffers and thread stacks are accounted to the owning session in proportion to the memory consumed. Charging for keeping code modules in memory is likely to be a system specific policy e.g. it might be the case that linking to a commonly used module would be less expensive than loading a private module. Heap memory presents more challenges. Since safe languages generally require GC to prevent malicious (or careless) programmers exploiting dangling pointers, Rcane needs to provide GC services. The framework must be able to support the following features: – Efficient tracking of the memory usage of each session. – Ability to revoke references from other sessions when a session is deleted. – Prevention of crosstalk between sessions due to GC activity. These three requirements suggest giving each session its own independently garbage-collected heap. Tracking the allocations made by a session is straightforward; deciding to whom to refund garbage-collected memory is difficult to perform efficiently without separate heaps. Sessions which have completed their tasks (or whose authorisation/credit has expired) are destroyed – if other sessions have pointers to their data, it is impossible to safely release the session’s memory. Finally, deciding to whom to account the time spent on GC activity is difficult without separate heaps. Rcane uses an incremental garbage collector to prevent excessive interruptions to execution. Each session reserves a maximum heap size, and tunes the parameters of the GC activity – such as frequency and duration of collection
30
Paul Menage
slices – to allow it to trade off responsiveness against overhead. Charging can then be based on the size of the reserved memory blocks that comprise the heap, rather than the amount of live memory within those blocks, simplifying the accounting process. 2.7
Service Functions
The use of separate heaps and garbage collectors for each session requires Rcane to prevent the existence of pointers between different sessions’ heaps. In general this does not present a problem, since applications will not generally be relying on shared servers to perform data-path activities. However, in some situations it may be necessary or desirable to communicate with other sessions: – When talking to the System session to request a change in reserved resources, or to make use of services provided by the System session (such as default routing tables). – Some sessions may wish to export services to other sessions running on the node (e.g. extended routing tables, or access to proprietary algorithms). In each of these cases, a client executing in one session requires a local reference to a service function implemented in a different session. This reference is opaque to the client, and enables the runtime to identify the service associated with the reference. Invoking this service involves the following steps: 1. Copying the function’s parameters into the server session’s heap. 2. Invoking the underlying function in the context of the server’s heap. 3. Copying the results back into the client session’s heap. During both the invocation and return copying phases, the runtime notes when a copied value is itself a service, and creates a new reference (or reuses an existing reference) to the same service which is available in the destination session. Services can thus be passed from session to session. Server-specified policy can limit such copying to allow additional control over which sessions can utilise a service. Any work carried out by the server during the invocation is performed using the client’s thread, and accounted to the client’s CPU allocation. Figure 2 shows the interface provided for creation and manipulation of services. create() takes an ordinary function and returns a service function – invoking the returned function will cause the session switch described above. Thus invoking a service appears the same as invoking an ordinary function. Other parameters to create() specify the maximum amount of memory to be copied when invoking the service and whether the service may be passed from one client to another. The memory limit is currently a rather crude method of preventing DoS attacks by clients on servers. Ideally, the server would be able to inspect the data before it was copied, but this could result in untracked pointers from the server’s heap to the uncopied data in the client’s heap. destroy() withdraws a service – clients attempting to invoke it in future will experience a Revoked exception.
RCANE: A Resource Controlled Framework
31
type α → β service exception Revoked α → β service create (func : α → β; limit : int, shared : bool); void destroy (s : α → β service);
Fig. 2. The Service interface
3
Implementation
A prototype of Rcane has been implemented over the Nemesis Operating System [4]. The Runtime is based on the OCaml system from INRIA [2], with support for real-time CPU scheduling, multiple isolated heaps and access to Nemesis I/O. The Best-Effort session uses the PLAN interpreter [5] to provide a limited execution environment for unauthenticated packets, with PLAN wrappers around the Session interface to permit authentication and session creation. Rcane interoperates with PLAN systems running on standard (non resource-controlled) platforms, allow straightforward control of an Rcane system. Support for demand-loaded code in the style of ANTS [6] is also provided. In general, data path operations such as network I/O and CPU scheduling are implemented in native code in the runtime for efficiency. Most control path operations (such as bytecode loading and session creation) are implemented in OCaml for flexibility and ease of interaction with clients. 3.1
CPU Scheduling
CPU scheduling is accomplished using a modified EDF [7] algorithm similar to that described in [8]. Each VP’s guarantee is expressed as a slice of time and a period over which the time should be received (e.g. 300µs of CPU time in each 40ms period). Whenever the Rcane scheduler is entered, the following sequence of events occurs: 1. If there was a previously running VP, the elapsed time since the last reschedule is accounted to it. 2. The next VP to be run, and the period until its next pre-emption are calculated. From this point onwards, all work carried out is on behalf of the new VP, and hence can be accounted to it. 3. If there are packets waiting on the owning session’s incoming channels (see Sect. 3.3) they are retrieved and transferred to the appropriate pool’s packet queues. Any idle pools with pending events are marked as runnable. 4. The next pool and thread to be run are selected. 5. If the selected thread is active in a heap that is currently in a critical GC phase (see Sect. 3.2) then the thread carrying out the critical GC is activated instead, until the phase has completed. 6. The selected thread is resumed.
32
3.2
Paul Menage
Memory
The garbage-collector is based on the OCaml collector. When tracing the roots of a heap, it is necessary to suspend all threads that might access that heap. To ensure that all appropriate threads are stopped during such critical GC activity, each thread has associated with it a stack of heaps. When a thread makes a service call through to a different session, a pointer to the server’s heap is pushed on to the thread’s heap stack. When returning, the server’s heap pointer is popped from the heap stack. The top heap pointer on each stack is the thread’s active heap. (For brief periods of time, while transferring control between two sessions, a thread will actually have both of the top two heaps marked as active.) Whenever critical activity is being carried out on a heap, all threads which are active in the heap are suspended, other than to carry out the GC work. The majority of the GC work can be carried out without suspending threads. Tracking the threads which have access to each heap minimises the number of threads’ stacks which must be traversed to identify roots, and prevents QoS crosstalk between principals which are not interacting. Additionally, a thread executing a service call in a different session need not be interrupted (possibly whilst holding important server resources) due to critical GC work in its own heap. Since no pointers to the client’s heap can be carried through to the server, the code running in the server cannot cannot access that heap, and so the thread need not be suspended. Upon returning from the service call it is suspended if the activity is still in progress. When a session is destroyed, any references to its exported services are marked as revoked; attempts to invoke them generate an exception. 3.3
Network I/O
Rcane flows map directly to Nemesis I/O channels. A channel is a connection to a device driver associated with a particular set of flows, specified by a packet filter. The current implementation of Rcane supports channels for UDP packets and Ethernet frames, allowing interaction both on a local physical active network or on a larger virtual network tunnelled over UDP/IP. Link-level frames are classified on reception by the network device drivers via a packet filter, which maps the frame to the appropriate channel. In the case of sessions without guaranteed resources allocated on a given device, the frame is mapped to the Best-Effort session’s channel for that device. If the channel has free buffers available, the frame is placed in the channel – no protocol processing is performed at this point. If the channel has no free buffers, the packet is dropped. Thus, if a session is not keeping up with incoming traffic, its packets will get discarded in the device driver, rather than queueing up within a network stack as might happen in a traditional kernel-based OS. At some later point, when the appropriate VP is scheduled by Rcane to receive CPU time, the packets are extracted from the channels and demultiplexed to the appropriate thread pools for processing. Transmit scheduling is performed by the device drivers following a modified EDF algorithm, as described in [9].
RCANE: A Resource Controlled Framework
4
33
Evaluation
This section presents the results of various test scenarios run to verify the QoS guarantees and resource isolation provided by Rcane. 4.1
CPU Isolation
To demonstrate the isolation of multiple VPs from one another, three separate sessions, each with a single VP, were started at 5 second intervals. In each case an OCaml bytecode module was loaded over the network to be used as the entry point for the session. Session A runs on best-effort time only. Session B requests a 1ms slice in each 4ms period. Session C begins running on best-effort time. After 3 seconds it requests a CPU guarantee of 400µs each 2ms. It makes further changes to its allocation and then exits. For this experiment, requests for guaranteed CPU allocation also specified that they did not wish to additionally receive a portion of the best-effort time. Figure 3 (a) shows the amount of CPU time actually received by each session over the course of each scheduler period. Initially session A receives all the CPU; later B arrives and receives a constant 25% of the CPU. When C arrives, it initially shares the remaining 75% best-effort time with A; then it switches to guaranteed CPU time (initially 20%, then 40%, then 10%). It can be seen that the guarantees requested from the system were accurately respected at fine timescales. 4.2
Network Transmission
Figure 3 (b) shows a trace of network output from three sessions, each attempting to transmit flat-out. Session D has no guaranteed bandwidth. E has an allocation of 33% (on a 100Mb/s link). F starts with a guarantee of 25%. After about 12s, it requests 45%, thus reducing the best-effort bandwidth available to D. After another 2s, it requests 65%. Now the link is saturated and there is no best-effort transmission time available. After a further 2s it returns to 25%, allowing D to begin transmitting again. It can be seen from the trace that the desired resource isolation is achieved. 4.3
Memory Isolation
To demonstrate the utility of running different principals’ sessions in their own heaps, two scenarios were considered. In (a), VPs G and H are running in the same session. Initially both are generating small amounts of garbage. After a period of time, G begins generating large amounts of garbage. Scenario (b) is the same, but with the two VPs running in separate sessions (and hence having separate heaps). Figure 4 shows the outcome of these scenarios. In (a), both VPs are initially doing small amounts of GC work. G is running best-effort, H has a guarantee of 1ms in each 4ms period. When G switches to generating large
Paul Menage 100 100
A B C
F E D
80
80 TX bandwidth (Mb/s)
% of CPU time (averaged over VP’s period)
34
60
40
60
40
20
20
0
0 5
10
15
20
25
30 35 Time (s)
40
45
50
55
8
10
12
(a)
14 Time (s)
16
18
20
(b)
Fig. 3. (a) Dynamically changing CPU guarantees. (b) Network output
% of CPU time (averaged over VP’s period)
% of CPU time (averaged over VP’s period)
amounts of garbage, the time it spends garbage collecting increases substantially. However, as shown by the noisy region at the bottom right of the graph H also ends up doing an irregular but substantial amount of GC work. Although H has its own independent CPU guarantee, sometimes critical GC activity (such as root tracing) is taking place when its thread is due to run; it must complete this GC work before normal execution can be resumed. In (b), H is unaffected by the extra GC activity caused by G, since it is running in a separate session and hence does not share its heap. 100
G (total CPU used) 80
60
G (GC work) H (total CPU used)
40
20
H (GC work) 0 22
24
26
28 Time (s)
30
(a) Single heap
32
34
100
G (total CPU used) 80
60
G (GC work)
40
H (total CPU used)
20
H (GC work)
0 20
22
24
26
28 Time (s)
30
32
34
(b) Separate heaps
Fig. 4. Avoiding QoS crosstalk due to garbage collection
5 5.1
Related Work Active Networks
Many approaches to loading user-supplied code onto a network node are built on an existing safe language such as Java (including ANTS [6] and Hollowman [10]) or OCaml [2] (including ALIEN [11] and the PLANet service loader [12]). An
RCANE: A Resource Controlled Framework
35
alternative is to start with a very restricted language, and rely on the limitations of the language to bound the resources consumed by the user’s code. This approach is taken by PLAN [5] and Smart Packets [13]. PLAN also extends the concept of the hop count found in IP to apply to recursive or remote invocations, bounding the resources that a packet can consume globally. The Active Networks Working Group NodeOS Interface Specification [14] aims to standardise on an API addressing similar issues to Rcane, although at a lower level of abstraction. 5.2
Resource Control and Isolation in Safe Languages
JRes [15] provides Java resource control using minimal runtime support. This provides portability, but with high accounting overheads. The J-Kernel [16] gives Java support for multiple protection domains and capabilities to allow revocation of services, but does not fully partition the JVM heap. The Java Sandboxes [17] project allows separate heaps in a modified JVM, by preventing stores of interheap references at run-time. This has serious efficiency consequences, and also fails to address the issue of QoS crosstalk due to critical GC activity. 5.3
Resource Control in Operating Systems
Nemesis [4] aims to provide reliable resource guarantees to applications. It is based on the principles that applications should perform as much of their own work as possible, without relying on shared servers for data-path activities, and that applications should have full control over their own resources. The Exokernel [18] takes a similar approach, but motivated by performance gains, rather than provision of QoS guarantees. Scout [19] seeks to associate resources with data paths rather than with users or applications.
6
Conclusions and Future Work
This paper has presented the design for Rcane, a Resource Controlled Active Network Environment, and its implementation over the Nemesis Operating System. Rcane supports the execution and accounting of untrusted code written in a safe language. Direct interference between principals is prevented through the use of a safe language. QoS interference is prevented through scheduling and accounting. Experiments showed that principals running on Rcane do experience isolation with respect to CPU time, network bandwidth and GC activity. Areas for future work include: a more developed charging and accounting model, resource control and transfer on a network-wide scale, and allowing principals more flexibility in specifying scheduling and memory usage policies.
Acknowledgements The author wishes to thank Jonathan Smith at the University of Pennsylvania, where part of this work was carried out, and Jonathan Moore and Michael Hicks for developing the PLAN infrastructure.
36
Paul Menage
References 1. D. Scott Alexander. ALIEN: A Generalized Computing Model of Active Networks. PhD thesis, University of Pennsylvania, September 1998. 26 2. Xavier Leroy. Objective Caml. INRIA. http://caml.inria.fr/ocaml/. 26, 31, 34 3. Dickon Reed, Ian Pratt, Paul Menage, Stephen Early, and Neil Stratford. Xenoservers: Accountable Execution of Untrusted Programs. In Seventh Workshop on Hot Topics in Operating Systems (HOTOS-VII), March 1999. 26 4. I. M. Leslie et al. The Design and Implementation of an Operating System to Support Distributed Multimedia Applications. IEEE Journal on Selected Areas In Communications, 14(7):1280–1297, September 1996. 31, 35 5. Michael Hicks, Pankaj Kakkar, Jonathan T. Moore, Carl A. Gunter, and Scott Nettles. PLAN: A Packet Language for Active Networks. In Third ACM SIGPLAN International Conference on Functional Programming (ICFP), 1998. 31, 35 6. David J. Wetherall, John Guttag, and David L. Tennenhouse. ANTS: A Toolkit for Building and Dynamically Deploying Network Protocols. In 1st IEEE Conference on Open Architectures and Network Programming (OPENARCH), April 1998. 31, 34 7. C. Liu and J. Layland. Scheduling Algorithms for Multiprogramming in a Hard Real-time Environment. Journal of the Association for Computing Machinery, 20(1):46–61, February 1973. 31 8. Timothy Roscoe. The Structure of a Multi-Service Operating System. Technical Report 376, University of Cambridge Computer Laboratory, August 1995. 31 9. Richard Black, Paul Barham, Austin Donnelly, and Neil Stratford. Protocol Implementation in a Vertically Structured Operating System. In 22nd IEEE Conference on Local Computer Networks (LCN), 1997. 32 10. Sean Rooney. Connection Closures: Adding application-defined behaviour to network connections. Computer Communications Review, April 1997. 34 11. D. Scott Alexander, Marianne Shaw, Scott M. Nettles, and Jonathan M. Smith. Active Bridging. In ACM SIGCOMM Conference on Applications, Technologies, Architectures and Protocols for Computer Communication, September 1997. 34 12. Michael Hicks, Jonathan Moore, D. Scott Alexander, Carl Gunter, and Scott Nettles. PLANet: An Active Internetwork. In IEEE INFOCOM ’99, 1999. 34 13. Beverley Schwartz, Alden Jackson, Timothy Strayer, Wenyi Zhou, Dennis Rockwell, and Craig Partridge. Smart Packets for Active Networks. In 2nd IEEE Conference on Open Architectures and Network Programming (OPENARCH), 1999. 35 14. Active Networks NodeOS Working Group. NodeOS Interface Specification. Draft. 35 15. Grzegorz Czajkowski and Thorsten von Eicken. JRes: A Resource Accounting Interface for Java. In ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), November 1998. 35 16. C. Hawblitzel, C.-C. Chang, G. Czajkowski, D. Hu, and T. von Eicken. Implementing Multiple Protection Domains in Java. In 1998 USENIX Annual Technical Conference, June 1998. 35 17. Philippe Bernadat, Dan Lambright, and Franco Travostino. Towards a Resourcesafe Java. In IEEE Workshop on Programming Languages for Real-Time Industrial Applications (PLRTIA), December 1998. 35
RCANE: A Resource Controlled Framework
37
18. Dawson R. Engler, M. Frans Kaashoek, and James O’Toole Jr. Exokernel: an Operating System Architecture for Application-level Resource Management. In 15th ACM Symposium on Operating Systems Principles (SOSP), volume 29, 1995. 35 19. A. Montz, D. Mosberger, S.W. O’Malley, L. Peterson, and T. Proebsting. Scout: A Communications-Oriented Operating System. Technical report, Department of Computer Science, University of Arizona, June 1994. 35
The Protean Programmable Network Architecture: Design and Initial Experience Raghupathy Sivakumar, Narayanan Venkitaraman, and Vaduvur Bharghavan University of Illinois, Urbana-Champaign, IL 61801, USA {sivakumr,murali,bharghav}@timely.crhc.uiuc.edu http://timely.crhc.uiuc.edu
Abstract. This paper presents Protean, a programmable network architecture for future networks. Protean is an event-driven network architecture that allows service providers, applications, and even individual flows to customize the network services, while at the same time providing efficient data paths for flows that use default services. A key feature of Protean is the support for state management. A service that is invoked at one node has the ability to access and update non-local state, and the management of distributed network state is achieved by a core-based self-configuring infrastructure in Protean.
1
Introduction
The next generation Internet is expected to support very diverse environments (commercial heterogeneous wireline/wireless networks), applications (multimedia, WWW, telnet), and workloads (heterogeneous unicast and multicast streams with different quality of service requirements). The problem with supporting such diversity in a single network infrastructure is that different applications have very different requirements from the network. Consequently, it is clear that the network must play a more active role in supporting the needs of the applications and end users. To this end, there has been a lot of recent discussion regarding the design and deployment of active networks. Unlike traditional network architectures wherein the network provides only best effort datagram service and all the smarts reside in the end hosts, applications in an active network have the ability to inject specialized functionality into the routers of the network. In this paper, we present an overview of the active router architecture and state management in the PROTEAN (PROgrammable TEchnology for Active Networks) active network that is being developed at the University of Illinois. Protean is similar to other programmable network approaches in that it provides for the dynamic injection of services, advertisement of services, and a programmable abstraction of the network to service providers and even applications. However, Protean is distinct in terms of its focus on state management. While related work has typically focused on the mechanisms for injecting and executing customized services in the network, to our knowledge there has been very little study on how services can access and manipulate non-local state. Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 37–47, 1999. c Springer-Verlag Berlin Heidelberg 1999
38
Raghupathy Sivakumar et al.
Of course, this is a critical issue in the practical deployment and use of active networks, since services must be able to access and update non-local network state to make intelligent decisions about packet handling. At the same time, the state management needs to be low overhead, and the state that is monitored needs to be extensible. Protean allows services invoked in a router to access and update non-local state, e.g. making reservations along an entire path or access routing tables of other routers. In essence, services in Protean are provided with a ‘distributed shared memory’ abstraction of the network. This has three key advantages: (a) it makes writing services much easier, (b) it allows the network to arbitrate resources and access among competing services, and (c) it provides a uniform framework for the dissemination of both network state (such as link bandwidth, expected delays, and resource availability) and available services in the network. On the other hand, aggregating and maintaining ‘network state’ efficiently is a challenging task. We discuss more about the Protean architecture and state management in subsequent sections. Section 2 presents the architectural framework of Protean and Section 3 describes the Protean state management. Section 4 presents an illustrative case study of the Protean architecture and Section 5 concludes this paper.
2
The PROTEAN Active Network Architecture
As shown in Figure 1, the Protean architecture has three key components (a) the router architecture (services, virtual network contexts etc.,) (b) the state management architecture (network state monitoring, propagation and access, distributed services management) shown in two separate modules, and (c) the programming and runtime framework. We focus on the first two components in this paper. 2.1
The Protean Active Router Architecture
The Protean router is based on an event-driven model. An event is a fundamental entity in the router architecture. Events are associated with event-handlers, and an (event, event-handler) pair is termed a service. A set of services associated with the data path of a packet in a router is contained in a virtual network context (VNC). A VNC in Protean consists of the data path a packet traverses from the point it enters a router till when it leaves the router, the probable events along the data path, the handlers for the events (services) and finally the state space for the VNC. A virtual network context is typically populated with a number of flows that reside within the context. The router architecture allows for the creation of customized VNCs by a service provider, application, user, or even individual flows. VNCs are hierarchically structured, and children VNCs inherit services from their parents by static scoping rules. Figure 1 shows the architecture of a typical router in Protean. Although the Protean router architecture is fundamentally similar to the programmable router approach [1,2], it differs from existing approaches in two key aspects: (i) the way virtual network
Distributed Services Manager Events, Event Handlers (Services)
Example Service Sets: VNC-ID ServiceSet
PROTEAN SWITCH
network state abstraction
39
ehU1 User defined Events
Programming and Runtime Framework access state update state
Default Services Application Injected Services
Virtual Network Context abstraction
1.1.1
inject services
Applications
access services
The Protean Programmable Network Architecture
1.1.1
VNC classify
VNC Classification
State Manager
(a)
Event 1
Event 2
1.1.1 eh1.A
1.2.1 eh2.A
1.1
eh1.B
1.1 eh2.B
1.2
eh1.C
1.2 eh2.C
*
eh1.D
*
Event n
1.1.2
ehU1, eh1.A, eh2.B eh1.B, eh2.B
1.2.1
eh1.C, eh2.C
2.1
eh1.D, eh2.D (B)
eh2.D (A)
(b)
Fig. 1. (a) Protean Active Network Architecture: The rectangular boxes show components of the architecture while the shaded planes indicate levels of abstraction (b) Protean Router Architecture: For each event in the data path of a packet, there exists a mapping from VNC-ids to event handlers. For a given packet, a maximal prefix match of the VNC-id of the packet is used to retrieve the appropriate event handler (A). The set of event handlers along the data path constitutes the Virtual Network Context of the flow (B)
contexts are created and maintained and (ii) the nature of the state space that is provided to virtual network contexts. Since Protean maintains an hierarchy of VNCs, individual flows are allowed to create their customized virtual network contexts by either inheriting from an existing VNC, or by building a VNC from scratch (by injecting handlers for all events in the trigger points along its data path) or by a hybrid of the two approaches (inheriting an existing VNC, modifying existing services, adding new services etc.). Also, a VNC in Protean has access to not only local state but also to non-local state. Moreover, the state space in Protean is itself programmable thus allowing for flows to inject their own state variables in their state spaces which would from then on be maintained by the Protean state manager. All services in Protean are injected and executed in the kernel level to improve efficiency, while the state management is done in user level in order to limit the complexity of the kernel. The rest of this section describes what a virtual network context in Protean is and how a virtual network context is setup by a flow. 2.2
The Protean Virtual Network Context
A virtual network context in Protean can be defined as a set of services. Each flow at a router is associated with a particular VNC and all data packets belonging to that flow are processed within this VNC. Each VNC is associated with an unique VNC-Id. Switches are made programmable by allowing flows to create their own customized virtual network contexts. The following is the list of components forming a virtual network context for a given flow:
40
Raghupathy Sivakumar et al.
– The data path for the flow within the router. While in conventional routers the data path is typically the same for all flows, flows in an active network might potentially want to traverse different data paths based on their requirements. For example, flow 1 may choose to be routed by the standard routing algorithm while flow 2 may choose to be routed by a QoS routing algorithm. – Events, handlers and services. We use the term service to signify an (event, handler) pair. Events are of two types: (a) basic events, which are predefined by the network, and (b) user-defined events. Basic events are of two types: those whose handlers are non-programmable, and those whose handlers are programmable. Note that only the top level for these services is non-programmable. Within a virtual network context, any scheduler or resource allocation policy can be used among its flows/children contexts. User-defined events are not a part of the default network event set. Thus, user-defined events can only be triggered by user-defined event-handlers. – The VNC state space. A VNC in Protean includes a programmable state space that the particular VNC has access to. While the state space consists of some default state variables in it by default (routing table, CPU utilization etc.,), Protean allows flows to program the state space to include non-local state and even newly defined state variables. For example, if a flow needs to have access to the congestion in all of its next hop routers, it can introduce new state variables in its virtual network contexts that indicate the levels of congestion in neighboring routers. The Protean state manager would then be responsible for monitoring this state and keeping the state consistent. Section 3 describes in detail how this is achieved. The Protean VNC can thus be expressed as follows: V N C := (dataP ath, (Service1, Service2, ...), stateSpace) where, services are of the form ((event1, eventHandler1), (event2, eventHandler2)...) and the stateSpace is a union of the pre defined default state variables and the state variables programmed by the particular flow. Each of the above components are made programmable in Protean, paving the way for a programmable router. The next section explains how a V N C is setup by a flow. 2.3
Programming a Protean Router
As mentioned before, at a given router, each flow is associated with its own virtual network context. Switches are made programmable by allowing flows to customize their virtual network contexts. While one extreme of customization would involve building from scratch each of the components that compose the V N C, the other extreme would be to inherit an entire V N C from the set of existing V N Cs. Setting up a VNC with customized services is a one-time effort; if all the services selected to compose the VNC are already in the router,
The Protean Programmable Network Architecture
41
then the overhead for creating a VNC is negligible; if some of the services need to be downloaded over the network, then the overhead for creating a VNC is significant. In Protean, flows inherit by default the V N C created by the closest ancestor. Thus, if a service provider creates its own virtual network context V N Ci , all flows originating from hosts subscribing to that service provider would by default inherit V N Ci in the absence of other V N Cs belonging to more closer ancestors in the organization hierarchy. Thus, VNC set up in Protean can be classified into two: – Default VNC setup. A flow that does not want to incur the overhead of setting up a customized VNC can start transmitting its packets without going through the VNC set up process. At each of the routers the flow goes through, it is associated with the VNC belonging to the closest ancestor of the flow in the organization hierarchy. The closest ancestor is identified by performing a max-prefix match of the flowId and the ids of the available VNCs at the router. – Customized VNC setup. For flows that do want to setup their own customized VNCs, Protean offers three choices: (i) inherit an entire V N C from the available set of V N Cs, (2) customize only portions of an inherited V N C or (iii) build the entire V N C using injected modules. It is important to note here that setting up a VNC does not involve an explicit setting up phase with a high overhead. Rather, all it involves is creation of mappings between events on the data path and appropriate event handlers. Hence the VNC setup phase in Protean is an implicit and not an explicit set up phase. The Protean state management architecture plays a key role in enabling flows to customize existing VNCs. The state management architecture provides the new flows with information about existing V N Cs including the various options available to build the components of the VNCs (data paths, trigger zones, services, etc.,) from which the flow chooses portions of existing VNCs or an entire VNC.
3
The State Management Architecture
The goal of state management in Protean is to provide services access to nonlocal state in a consistent and available manner. At the same time, the state management needs to be scalable, low-overhead, and robust. These are contradictory goals, because the more non-local state a router caches, the more consistency management it needs to perform. Likewise, the more dynamic the network becomes, the more likely it is for cached non-local state to become outdated, thus leading to either lower availability or more consistency management overhead. In order to balance these issues, Protean adopts a 2-level approach for state management: (a) it hierarchically clusters the network, and a node in a cluster only maintains non-local state about the cluster, and (b) it creates and maintains a self-configuring core infrastructure in each cluster, that is responsible for aggregating the network state within the cluster, and providing the ability for individual nodes to access and update this state.
42
Raghupathy Sivakumar et al.
In this paper, we focus on the intra-cluster state management infrastructure of Protean. Specifically, we describe how the core is formed, and how it propagates the cluster state to the nodes in the core. The mechanisms for inter-cluster state propagation and abstraction are still ongoing work, and are not discussed further. State management within the cluster has two key components: (a) generation of the core nodes, and (b) propagation of link state. We describe each in turn. 3.1
Generation and Maintenance of the Core
The core nodes of the network, together with tunnels that interconnect core nodes form an infrastructure called the core network. The core network serves to maintain and propagate state in the active network. In Protean, only autonomous networks (or stub networks that hang off transit networks) use a core network for state management. For transit networks which typically are bigger and serve a larger number of flows, the use of a core network might not be a scalable and effective option and in a related work, we propose a scalable and low-overhead state management mechanism for transit networks [3]. In this paper, we focus on the use of core networks for state management. Core networks in Protean are constructed by approximating the minimum dominating set of the underlying network [4] and hence satisfy two properties: (a) each node is either a core node or has low latency access to a core neighbor in order to access non-local state, and (b) the number of core nodes is minimized thus reducing the consistency management overhead. Each core node establishes tunnels with all of its nearby 1 core nodes. Once we come up with the core nodes and establish tunnels among nearby core nodes, then we are all set in terms of having an infrastructure that can capture the state of the cluster and propagate it among the core nodes. State is propagated to the nodes in the cluster via the state propagation mechanism described below. Also, nodes that are not in the core set access non-local state via a transparent mechanism, that allows a non core node to maintain a strongly consistent copy of the cached state with its dominating core node. 3.2
Propagation of State in the Core
The key remaining issue is how the cluster state is propagated among the core nodes. To take a concrete example of ‘state’, we will take the available link bandwidth as an example. Available link bandwidth is a particularly dynamic piece of state, because it changes every time a new flow starts or an ongoing flow terminates. For this example, we will assume that a monitoring process is available to compute the available bandwidth. Thus, the focus of this section is only on how this state is propagated, and what level of consistency we can expect among the core nodes. 1
We define nearby core nodes of a node u as core nodes that are less than or equal to 3 hops from u. It can be shown that every core node will have atleast one other core node within 3 hops from it [4].
The Protean Programmable Network Architecture
43
The state propagation mechanisms in Protean are motivated by three reasons: (a) each core node must maintain up-to-date local state, (b) if a node is aware of a non-local resource, it is a potential contender for the resource, thus the state corresponding to a resource must not be propagated far into the cluster if the available resource is small, and (c) if a resource is fluctuating, consistency overhead in maintaining the updated value is unacceptably high, and so state corresponding to fluctuating resources must be kept local. In essence, the goal of the state management algorithm is to propagate stable abundant resource state throughout the core nodes, and restrict the propagation of unstable or scarce resource state. We achieve this goal by creating two types of waves: slow moving increase waves that signal an increase in the resource, and fast moving decrease waves that signal a decrease in resource. The basic idea is that for fluctuating resources, the fast moving decrease wave triggered by a resource decrease will quickly overtake and kill the slow moving increase wave that was triggered by a previous resource increase. Likewise, a stable high bandwidth resource will eventually propagate to all core nodes by virtue of the increase wave. Increase waves have a time-to-live, i.e. maximum distance to which they can be advertised, which is a function of the available resource. 3.3
Using the Core to Perform State Management
Having described the core infrastructure to aggregate and propagate intra-cluster state, we are faced with five important issues: (a) how do services access the state, (b) what are the consistency semantics for distributed state, (c) what is the trade-off between providing read-only versus read-write access to state, (d) what are the trade-offs between intra-cluster and inter-cluster state management, and (e) how are services propagated in this infrastructure?. While (d) is still part of ongoing research, we discuss the other issues below. 1. Event-handlers are instantiated as kernel-level modules, but the state manager process is user-level. We have looked at two ways for event-handlers to access non-local state that is managed by the state manager: (a) via upcalls, and (b) via shared memory pages between the user-level and kernel-level modules. We have chosen the latter approach because of its simplicity and efficiency, though the former approach is more scalable when the state manager manages large amounts of state. 2. The consistency semantics of each state element is dependent on several factors, most importantly, the granularity at which waves are triggered. Between a non-core node and its core dominator, the caching of state is on-demand at the non-core node, and the consistency semantics is strong consistency. Among the core nodes, the consistency semantics is weak consistency, and Protean currently provides no guarantees that the accessed state is indeed correct. For guaranteed state update, we expect that a service will use the available non-local state as a read-only resource and directly propagate a state update request to the node that controls the state element.
44
Raghupathy Sivakumar et al.
3. For read-write accesses of state Protean supports two consistency models. The first model supports weak consistency semantics in which reads are served using the local copy of the state and writes are done to the local copy with a lazy update of the owner’s copy (the primary copy of the state). The second model supports strong consistency semantics in which all reads and writes propagate to the owner’s copy and block till the access/update is completed. Of course, it is evident that these two models have a tradeoff between the “strictness” of consistency and the overhead for providing such consistency. It is our belief that for most services in the network, strong guarantees are not required, thus the weak consistency model will be sufficient. Also, for both mechanisms, once the update is made, the state manager is responsible for propagating the updated state. 4. It is easy to see how service dissemination is achieved using the core. When a cluster node has access to an event handler, it advertises itself as a node which can be contacted to obtain a copy of the event handler. Directory service is thus an element of state, and directory service is updated at a core node whenever a core node either acquires or relinquishes a copy of an event handler. We do have efficient mechanisms for propagating services across clusters. In this case, we simply aggregate the services available in a cluster, and nominate a ‘clusterhead’ that acts as the repository of these services. Service dissemination is then carried out hierarchically across different cluster levels.
4
The Active Router - A Case Study
In this section, we present a case study of the Protean active network architecture that serves two purposes: (i) it acts as a proof of concept for the Protean router architecture and (ii) it provides a way of evaluating the Protean architecture. The case study implements an Active Dropping Router that allows flows to dynamically reconfigure the dropping policies for their respective queues. The case study uses a prototype implementation of the Protean router architecture and illustrates the router-level behavior - specifically, it shows how services are instantiated in the router, and perform measurements to calibrate both the overhead of service instantiation, and the improvement in functionality due to the introduction of application-specific services. More case studies of the Protean architecture are presented in [3]. We customize the “packet dropping” behavior of a Protean router. In a typical Protean event sequence, when a packet is ready to be queued in a designated queue at the output link, the “packet-level admission test” event is invoked. The event handler for this event determines if the new packet can be enqueued without causing some packet to be dropped (e.g. if the buffer is full or above a threshold value). If the packet-level admission test fails, then the “packet drop” event is invoked. The default event handler for packet drop uses the tail-drop policy, i.e. the incoming packet is dropped. Thus, by default, the incoming packet is dropped if the queue is overfull. However, event handlers for both the admission test and packet drop events can be modified. We focus on the packet drop
The Protean Programmable Network Architecture
45
event, and compare application-specific services that can replace the default tail drop with head drop, random drop, or priority drop services. For example, an application may introduce head drop for its feedback queue (where more recent feedback gets precedence), random drop for ensuring fairness among flows if multiple flows corresponding to the application are sharing the same queue, and priority drop for packet flows that have some application-specific structure built into them (e.g. MPEG flows, wherein packets corresponding to I-frame, P-frames, and B-frames are in descending order of importance). In a conventional router, any change in the dropping policy would involve updating the kernel code to implement the new policy, recompiling the kernel and finally shutting down the router to boot through the new modified kernel. Whereas, in our active router, we show that using the Protean active network architecture, the dropping policy can be modified on the fly without having to take the router offline. In terms of performance analysis, the later part of this section analyses the active router’s performance on three counts: (i) functional correctness, whether the active router performs according to its current configuration after the configuration has been changed on the fly, (ii) throughput, performance of the active router in terms of throughput when compared with a conventional router which has the new policies built into the kernel and (iii) latency, where the latency suffered by packets in our active router is compared to the latency suffered by packets in a conventional router. We now discuss the precise mechanisms to instantiate application-specific services in the Protean router. When an application wants to customize an event-handler for a VNC, it first notifies the active router about which eventhandler it wants to instantiate. Event-handlers are identified via unique service names. If the active router already has the event-handler, then it updates the (event, event − handler) association for the corresponding event, adding one additional entry corresponding to the VNC id in the event-handler table of the event (see Figure 2). Otherwise, the router contacts the state manager, which is responsible for service dissemination. Service dissemination is achieved through a hierarchy of DNS-like state managers, which maintain locally available services within the network cloud (and where the services are available), and pointers to forward queries for services not available locally. Eventually, an active router seeking a service discovers where it can obtain the service and downloads the corresponding event-handler via ftp-like bulk transfer. Event handlers are loadable modules that are dynamically loaded into the kernel space. The pointer to the event handler in the event table is then updated. Instantiating new services thus incurs a one-time seek-fetch-load overhead. Subsequent service invocations occur in the kernel space and are highly efficient. For this experiment, the initial configuration of the kernel uses a tail drop policy for all queues (as most routers in the internet do). Each of the three queues in the router is then reconfigured on the fly with a different drop policy. The specific policies assigned to the queues were random drop for oq1, head drop for oq2 and priority drop for oq3. The dropping schemes use two local state variables, the flow’s queue and the incoming packet to decide which packet to drop. We
46
Raghupathy Sivakumar et al.
70
200
60 150
50
#
# Packets
40
Frames
Fair Share 100
30
20
50
10 0
0
flow1 flow2
flow1 flow2
flow1 flow2
Flow1 Flow2
I Frames
P Frames
B Frames
With Tail Drop
Fig. 2. Priority Dropping
Flow1 Flow2 With Random Drop
Fig. 3. Random Dropping
now present the performance evaluation through two sets of results, one showing the functional correctness of the “programmed router” and the other showing the overhead in terms of latency induced by the active network component in the router. For the first part, we present three graphs, each showing the functional correctness of the router programmed with priority dropping, random dropping and head dropping respectively. For the priority dropping policy, two MPEG streams (with priorities of I, P and B set to 2, 1 and 0 respectively) were used. The first MPEG stream (flow1) does not use priority dropping while the second stream (flow2) programs the router with a priority dropping mechanism (by injecting the appropriate code). Graph 2 shows the difference between the number of I, P and B frames received by the receivers of flows 1 and 2 respectively. In order to show the functional correctness of the random dropping mechanism, the test performed measured the fairness that two flows enjoy when they share the same queue in the bottleneck router. Graph 3 shows the number of packets that got through for the two flows when a tail drop mechanism (default policy of the router) was used and when a random drop mechanism was used. The graph shows that the fairness improves when the router is programmed to perform random dropping as opposed to tail dropping. Graph 4 illustrates the performance of two flows, one using tail drop and the other using head drop, in terms of the effectiveness of the packets that get through for both the flows. For purposes of this test, the effectiveness of packets increases with the sequence number. The graph shows the net effectiveness observed by the two flows when using the two drop policies respectively. As expected, the effectiveness for the flow using a head drop scheme is much higher than that of the other flow. Graph 5 shows the latency observed by a flow traversing through a conventional router employing a priority drop mechanism and the latency observed by a flow traversing a Protean active router which has been programmed to perform priority dropping. Since the protean event handlers are instantiated in the kernel, once the handler is installed, the latency difference is close to zero. But the separation between the two curves represents the time taken to instantiate the priority drop event handler in the protean router. This time was observed to be around 230 ms.
The Protean Programmable Network Architecture
47
1.4e+06 5000
1.2e+06
4500 4000
Time in microseconds
Flow using head drop
Effectiveness 3500 Factor 3000 2500 2000 1500
600000 400000 Flow traversing normal router
200000
500 0
20
40
60
80
Time t ->
Fig. 4. Head Dropping
5
Flow traversing active router
Flow without head drop
1000
0
230 ms
1e+06 800000
100
120
0
0
20
40 60 Packet sequence number
80
100
Fig. 5. Active Component’s Latency
Summary
In this paper, we have described elements of the Protean active router and state management architecture and presented an illustrative case study of Protean. The key aspects of Protean are the event-driven architecture, the ability to create hierarchical virtual network contexts, and the ability to access and update nonlocal state from specialized services that are invoked at a router. At this point, Protean is still in preliminary stages of design and development, and several important issues are still to be resolved. However, we believe that the architecture has some features of interest and may offer some new perspectives on the key issues of state management and flexible service creation in active networks.
References 1. D. L. Tennenhouse and D. J. Wetherall. Towards an Active Network Architecture. Computer Communication Review, 26(2), April 1996. 38 2. Jonathan M. Smith. et. al. The SwitchWare Active Network Architecture. IEEE Network Special Issue on Active and Controllable Networks, 12(3):29–36. 38 3. R. Sivakumar, N. Venkitaraman, V. Bharghavan. A Scalable Architecture for Active Networks. TIMELY Group Research Report, 1999. 42, 44 4. S. Guha and S. Khuller. Approximation algorithms for connected dominating sets. Tech. Rep. 3660, Inst. for Adv. Computer Studies, Dept. of Computer Science., Univ. of Maryland, College Park, June 1996. 42
A Dynamic Pricing Framework to Support a Scalable, Usage-Based Charging Model for Packet-Switched Networks Mike Rizzo, Bob Briscoe, J´erˆome Tassel, and Konstantinos Damianakis Distributed Systems Group, BT Labs Martlesham Heath, Ipswich IP5 3RE, England {michael.rizzo,bob.briscoe,jerome.tassel,konstantinos.damianakis}@bt.com
Abstract. We describe a dynamic pricing framework designed to support a radical approach to usage-based charging for packet-switched networks. This approach addresses various scalability issues by shifting responsibility for accounting and billing to customer systems. The ultimate aim is to create an active multi-service network which uses pricing to manage supply and demand of resources. In this context, the role of the dynamic pricing framework is to enable a provider to establish ‘active tariffs’ and communicate them to customer systems. These tariffs take the form of mobile code for maximum flexibility, and the framework uses an auditing process to provide a level of protection against incorrect execution of this code on customer systems. In contrast to many active networks proposals, the processing load is moved away from routers to the edge of the network.
1
Introduction
As the Internet continues to grow and evolve into a global, multi-service network, the issue of how to charge fairly and sensibly for network services is becoming increasingly relevant. The flat-rate charging model, currently used by virtually all ISPs worldwide, relies heavily on characteristics of the present best-effort Internet which may no longer be valid in the near future. For example, the inability of the present Internet to offer differential services means that it does not make sense to speak of higher prices for better services. And the bandwidth limitations associated with dial-up connections provide a convenient cap on resource usage by any one individual, thereby protecting providers’ routers from being hogged by a single user at the expense of other users. It is envisaged that the current best-effort Internet will gradually be replaced by a network that can offer differential levels of network service that are better suited to the individual needs of specific applications. For example, a video-on-demand application might make use of a high-bandwidth, low-jitter, reservation-based service, whilst email would continue to use best-effort service. Furthermore, it is expected that access bandwidth for end-users will increase in order to enable the provision of high quality multimedia services. In this scenario, the flat-rate charging model gives rise to the anomalous situation wherein Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 48–60, 1999. c Springer-Verlag Berlin Heidelberg 1999
A Dynamic Pricing Framework
49
users that make heavy demands on network resources are charged the same amount as users that make lighter demands. Moreover, the service offered to lighter demand users is likely to be impaired by the provision of services to higher demand users. This situation is likely to result in either convergence back towards a single-service network (where users will always request the best service possible), or denial of service to users whenever network resources are operating at maximum capacity. It makes sense, therefore, to abandon flat-rate charging in favour of a usagebased model in which there is a relationship between price and resource usage. Indeed it is possible that such a model might also have been considered for the current best-effort Internet, were it not for the substantial increase in operational complexity involved. Whilst flat-rate charging is extremely easy to implement, usage-based charging requires that some form of usage accounting be carried out before a charge can be computed. It is generally accepted that the additional operational cost associated with such accounting is substantial due to the increase in processing power that is required, not only to cope with accounting processes as such, but also to compensate for the the blocking nature of the measurement process, which has a negative effect on throughput. Consequently usage-based models are not yet considered viable, and many proposals have focused on compromise solutions based on aggregation [3,9,8,11]. As part of a project investigating radical approaches for operational support systems, we are investigating the possibility of lowering the operational cost of usage-based charging by shifting responsibility for billing to the users themselves. We propose that users measure their own traffic, and compute their own bills using tariffs supplied by the provider. This spreads the load so that each processing unit uses a near-negligible amount of resources for billing purposes, all the more so when one considers that most users’ machines spend much of their time in an idle state. It also allows network routers to focus on their principal function without sacrificing throughput. Using this approach it is possible to charge users differentially on the basis of both volume and quality of services. This gives providers a degree of control over resource usage, because tariff structures can be determined so as to give users an incentive to use the minimal amount of resources that meets their requirements. Furthermore, finer-grain control over supply and demand management of network resources can be achieved by price variation along the lines of established economic supply and demand principles. At times when resources are in short supply, demand is curbed by raising service prices. Conversely, at times when resources are under-utilised, demand is stimulated by lowering prices. If the decision-making involved in changing prices is (partially) automated1 , then the result is an intelligent active network which performs its own supply and demand management. 1
In this paper we limit ourselves to describing a general framework which can support this concept. It is beyond the scope of the paper to present arguments related to the desirability or extent of automated decision-making in this regard.
50
Mike Rizzo et al.
This approach immediately raises several questions, particularly with respect to trust, stability, security, and user acceptance. Some of these questions are briefly covered in sections 2 and 3, and are expanded on in another paper [6]. The principal focus of this paper, however, is the framework which enables a provider to communicate tariffs and price variations to its customers. Following a broad overview of our approach to charging in Section 2, Section 3 outlines the issues of specific concern to tariff representation, dissemination, and application. Section 4 describes a prototype that was developed to demonstrate the approach, and to gain experience in the choice of suitable implementation techniques. Section 5 follows with indications for further work. Finally Section 6 concludes with some general implications for active networks.
2
Background
We assume a packet-switched network in which a variety of network services are made available to users. The exact nature of these services, and the specific characteristics that form the basis upon which they might be differentiated, are not important for our purposes, and may vary from one provider to another. However we assume that, in general, service usage may be measured on the basis of packet counts, and may be classified using some notion of quality of service, irrespective of whether this is reservation-based [4] or class-based [2].
Fig. 1. Radical charging model: processes and flow of information
In the proposed charging model, the customer system is responsible for accounting for usage under the instruction of the provider. The provider supplies tariffs for each of the available services, along with other information pertaining to their application e.g. how frequently they should be applied. The customer’s system measures and categorizes both inbound and outbound traffic, applies the
A Dynamic Pricing Framework
51
appropriate tariffs for each category of traffic, and periodically sends accounting reports to the provider. The customer’s system might also be responsible for making payments, although this may be delegated to some other entity. The various processes and data flows are depicted in Fig. 1. This model clearly places a lot of trust in customer systems. Our view is that this does not pose a problem, as long as the provider is able to check up on a sample of its customers from time to time. A random audit function may be employed by the provider’s accounting process to make measurements pertaining to a particular customer at the provider end, and verify that the customer’s accounting reports tally with the observations made. The model is not targeted solely at the edge of a packet network, but is intended to be applied recursively throughout the network. Thus an access provider might be the customer of a larger provider, which may in turn be the customer of a backbone provider. A multi-host edge customer might also employ a similar model within its network in order to recover costs. Whilst it is likely that charge for network use will be uni-directional at the edge of the network, this is not the case in general. The distinction between provider and customer becomes somewhat blurred as one approaches the core of the network. There is also a charging issue related to the direction of traffic: should a chargeable entity pay for packets sent, packets received, or both? In general there are four possible charges between two entities A and B: – – – –
A charges B A charges B B charges A B charges A
to to to to
send packets to it; receive packets from it; send packets to it; receive packets from it.
An entity, therefore, can assume the roles of both provider and customer with respect to some other entity. We allow charging for any combination of the above to allow maximum flexibility with respect to traffic direction when establishing charging policies and tariffs.
3
The Tariffing Subsystem
Having outlined the general principles of our charging model, this section focuses on the role of the tariffing subsystem, which comprises: – establishment and adjustment of tariffs on the provider side; – dissemination of tariffs and adjustments to customer systems; – application of tariffs by customer systems to local measurements. The remainder of this section characterizes the requirements attached to this role, setting the scene for the subsequent section.
52
3.1
Mike Rizzo et al.
The Nature of Tariffs
We assume that a provider may set a separate tariff for each category of service that it offers, and that for a particular category of service, a tariff may change periodically. We do not exclude the possibility that a tariff may change frequently e.g. in response to changing traffic patterns. However, we distinguish between two kinds of change, namely replacing a tariff and adjusting a tariff. The former implies substitution of an old tariff by a new one, whilst the latter involves ‘tuning’ an existing tariff. We envisage that tariff adjustments will occur more frequently than tariff replacements. There is a clear distinction between ‘tariff’ and ‘price’. A tariff is responsible for determining a price with respect to a set of given contextual parameters. It is therefore possible for a price to change without there being a change to the tariff that determines it. For example, a traditional PSTN tariff might offer one price for peak hours, and another price for off-peak hours. Here price varies according to the time of day, but the tariff remains constant. Continuing with this example, a tariff adjustment might involve changing the off-peak price, or perhaps the times of day at which off-peak is considered to start. If an altogether different tariff structure is required e.g. due to the introduction of a new discount scheme, then a tariff replacement is required. It is a goal of the model to allow maximum flexibility in the structure of tariffs. Ideally it should be possible for tariffs to be modelled on complex rules. For example, a provider may wish to deploy a tariff for best-effort traffic which operates such that customers are penalized if their systems do not back off in the presence of congestion. It is also desirable to put as much intelligence as possible into tariffs, so as to avoid frequent transmission of tariff changes to customers. This is particularly relevant at times when network congestion is high, in which case a tariff should be capable of making price adjustments without having to receive explicit instruction from the provider. 3.2
Supply and Demand Management
Our model allows on-the-fly changes to prices, and can be used to support to supply and demand management wherein prices fluctuate on the basis of current demand. This concept may come across as too radical to some. However it is worth pointing out that many people are quite happy to purchase variable rate mortgages, or invest in the stock market. And just as other people pay a fee for a fixed-rate mortgage, or are prepared to commit themselves to a safer long-term savings plan, it is quite conceivable that they will be prepared to pay for their price to be kept fixed, or for price variations to be constrained in accordance with some pre-defined contract. The notion that there may be different charging schemes for a given service category leads us to the concept of a product. For example, a product A might offer best-effort service at a fixed price, whilst another product B might offer best-effort service at a variable price. It is envisaged that a provider will adjust product prices on the basis of observations it makes with respect to:
A Dynamic Pricing Framework
– – – –
53
the prices it is being offered by its own providers; competitors’ prices; current resource utilisation; relative demand for different products e.g. the price for a particular product might be lowered so as to entice users to switch to it.
Price adjustments can be effected in one of three ways: – A tariff may be able to adjust prices on the basis of observations made by local monitoring, without necessitating explicit communication from the provider. This requires foresight at the time the tariff is designed, and is limited to those price variations which are dependent exclusively on observations local to the customer system. – The provider may tune a tariff by adjusting some of its parameters. This kind of adjustment is required when the decision is dependent on observations which cannot be made by customer systems e.g. variations in the prices offered to the provider by its own providers, and the changes required can still be accomodated by the present tariff. – The provider may replace a tariff. This is required when the present tariff cannot accomodate the changes that are required. The first of these is by definition an automated decision. The second may be performed both manually or by an agent that issues adjustments on the basis of observations made by the provider system. The third is likely to be performed manually, as replacement of a new tariff represents a major change in business strategy. In particular, creation of a new tariff involves an element of design which can only be sensibly carried out by a human with expertise in economics. However, it is possible that given the availability of a repertoire of tariffs, an agent might be employed to automatically switch tariffs for a product on the basis of a set of specified rules. Given the possibility of frequent, on-the-fly changes, it is important that customers have some way of knowing what is going on. It is difficult to construct a customer user interface that can convey the workings of a tariff if the tariff is not known at the time the customer software is deployed. It is therefore desirable that the rules that define tariffs are accompanied by user interfacing suggestions that could somehow be used by the customer system 2 .
4
Implementation
This section describes a prototype that we implemented to demonstrate the tariff subsystem outlined above. The key features of our design include: 2
This is intended for feedback purposes only. We envisage that the customer system will also employ an agent (potentially supplied by a regulator) to monitor tariffs for the purposes of verifying that they are within the bounds of the contract.
54
Mike Rizzo et al.
– using mobile code to represent tariffs and associated graphical user interface (GUI) components; – use of a repeated multicast announcement protocol to communicate tariffs and tariff adjustments efficiently; – using dynamic class loading and reflection in order to receive and tune tariffs. The prototype comprises two applications, namely: – a provider system which allows the provider to introduce, replace, and tune tariffs for a number of products; – a customer system that enables customer to keep track of the charges being applied for the products they are using. The provider system is intended to serve multiple instances of the customer system running on different hosts in a multicast-enabled network. A multicast protocol is used to communicate tariff data to customer systems. 4.1
Tariff Representation
Fig. 2. UML description of tariff definition framework
In order to maximize flexibility with respect to the definition of tariffs, we chose to represent tariffs using Java classes. This technique also proved useful for supplying custom-built GUI components to support visualisation of tariffs. Figure 2 illustrates the framework within which tariffs are defined. The Tariff interface acts as the base class for all tariffs. This defines a single operation getGUI() which returns a Java SWING component that can be incorporated into the customer’s GUI. The intention is that this GUI component will enable the customer to visualise the behaviour of the tariff using the most appropriate user interfacing techniques for that tariff. Interfaces derived from Tariff establish a set of tariff types, each of which is associated with a different set of measurement parameters. These parameters are identified by listing them in the signature of the getCharge() method. For example, the interface RSVPTariff defines getCharge() as receiving an RSVP TSPEC, allowing for the definition
A Dynamic Pricing Framework
55
of tariffs that compute price on the basis of the characteristics of an RSVP reservation [1]. Another interface, PacketCountTariff, defines getCharge() as receiving measurements of packets in, packets out, and current congestion (typically measured as a function of packet drop), allowing for the definition of tariffs that are dependent on packet counts and sensitive to congestion. Tariffs are defined by providing implementations of tariff interfaces. For example, PacketCountLinear implements PacketCountTariff to compute charges in proportion to packet counts. CongestionSensitiveLinear works on a similar basis, but adds a penalty charge if the customer does not stay within specified traffic limits in the presence of congestion. A tariff implementation may make use of other ‘helper’ classes to assist it in its operation, as well as one or more GUI component classes for customer visualisation purposes. A GUI may also be required to enable the provider to make tariff adjustments. A complete tariff description then, consists of a set of Java classes, some of which are destined for the customer system and others which are intended for use by the provider system. The customer-side classes are bundled into a Java JAR file to facilitate loading by the provider system. 4.2
Tariff Dissemination and Adjustment
In order to deploy a new tariff, the provider system first loads the tariff classes which it requires into its execution environment. It then loads the customerside bundle, serializes it, signs it with a private key (to enable authentication by customers), and uses an announcement protocol to distribute it to customer systems. Upon receiving the bundle, each customer system verifies the signature, unpacks the bundle, and loads the classes into its execution environment using a purpose-built dynamic class loader. An instance of the received tariff class is created and installed in place of the previous tariff. If the tariff has a GUI component (obtained by calling the tariff object’s getGUI() method), then it replaces the GUI of the previous tariff. The change in GUI serves to notify the user that the tariff has changed. Tariff adjustment involves the remote invocation of an operation which is specific to the tariff currently in force. This means that a customer system cannot know the signature of this operation in advance of receiving the tariff i.e. the operation will not be listed in any of the tariff interfaces known to the customer system. In order to get around this problem, use is made of the reflection feature supported by Java. In order to disseminate a tariff adjustment, the provider creates an instance of an Invocation object, which stores the name of the operation to be called, together with the parameters that are to be supplied to it. This object is then serialized, signed, and announced using the announcement protocol. When an adjustment is received and verified by a customer system, the Invocation object is de-serialized and applied to the current tariff by using reflection to invoke the described operation. In order to simplify the announcement protocol, adjustments are required to be idempotent and complete. Idempotency guarantees that a tariff will not be adversely affected if an adjustment is applied more than once. Completeness
56
Mike Rizzo et al.
implies that an adjustment determines the entire parameter set of a tariff object, so that an adjustment completely removes the effect of any previous adjustments. 4.3
Tariff Application
The customer system applies a tariff by repeatedly invoking the getCharge() operation supported by that tariff every second, and adding the returned value to the cumulative charge. The parameters supplied to getCharge() depend on the kind of tariff currently in force. For example, if the tariff is an implementation of PacketCountTariff, then measurements of inbound packets, outbound packets and congestion over the past second are required. However, if the tariff is an implementation of RsvpTariff, then only a TSPEC describing the current reservation is required3 . Each invocation of getCharge() also results in an update to the tariff-specific GUI e.g. in CongestionSensitiveLinear, the usage parameters supplied to getCharge() are used to update the graphical displays of traffic and congestion. 4.4
Announcement Protocol
The announcement protocol is used to communicate serialized tariffs and adjustments from a provider system to multiple customer systems. The number of customer systems is assumed to be large, and a repeated multicast solution in the vein of SAP [10] is adopted. Each product supported by a provider is assigned a multicast channel for announcement purposes. Customer systems listen to the channels corresponding to the products that they are using. For each product channel, the provider repeatedly announces the current tariff and the most recent adjustment made to it (if any). Each announcement carries a version number, which is incremented each time the announcement is changed. Customer systems only process announcements when a version number change is detected. If a new customer joins a channel, it waits until it receives a tariff before processing any adjustment announcements. Furthermore, an adjustment is only applied if its announcement version is greater than that of the current tariff, thereby ensuring that a missed tariff announcement does not result in the application of a subsequent adjustment to an old tariff. 4.5
Illustration
Figure 3 shows the GUI for the customer-system with the GUI component for the CongestionSensitiveLinear tariff embedded within it. The latter displays information about traffic and congestion levels, and indicates traffic limits which must be observed when congestion is above a specified threshold. The formula used to compute the current price is displayed in the bottom right corner. In the 3
Mention of this tariff is intended purely for illustration purposes, and does not necessarily represent a realistic or sensible way to charge for RSVP reservations.
A Dynamic Pricing Framework
57
Tariff specific GUI component
Fig. 3. Customer interface
case depicted, congestion is above the threshold and the incoming traffic level is above its limit, with the result that a penalty of 0.2 is added to the price. The upper part of the customer GUI displays the current charge being applied (per second), and the total charge accumulated by the customer. The GUI also allows the user to specify the public key to be used for authentication purposes, and shows details of the multicast address being listened to for announcements. The provider system GUI consists of a set of product windows, each allowing control over a particular product. Figure 4 shows the provider-side window corresponding to the product being used by the customer system shown earlier. The lower part of the window contains the provider-side GUI component for the CongestionSensitiveLinear tariff. Using this interface, the provider can manually adjust the parameters associated with the current tariff. Any adjustments are communicated to customer systems using the announcement protocol, and are immediately reflected in customer-side GUI components. The upper part of the product window allows the provider to replace the tariff currently associated with that product. Each tariff is fully described by a policy, which contains such details as a tariff name, a descriptive string, and more importantly the location of the JAR file on the provider’s file system. Once a new policy has been selected, the ‘Activate’ button injects the corresponding tariff into the network, instantly replacing the existing tariff for that product.
58
Mike Rizzo et al.
Tariff specific GUI component
Fig. 4. Product window (in provider GUI)
5
Further Work
To date we have focused primarily on the establishment of general principles related to dynamic pricing, and the technical infrastructure required to support these principles. However, we have not identified those specific configurations of the framework which are economically viable or socially acceptable. This can only be achieved by a combination of rigorous modelling, and experimentation with user trials. We intend to continue to develop the existing infrastructure into a testbed for experimenting with different tariffing schemes and for conducting user trials in order to gain experience with relevant human factors issues. We are currently working on improving on a number of aspects of our current implementation, particularly with respect to the announcement protocol. Currently this makes a number of assumptions which are not valid in general. For example, it assumes that a tariff will fit in a single datagram. At best this leads to a packet fragmentation problem, but at worst it means that larger tariffs cannot be announced. There is also a problem in that well-known announcement addresses are expected to be known in advance by customers. This does not give the provider any flexibility with respect to channel assignments e.g. the provider may wish to move a product to share a channel with another product if it observes that a large number of customers are using both products simultaneously. Last but not least, there are a host of timing issues which need to be addressed e.g. working with multiple independent physical clocks.
6
Concluding Remarks
The dynamic pricing framework described in this paper demonstrates an active network approach to demand and supply management of network resources. This is relevant to the debate over whether overprovisioning is likely to be more costeffective than rationing of resources in a multi-service network [12,5]. By lowering the operational cost of usage-based charging, and by providing an infrastructure
A Dynamic Pricing Framework
59
within which resource rationing mechanisms can be adjusted and fine-tuned as required, many of the arguments against resource rationing are invalidated. Additionally, our experience with the dynamic pricing framework has some interesting bearings on the general areas of active networks and mobile code. One important point relates to the fact that active networks need not rely solely on the processing capacity of provider equipment. In particular, for applications with high processing loads, it may be possible to shift much of this load right up to the very edge of network, using multicast technology for efficient deployment of mobile code. Furthermore, it may be possible to exercise some control over core network elements as a side-effect of mobile code deployed to the edge of the network. In this respect, the dynamic pricing framework provides an interesting contrast to mainstream thinking on active networks, where the emphasis is normally on deploying mobile code to network routers. Another point relates to the well-known security problem concerning the protection of mobile code from malicious or erroneous execution platforms. The sample-based auditing approach adopted in our charging model does not represent a complete solution to this problem, but is a reasonable compromise which can detect some cases of abuse whilst acting as a deterrent in general. The class loader used for deployment of mobile code in our implementation differs substantially from other approaches to dynamic loading of remote classes in Java, as exemplified by Bursell et al [7]. Instead of loading each class individually from a remote class repository using a request-reply ‘pull’ protocol, we employ a ‘push’ approach in which a bundle of classes is delivered to the receiver in a single transaction. The receiver can then load all the classes without having to access the network. This is useful in situations where the sender determines which classes the receivers should be loading, and has the advantages that the effects of network latency are minimized, and that multicast may be employed to push bundles to several receivers simultaneously.
References 1. S. Berson, R. Lindell, and R. Braden. An architecture for advance reservations in the internet. Technical report, USC Information Sciences Institute, July 1998. 55 2. S. Blake et al. An architecture for differentiated services. Request for Comments (Proposed Standard) 2475, Internet Engineering Task Force, December 1998. 50 3. Roger Bohn et al. Mitigating the coming internet crunch: multiple service levels via precedence. Technical report, University of California, San Diego, Nov 1993. 49 4. R. Braden et al. Integrated services in the internet architecture: an overview. Request for Comments (Proposed Standard) 1633, IETF, Jun 1994. 50 5. Lee Breslau and Scott Shenker. Best-effort versus reservations: A simple comparative analysis. In Proceedings of SIGCOMM ’98, Vancouver, 1998. 58 6. Bob Briscoe et al. Lightweight, end to end usage-based charging for packet networks, 1999. http://www.labs.bt.com/projects/mware/charging.htm. 50 7. M. H. Bursell et al. A mobile object workbench. In Mobile Agents ’98, 1998. 59 8. David D. Clark. A model for cost allocation and pricing in the internet. In MIT Workshop on Internet Economics, March 1995. 49
60
Mike Rizzo et al.
9. Jon Crowcroft. Pricing the internet. In IEE Colloquium on Charging for ATM (ref. no. 96/222), pages pp1/1–4, November 1996. 49 10. M. Handley. SAP: Session announcement protocol. IETF Draft, Nov 1996. 56 11. Frank P. Kelly. Charging and Accounting for Bursty Connections, pages 253–278. MIT Press, 1997. 49 12. Andrew Odlyzko. The economics of the internet: Utility, utilization, pricing, and quality of service. In Proceedings of SIGCOMM ’98, Vancouver, 1998. 58
Active Information Networks and XML Ian Marshall1, Mike Fry2, Luis Velasco1, and Atanu Ghosh2 1
BT Labs, Martlesham Heath, Ipswich, IP5 3RE
[email protected], 2 UTS, Sydney, NSW2007, Australia atanu,
[email protected] Abstract. Future requirements for a broadband multimedia network are discussed and a vision of the future network is presented. Three key needs are identified; rapid introduction of new services, dynamic customisation of services by clients, and minimal management overhead. Application layer active networking, perhaps the most pragmatic and immediately realisable active network proposal, is a potential solution to all three. Combining eXtensible Markup Language and Application Layer Active Networking yields strong benefits for networked services. A Wide range of applications can be developed based on the flexibility of XML and the richness of expression afforded by the metadata. A system of network intermediaries based on caches, which are also active and driven by XML metadata statements, is described.
1
Introduction
The characteristics and behaviour of future network traffic will be different from the traffic observed today, generating new requirements for network operators. Voice traffic will become another form of data, most users will be mobile, the amount of traffic generated by machines will exceed that produced by humans, and the data traffic will be dominated by multimedia content. In the medium term the predominant multimedia network application will probably be based around electronic commerce capabilities. Operators will therefore need to provide a low cost service, which offers an advanced global trading environment for buyers and sellers of any commodity. The e-trading environment will be equipped with all the instruments to support the provision of a trusted trading space. Most important is the ability to support secure transactions over both fixed and mobile networks. Networks will thus need to be robust, contain built in security features and sufficiently flexible to address rapidly evolving demands as other unforeseen applications become predominant.Existing networks are very expensive, and the deployment of new communication services is currently restricted by slow standardisation, the difficulties of integrating systems based on new technology with existing systems, and the overall system complexity. The biggest cost is management. The network of the future will Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 60-72, 1999. Springer-Verlag Berlin Heidelberg 1999
Active Information Networks and XML
61
need to be kept as simple as possible by using as few elements as possible, removing duplication of management overheads, minimising signalling, and moving towards a hands off network. The simplest (and cheapest) current networks are multiservice networks based on powerful ATM or IP switches. New transport networks are designed on the basis that nearly all applications will eventually use internet-like connectionless protocols. The difficulties of adding services such as multicast and QoS to the current internet demonstrate that even these simpler IP based networks will require additional mechanisms to enhance service flexibility. The simple transport network will thus need a flexible service surround. The service surround will provide a trusted environment, with security features (for users, applications and hosts), QoS support, application specific routing, automatic registration and upgrade for devices connected to the transport network. It will also enable Network Computing facilities such as secure gateways, application layer routers, cache/storage facilities, transcoders, transaction monitors and message queues, directories, profile and policy handlers. Such a service surround will likely be based on some form of distributed middleware, enabling the features to be modular and interoperable. The service surround must enable rapid introduction of new features by the operator. In order to minimise the management overhead clients will directly control which features should be used for a particular session, without operator intervention. The network will thus need to know nothing of the semantics of the session. To achieve this a middleware based on some form of active services or active networks will be required.
2
Active Networks (Tennenhouse)
Active networking was originally [TENN] a proposal, by Tennenhouse at MIT, to increase network flexibility by adding programmes, that are intended to run on network devices that the packet encounters, to the packet header. This is referred to as the capsule approach. There are a number of problems; The maximum transport unit (MTU) size in the internet is typically 565 bytes. This will likely be upgraded to 1500 bytes in the near future, however it is clear that if there is to be a programme embedded in every packet the programmes must be very small, even if the programme is not confined to the header. This severely restricts the flexibility that can be offered, although it has been shown that copy instructions to emulate multicast can be embedded in packet headers in some circumstances. It has been proposed that only those packets initiating flows should carry programmes. However, it is common for the packets in an individual flow (such as a document retrieval) to use multiple routes across the internet. This is a result of the routers having the freedom to use the best available route at any time, so as to maximise network resiliency. Therefore, in order for a programme to be applied to all packets in a flow, either the route for all subsequent packets in the flow must be pinned, so that all packets flow through the node where the programme was loaded, or, the programme must be copied to all nodes on valid routes. The second option is clearly impractical. The first option is currently not possible, and in any case creates an undesirable reduction in network resilience.
62
Ian Marshall et al.
The proposal envisages programmes being supplied by network clients. However service operators will never permit third party programmes to run on their equipment without a strong guarantee that the programme will not degrade performance for other users. Such a guarantee requires the programmes to be written in a language in which behaviour is verifiable through pre-run checks, resource usage can be tightly controlled and termination is guaranteed. The Safetynet [WAK] project at Sussex University in the UK is designing a promising language, but the research is still at a very early stage. Since it will be extremely hard to create interesting programmes in a language which is simple enough to enable resource control and termination guarantees, the flexibility offered by this approach is probably somewhat limited, even when the language is mature. The programmes are intended to be added to the switch control kernel in the router. All the known approaches to making the kernel extensible degrade performance. Packets which do not require router programming will thus suffer an unacceptable performance penalty. Undesirable interactions between programmes and network features are almost impossible to predict and control. For example a mobile client will potentially send programmes to several routers where they are used once, then do not receive the acknowledgement packets that would terminate the programmes as the acks are routed to the clients current location. Standards for the interface offered by active routers must be developed before any service based on this proposal could be offered. Appropriate standards are not even being discussed at present. Despite the manifest difficulties inherent in this proposal it has succeeded in highlighting an important requirement, and stimulating discussion amongst a previously disparate community of researchers attempting to develop more immediately realisable means to resolve the requirement. The main threads are summarised in the next section.
3
Active Networks and Services
The first response to Tennenhouse was a somewhat different flavour of active networking, in which the packets do not carry programmes but transport layer header flags indicating the desirability of running a programme [ALEX97]. This approach attempts to resolve the issue of restricted programme size, and potentially gives network operators the freedom to choose an appropriate programme of their own which has been tested. However, the proposal makes no progress on the last 3 issues, and the range of flags could not be large as the space available in the transport layer header is tiny. This proposal has impacted the IETF diffserv activity which is enabling QoS in ip networks by adding flags to transport headers, one of which is an active tag. The second response came from the programmable network community who had for some time been looking to make networks more active with respect to operators [LAZ] and had progressed to a concept called “switchlets” [ROO] in order to avoid requiring operator intervention for all programmable changes. Switchlets enable clients to control their own VPNs by downloading their own control software onto a designated subset of the switch. This is only a partial solution as it only provides
Active Information Networks and XML
63
flexibility within a VPN for a single large customer. Programmable interfaces are being standardised in IEEE P1520 [BIS] Smith and co-workers at University of Pennsylvania and Bellcore are working on a proposal (switchware [ALEX98]) which combines programmable packets, switchlets and a safe language (PLAN). This could be regarded as a realisable version of Tennenhouse, but only in the long term. We have proposed a third alternative, known as application layer active networking [ALAN], which is perhaps the most immediately realisable. Similar proposals [AMIR,PARU] were described as active services. In these systems the network is populated with active nodes referred to as service nodes, or dynamic proxy servers. These can be thought of as equivalent to the http caches currently deployed around the internet, but with a hugely increased and more dynamic set of capabilities. They are logically end systems rather than network devices. This approach relies on redirecting selected packets into an application layer protocol handler, where user space programmes can be run to modify the content, or the communication mechanisms. Packets can be redirected using a single active packet tag in the transport layer header, or on the basis of the mime type in the application layer header. There is no need for additional flags or for any new standards (indeed many implementations use the ubiquitous http), and an arbitrarily large number of programmes, of arbitrary size can be used. The programmes can be selected from a trusted data source (which may itself be a cache) containing only well tested or specified programmes, and can be run without impacting router performance or requiring operator intervention since they do not impact the control kernel for normal packets. Programmes can be chosen to match the mime type of the content (in the application layer header), so again no additional data or standards are required. Alternatively a more detailed specification can be supplied in xml metadata, if desired. There is a small performance penalty associated with the redirect operation, but this is acceptable for most applications. The major outstanding issue is the interactions between dynamically loaded programmea, and this should be a priority for ongoing research. There is a further proposal [CAO] that allows servers to supply cache applets attached to documents, and requires proxies to invoke the cache applets. Although this provides a great deal of flexibility, it lacks important features like a knowledge sharing system among the nodes of the network (It only allows interaction between the applets placed in the same page). The functionality is also severely restricted by the limited tags available in HTML (HyperText Markup Language). Most importantly the applets must be supplied by the content source and cannot necessarily be trusted. Clients do not have the option of invoking applets from trusted 3rd party servers. Using mime-types (as in ALAN) provides more flexibility than HTML tags, but still restricts the range of applications that can be specified by content providers, as, different operations are often required for content with identical mime types. It is therefore necessary to find a better way to specify new services. XML provides a very promising solution, since the tags are extensible and authors can embed many different types of objects and entities inside a single XML object. For example policies describing resource and security requirements can be expressed and transferred with the object. In this paper we present a design for a modified ALAN based on XML, and describe how it could be used to provide a customer driven QoS
64
Ian Marshall et al.
routing capability . We also demonstrate feasibility of the use of XML by implementing and measuring a simple example service.
4
ALAN and XML
Our design is built in several layers and is based on existing technology. Figure 1 shows the architecture of the prototype. The first layer is a fully populated cache hierarchy, with caches placed at all domain boundaries and network bottlenecks. We envisage active nodes and caches being co-located since the optimal sites for most of the activities proposed for active networks are domain boundaries. In addition it is advantageous to maintain a cache of programmes required by the active services at an active node and a web cache is a convenient implementation. For the prototype, we have used squid v1.19 [WESS] for the cache. The second layer and the upper layers constitute the core of our system and will be discussed thoroughly within this paper. An Application Layer Active Network Platform (ALAN) implements the active services. One of these services is an XML parser that provides the functionality to handle metadata associated with objects.
XML ACTIVE NETWORK
ACTIVE SERVICES
XML PARSER
ACTIVE NETWORK (ALAN)
CACHE NETWORK (SQUID) Fig. 1. shows the architecture or the prototype. The first layer is the cache network. For the prototype, we have used squid v1.19 [WESS97] for the cache. The second layer and the upper layers constitute the core of our system and will be discussed thoroughly within this paper. An Application Layer Active Network Platform (ALAN) implements the active services. One of these services is an XML parser that provides the functionality to handle the active objects and Metadata.
The ALAN Platform is a Java RMI based system built by the co-authors from the University of Technology, Sydney in collaboration with BT-Labs to host active services. It provides a host program (Dynamic Proxy Server) that will dynamically
Active Information Networks and XML
65
load other classes (Proxylets) that are defined with the following interface: Load, Start, Modify, Stop [ALAN]. The platform provides a proxylet that analyses the HTTP headers and extracts the mime-types of the objects passing through the machine (HTTP Parser). After determining the mime-type of the object, the program chooses a content handler, downloads the appropriate proxylet from a trusted host to handle that mime-type, and starts the proxylet with several parameters extracted by the HTTP parser. Using this model, a wide range of interesting services can be provided. However, this original model cannot support the whole range of services we plan to implement. There is a need for additional data (not included in the HTTP headers) to manage interoperability among the services and to expand the flexibility and range of applications that can be developed. XML provides a mechanism to implement these improvements and appears a perfect complement to the architecture.
ACTIVE CACHE
MACHINE SERVER
Cache Program
Dynamic Proxy Server
HTTP Parser Content handler Proxylet
Trusted Proxylet Server
Fig. 2. Functionality of an active Node. The functionality of an active node (figure 2) is described as follows. Upon the arrival of an object into the node, the HTTP parser examines the header and gets its corresponding Mime-Type. If the object is an XML object, then the XML Parser is called and it will extract the Meta-Data. The metadata specifies which proxylets should be invoked, in which order, with which parameters, and under what circumstances. The parser then makes the appropriate calls to the DPS, which loads the proxylets.
The BT co-authors built a simple XML Parser in Java that works in collaboration with a new HTTP parser designed to utilise all the Metadata needed in the active applications. The original HTTP Parser [ALAN] has been completely rewritten by BT in order to integrate the XML parser seamlessly into the processing. The functionality of an active node (figure 2) is described as follows. Upon the arrival of an object into the node, the HTTP parser examines the header and gets its corresponding Mime-Type. If the object is an XML object, then the XML Parser is
66
Ian Marshall et al.
called and it will extract the Meta-Data. The metadata specifies which proxylets should be invoked, in which order, with which parameters, and under what circumstances. The parser then makes the appropriate calls to the DPS, which loads the proxylets.
5
QoS Routing
Modern networks must optimise the management of communication among nodes that are interconnected by diverse and alternate paths. These paths may be based on heterogeneous, technologies with divergent properties. This makes smart path choice an essential feature in order to provide quality services (QoS). QoS management could be based on in-band datagrams (e.g. class of service based routing) or on out-of –band reservations (flow based routing). Both approaches have their adherents in the network research community. QoS-based routing has been recognised as a missing piece in the evolution of QoS-based service offerings in the Internet and is the subject of a range of standardisation efforts in the IETF including: Integrated Services - QoS - plus ISSLL Resource Reservation - RSVP Traffic Engineering Differentiated Service - Class of Service based An interesting application that highlights many of the issues is the aircraft services application illustrated in figure 3. There are two basic services; communication with the flight deck and e-commerce/www access for the passengers. The aircraft (Node 1) has a 64kbit/s radio based bi-directional link to the control tower and a VSAT based downlink (2Mbit/s). The control tower can use wide-area connectivity to establish a high bandwidth link to the aircraft using ATM and Satellite combined. Path selection is performed at the Control Tower by examining the meta-information of the packets, deciding which is the most appropriate path and adding security if needed. There will also need to be QoS management in the plane to ensure life critical info from the flight deck is delivered into the bandwidth restricted downlink before any traffic originating from passengers. To illustrate the power of an active information network of the kind we have described we have designed a QoS routing scheme which is suitable for the above application. The scheme requires no new standards and could be implemented entirely in the user space of network nodes. For this application QoS can be characterised in terms of bandwidth, latency, security, strength of guarantee and uni or bi-directionality. Current Internet routing protocols, e.g. OSPF, RIP, use "shortest path routing", i.e. routing that is optimised for a single arbitrary metric, administrative weight or hop count. These routing protocols are also "opportunistic," using the current shortest path or route to a destination [CRAW]. Our aim is to enable route decisions to be made on multiple metrics and fixed for the duration of the flow. For example a node could have a connection via landline with low latency, low bandwidth and high security, and a satellite connection with high bandwidth, high latency and low security. The choice of best route will be application specific. Given access to local route information obtained through link
Active Information Networks and XML
67
state adverts, nodes can make QoS decisions if the application requirements are also available. In our design the application requirements are expressed in XML metadata. The XML Metadata is rich enough to express the application layer requirements and force a correct choice. Using our scheme it also allows return traffic to be correctly routed using an appropriate proxylet at the head end of the satellite link Satellite Downlink
Satellite Uplink Plymouth
Node 1 Aircraft
ATM Link
Radio Link
Node 2 Control Tower Brussels Terrestrial Link
Fig 3. The aircraft application is an excellent example that shows diverse communication paths based on different physical layers: Aircrafts (Node 1) will have a radio based bidirectional link to the control tower. Meanwhile, the control tower can use COIAS wide area connectivity to establish a high bandwidth link to the aircrafts using ATM and Satellite combined. Path selection is performed at the Control Tower by examining the meta-information of the packets, deciding which is the most appropriate path and adding security if needed. Users can use meta-information to mark their packets with their needs of bandwidth, latency, needed degree of security and needed degree of guarantee. This meta-information will help the system to classify the packets so a smart path selection. By doing this, the packets will be routed using the optimum path.
The policy syntax is based on the syntax for IPSEC security policies, where a set of fields are associated with a particular security association (or degree of security). This enables routing policies and security policies to be handled in the same way. For the routing policies the fields are; APP_TYPE, CONTENT_TYPE (MIME), BANDWIDTH, LATENCY, GUARANTEE, DUPLEX. The fields are associated with 6 tuples each containing the value of the field required by the associated content, and the priority (on a scale of 1 – 10) of obtaining that value. The QoS router will intercept all socket requests from application layer processes, read all policies in the policy database relevant to that process, check the current path data and choose the output port which matches the most policy criteria. The criterion weighting is used to distinguish between alternates matching equal numbers of different criteria. Security criteria are regarded as mandatory – if no match is available the QoS router will either
68
Ian Marshall et al.
directly request user intervention (in an end system) or will deny the request (in an intermediate node). In the latter case the proxylet will request input from the session source or the preceeding active node (which may just use an alternate route). In an initial implementation the QoS router would be a proxylet and would be invoked by direct calls from other proxylets requiring QoS routing services. We anticipate that, for performance reasons, if it proved popular it would be rapidly reengineered as a layered protocol intercepting all socket calls. In a retrieval session the content source would supply an XML object with its policies. The XML Object will be parsed, at any active nodes in the path, where the routing policies will be extracted and any other necessary proxylets will be started. The QoS router will then be able to choose the best available route for the associated flow. Any downstream active nodes can obviously perform further local route optimisations in a similar manner. For an interactive session the initiator would supply a session definition formatted as and XML object. DPS
Application/Proxylet
XML Parser
QoS Router Path Data
Policy Data
TCP/IP IPSEC
. Fig 4 Active QoS routing nodeThe design of a QoS routing node is illustrated in figure 4. The path data is information about the QoS available on all local output addresses/ports. It is essentially a local routing table with added fields for measurement of delay, occupancy, loss rates etc. The measurements can be obtained by filtering link state adverts or using ping measurements. The policy data is a collection of policies regarding application requirements extracted by the XML parser from XML metadata.
6
Implementation of an Example & Results
In order to get some preliminary performance measurements of the architecture we implemented advert rotation driven by XML metadata policies. The objectives were to demonstrate the feasibility of XML policies and show the weak parts of the implementation in order to improve releases in the future.
Active Information Networks and XML
69
Studies show that advert banners that are dynamically rotated so the same HTML page can show different adverts for each request, are very popular. These dynamically created HTML pages are just slight modifications of an original template page. The changes usually consist of sets of graphics of the same size that will appear consecutively in the same position of the page. To achieve this, the server executes a cgi program that generates the HTML text dynamically. This dynamic behaviour tends to make this content un-cacheable. It is preferable to make simple dynamic pages containing rotating banners cacheable since this will allow a distributed service, eliminate the critical failure points and improve the usage of the existing bandwidth.
Object loaded on demand of the XML Object
REDIRECTED SERVER
Dynamic Proxy Server
HTTP
Final Object
XML Object
ORIGINAL SERVER
Parser
XML Parser Proxylet Advert Proxylet Loaded by the XML object
The XML parser extracts the information embedded in the object. Basically, it will extract the URL for the object to be treated, the URL of the proxylet needed and the commands for it.
Trusted Proxylet Server
Advert Proxlet
Fig. 4. As the object is requested and passes through the active node, it will be parsed in the http parser and then by the XML Parser. This analysis will extract the generic html page that is going to serve as a static template for the rotated adverts, the list of images which should be rotated and the rotation policy. The information is used as a parameter list in the invocation of a rotator proxylet, which will download the objects as needed. Subsequent requests for the page will be passed to the proxylet by the cache at the active node, and the proxylet will execute the rotation policy.
Our experiment consisted of running an active node on a Sun Sparc Station 10 with 64 MB running SunOS Release 5.5.1. The Java version for both programming and testing was JDK 1.1.6. The active node program was started and was already running the Proxylets needed for HTTP and XML Parsing. We conducted 20 experiments. For the first ten, the whole process ran each time a new advert rotation was requested, the advert proxylet was loaded. The subsequent ten utilised caching. When a new request arrived, a proxylet that was already running was used. We measured the times needed to accomplish the different tasks. The numerical results of these experiments are shown in figures 6 and 7 below. The analysis tries to show the times needed to
70
Ian Marshall et al.
perform the processes and tasks during the normal operation of the system. The functionality of these processes is described as follows: HTTP Parsing. Time needed to determine analyse the HTTP header and determine the Mime-Type. XML Parsing. Time needed to get the XML Object, parse it and extract all the Metadata Embedded. URL Download. Time needed to download the HTML. Proxylet Call. Time need to generate the query to load the Proxylet in our Active Node. Proxylet Download. Time needed to download and start the proxylet; it requires a lot of time because of the ALAN platform design. Advert Rotation. Time needed to perform the demanded task. In this case the advert rotation.
Fig. 6. The graph shows the results of the experiments when the service was cached. The proportion of time spent downloading the proxy has disappeared. The URL download time can vary depending the object to be downloaded and the bandwidth to the server. In our testing, all the objects are available in our LAN so we can expect greater values for this part of the process in wide area tests. However this increment will only be important for the first request, thereafter the URL object is cached and is made locally available.
The most imporant variable is the times due to the additional processing of the proxylets. It appears that the XML-Parse Proxylet and the Advert Rotator Proxyler are taking most of the time. Nevertheless the total delay is below one second. We can expect better results if a faster computer is used as a server with a non-interpreted language. However the purpose of this paper was to demonstrate the feasibility of active caching nodes based on XML and throughoutput was not a priority. This prototype shows that it is possible to provide active services with delays of just several hundred milliseconds.
Active Information Networks and XML
71
Fig. 7. illustrates the difference in the delay between the experiments that needed the proxylet to be downloaded and started and the experiments where the proxylet was already downloaded and running in the active node
7
Future Work
In the immediate future we intend to build and test the QoS routing service outlined in this paper. In addition there are three major longer-term issues, which we are concentrating our efforts on solving. It would be beneficial to specify behaviour and other non-functional aspects of the programmes requested in the metadata, using a typesafe specification language. One possibility is to use SPIN, which is c-like and has some useful performance and time related primitives. The use of a language of this kind would provide greater flexibility and interoperability, and go some way towards solving our second issue. For complex services it will be necessary to invoke several proxylets. At present the question of how multiple proxylets interact, and how interactions such as order dependencies can be resolved, is open. We anticipate attempting to use metadata specifications (using the language from issue a)) to maximise the probability of avoiding problems. The performance and scalability of the DPS is currently far from ideal. The Sydney co-authors [ALAN] are addressing these issues with a new implementation, and we anticipate significant improvements will be available shortly.
8
Conclusions
Application layer active networks will play a crucial role in networked applications that can tolerate a delay of around a hundred milliseconds. They will extend the
72
Ian Marshall et al.
functionality and versatility of present networks to cover many future customer needs. HTTP caches help reduce the use of bandwidth in the Internet and improve responsiveness by migrating objects and services closer to the client. They are also ideally placed to evolve into the active nodes of the future. XML is a perfect complement to Application layer active networks based on http caches, since it will allow active nodes to be driven by enriched Metadata requests and at the same time will introduce the mechanisms for sharing knowledge between nodes. We have implemented a prototype, which demonstrates that XML offers greater flexibility and expressivity than HTTP tags or MIME types, without significant performance penalty. The performance of our system is not yet ideal, however, results can easily be improved by using a non-interpreted language and a more powerful server.
References [ALAN] “Application Layer Active Networking” M. Fry and A. Ghosh,, Fourth International Workshop on High Performance Protocol Architectures (HIPPARCH '98), June 98. http://dmir.socs.uts.edu.au/projects/alan/prog.html [ALEX97] “Active Bridging” Alexander, Shaw, Nettles and Smith, Computer Communication Review, 27, 4 (1997), pp101-111 [ALEX98] “A secure active network environment architecture” D.S.Alexander et al, IEEE Network 1998 [AMIR] “An active service framework and its application to real time multimedia transcoding” E.Amir, S.McCanne, R.Katz, Proc SIGCOMM ’98 pp178-189 [BIS] “The IEEE P1520 standards initiative for programmable interfaces” J.Biswas et.al., IEEE Comms. Oct 1998 pp64-72 [CAO] “Active Cache: Caching Dynamic Contents (Objects) on the Web” P. Cao, J. Zhang and K. Beach Proc middleware ’98 (Ambleside). [CRAW] “A Framework for QoS-based Routing in the Internet”. E. Crawley. R. Nair. B. Rajagopalan. H. Sandick. Copyright (C) The Internet Society (1998). ftp://ftp.isi.edu/in-notes/rfc2386.txt. [ERIK] "MBone - The Multicast Backbone", Eriksson, Hans, INET 1993 [LAZ] “Programming Telecommunication Networks” A.Lazar, IEEE Network Oct 1997 pp 2-12 [PARU] “Active Network Node Project” G.Parulkar et.al Washington University St Louis [ROO] “Tempest: A framework for safe programmable networks” S.Rooney et.al., IEEE Comms Oct 1998 pp42-53 [TENN] “Towards an active network architecture”. Computer Communication Review, 26,2 (1996) D. Tennenhouse, D.Wetherall. [WAK] “Designing a Programming Language for Active Networks” I.Wakeman et.al. HIPPARCH ‘98 [WESS] “Configuring Hierarchial Squid Caches”, Duane Wessels AUUG'97, Brisbane, Australia. [XML] Extensible Markup Language (XML) W3C Recommendation 10-February1998 http://www.XML.com/aXML/testaXML.htm.
Policy Specification for Programmable Networks Morris Sloman and Emil Lupu
Department of Computing, Imperial College London SW7 2BZ, UK {m.sloman,e.c.lupu}@doc.ic.ac.uk
Abstract. There is a need to be able to program network components to adapt to application requirements for quality of service, specialised application dependent routing, to increase efficiency, to support mobility and sophisticated management functionality. There are a number of different approaches to providing programmability all of which are extremely powerful and can potentially damage the network, so there is a need for clear specification of authorisation policies i.e., who is permitted to access programmable network resources or services. Obligation policies are event triggered rules which can perform actions on network components and so provide a high-level means of ‘programming’ these components. Both authorisation and obligation policies are interpreted so can be enabled, disabled or modified dynamically without shutting down components. This paper describes a notation and framework for specifying policies related to programmable networks and grouping them into roles. We show how abstract, high-level policies can be refined into a set of implementable ones and discuss the types of conflicts which can occur between policies.
1
Introduction
Networks have to become more adaptable to cater for the wide range of user devices ranging from powerful multi-media workstations to hand-held portable devices. A convergence is taking place between telecommunications and computing so networks are increasingly being used to transport voice, video, fax as well as data traffic. Future personal digital assistants will include mobile phones and Web-enabled mobile phones are beginning to appear. There is a need to reconcile the perspectives of the telecommunication and computing communities in new dynamically programmable network architectures that support fast service creation and resource management through a combination of network aware applications and application aware networks. It is necessary to be able to dynamically program the resources within a network to permit adaptive quality of Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 73-85, 1999. Springer-Verlag Berlin Heidelberg 1999
74
Morris Sloman and Emil Lupu
service management, flexible multicast routing from multiple sources for applications such as video conferencing, intelligent caching and load distribution for Web servers or to perform compression and filtering when traversing low bandwidth wireless links. These types of application specific functions need to be dynamically programmed within the network components in order to support flexible and adaptive networks. The main objective is to speed up the slow evolution of network services by building programmability into the network infrastructure itself [1]. There are a number of approaches to supporting Programmable Networks: Active Networks – the packets traversing the network contain normal data plus programs which may invoke switch and router operations [2]. Example uses include setting up multicast routing groups or fusion of data from many different sensors into larger messages to traverse the network to the data sink. This is essentially programming at the IP level and is often limited to routing or filtering. It has inherent security risks which can be alleviated by the use of ‘safe’ languages or executing the programs in a controlled environment such as an associated processor rather than the main processor within a network component. Mobile Agents – agents containing code and state information traverse multiple nodes within a network in order to perform functions on behalf of users e.g., an email to voice converter which follows a mobile phone user [3]. This type of programming is generally associated with hosts or servers connected to the network rather than switches or routers but could also be used to set up specific routing tunnels [4]. Management Interface – network components provide a management interface which facilitates a limited form of programming of components by invoking operations to change their behaviour [5]. This is really provided for the use of network managers but some operations may be made available to managers of valueadded, third-party service providers or even user applications. For example, there could be service creation and service operation interfaces to support various virtual network, multicast or multimedia services. IEEE are standardising an Applications Programming Interface for Networks [http://www.ieee-pin.org/]. Management by Delegation – is a means of downloading management code to be executed within network components to perform functions such as complex diagnostic tests on specific nodes [6]. This is an extension to the Management Interface approach as it supports remote execution of code rather than just remote operation invocation. Code delegation is usually performed by network managers but could be used to load specific filtering or compression code onto an access gateway on behalf of an application or user. The advent of Java has made it easier to implement portable ‘elastic agents’ into which code can be loaded dynamically. Interpreted Policy – there has been recent interest in bandwidth management policies which specify who can use network resources and services based on time of day, network utilisation or application specific constraints [7]. Most of the previous work on policy has been related to management of distributed systems and networks [8,9]. Authorisation policies specify what actions a subject is permitted or forbidden to perform on a set of target objects. Obligation policies specify what actions must be performed by a subject on a target. Policies can be used to modify the behaviour of network components so can be considered a ‘constrained’ form of programming [8].
Policy Specification for Programmable Networks
75
There is no single universal solution to programmability of networks and the various approaches can be used to perform complementary functions, although there is some overlap between them as a particular functionality could be implemented using more than one approach. In addition, these are all very powerful facilities which can easily destroy the normal working of the network so it is necessary to specify authorisation policies to define who can program specific components and what programming operations they can access. The obligation policies are event triggered rules which result in actions being performed. This can be considered a ‘constrained’ form of programming in that policies can be dynamically modified but can only call predefined actions. Policies can be used to define the event conditions and constraints for invocations on a management interface, or loading or executing code in an elastic agent. Thus, policies are complementary to the other approaches described above. This paper focuses on the specification of policies for the adaptability and security needed in programmable networks. Section 1 outlines how objects can be grouped in domains in order to apply a common policy. Sections 2 and 3 discuss the policy notation and implementation, followed by some of the conflict detection and resolution issues. Section 5 introduces roles as a means of grouping policies which specify the rights and duties of managers. Policies for the configuration and management of network devices are not specified in isolation but derived from business objectives and requirements, so section 6 addresses the refinement of policies from an abstract description to implementable rules. Related work and conclusions are presented in sections 7 and 8.
1
Domains & Directories
In large-scale systems it is not practical to specify policies for individual objects and so there is a need to be able to group objects to which a policy applies. For example, a bandwidth management policy may apply to all routers within a particular region or of a particular type. An authorisation policy may specify that all members of a department have access to a particular service. Domains provide a means of grouping objects to which policies apply and can be used to partition the objects in a large system according to geographical boundaries, object type, responsibility and authority or for the convenience of human managers [8,10]. A domain does not encapsulate the objects it contains but merely holds references to object interfaces. It is thus very similar in concept to a file system directory but may hold references to any type of object, including a person. A domain, which is a member of another domain, is called a sub-domain of the parent domain. Object and sub-domains may be a member of multiple parent domains and may have different local names in each one of them. For example, in Fig. 1, the 2 ‘bean people’ and sub-domain E are members of both B and C domains, which therefore overlap. Details of domains are described in [8,10].
76
Morris Sloman and Emil Lupu
A
D
E B
C A
Sub-Domains and Overlapping Domains
B
C
D
E
Domain Hierarchy (without member objects)
Fig. 1 Domains
Path names are used to identify domains, e.g., domain E can be referred to as /A/B/E or /A/C/E, where ‘/’ is used as a delimiter for domain path names. Policies
normally propagate to members of sub-domains, so a policy applying to domain B will also apply to members of domains D and E. Domain scope expressions can be used to combine domains to form a set of objects for applying a policy, using union, intersection and difference operators, e.g., a scope expression @/A/B + @/A/C - @/A/B/E would apply to members of B plus C but not E, and @/A/B ^ @/A/C applies only to the direct and indirect members of the overlap between B and C. The ‘@’ symbol selects all non-domain objects in nested domains. An advantage of specifying policy scope in terms of domains is that objects can be added and removed from the domains to which policies apply without having to change the policies. However, objects have to be explicitly included in domains. It is not practical to specify domain membership in terms of a predicate based on object attributes but a policy can select a subset of members of a domain, to which it applies, by means of a constraint in terms of object attributes (see section 2). We have implemented our own domain service but we are redoing this for an LDAP (Lightweight Directory Access Protocol) directory service [11]. However, although LDAP supports the concept of an alias as a reference to an object in another domain, it does not permit objects to be members of multiple directories.
2
Policy Notation
A precise notation is needed for system administrators and (technical) users to specify the network policies related to the applications or services for which they are responsible. This notation is the means of ‘programming’ the automated agents in network components which interpret policy but can also be used to specify higher level abstract policies or goals which are interpreted by humans or are refined into implementable policies [12,13,14]. Another reason to have a precise notation is that policies may be specified by multiple distributed administrators so conflicts between policies can arise. Our notation can be analysed by tools to detect and, in some cases, resolve conflicts. Implementable policies are directly interpreted by automated manager and access control agents, which are (potentially) distributed, so we do not use logical deduction in order to analyse the state of the system.
Policy Specification for Programmable Networks
77
Authorisation policies define what activities a subject can perform on a set of target objects and are essentially access control policies to protect resources from unauthorised access. Constraints can be specified to limit the applicability of both authorisation and obligation policies based on time or values of the attributes of the objects to which the policy refers. x1 A+ @/NetworkAdmin {PolicyObjType: load(); remove(); enable (); disable ()} @/Nregion/switches
Members of the NetworkAdmin domain are authorised to load, remove, enable or disable policies in Nregion/switches. The ‘;’ separates permitted actions. x2 A- n: @/test-engineers {performance_test()} @/routers when n.status = trainee
Trainee test engineers are forbidden to perform performance tests on routers. Note the use of a constraint based on subject state information x3 A+ @/Agroup + @/Bgroup {VideoConf (BW=2, Priority=3)} USAStaff – NYgroup when (16:00 < time < 18:00)
Members of Agroup plus Bgroup can set up a video conference (bandwidth = 2 Mb/s, priority = 3) with USA staff except the New York group, between 16:00 and 18:00. Note the use of a time-based constraint. Obligation policies define what activities a manager or agent must or must not perform on a set of target objects. Positive obligation policies are triggered by events. x4 O+ on video_request(bw, source) @/USGateway { router:bwreserve (bw); log(bw, source)} @/routers/US
This positive obligation is triggered by an external event signalling that a video channel has been requested. The object in the USGateway domain first does a bwreserve operation on all objects of type router in the /routers/US domain and then logs the request (assume to an internal log file) i.e., operations specified in a policy can be on external objects or internal operations in the agent. The ‘;’ is used to separate a sequence of actions in a positive obligation policy. x5 O- n:@/test-engineers { DiscloseTestResults() } @/analysts + @/developers when n.testing_sequence == in-progress
This negative obligation policy specifies that test engineers must not disclose test results to analysts or developers when the testing sequence being performed by that subject is still in progress, i.e., a constraint based on the state of subjects. The general format of a policy is given below with optional attributes within brackets. Some attributes of a policy such as trigger, subject, action, target or constraint may be comments (e.g. /* this is a comment */ ), in which case the policy is considered highlevel and not able to be directly interpreted. identifier mode [trigger] subject ‘{’ action ‘}’ target [constraint] [exception] [parent] [child] [xref] ‘;’
The identifier is a label used to refer to the policy. The mode of the policy distinguishes between positive obligations (O+), negative obligations (O-), positive authorisations (A+) and negative authorisations (A-). The trigger only applies to positive obligation policies. It can specify an internal timer event using an at clause, as in x5 above, or an every clause for repetitive events. An external event is defined using an on clause, as in x4 above, where the video_request event passes parameters bw and source to the agent. These events are
78
Morris Sloman and Emil Lupu
detected by a monitoring service. The policy notation only specifies simple events as a generalised monitoring service can be used to combine complex event sequences to generate simple events [16]. The subject of a policy, defined in terms of a domain scope expression, specifies the human or automated managers to which the policies apply. The target of a policy, also defined in terms of a domain scope expression, specifies the objects on which actions are to be performed. Security agents at a target’s node interpret authorisation policies and manager agents in the subject domain interpret obligation policies. The actions specify what must be performed for obligations and what is permitted for authorisations. It consists of method invocations or a comment and may list different methods for different object types. An authorisation policy indicates the set of operations which are permitted or forbidden while the multiple actions in a positive obligation policy are performed sequentially after the policy is triggered. The constraint, defined by the when clause, limits the applicability of a policy, e.g. to a particular time period as in policy x3 above, or making it valid after a particular date (when time > 1/June/1999). In addition, the constraint could be based on attribute values of the subject (such as in policy x2 above) or target objects. In x2, the label n, prepended to the subject, is referenced in the constraint to indicate a subject attribute. An action within an obligation policy may result in an operation on a remote target object. This could fail due to remote system or network failure so an exception mechanism is provided for positive obligations to permit the specification of alternative actions to cater for failures which may arise in any distributed system. High-level abstract policies can be refined into implementable policies. In order to record this hierarchy, policies automatically contain references to their parent and children policies. In addition, a cross-reference (xref) from one policy to another can be inserted manually, e.g., so that an obligation policy can indicate the authorisation policies granting permission for its activities (see Section 6).
3
Policy Implementation Issues
The policy service provides tool support for defining and disseminating polices to the agents that will interpret them. Policies are implemented as objects which can be members of domains so that authorisation policies can be used to control which administrators are permitted to specify or modify policies stored in the policy service. Query subjects & targets
Policy Editor Enable policy
Policy Service
Query targets
Domain Service
O+/ O- policies Manager Agent
Register Monitoring Service
A+/A- policies
Perform actions
Notify (event)
Fig. 2 Policy Enforcement
Target Objects Domain
Policy Specification for Programmable Networks
79
An overview of the approach to policy enforcement is given in Fig. 2. An administrator creates and modifies policies using a policy editor. He checks for conflicts, and if necessary modifies policies to remove the conflicts (see Section 4). Authorisation policies are then disseminated to target security agents as specified by the target domains and obligation policies to manager agents as specified by the subject domains. Policies may be subsequently enabled, disabled or removed from the agents. Manager agents register with the monitoring service to receive relevant events generated from the managed objects. On receiving an event which triggers one or more obligation policies, the agent queries the domain service to determine target objects and performs the policy actions, provided no negative obligations restrain it. Fig. 3 shows a policy agent which interprets obligation policies. It is application specific in that there can be agents for quality of service management which are different from those used for security management, for example. Each class of agent has predefined management functions which are accessible from the policies. These functions may result in operations on remote target objects or can be internal to the agent. The functionality of an agent could be dynamically modified using Management by Delegation techniques to load new code, but this has not been implemented in our prototype. More details on the syntax, and implementation issues of the policy service can be found in [12,13,14]. Generic Interface
Load, Remove, Enable, Disable, policies
Policies
Application specific, predefined management functions
Java Interpreter CORBA interaction service
Application Specific Interface Operations on target objects
Events
Fig. 3 Obligation Policy Agent
4
Policy Conflicts
In any large inter-organisational distributed network, policies are specified by multiple managers, possibly within different organisations. Objects can be members of multiple domains so multiple policies will typically apply to an object. It is quite possible that conflicts will arise between multiple policies. There are two types of conflicts which we will consider – modality and semantic conflicts [15]. Modality Conflicts − are inconsistencies which may arise when several policies with modalities of opposite sign refer to the same subjects, actions and targets. Therefore, these conflicts can be determined by syntactic analysis of polices. There are three types of modality conflicts: § O+/O- subjects are both required and required not to perform the same actions on the target objects.
80
Morris Sloman and Emil Lupu
§ A+/A- subjects are both authorised and forbidden to perform the actions on the target objects. § O+/A- subjects are required but forbidden to perform the actions on the target objects. Note that O-/A+ is not a conflict, but may occur when subjects must refrain from performing certain actions as specified by a negative obligation, even though they are permitted to perform the actions, as in policy X5 in Section 2. It is possible to resolve these conflicts automatically by assigning a priority to individual policies, but meaningful priorities are notoriously difficult for users to assign and may result in arbitrary priorities which do not really relate to the importance of the policies. Inconsistent priorities could easily arise in a distributed system with several people responsible for specifying policies and assigning priorities. Our approach has been to permit more specific policies to have precedence – a policy applying to a sub-domain overrides more general policies applying to an ancestor domain. Our tools analyse the policies within a domain to indicate conflicts for an administrator to resolve and allow precedence to be enabled or disabled. We are investigating techniques for specifying other forms of precedence – in some situations negative authorisation policies should have precedence over positive ones, more recent policies over older ones or perhaps policies applying to short time-scales over longer (background) ones. Semantic Conflicts and Metapolicies − while modality conflicts can be detected purely by syntactic analysis, application-specific conflicts arise from the semantics of the policies. For example, a conflict may arise if there are two policies which increase and decrease bandwidth allocation when the same event occurs. Similarly, policies related to differentiated services which define to which queues specific types of packets should be allocated, must not result in 2 different queues to which the packet should be allocated. These conflicts for resources or conflicts of action are application specific and cannot be detected automatically without a specification of what is a conflict i.e., the conflicts are specified in terms of constraints on attribute values of permitted policies. We call these constraints metapolicies as they are policies about which policies can coexist in the system or what are permitted attribute values for a valid policy.
5
Roles
Organisational structure is often specified in terms of organisational positions such as regional, site or departmental network manager, service administrator, service operator, company vice-president. Specifying organisational policies for people in terms of role-positions rather than persons, permits the assignment of a new person to the position without re-specifying the policies. The tasks and responsibilities corresponding to the position are grouped into a role associated with the position (which is essentially a static concept in the organisation). The position could correspond to a manager or a user of a network or services. A role is thus the position, and the set of authorisation and obligation policies defining the rights and duties for that position. Organisational positions can be represented as domains and we consider a role to be the set of policies (the arrows in Fig. 4) with the Position Domain as
Policy Specification for Programmable Networks
81
subject. A person or automated agent can then be assigned to or removed from the position domain without changing the policies as explained in [17]. Role Authorisation & Obligation Policies
Position Domain (Subject)
Target Domains & Managed Objects
Role Fig. 4 Management Roles
Although the concept of role was originally defined to apply to people, it can also be used to group the authorisation and obligation policies that apply to a particular type of network component as a subject e.g., an edge-router that interconnects the local network to the service provider or a core-router providing a backbone service. It is possible that similar hardware and software is used for both core and edge routers and so assigning a particular router to a role will define the set of policies which are loaded onto that router. Another example is a mobile agent which is assigned to a visiting agent role when it is received at a network node. This could specify what resources it can access and what actions it must perform on arrival and departure. There are additional extensions to the concepts of roles described in [18,19]. These define inter-role relationships in terms of interaction protocols and concurrency constraints on the ordering of obligation actions. Furthermore, an object model for the specification of policy templates and role classes which uses inheritance to implement specialisation has also been defined. However, these issues will not be discussed further in this paper.
6
Policy Refinement
High-level abstract policies are often specified as part of the business process and express requirements from the communication network. These requirements are specified as management goals which cannot be directly interpreted by automated components and hence, must be refined into functional policy specifications or be implemented manually by human managers. We express abstract policies in the same notation as implementable policies, however the policy attributes (subjects, actions, etc.) may be written in natural language. For example, a high-level policy may be written as: T1 O+ @/NetworkManagers {/* provide adequate video conference set up */} @/users/groupA when 14:00 < time < 15:00
Network managers must provide an adequate video conference set up for groupA users between 14:00 and 15:00.
82
Morris Sloman and Emil Lupu
In order to achieve this goal it is necessary to refine policy T1 into bandwidth management policies, authorisation policies and further administrative policies to enable or disable special policies which might apply during these hours. For example: Administrative policies T2 O+ at 13:55 @/NetworkManagers { enable() } @/policies/BandwidthControl + @/policies/QoSmonitoring T3 O+ at 15:00 @/NetworkManagers { disable() } @/policies/BandwidthControl + @/policies/QoSmonitoring Network managers must enable at 13:55 (T2) and disable at 15:00 (T3) special
bandwidth control and QoS monitoring policies. Authorisation policies T4 A+ @/Agroup {VideoConf (BW=2, Priority=3)} @/USAStaff when (14:00 < time < 15:00)
Group A users must be able to set up the video connections (similar to policy x3). T5 A+ @/NetworkManagers { enable(); disable() } @/policies/BandwidthControl + @/policies/QoSmonitoring
Network managers are authorised to enable and disable bandwidth control and QoS monitoring policies. Bandwidth Control T6 O+ on req(bw,chanId) edgeRouter {reduceReservation(bw)} channels/chanId when bw < getReservation(chanId)
Edge routers should decrease the bandwidth reservation on a channel when the request is for less than the amount currently reserved. T7 O+ on req(bw, chanId) edgeRouter {increaseReservation(min(bw, x))} channels/chanId when bw > getReservation(chanId)
Edge routers should increase bandwidth when the request is for more than the amount currently reserved. However, the amount reserved should not exceed x. The refinement of abstract policies into implementable ones must be done by human managers. A positive obligation policy requires related authorisation policies giving subjects the necessary access rights to perform their tasks. Similarly, the refinement of an authorisation policy may include obligation policies defining the measures and counter-measures to be taken in case of security violations. Thus the refinement of a policy does not preserve the policy modality or necessarily apply to the same subjects or targets. For example, while network managers are responsible for ensuring that the adequate quality of service is provided (policy T1), the edge routers are responsible for performing the bandwidth reservations (T7, T8). We currently maintain pointers from an abstract policy to the policies, derived from it, (omitted from the above examples for clarity) but we do not have tools to support the refinement process. We are investigating the use of requirements engineering tools and techniques for refinement and analysis of policies.
Policy Specification for Programmable Networks
7
83
Related Work
There are a number of groups working on policies for network and distributed systems management [9,20,21]. Some of this has been based on our early proposals for policy notation. Another approach is to define policies using the full power of a general purpose scripting or interpreted language (e.g., TCL) and load this into network components. Bos [22] takes this approach to specify application policies for resource management for netlets, which are small virtual networks within a larger virtual network. There is considerable interest in the internet community in using policies for bandwidth management. They assume policies are objects stored in a directory service [7]. A policy client (e.g. a router) makes policy requests on a server which retrieves the policy objects, interprets them and responds with policy decisions to the client. The client enforces the policy by, for example, permitting/forbidding requests or allocating packets from a connection to a particular queue. The IETF are defining a policy framework that can be used for classifying packet flows as well as specifying authorisations for network resources and services [23,24,25]. They do not explicitly differentiate authorisation and obligation policies. A simple policy rule defines a set of policy actions which are performed when a set of conditions becomes true. These conditions correspond to a combination of our events and constraints for obligation policies. Their policy may be an aggregation of policy rules. They have realised policy conflicts can occur, but have not distinguished between modality and semantic conflicts nor do they say how conflicts will be detected. Directories are used for storing policies but not for grouping subjects and targets. They use dynamic groups which can be specified by enumeration or by characterisation i.e., a predicate on object attributes. We can achieve this by means of a constraint on policies within the scope of a domain expression which is a defined set. Defining a group in terms of an arbitrary predicate can be impractical. For example, the group of all Pentium II workstations with memory > 128 Mbytes would require checking millions of workstations on the internet to determine if they are members of the group, which would not be feasible. They have the concept of a role which is defined as a label indicating a function that an network device serves. Roles enable administrators to group the interfaces of multiple devices for applying a common policy. This is similar to our domains although it is not clear how it will be implemented. There is a restriction that their role can be associated with a single policy (which can be as complex as necessary). We think this is very restrictive and unnecessary. In the IETF approach a policy enforcement point queries a decision point to find out which policies apply. Our notation, with explicit subjects and targets permits us to propagate policies to where they are required so we combine decision and enforcement at subjects for obligation policies and targets for authorisation policies. Our policy service disseminates policies to the relevant distributed agents.
8
Conclusions
We have shown that our management policy and role approach, is also very useful for programmable networks. A clear specification of authorisation policy is essential, whatever implementation techniques are being used. The obligation policies can be
84
Morris Sloman and Emil Lupu
used to ‘program’ the network components or combined with other programming approaches to define the events and constraints for performing actions. In any large-scale system, conflicts between policies will occur. We distinguish between modality and semantic conflicts and indicate an approach for specifying what is a semantic conflict as a metapolicy. Where possible, conflicts should be detected at specification or load-time (c.f. type conflicts detected by a compiler), although some conflicts can only be detected at run-time. We have also shown the use of roles for specifying policies for network managers, service users and network components. We have a prototype toolkit which can be used to specify roles and policies. It also performs static analysis for conflicts. We are currently working on extending this to run-time analysis and are investigating the applicability of requirements engineering approaches for refining high level goals into detailed specifications to policy refinement. They also have more sophisticated consistency analysis tools which may be applicable.
Acknowledgements We gratefully acknowledge financial support from the Fujitsu Laboratories and British Telecom and acknowledge the contribution of our colleagues to the concepts described in this paper – in particular Nicholas Yialelis and Damian Marriott.
References 1. Wetherall D., Legedza U., Guttag J.: Introducing New Internet Services: Why and How. IEEE Network, Special Issue on Active and Programmable Networks, July 1998. 2. Tennenhouse D, Smith J, Sincoskie D, Wetherall D, Minden G.: A survey of Active Network Research. IEEE Communications Magazine, 35(1):80-86, 1997. 3. Bieszczad A, Pagurek B, White T.: Mobile Agents for Network Management. IEEE Communications Surveys, 1(1), 1998. www.comsoc.org/pubs/surveys. 4. de Meer, et al.: Agents for Enhanced Internet QoS. IEEE Concurrency 6(2):30-39, 1998. 5. Lazar, A.: Programming Telecommunication Networks. IEEE Network, Sep/Oct 1997, 8-18 6. Goldszmidt, G., Yemini Y.: Evaluating Management Decisions via Delegation. In Hegering H, Yemini Y (eds.) Integrated Network Management III, Elsevier Science Publisher (1993), 247-257. 7. 3COM: Directory Enabled Networking and 3COM’s Framework for Policy Powered Networking. from http://www.3com.com/,1998. 8. Sloman, M.: Policy Driven Management for Distributed Systems. Journal of Network and Systems Management, 2(4):333–360, Plenum Press, 1994. 9. Magee J., Moffett J. (eds.): Special Issue of IEE/BCS/IOP Distributed Systems Engineering Journal on Services for Managing Distributed Systems, 3(2), 1996. 10. Sloman, M., Twidle, K.: Domains: A Framework for Structuring Management Policy. In Sloman M. (ed.): Network & Distributed Systems Management. Addison-Wesley (1994), 433–453. 11. Whal, M., Howes, T.,Kille S.: Lightweight Directory Access Protocol (v3), IETF RFC 2251, Dec. 1997. Available from http://www.ietf.org 12. Marriott, D., Sloman, M.: Management Policy Service for Distributed Systems. 3rd IEEE Int. Workshop on Services in Distributed and Networked Environments, Macau, 2–9, 1996. 13. Marriott, D., Sloman, M.: Implementation of a Management Agent for Interpreting Obligation Policy. IEEE/IFIP Distributed Systems Operations and Management Workshop (DSOM’ 96), L’Aquila (Italy), Oct. 1996.
Policy Specification for Programmable Networks
85
14. Marriott, D.: Management Policy for Distributed Systems. Ph.D. Dissertation, Imperial College, Department of Computing, London, UK, July 1997. 15. Lupu, E., Sloman, M.: Conflicts in Policy-Based Distributed Systems Management. To appear in IEEE Trans. on Soft. Eng., Special Issue on Inconsistency Management, 1999. 16. Mansouri-Samani M., Sloman, M.: GEM: A Generalised Event Monitoring Language for Distributed Systems. IEE/BCS/IOP Distributed Systems Engineering, 4(2):96-108, 1997. 17. Lupu, E., Sloman, M.: Towards a Role-based Framework for Distributed Systems Management. Journal of Network and Systems Management, 5(1):5-30,Plenum-Press, 1997 18. Lupu E., Sloman, M.: A Policy-based Role Object Model. 1st IEEE Enterprise Distributed Object Computing Workshop (EDOC’97), Gold Coast, Australia, Oct.97, pp. 36-47. 19. Lupu, E.: A Role-Based Framework for Distributed Systems Management. Ph.D. Dissertation, Imperial College, Dept. of Computing, London, U.K, July 1998. 20. Koch, T. et al.: Policy Definition Language for Automated Management of Distributed System. 2nd IEEE Int. Workshop on Systems Management, Toronto, June 1996, 55-64. 21. Wies R.: Policies in Integrated Network and Systems Management: Methodologies for the Definition, Transformation and Application of Management Policies. Ph.D. Dissertation, Fakultat fur Mathematik der Ludwig-Maximilians-Universitat, Munchen, Germany, 1995. 22. Bos H.: Application Specific Policies: Beyond the Domain Boundaries. IFIP/IEEE Integrated Management Symposium (IM’99), Boston, May 1999. 23. Strassner J. Elleson, E.: Terminology for Describing Network Policy and Services, IETF draft work in progress, Feb. 1999. Available from http://www.ietf.org 24. Strassner J. Elleson, E., Moore, B.: Policy Framework Core Information Model, IETF draft work in progress, Feb. 1999, Available from http://www.ietf.org 25. Strassner J., Schleimer, S.: Policy Framework Definition Language, IETF draft work in progress, Nov. 1998, Available from http://www.ietf.org
A Self-Configuring Data Caching Architecture Based on Active Networking Techniques Gaëtan Vanet and Yoshiaki Kiriha C&C Media Research Laboratories, NEC Corporation 1-1, Miyazaki 4-Chome, Miyamae-Ku Kawasaki, Kanagawa 216-8555, Japan {vanet,kiriha}@nwk.cl.nec.co.jp
Abstract. This paper presents the design of a new Web cache architecture that uses the active network capabilities to provide a solution to cache dynamic data throughout the network. In our proposal, objects, viewed with the smal-lest granularity, are cached associated with a timestamp. But, instead of considering dates individually, we define some time classes which specify the level of objects timesensitiveness. Each intermediate node is specialised into a unique time class according to its location within the network and caches dynamic data which belongs to corresponding time class. Nodes are shared out among two types ( manager and cache ) and are bound together to define a hierarchical time-sensitive cache tree, the timestamp tree. In our proposed cache architecture, the timestamp tree is automatically reconfigured according to users access history, applications load and network conditions. To achieve such self-configuring cache architecture, we have actually designed five types of capsules.
1
Introduction
The Internet has been created 20 years by universities and research centres to make their work easily acknowledged. At that time, the Internet was a media for exchanging static data, accessed by a small number of users. This network was viewed as a world wide database containing the current state of the art of the scientific community. But for few years, the Internet has become very popular and millions of people browse the Web every day. To limit the overload of networks and applications sites, the cache technology has been introduced few years ago. Accessed Web pages are stored in cache proxies to avoid permanent reloading from origin servers. The last few years have led designers of Web cache infrastructures to develop schemes to store static data, ignoring dynamic ones. In fact, current cache solutions are unsuitable for caching dynamic data, which are always reloaded from their origin server. However, the appearance of new type of services ( online auctions, stocks quotes, sensors mixing, video on-line, … ) arises a new problem to be studied. All these new services require Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 85-96, 1999. Springer-Verlag Berlin Heidelberg 1999
86
Gaëtan Vanet and Yoshiaki Kiriha
dynamic data exchange and the structure of the current Internet has not been designed to fill this requirement. Because of this, the key idea of our study is to provide a solution to cache dynamic data by a self-configuring architecture. Two cache solutions became popular few years ago and are still widely implemented in the Internet : the Harvest [3] and the Squid [5,12] solutions. In both cases, network administrators must configure manually cache proxies taking part into the cache architecture by defining the parents-siblings relationships [12]. These relationships are fixed and valid for all types of Internet applications. Configuring neighbour caches requires co-ordination of both parties and is a burden as membership becomes large. Actually, configurations mainly depend on inevitable human errors and present a lack of adaptability to network and users accesses changes. Figure 1 shows a basic cache configuration of the Squid approach. In this scheme, proxy 4 is configured as the parent of each of the three other proxies. Proxies 2 and 3 are defined as siblings. When proxy 3 receives a request and it cannot answer, it can forward the request to proxies 2 and 4 to get the answer. But proxy 1 is never contacted. Whatever the information it stores, it does not co-operate with proxies 2 and 3 - unless a network manager defines manually this association. Obviously, this static configuration cannot fit for caching dynamic data where caches co-operation should be more flexible.
Users Users Cache Proxy 1
Cache Proxy 2
Cache Proxy 3
Cache Proxy 4 Origin Server
Fig. 1. Model of current cache architecture
The remainder of this paper is organized as follows. Section 2 presents some issues of dynamic data caching. Based on these facts, section 3 details our cache architecture. Then, section 4 explains the reconfiguration algorithm of our cache architecture and discusses the set of capsules we designed to achieve our proposal. Section 5 presents related work. Finally, section 6 ends with concluding remarks.
A Self-Configuring Data Caching Architecture
2
87
Issues of Dynamic Data Caching
Caching dynamic data is not so trivial and designers have to come up against various difficulties. One of them is the feature of dynamic objects themselves. Basically, caching dynamic data is useless because the content of caches becomes out-ofdate very easily and looses the persistency with the origin server. However, one of the purpose of dynamic data caching is to reduce the load of application servers during a high demand period. Even if the time span of data is limited, caching can provide a good help as long as caches are appropriately located. The definition of the time to live of dynamic objects also presents some difficulty. This parameter cannot be the same for all cached objects [13]. Since it should depend on the type of application, users and the cache configuration should take this parameter into consideration. Another problem is the format of cached objects [9]. The current cache solutions store entire Web pages in cache proxies. But concerning dynamic data, this scheme should be changed. Actually, “dynamic Web pages” are mainly composed of static data ( explanations for customers or advertisements ) and some dynamic data. These applications provide dynamic information but mainly exchange static data. These “dynamic applications” also give users the opportunity of selecting few dynamic data on their Web page. Then, storing whole pages is impossible due to the huge number of possible combinations, according to the variety of users interests. Thus, it should be interesting to make the granularity of cached objects to be smaller, from a whole page to a single object. The last problem is where is the most suitable place to cache dynamic data. In the case of static data, objects tend to be stored close to end users to reduce as much as possible transmission delays. The distance to the origin server is not taken into account. However, this scenario is not possible for caching dynamic data because of their short time to live. As it’s written in [13], if an object changes more frequently than it’s accessed, caching it is pointless. This property must be the basis of every dynamic data cache model. If we cache dynamic objects closer to their origin server, we obviously increase the number of hits. But, if we cache items farther into the network, their time to live must be rather long to make dynamic object caching useful. So, there’s a tight relationship between the time to live of cached objects and the distance between the cached data and their origin server.
3
Basic Concept
We consider the situation where many caches are scattered throughout the Internet. Our approach addresses the problem of caching dynamic data in intermediate nodes rather than reloading them from their origin server. From this viewpoint, the following requirements must be satisfied. Firstly, we must ensure that dynamic data will be stored according to their time sensitiveness. Then, we must also ensure that dynamic data will be cached close to shortest paths between users and origin servers. This requirement limits the latency if the cache infrastructure cannot answer. Finally, we must take the distance between the location of a cached object and its origin server
88
Gaëtan Vanet and Yoshiaki Kiriha
into consideration. Indeed, data with short life span should be stored rather close to their origin servers. Data with longer time spans could be cached rather farther into the network.
3.1 Class Categorisation Each dynamic object can be defined by the pair of “value” and “date”. It represents the value that an object had at a specific date. For a specific object, the number of the pair data becomes huge and the possibility of a cache hit becomes small. Then, instead of considering a single date as the basis of reasoning, we rather consider the difference of two dates : the date of the request and the requested date for the object. For instance, if we request, at t=10:00:00, the value an object had at t=9:59:55, this difference will be equals to 5 seconds. The more an object becomes dynamic, the smaller this difference will be. Considering such time difference significantly reduces the number of possible combinations of pair data. If the difference of timesensitiveness between a 4 seconds value and a 6 seconds value is not so important, then it could be interesting to merge them into the same “category”. Therefore, we define the time classes as groups of objects having the same level of time-sensitiveness. For instance, if the time-sensitiveness is included between 0 and 3 seconds, the object belongs to the first class ( class1 ). If the time-sensitiveness is comprised between 3 and 10 seconds, the object belongs to the second class ( class2 ) and so on. The conversion “time to class” is done by intermediate nodes, based on a mapping table common to all Internet applications. Each node is specialised into an unique class and is not allowed to store objects belonging to other classes.
3.2 Manager and Cache Nodes Like the Squid approach, ours is based on a hierarchical architecture where nodes are bound together to make a cache tree. These relationships are based on the location of nodes and the class of data nodes mainly process. Each application, delivering some dynamic data, maintains such a tree, with the server as the source. In our proposal, intermediate nodes are shared out among two types : manager and cache nodes. These two entities are joined together to make up a group. Each group is composed of one manager and many cache nodes. The size of groups is determined by the source of each cache tree ( application server ) during the cache reconfiguration. This decision is done by taking into account the load of the application server and some historical knowledge. The load is estimated by considering, for example, the number of active connections, the number of requests per second or the load of the CPU. The function of the manager is to register, through pointers, all resources ( cached objects ) available in its group. So, a group can store the object only once. These pointers are scattered dynamically, according to the number of requests which nodes were received. Cache nodes, for their part, store objects and their associated value. Each cached object is defined by a tuple : object identifier - value - validity time. The validity time is the date until when the object value is valid, and is specified by the definition of time class.
A Self-Configuring Data Caching Architecture Class 1
Users
2
Manager node
Class 2
89
Origin Server
1
Cache node
7
3
11
4 5 8
6
9
10
Backbone 12 Class 2
Fig. 2. Cache configuration for a two-classes application
In our architecture, each node maintains a requests history which stores the number of requests, per time class, received from the last cache reconfiguration. This will be useful for the cache architecture updating. Furthermore, we have chosen to introduce the idea of manager, as pointers repository, to avoid systematic multicast within groups as it’s widely implemented in the current cache solutions [3,5,12]. This can deal with the problem of data replication and increases the frequency of cache accesses.
3.3 The Timestamp Tree Our cache proposal uses two types of inter-nodes relationships, sibling and parent, to configure a hierarchical cache tree, called timestamp tree. All nodes belonging to a same group are defined as siblings. A parent is a node belonging to an upper class in the cache tree. In our case, cache tree levels are related to object classes : the more dynamic the class is, the higher it will be in the timestamp tree. All nodes have a unique parent. But in the case of first class nodes, their parent is the origin server. An example of cache configuration is illustrated in figure 2. The origin server configures a cache tree comprising two cache classes in a stock quote service application for both professionals and ordinary people. Professionals are only interested in the last quotations ( class1 objects ) while the others will be satisfied with rather out-of-date data ( class2 objects ). Figure 3 illustrates the timestamp tree corresponding to figure 2. However, the class2 group containing nodes 9, 10, 11 and 12 has not included in order not to overload the figure. Whenever a user request is received by node 7, this latter achieves the mapping “time to class” as explained in section 3.1. If the request is a class2 message which node 7 cannot answer to, the request is forwarded directly to its group manager, node 6.
90
Gaëtan Vanet and Yoshiaki Kiriha
Origin Server
Manager node Cache node
Class 1
1
2
3
Parent relationship
4
Sibling relationship
Class 2
5
6
7
8
Fig. 3. TimeStamp Tree example
If the request is still unanswered, the message is directly sent to the origin site. Then, node 6 creates a pointer which means that the object value is stored by node 7. If the request is a class1 message, node 7 forwards it to its parent, node 4, and waits for the answer.
4
Self-Configuring Cache Architecture
In order to fit with the requirements of dynamic data storage, a cache architecture must be dynamic and self-organized, based on users requests and data sensitiveness. In our proposal, the reconfiguration of timestamp trees is invoked immediately after the detection of an overload by intermediate nodes or origin servers. The depth of the reconfiguration depends on the entity detecting this state.
4.1 Algorithm If an application server detects an overload, a cache reconfiguration must be done. During the first stage, the application server defines rules of class membership and also maximum average inter-nodes distance within groups. The rules of membership are based on two criteria : the number of requests received by the node from the last reconfiguration and the distance ( in hops number ) between the node and the application server. Then the application server sends this information to all nodes of the network. This can be done through a flooding algorithm or based on a list of registered nodes, maintained by the server. All nodes define their new position in the architecture. Afterwards, intermediate nodes exchange packets to define the content of all groups. Groups must be defined as wide as possible to limit the data replication problem. At the same time, the average distance between the different node must be as small as possible to limit transmission delay between each member. The balance between these two parameters is specified by the origin server, based on its historical knowledge. All managers are determined as given the widest nodes group, keeping the
A Self-Configuring Data Caching Architecture
91
average distance between members lower than the limit defined by the origin server. After each reconfiguration, pointers and caches are re-initialised. If the overload is detected by an intermediate node, this latter notifies its neighbours about its current state, avoiding a complete cache architecture reconfiguration. As the node cannot provide cache function anymore, the corresponding group must be updated. If this node was a cache node, pointers related to it must be removed in the nearby nodes’ caches. But if the node was a manager node, a local cache reconfiguration is achieved, following the algorithm detailed above. This stage will define a new structure for this group and obviously a new manager. If a new node is inserted into the network, it does not belong to existing timestamp trees. It has to wait for messages, sent by application servers or nearby nodes, to take part into any cache architectures. If a node does not receive any messages from its neighbours, it cannot participate into the cache architecture.
4.2 Active Networking Based System Design The implementation of such a protocol is not so easy and the current IP based networking technology did not allow us to achieve such a purpose. We should have changed routers already implemented in the network, updated them with our protocol and changed the structure of packets. However, through the use of the active network technology [1,2,14], it becomes possible to develop our proposal. This technology allows the capabilities of routers to be extended, beyond the classical and basic IP forwarding and routing. It’s a way to insert into the network new functionality, without changing entities composing this network. Moreover, monitoring functions can be put at the network level, whereas they were only present at the application layer before. In the case of dynamic data caching, this feature is really important because it reduces the general latency of the cache infrastructure. This paper does not discuss the security problem due to the incursion of malicious code into the network. However, University of Pennsylvania [1,2] gives a very suitable answer to this problem. 4.2.1 The Designed Capsules We designed five types of capsules to implement our proposal : • A cache reconfiguration capsule for the application server to invoke a cache architecture reconfiguration ( cf. 4.2.2 ). • A congestion notification capsule for the congested node to notify its neighbours about its state. • A get capsule for nodes to request for an object. • A manager election capsule, exchanged between nearby nodes, to define groups content and the manager for each of them. • A new position capsule for nodes to notify their position to the application server. All these messages have the basic structure represented in figure 4. First of all, the content of the common header is rather similar to IP header. But in the active network area, it must contain a protocol identifier. The IP address of the previous node allows
92
Gaëtan Vanet and Yoshiaki Kiriha
to reload missing functions. The sender of the capsule is the node which achieves the “time to class” mapping. The destination field can be the address of the origin server or other nodes. Finally, the maximum hops number field specifies the time to live of the capsule in hops number. The data field of the capsule can contain information like the distance to the application server or the mapping table. Finally, the program part can perform arbitrary computations, store information in soft-state, create and send packets back out into the network. According to the function of each capsule, some fields can be contained or not. Common header
Protocol Id. 4 bits
IP-address previous node - 32 bits
IP-address sender 32 bits
Data
Program
IP-address destination - 32 bits
Max hops nb. 6 bits
Header checksum 16 bits
Fig. 4. Basic format of capsules
4.2.2 Example : Cache Reconfiguration Capsule Let’s consider the cache reconfiguration capsule as an example. This capsule is sent by application servers to all active nodes of the network to notify a cache architecture reconfiguration must be done. The format of this capsule is described in figure 5. Data Common header
Hops to server 6 bits
Mapping table 21 bits per class
Maximum inter-nodes distance - 32 bits
Program
Fig. 5. Format of a cache reconfiguration capsule
The hops to server field, incremented host by host, computes the distance between the server and the destination node. The mapping table is used to make the conversion between the time requested for the object and the time of the request. The maximum inter-nodes distance defines the maximum length of average distance between any two members of each group. These data are specified by the application server at the creation of the capsule. The program part is shown above. The different threshold, embedded in the program part of the capsule ( 50%, 4 hops, 8 hops, … ), are defined by application servers and constitute the rules of class membership. The other configuration parameters ( the maximum average inter-nodes distance or the distance between the node and the application server ) are contained in the data part of the capsule. The execution of this program makes them stored in the node’s cache.
A Self-Configuring Data Caching Architecture
93
Example of program embedded in a cache reconfiguration capsule : hops_to_server ++; if Already_Received_Capsule then removed ( This ) // The node has not received the reconfiguration capsu// le yet and is located at a suitable distance to the // application server. else if ( hops_to_server < max_hop_number + 1 ) then { forward ( This, Nearby_Nodes ); // Get the requests access history stored by the node. Object Class_Requests = getCache ( ); // Class_Id is the class of the node. // Conditions related to class1 requests if (Class_Requests[1]>50%) AND (hops_to_server50%) AND (hops_to_server) to denote the function that needs to be executed on the data packets for the flow. In ANMAC we propose context dependent function identifiers (< cdf i >) that would enable the execution of a plug-in depending on some predicate or context. If we consider encryption of data within an intranet before it goes out to the internet, we need to perform the encryption only at the last (internal) hop and not at each router along the route. Thus, the < cdf i > with a location predicate would enable the selective execution of the encryption plug-in at the right location. Note that nesting < cdf i >’s allows pipelining the functions to be executed on a data stream improving the performance. IP Hdr 1 2
..........
n
Fig. 3. Packet Format
218
4
Samphel Norden and Kenneth F. Wong
The Plug-in Modules
As mentioned earlier, the dynamic load feature of the plug-in modules can be exploited to perform network management and control. The plug-in components of ANMAC that we plan to implement are given below. – Feedback based Congestion Control: Congestion is an application independent event and occurs within the network. This makes it suitable for active networks, especially since the time required for congestion notification information to propagate back to the sender limits the speed at which an application can decrease its sending rate or ramp up depending on the current state. ANMAC Routers dynamically react to congestion. Each router maintains a congestion profile and required QoS for a particular flow which is calculated by the NOC using past history. The congestion control algorithm(for example: RED) in the router would use this profile for deciding which packets to drop/shape. An alternative method is to inform the NOC which then installs special packet filters to reroute QoS flows. – Distributed Monitoring Entity Handler: This module collects the various data from the distributed DMEs and sends the data in a coherent coordinated fashion. The NOC can query the plug-in for information and install filters for specific monitoring. – QoS and Resource Control - Deferred Reservation: In ANMAC, admission control is performed by active routers rather than any centralised arbitration logic. When a user requests a QoS connection, the nearest active router will use the resource control plug-in to decide whether a new connection should be accepted. ANMAC uses DRES (Deferred REServation), our new resource reservation protocol which is described in more detail in [5]. DRES is a sender-oriented, soft-state, 2-phase resource reservation protocol that uses deferring (delaying) to increase call admissibility, and lower latency, while having overhead competitive to RSVP and ATM signalling. DRES uses the concept of tentative resource reservation (TTR) for unrejected but unadmitted flows. In an end-to-end resource request involving n hops, a resource request is not rejected immediately when there are insufficient resources at a single hop. Rather, the request continues to propagate through the network making TTRs. Furthermore, multiple requests can be propagating concurrently. During the call setup, resources can become available (e.g., call teardown, call rejection) that convert TTRs to permanent resource reservations that allow a call to be admitted. – Security: Traditional security concerns in active networks are avoided by the NOC installing plug-in code in routers. Consider for the moment, a specialised IP spoofing attack called TCP Syn-Ack flooding [4]. Once the router determines that spoofing is being done, it could inform the NOC and install a filter in its DME interface and propagate the filter to the next hop router, so that the neighbouring router can deal with the spoofed or forged
ANMAC: An Architectural Framework for Network Management and Control
219
packet in a similar fashion. Thus, subsequent attacks will fail due to this collaborative filtering process. More details of our implementation environment are available at [5].
5
Conclusions
In this paper, we have proposed a new framework for performing network management using active networks. We show that by providing routers with dynamic functionality, this allows a customizable interface that allows monitoring and management of the network at any level of granularity. We have described several dynamically loadable plug-in modules that tackle congestion, provide support for QoS traffic and provide security. We have also shown the robustness of the framework by implementing mechanisms to prevent security attacks. We plan to implement and evaluate ANMAC in a QoS-enabled test-bed.
References 1. Decasper D., and Plattner B. “DAN: Distributed code caching for active networks”, Proceedings of INFOCOM’98, June, 1998. 2. Dittia Z., Cox J.R. Jr., and Parulkar G. “The APIC Approach to High Performance Network Interface Design: Protected DMA and Other Techniques”, Proceedings of INFOCOM’97, Kobe, Japan, 1997. 3. Decasper D., Dittia Z., Parulkar G., and Plattner B. “Router-plugins: a software architecture for next generation routers”, Proceedings of SIGCOMM’98, September 1998. 4. Cisco Systems. “Defining strategies to protect against TCP-SYN denial of service attacks”, http://www.cisco.com/warp/public/707/4.html. 5. S. Norden. “ANMAC: A novel architectural framework for network management and control using active networks”, http://www.arl.wustl.edu/˜samphel/iwan.ps. 6. Labovitz C., Malan G.R., and Jahanian F. “Internet routing instability”, Proceedings of SIGCOMM’97, September 1997. 7. Stallings W. “SNMP, SNMPv2 and RMON: Practical network management”, Addison-Wesley Pub., 2nd edition, 1996. 8. Schwartz B., Zhou W., Jackson A. W., Strayer W. T., Rockwell D., and Partridge C. “Smart packets for active networks”,2nd Active Nets Workshop, March 1997.
An Active Network Approach to Efficient Network Management Danny Raz and Yuval Shavitt Bell Laboratories, Lucent Technologies 101 Crawfords Corner Road, Holmdel, NJ 07733-3030
[email protected] [email protected] Abstract. Active networks is a framework where network elements, primarily routers and switches, are programmable. Programs that are injected into the network are executed by the network elements to achieve higher flexibility and to present new capabilities. This work describes a novel active network architecture which primarily addresses the management challenges of modern complex networks. Its main component is an active engine that is attached to any IP router to form an active node. The active engine we designed and implemented executes programs that arrive from the network and monitors and controls the router actions. The design is based on standards (Java, SNMP, ANEP over UDP), and can be easily deployed in todays IP networks. The contribution of this paper is the introduction of novel architectural features such as: isolation of the active mechanism, the session concept, the ability of active sessions to control non-active packets, and blind addressing. Implementing these ideas, we built a system that enables the safe execution and rapid deployment of new distributed management applications in the network layer. This system can be gradually integrated in todays IP network, and allows smooth migration from IP to active networking.
1
Introduction
The emerging next generation of routers exhibit both high performance and rich functionality, such as support for virtual private networks and QoS [11]. To achieve this, per flow queueing and fast IP filtering are incorporated into the router’s hardware [11]. The management of a network comprised of such devices and efficient use of the new functionality introduces new challenges. Active networks is a framework where network elements, primarily routers and switches, are programmable [14]. Programs that are injected into the network are executed by the network elements to achieve higher flexibility for networking functions, such as routing, and to present new capabilities for higher layer functions by allowing data fusion in the network layer. This work suggests a novel active network architecture which primarily addresses the network management challenges. At its center, an active engine executes programs that are received through the network. The active engine is Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 220–231, 1999. c Springer-Verlag Berlin Heidelberg 1999
An Active Network Approach to Efficient Network Management
221
attached to an IP router, and together they form an active node. We introduce the notion of a session, which generalizes the soft state mechanism that appears in many active network architectures [10,12]. This enables long lasting applications that are typical to network management, to reside in the active node. For a scalable distributed operation we introduce a new addressing mode, blind addressing, in addition to the explicit mode available in IP. We explicitly allow sessions to access the router management information base (MIB) using SNMP. Overall, we introduce a modular solution that enables an easy deployment in the current IP networks. This paper describes a working network prototype. The active engine is written mostly in C, and is demonstrated on a network comprised of software routers running on FreeBSD PCs and a Cisco 2514 router with a PC as an adjunct active engine. We put an emphasis on standard APIs and tools. The mobile code is written in Java, it is encapsulated together with data using the standard ANEP headers [2] over UDP. The engine communicates with the router using SNMP which enables it to monitor and control the router’s operation. Our approach can handle the entire range of active networking, from capsules to programmable switches. Capsule applications carry their code and terminate after execution. Programmable switches are implemented by “well-known” session ids that may receive data and act on it. We also allow authorized sessions to intercept non-active packets and manipulate their data, change their routing, drop them, etc. Such authorized sessions can also change the MIB variables in the router using SNMP. Another feature of the architecture is the ability of nonactive packets to request a special service, e.g., routing, that is implemented by a transparent resident session that is mapped to this service at the router. Network management applications are traditionally centralized around some manager. The manager queries the managed objects, builds a view of the network, and sends alerts if a problem is detected. The manager can also try and take corrective actions by sending configuration commands to network entities. The recent trend in network management architectures is to rely on multiple levels of abstraction, e.g., CORBA, Java ORB, Java RMI, Styx, DCOM, and Directory Enabled Networks (DENs). As a result, the cost of management is obscured from the application programmer, and thus neglected. If this trend will continue, management may consume increasing portions of network resources (bandwidth, buffer space). As is well put in the last chapter of [15]: “When CORBA is used the wrong way, the implemented applications, although they are functionally complete, can have performance and scalability problems.” We believe that our approach presents a better alternative to the current practice. It calls for the distribution of the management task in the network, enables shorter control loops, deletes long haul dissemination of redundant and unimportant information (”I’m OK” messages), and facilitates new exciting applications. The framework forces the programmer to be aware of efficiency issues and thus will result in more efficient code not only due to its intrinsic capabilities to do so, but also due to the human change of focus. Other agent based approaches [8,9]
222
Danny Raz and Yuval Shavitt
that enable distributed computing rely heavily on bandwidth blind approaches such as Java RMI and thus do not result in efficient usage of network resources. The rest of the paper is organized as follows. Section 2 gives a short overview of the system we built. A detailed comparison between our design and existing systems appears in Section 3. Section 4 describes the system architecture and flow of information. Section 5 shortly describes implementation examples. We discuss future work and give our concluding remarks in section 6.
2
System Overview
Logically, an active node in our system is comprised of two entities: an IP router, and an adjunct active engine (AE). The IP router component performs the IP forwarding, basic routing, and filtering that are part of the functions performed by today’s commercially off-the-shelf (COTS) IP routers. The filtering is used to divert packets to the active engine. The active engine is an environment, where user written programs can be executed with close interaction to the router data and control variables1 . Physically, the IP router and the active engine may either reside on different machines or coreside inside the same box. This structure enables us to upgrade any COTS IP router to an active router simply by adding an adjunct active engine. The separation protects non-active traffic from the effects of erroneous operations of the active part of the network, and inflicts minimal additional delay on non-active traffic. It also makes gradual deployment of active nodes in current networks easy. A logical distributed task is identified by a globally unique number called a session id. When code associated with a non-existing session arrives, it is executed and creates a process that handles all the packets of that session. Such a process can either handle only a single data packet and terminate (capsule), or it can exist in the AE for a long period of time handling many data packets as required by many network management applications. To perform network layer tasks, sessions must have access to the router’s network layer data, such as, topological data (neighbor ids), routing data, performance data (packets dropped, packets forwarded, CPU usage etc.) and more. We use SNMP as the interface between the router and the AE. Standard SNMP agents exist in all routers and enable a read/write interface to a standard management information base (MIB). In order to perform distributed tasks, an active node must have means to communicate with other active nodes. Relying on the fact that the full topology information is available for any specific node does not scale. To tackle this problem we support a topology-blind addressing mode that enables a node to send a packet to the nearest active node in a certain direction. This mode is useful for topology learning, robust operation, support of heterogeneous (active and non-active) environment, etc. We also support the explicit addressing mode in which a packet is sent to a specific active node. 1
The AE can be perceived as an execution environment in the context of [6].
An Active Network Approach to Efficient Network Management
223
Overall, we built a system that enables the safe execution and rapid deployment of new distributed management applications in the network layer. This system can be gradually integrated in todays IP network, and allows smooth migration from IP to active networking. To facilitate this, we introduce novel architectural features such as: isolation of the active mechanism, the session concept, the ability of active sessions to control non-active packets, and blind addressing.
3
Related Work
Recently, research in active networking is gaining popularity. Some of the research groups in this area are: MIT (ANTS) [16]; U. Kensas [10]; U. Penn. (SwitchWare) [1]; Georgia Tech. [4]; Columbia (NetScript) [17]; UC Berkeley (MeGa) [3]; Washington University (DAN) [5], and more. In this section we compare our architecture with the menagerie of existing active network architectures and with other agent based approaches. In our design, we separate the active engine where active code is executed from the router itself. This approach makes deployment easier and poses less threat to the non-active traffic in case the active engine breaks down. A similar approach was recently reported by Amir et al. [3], however, they are limiting the scope of their active server to the application level and thus limiting its capabilities. Bhattacharjee et al. [4] also suggested a similar approach but for a very restricted active server that can support only a given set of functions. In most other works, the active part and the non-active part are not well separated. ANEP [2] is incorporated into several of the existing projects. It is already used in [10] and will be used in [16]. We can only speculate whether the current research projects will converge eventually to a unified environment using ANEP as the wire encapsulation. Interestingly, SNMP escaped the notice of all the active network projects but one [17]. We believe the use of SNMP is the most attractive option to integrate active network technology with existing routers. Most agent based systems [8,9] reside in the application layer and thus do not have access to network layer information. A recent first step in addressing the need of agents to interface with network layer information is presented by Zapf et al. [18]. They allow their application layer agents to access router information through an intermediate resident application in the router using SNMP interface. Another interesting work was presented by Hj´ almt´ ysson and Jain [7]. They built a system where installed agents can manipulate data streams in a router. Though similar in flavor to this work, Hj´ almt´ ysson’s work suggests a new router design, while we emphasize the use of legacy routers.
4 4.1
Architecture Design Principles
Targeting specifically the network management domain, the following principles guided our design:
224
Danny Raz and Yuval Shavitt
Generality and Simplicity Building applications should be easy to a large base of programmers. Thus, the system should not be limited to one language, and should support languages that are in general use. The node should also be general enough to support many levels of active networking, from capsules to programmable switches. Modularity We separate the active node to modules with clearly defined API between them. In particular, we chose to separate the forwarding mechanism of a regular router from the operating environment where the packets are executed. We also use, as much as possible, well accepted standards, such as Java, SNMP, and ANEP [2], as the API in which the modules exchange information. Inter-operability and heterogeneity Most likely, active nodes will co-exist with non-active routers. Furthermore, incremental deployment of active nodes with co-existing routers seems a natural evolvement path. In such a scenario it is very unlikely to assume that an application running on an active node could explicitly know the addresses of its active neighbors. To this end we support “blind” addressing, in which the active node needs not know the address or the location of other active nodes. Long Lasting Sessions In many network management applications there is a natural need for an application to reside in a node for a long period of time (for example to do monitoring and billing). This cannot be efficiently implemented with capsules. As we do target the network management domain we specificly design the system to support such applications. It is also very important for an application to have an easy and standard access to the local information at a node, since in many applications the action taken by the packet depends on this information. Cost Visibility Although we wish to abstract most of the technical details in order to simplify the development of applications, we think that the application must be aware of the costs, both in terms of node resources (CPU, memory, etc.), and in terms of global network resources (bandwidth and delay). Therefore, we do not use advanced distributed tools such as CORBA and Java RMI, which in general hide much of the actual cost from the user. Safety and Security Non-active traffic should not be affected by the new active ability. Further more, an active application should not be able to affect any other application. The system should support security and robustness at all levels. The above principles directed us to make the following architectural decisions: (1) The active node is composed of a regular router with a diverter, which detects and diverts active packets to the main separate component - the Active Engine; and (2) The AE is a separate entity (which may reside on a different card, or a separate machine), which performs most of the active node’s task. This simple modular structure supports inter-operability, and does not require that the specific address of the next active hop will be known. The diverter part is fairly simple, and can be carried out using IP filtering which is supported in the API level by most of the router vendors. This structure also allows an easy incremental deployment in heterogeneous networks. Another advantage of this design is robustness; non-active traffic could not be effected by active traffic.
An Active Network Approach to Efficient Network Management
225
Even if for some reason the active engine stops working the router will still route non-active packets correctly. The second significant entity in our design is the session which serves both as a mechanism that preserves soft state, and as a rendezvous point for data fusion. Logically, a session is a distributed task preformed in the network. A session has a unique network id, thus, different programs, on various nodes can belong to the same session. These programs may exchange information using active data packets, and they can distribute (and/or update) their code by sending active programs. This notion of a session is general enough to support both long lasting processes, and short term capsules. The fine details of the design are described in the following subsections. 4.2
Detailed Design
session 1
session n Active Engine
manager security IP
SNMP
forwarding router
diverter
Fig. 1. The general architecture. The main components of the system are (see figure 1): Diverter — A part of the router that enables it to divert packets to the AE based on their IP/UDP header. The new generation of high-performance IP routers [11] has this option implemented as part of the router hardware. Edge routers and our prototype perform this function in software. Active Manager — The core of the AE is the Active Manager. This part generates the sessions, coordinates the data transfer to and from the sessions, and cleans up after a session when it terminates. While a session is alive, the Active Manager monitors the session resource usage, and can decide to terminate its operation if it consumes too much resources (CPU time or bandwidth) or if it tries to violate its action permissions. Security Stream Module — This module resides in kernel space below the IP output routine. Every connection that the session wishes to open, must be registered with this module to allow monitoring of network usage by sessions. The registration will be done by our supplied objects transparently to the application developer. The module is not fully implemented, yet.
226
Danny Raz and Yuval Shavitt
Router Interface — This module allows sessions to access the router Managed Information Base (MIB). It is implemented as a Java object that communicates with the router using SNMP. In the future we plan to enhance performance by caching popular MIB variables. The design allows multiple languages to be implemented simultaneously, but since the current implementation handles only Java packets we will restrict the description to the details of the Java implementation. Implementation of other languages may require some adaptations according to the language specifics. In the following we describe the flow of packets through the system. Note that a non active packet does not pass through the Active Engine since the diverter recognizes it as such and thus the packet takes the fast-track to its output port. All active packets include a default option that contains among others the unique session id of the packet, and a content description (data, language). All the diverted packets are sent to the active manager. If a packet does not belong to an existing session and it contains code it triggers a creation of a session. If it is a data packet it is discarded. A session creation involves, among others, authentication (not implemented), creation of a control block for the session, creation of a protected directory to store session files, opening of a private communication channel through which the session receives and sends active packets, and execution of the code. Methods that are associated with the session object allows the Java program to easily send itself to another node, and to send and receive data. New arriving programs are passed to the session to allow it to perform code updating without losing its state. Four UDP port numbers (3322-5) are assigned to active network research. The first, the blind addressing port, is used to send active packets to an unspecified node in a certain direction, i.e., towards some distant destination. The diverter in the first active node that is on the route to that destination intercepts the packet and sends it to the active engine. Therefor, the sender is not required to know the address of the next active node. The second UDP port number (the explicit active port) is used to send an active packet to a specific active node. This packet is forwarded through the fast-track of all the intermediate active nodes, and is not diverted until reaching its destination. The active manager keeps track of the resource consumption of the session in the node (CPU time, bandwidth, disk space). A session that consumes excessive resources is aborted. A session may also be aborted due to lack of activity. Since we expect most network programming to be stable, we do not try to optimize the capsule model. Thus, we are less concerned about the program size as programs are not going to be transmitted frequently. A mechanism to reassemble a program from a chain of UDP packets is currently implemented. 4.3
Security
Security and safety are of major concern in the deployment of active networks. A system is safe if no application can destroy or damage the appropriate execution
An Active Network Approach to Efficient Network Management
227
of other applications. In particular, the active engine as a whole should not effect the routing of non-active packets. A system is secure if all operations including access to data are authenticated, i.e., only authorized sessions can perform actions and or access private data. Our architecture supports both security and safety, although currently it is not fully implemented. In any design, one faces the dilemma of choosing between the freedom to allow more sophisticated session behavior (e.g., setting MIB variables, diverting non-active packets) and the fear of a possible safety/security hole. Our approach allows multiple levels of security via authentication and session classification. Each session is authorized to use specific services (MIB access for read or write, divert non-active packets) and resources (CPU time, bandwidth, memory). As it is important to ensure both safety and security in order to promote the use of active network, one can initially select to be more restrictive in authorizing services, and gradually allow more sophisticated services. Our first concern is to make sure that non-active packets are not affected by the active packets. This is easily achieved by the logical (and sometimes physical) separation of the active engine from the router. Next step in safety, is to ensure that a session will not corrupt or even gain access to other sessions data. We achieve this through the use of Java SecurityManager. It allows us to control the session running environment, in particular we prevent sessions from using native methods and restrict the use of the file system. Malicious or erroneous over use of system resources is of great concern. To this end, we intend to monitor the use of CPU time by sessions. We implemented a tight control over the usage of the communication channel to the outside world. TCP connections can be only opened by a permitted session using our supplied methods that monitor the bandwidth consumption. An attempt to use Java methods is blocked by controlling the IP layer in the active engine. An unauthorized connection will be dropped. UDP packets can be sent only through the manager, which again can monitor the bandwidth usage. 4.4
Test Bed Description
We built a small heterogeneous network as described in figure 2. The network is comprised of both FreeBSD based active routers and COTS routers (currently we use CISCO 2500 routers and Lucent Technologies RABU PortMaster3) with adjunct active component. The FreeBSD routers are PCs running FreeBSD, using routed for routing. In these PCs, the active engine and the router coreside in the same machine. Performance In building the prototype we did not aim at performance. Our first target was to build a concept system that will enable us to test design ideas and applications. Thus, many parts of the system where not optimized for performance. Nevertheless, we tested the capabilities of our prototype. Thus the delay of an active packet through the system should be treated as an upper bound on what can be accomplished and the load measures as lower bound.
228
Danny Raz and Yuval Shavitt tishrey
heshvan
kislev
razcisco
Internet
adar
router
Internet
tevet
shvat
active engine
Fig. 2. The prototype network architecture.
Our first experiment was to measure the delay of an active packet through one active node. To this end we used a session on heshvan that forwards every packet it receives to the destination on the packet. Java applications on tishrey and kislev (see figure 2) exchange UDP packets using either an active port that is diverted to heshvan’s active engine, or a non-active port that is forwarded by heshvan’s router. The packets were about 500 bytes long, and the network was kept without additional traffic to prevent queueing delay to effect the results. The average round trip delay (RTD) for a packet was 1.37ms without diversion, and 11.20ms with diversion. The 90% confidence interval are 1.37±0.0047ms (∼0.68% wide), and 11.20±0.022ms (∼0.39% wide). Thus the average delay through heshvan’s active engine was (11.2 − 1.37)/2 = 4.915ms. Heshvan is a Pentium machine with 64Mbytes memory, that runs at 200MHz. It is obvious that the active engine’s performance is bounded by the memory access speed and not by the computation power of the processor. We did not conduct a full scale stress test, but we repeated the above experiment with ten sessions active in heshvan, and the delay through the active engine did not change.
5
Application Examples
To demonstrate the power of our system, we consider two problems both related to network management but represent basic problems that can be used in various applications. The first one is bottleneck detection, which is a special case of collecting information or calculating a function along a route between two nodes. The second application is a message dissemination for a large group of receivers. It is useful for an automatic configuration of network elements or any other application that require dissemination of messages to a large population. In this section, we briefly discuss the bottleneck detection applications and refer the reader for more details to [13]. Bottleneck detection is an important problem faced in network management. It is a building block for higher level applications, e.g., video conferencing, that require QoS routing. It is also an example for any problem related to gathering information along a given path between two network nodes.
An Active Network Approach to Efficient Network Management
229
In today’s IP networks there is only one ad-hoc technique to examine one specific QoS parameter, namely the delay along a path. It is the well-known traceroute program that enables a user at a host to get a list of all the routers on the route to another host with the elapsing time to reach them. The use of the traceroute program for network management has several drawbacks: it can only retrieve the hostname and the delay along a path; it is extremely inefficient in its use of network resources; and it is slow.
(A)
(B)
(C)
Fig. 3. Three traceroute executions on a three-hop path: (A) the current program; (B) collect-en-route; and (C) report-en-route
In an active network, and specifically in our architecture, there are several options to gather information along a given path between two network nodes, each optimize a different objective. One option (collect-en-route) is (see figure 3(B)) to send a single packet that traverses the route and collects the desired information from each active node. When the packet arrives at the destination node, it sends the data back to the source (or to any management station). This design minimizes the communication cost since a single packet is traveling along each link in each direction. Another option (report-en-route) is (see figure 3(C)) to send a single packet along the path. When the packet arrives at a node, it sends the required information back to the source and forwards itself to the next hop. This design minimizes the time of arrival of each part of the route information, while it compromises communication cost. Note that the traceroute program has (see figure 3(A)) time and communication complexities that are quadratic in the path length. Table 1 compares the three options. The use of general programs in the capsule enables the application programmer to query any available (MIB) variable from the router, rather than just the router IP address. For example, for bottleneck detection we can collect statistics about TCP packet loss along a route to a certain host. It is easy to generalize the program to start the data collection from any active node in the network (not necessarily the originator), and to send the reports to any other node.
230
Danny Raz and Yuval Shavitt
Algorithm Used traceroute collect-en-route report-en-route
No. of messages used time of data arrival from node i n(n + 1) i(i + 1) 2n 2n n(n + 3)/2 2i
Table 1. Performance comparison (time is measured in hop count).
6
Discussion and Future Work
Since the inception of the active network idea there is a search for the ”killer application”, the one that will strongly require active network technology. We believe that network management is a domain where using active network technology could be proved to be very significant. Applications like adaptive control, router configuration, element detection, network mapping, security management (intruder detection, fighting deny of service attacks) are only some examples where active network technology can be successfully applied. Our architecture also supports solutions to other problems not necessarily part of network management, such as, search worms, smart mail, multicast, hop to hop flow control, etc. The additional delay seen by packets in active networks, is an issue that was addressed in the past, and as others pointed out we believe the small slow-down is compensated by the big saving that can be achieved in traffic volume. The analysis presented in section 5 is a first step towards a more formal framework to address this point. In our prototype, non-active packets suffer only the negligible additional delay of the diverter. And although the active engine is not optimized in the current version, as the emphasis is on functionality, the delay through the AE is reasonable. Altogether, we presented a prototype implementation of a network management engine built using active network technology. We believe our approach can enable new and efficient ways to manage today’s (and tomorrow’s) networks.
References 1. D. S. Alexander et al. The SwitchWare active network architecture. IEEE Network, 12(3):29–36, May/June 1998. 223 2. D. S. Alexander et al. The active network encapsulation protocol (ANEP). URL http://www.cis.upenn.edu/∼switchware/ANEP/docs/ANEP.txt, 1997. 221, 223, 224 3. E. Amir, S. McCanne, and R. Katz. An active service framework and its application to real-time multimedia transcoding. In SIGCOMM’98, Sept. 1998. 223 4. S. Bhattacharjee, K. Calvert, and E. W. Zegura. An architecture for active networking. In HPN’97, Apr. 1997. 223 5. D. Decasper and B. Plattner. DAN: Distributed code caching for active network. In INFOCOM’98, Mar. 1998. 223
An Active Network Approach to Efficient Network Management
231
6. AN Working Group. Architectural framework for active networks. URL http://www.cc.gatech.edu/projects/canes/arch/arch-0-9.ps, August 31 1998. Version 0.9. 222 7. G. Hj´ almt´ ysson and A. Jain. Agent-based approach to service management towards service independent network architecture. In IFIP/IEEE IM’97, pages 715 – 729, May 1997. San Diego, CA, USA. 223 8. G. Karjoth, D. B. Lange, and M. Oshima. A security model for aglets. IEEE Internet Computing, 1(4):68–77, July/August 1997. 221, 223 9. J. Kiniry and D. Zimmerman. A hands-on look at java mobile agents. IEEE Internet Computing, 1(4):21–30, July/August 1997. 221, 223 10. A. B. Kulkarni, G. J. Minden, R. Hill, Y. Wijata, S. Sheth, H. Pindi, F. Wahhab, A. Gopinath, and A. Nagarajan. Implementation of a prototype active network. In OPENARCH’98, pages 130–143, Apr. 1998. 221, 223 11. V. P. Kumar, T. V. Lakshman, and D. Stiliadis. Beyond Best Effort: Router Architectures for the Differentiated Services of Tomorrow’s Internet. IEEE Communications Magazine, 36(5):152–164, May 1998. 220, 225 12. E. L. Nygren. The design and implementation of a high-performance active network node. Master’s thesis, Massachusetts Institute of Technology, Feb. 1998. 221 13. D. Raz and Y. Shavitt. An active network approach to efficient network management. Technical Report 99-25, DIMACS, May 1999. 228 14. D. L. Tennenhouse et al. A survey of active network research. IEEE Communications Magazine, 35(1):80–86, Jan. 1997. 220 15. A. Vogel and K. Duddy. JAVA Programming with CORBA. Wiley, 2nd ed., 1998. 221 16. D. Wetherall et al. ANTS: A toolkit for building and dynamically deploying network protocols. In OPENARCH’98, pages 117–129, Apr. 1998. 223 17. Y. Yemini and S. da Silva. Towards programmable networks. In IFIP/IEEE Intl. Workshop on Distributed Systems Operations and Management, Oct. 1996. 223 18. M. Zapf, K. Herrmann, K. Geihs, and J. Wolfgang. Decentralised SNMP management with mobile agents. In IFIP/IEEE IM’99, May 1999. 223
Virtual Networks for Customizable Traffic Treatments Jens-Peter Redlich, Masa Suzuki, and Steve Weinstein C&C Research Laboratories, NEC USA, Inc. 4 Independence Way, Princeton, NJ 08540, USA {redlich,masa,sbw}@ccrl.nj.nec.com
Abstract. Selective treatments are needed for different types of traffic and different user groups, even in the Internet. Virtual networks can help to partition physical network resources, where the resulting partitions may implement their own, independent control and processing mechanisms. This allows for customizable traffic treatment that is not available in either Integrated Services or Differentiated Services. This paper describes a networking strategy incorporating intelligent routers that implements the gateway between the end-user and the core (virtual) network. Intelligent routers classify end-user traffic and assign it to virtual networks provided by the core network, according to a programmable policy. Furthermore, an intelligent router may process transmitted data, according to QoS needs of the application and the core network’s resource allocation, as well as in support of higher-level application functionality. Open, CORBA-based interfaces allow for control of the intelligent router, including dynamic download of code which is used to extend the router’s functionality on the fly, as new applications or user requirements need to be supported.
1
Introduction
A virtual network (VN) is a service concept. It is characterized by a set of communication capabilities for an ensemble of communication calls, flows, or sessions that have something in common with one another. The Internet as a whole may be characterized as a virtual network which provides a best effort data forwarding service, augmented with higher level services, such as name services (DNS), management (ICMP, SNMP) and others, with a protocol stack grounded on IP. Other virtual networks may use the same physical infrastructure, but provide additional services or QoS features. Access to these VNs is usually restricted, as the VN features are not universally available or are reserved for usage by authorized users. Efforts such as COPS [ref] and DIAMETER[ref] are beginning to define client/server protocols to exchange policy information needed for allocation of resources to virtual networks.
Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 232-239, 1999. Springer-Verlag Berlin Heidelberg 1999
Virtual Networks for Customizable Traffic Treatments
233
Virtual networks are needed to avoid, on the one hand, the cost and complexity of providing a dedicated physical network for each traffic type with its associated preferred treatment, and, on the other hand, building a very complex "all services" network. Figure 1 illustrates the three alternatives of overlaid physical networks, overlaid virtual networks, and an all services network (as Broadband ISDN was envisioned to be [1]). Creating VNs is really a modularization strategy for an all services network that is scalable (in services) and not overwhelmingly complex. Some potential benefits, in addition to the architectural benefit described above, are: • • • •
Creation and deletion of special treatments (routes, processing, etc.) for aggregations of traffic, as needed. Using different, customized control algorithms in the different VNs. Providing built-in separations (enhancing privacy) between traffic aggregations with different sensitivities, purposes, or owners. Providing communications services customers with virtual private network (VPN) services in the Internet with the same and greater control capability they enjoy with VPN services in the switched telephone network.
There are, of course, some difficulties and potential disadvantages of VN architectures. First, there is the inefficient use of bandwidth, if pre-allocated in bulk for each VN rather than maintained as one fully shareable resource [2]. Second, it is a severe security challenge to let multiple entities control and manage pieces of the same physical resource [3]. The first concern is being addressed with techniques for quickly reallocating resources among VNs, and the second with operating system-like memory protection and CPU scheduling mechanisms, in conjunction with encryption, authentication, authorization and logging services.
Physical network 1 2
a)
1 2 1 2
Physical network 2 b)
Switch/router resources divided 1 2b telephony (1) and data (2) services separate control plane ih
c)
Switch/router resources allocated flflow with b a common control plane
Fig. 1. Three ways to realize an all-services communications capability, illustrated for voice and data traffic aggregations. (a) Physical overlay networks. (b) Overlaid Virtual Networks. (c) A single all-services network.
234
Jens-Peter Redlich et al.
2 Traffic Customization Precedents : Traffic Grooming, Virtual Private Networks, and Virtual Paths
The concept of virtual networks is already well-established in public telecom networks, albeit with limited flexibility. Three forms of virtual networks are common: 1 Traffic Grooming. 2 Virtual Private Networks. 3 Virtual Paths. Traffic grooming is a very old technique used in the telephone network to group traffic with similar characteristics, often related to destination, in order to more efficiently utilize transmission facilities. More recently, SONET add/drop multiplexers have been designed to route local traffic directly from one low-speed port to another, rather than multiplexing and de-multiplexing in the high-speed passthrough stream. Voice and data traffic may be separated in digital cross connects (DXCs) and sent through separate trunks to voice and data switches, and Digital Subscriber Line (xDSL) services separate data from voice traffic at the central office. Finally, traffic grooming is appearing in SONET/WDM ring networks as a way of reducing multiplexing costs by grouping similar traffic on particular wavelengths [5]. Virtual private networks (VPNs) are provided to large customers, such as companies linking different locations, who "...do not perceive that they are sharing a network with each other ... you think you have it, but you don't" [1]. A telephone DXC or switch can support the virtual links of several large customers, each of whom may be given some degree of control, particularly in reconfiguring cross connects. The virtual private network is ordinarily changed infrequently and slowly, and is usually restricted to grouping traffic on an owner (source and destination address) basis. Virtual private networks will continue to be an important virtual network category, with more recent work focusing on realizing finer resource dividing strategies and more customized control of routing and other VPN-specific treatments [6]. Virtual paths (VPs) [1] are groupings of ATM virtual circuits (VCs) that are traveling between the same end switches, sometimes groomed so that a VP carries virtual circuits of a particular service class or a particular user or user group. Use of virtual paths allows reuse of virtual circuit identifiers (VCIs) for circuits using the same switches but having different virtual path identifiers (VPIs), and reduces the processing load on intermediate switches which need only process VPIs. Virtual networks are useful in realizing both traffic grooming and VPNs. However, VPs are still just address-controlled routing mechanisms and do not allow the full range of aggregated traffic definitions and treatments that could be possible in virtual networks.
Virtual Networks for Customizable Traffic Treatments
3
235
Intelligent Edge Router
Nowadays Internet applications are not prepared for explicitly using Virtual Networks, i.e. to select a Virtual Network according to the applications QoS requirements and to utilize its build in processing and control features. New Internet signaling protocols, such as RSVP, are a first attempt to let applications explicitly request and configure resources from core network elements. But still, most applications are not RSVP aware and it will take some time until applications can assume ubiquitous support for RSVP in the network core. In addition, an application that signals directly to network elements lacks the global picture. It does not know about other applications or about other users. It does not know which services are most crucial to the end-user, as opposed to those that run just as background entertainment. Moreover the application does not know about the importance of a user within a community such as a corporation. How should a Websurfer know that he should reduce his resource consumption, because the marketing department has an IP telephony session with a big potential customer? To overcome these problems, we propose an architecture that decouples applications from the resource allocation mechanisms of the core network. In this architecture, the resource allocation function is provided by the router that connects an end-user LAN with the Internet core network. Because this router provides programmable functions in addition to its routing and forwarding function, we refer to it as an intelligent router (IR). The concept of allocating resources to traffic on the basis of attributes of the traffic rather than specific signaling requests is not new. There are very limited realizations in, for example, firewalls that block traffic from certain addresses or applications. What is new here is the flexibility of classification and resource allocation. We can quickly program new criteria for traffic classification, and set up a wide range of treatments for the VNs associated with these traffic classes. This is primarily supported by the intelligent router’s build-in capability to dynamically download compiled program code (share libraries), that are linked to the Router at runtime. Furthermore, we propose an open programming interface for setting up, controlling, and managing VNs from possibly remote locations. ISP’s Virtual Networks
LAN
Added processing
VN1 (IntServ) traffic mix Intelligent Router (classifies incoming traffic)
e.g. RSVP
VN2 (Gold) DiffServ
VN3 (Bronze) local network
Internet Core Network (WAN)
Fig. 2. An Intelligent Router (IR) is used to decouple applications on the LAN from the resource allocation mechanisms used inside the core network. A Virtual Network may include processing capabilities in addition to data forwarding service.
236
Jens-Peter Redlich et al.
Figure 2 shows our configuration. We assume that for all communication between any two hosts of the LAN, bandwidth and maximum delay meet all application requirements, or if not, that there are mechanisms that can resolve those LANresource allocation conflicts. These mechanisms are outside the scope of this paper. On the other side, the intelligent router is connected to an ISP, which may offer several virtual networks to its customers. Each of these virtual networks may have very specific QoS characteristics, pricing structures and control interfaces. However, as explained above, these virtual networks may share the same physical hardware (e.g. a wire coming out of the wall) or they may actually use a composition of physical access facilities, perhaps belonging to different ISPs. In addition to the virtual networks that are provided by the ISP, the intelligent router may implement its own set of virtual networks, each one with unique, value added functions and interfaces. The intelligent router is responsible for assigning the resources provided by the ISP to the various flows of IP packets that are emitted from the applications running on the LAN. This resource allocation process is governed by a policy, which is usually defined by the LAN administrator. For a small company’s network, this policy may, for instance, require that: -
Traffic originated from the CEO has preference over traffic from staff members. FTP traffic during working hours has lowest priority (except for those people that are assigned to a high priority software development project). HTTP traffic from summer interns is blocked completely (except traffic to allowed Web sites, such as the company’s headquarters web server). Email has preference over FTP.
In order to make its decisions, the intelligent router must analyze the traffic it has to forward. The source IP address can be used to determine the user who is associated with this traffic (assuming that an additional component maintains a mapping between users identifications and the IP addresses of their machines). If UDP or TCP is used (which is likely), the port numbers can be used to specify the service/application that is associated with this traffic. For big servers, e.g. of an Internet bank, the destination IP address can be used to determine the associated service. In addition, the application may use signaling, such as RSVP, to indicate its requirements. However, in this case, RSVP is terminated at the intelligent router and is used only to provide additional information about the associated flow of IP packets. An Internet telephony example helps illustrate how implicit or explicit application requirements are mapped to ISP resources, i.e. to the virtual networks provided either directly by the ISP or by the intelligent router. We assume that the Internet telephony application uses RSVP to signal its requirements for bandwidth and maximum delay to the network. This RSVP signaling is terminated at the intelligent router. Depending on the policies that the LAN administrator defined for the associated user and for the Internet telephony service, the company's CEO may get an ATM switched circuit for his traffic, with bandwidth and delay requirements derived from the RSVP information. Staff members might get the “gold service” from the ISP’s differentiated services virtual network, which in most cases shows good enough behavior for this type of traffic (but which is without any guarantees).
Virtual Networks for Customizable Traffic Treatments
237
Summer interns may run similar applications, but since their traffic is assigned to the “best effort” virtual network they may be not quite satisfied with the QoS they get most of the time. An alternative to using an ATM switched circuit for the CEO’s traffic is setting up a path through the Internet core that guarantees a high service quality for the IP telephony session. This assumes, however, that the ISP supports RSVP in its core network. If the IP telephony application sends its traffic without any additional (RSVP) signaling, the intelligent router applies its default policies without specific knowledge of bandwidth and delay requirements. For instance, for the CEO’s telephone call, the intelligent router can use RSVP to reserve nominal resources in the ISP’s core network (if RSVP is implemented there), even if the CEO's IP telephony application itself is unable to use RSVP. We see here a compromise between intelligent end systems and intelligent network in order to provide higher quality service even if the end system has not evolved to the latest stage. Last but not least, the intelligent router could temporarily change its policy in favor of a certain user, if this user either has the privilege to make such changes (service upgrades) or if the user is willing to pay for the higher service level from the ISP. The summer intern from the above example could purchase premium network support for his IP telephony application, even if he would usually have to use the “best effort” network. Assuming the availability of an infrastructure that allows for efficient and secure micro-payments, i.e. transactions below one dollar, the payment for the higher service could either be provided by the user himself or by his remote partner. Hence, as a courtesy to its customers, the summer intern’s bank could provide him with high service quality if he accesses his banking account through the bank’s Internet site. The use of the intelligent router for decoupling an application’s request for a certain QoS from the network’s resource management has the additional advantage that different mechanisms, i.e. protocols, abstractions, etc. can be used in the LAN and in the Internet core network. As alluded in the above examples, new protocols for letting an application express its QoS requirements can be introduced long before such support is available in the Internet core network. Moreover, the intelligent router may use the very efficient native resource management subsystems and signaling protocols of the underlying network in order to meet the applications requirements. Hence, an application may use native ATM without knowing about ATM at all.
4
Implementation Considerations and Conclusions
The implementation architecture is shown in Figure 3. The programming interface defined in CORBA IDL includes the following functionality: -
A programmable pattern factory that produces patterns, which are used for classifying traffic on the basis of bit-patterns in IP headers. Modification of parameters of schedulers for packet forwarding. Specification of parameters of token buckets for traffic delimiting. Specification of schedulers and traffic shapers to define VNs. Specification of rules for mapping incoming packets into VNs. Monitoring of packet counts in each operating VN.
238
Jens-Peter Redlich et al.
The capacity assigned to each contributing source of traffic for a particular VN is assigned as a fraction of the total throughput measured for that VN. The assignment is implemented with token buckets. The allocations among different VNs can also be modified. Another component, also shown in Figure 3, is a Domain Resources Manager that executes an algorithm for allocating capacity among different VNs. It does this on the basis of traffic congestion information from neighboring routers, obtained through their programming interfaces. A prototyping testbed is being constructed, using a software router running on a PC with Linux as its operating system. The router components are implemented as CORBA objects that can be remotely controlled. With these objects, virtual networks for different classes of traffic have been set up in an Ethernet environment. Initial results show the added burden of using CORBA results in an increased forwarding delay of less than 1ms. This system is intended to be compatible with future CORBA interfaces for network elements following the IEEE P1520 standard [14].
User Interface
DRM other routers
other routers
incoming traffic patterns
Classification Rules
CORBA interface
Schedulers Schedulers Schedulers
Ethernet outgoing Ethernet traffic Ethernet
Intelligent Router
Fig. 3. Implementation architecture of a LAN-based intelligent router.
A small group of everyday networking users in our organization are routed through this infrastructure. We can demonstrate fast creation of VNs and changes in assignments of the traffic of different users to these VNs. One application, for example, is on-demand upgrading of a user’s VN assignment on the basis of an incremental service charge. We are developing additional customized traffic treatments in this prototyping environment. We believe that implementations similar to ours will become a common platform for services flexibility in the Internet of the 21st century.
Virtual Networks for Customizable Traffic Treatments
239
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
11.
12.
13. 14. 15. 16.
Boyle, J., "The COPS (Common Open Policy Service) Protocol", IETF draft draft-ietf-rap-cops-05.txt, January 18, 1999. P. R. Calhoun, A. Rubens, "DIAMETER Base Protocol", IETF draft draftcalhoun-diameter-07.txt, Work in Progress, November 1998. U. Black, ATM: Foundation for Broadband Networks, Prentice Hall, 1995, ISBN 0-13-297178-X. J-F. Huard, A. Lazar, "A programmable transport architecture with QoS guarantees", IEEE Commun. Mag., October 1998. S. Alexander, W. Arbaugh, A. Keromytis, J. Smith, "Safety and security of programmable network infrastructures", IEEE Communications Magazine, October 1998. P. Ferguson, G. Huston, Quality of Service, Wiley, 1998, ISBN 0-471-24358-2. E. Modian, A. Chiu, "Traffic grooming algorithm for minimizing electronic multiplexing costs in unidirectional SONET/WDM ring networks", 1998 Conference on Information Systems Sciences, Princeton. J. Rooney, J.E. van der Merwe, S.A. Crosby and I.M. Leslie, "The Tempest: A framework for safe, resource-assured, programmable networks", IEEE Communications Magazine, October 1998. IETF, RFC1633, “Integrated Services in the Internet Architecture: An Overview”, R. Braden, et. al., June, 1994. IETF, Internet Draft, “An architecture for Differentiated Services”, S.Blake, August, 1998, available at http://search.ietf.org/internetdrafts/draft-ietf-diffserv-arch-01.txt. IETF Internet Draft, “Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers”, K.Nicholas, et. al., August, 1998. available at http://search.ietf.org/internet-drafts/draft-ietfdiffserv-header-02.txt IETF, Internet Draft, “Management of PHBs”, M.Borden, et. al., August, 1998, available at http://search.ietf.org/internet-drafts/draftietf-diffserv-phb-mgmt-00.txt. IETF RFC2205, “Resource ReSerVation Protocol (RSVP), V1, Functional Specification”, R.Braden, et. al., September, 1997. IETF, RFC1825, “Security Architecture for the Internet Protocol”, R. Atkinson, August 1995. R. Dighe, M. Suzuki and S. Weinstein, "The Global Internet: A New Perspective on Broadband Access to the Internet", Proc. IEEE Globecom'98, Sydney, November 1998. J. Biswas, " IEEE P1520 Standards Initiative for Programmable Network Interfaces", IEEE Communications Magazine, October 1998.
Flexible Network Management Using Active Network Framework Kiminori Sugauchi1, Satoshi Miyazaki2, Kenichi Yoshida1, Keiichi Nakane1, Stefan Covaci3, and Tianning Zhang3 1
Systems Development Laboratory, Hitachi, Ltd. 292 Yoshida-cho, Totsuka-ku, Yokohama 244-0817, Japan {sugauchi,yoshida,nakane}@sdl.hitachi.co.jp 2 Corporate Information Systems Office Hitachi, Ltd., New Marunouchi Bldg. 5-1, Marunouchi 1-chome, Chiyoda-ku, Tokyo, 100-8220 Japan
[email protected] 3 GMD FOKUS Kaiserin-Augusta-Allee 31 10589, Berlin, Germany {covaci,zhang}@fokus.gmd.de Abstract. The growing intelligence of communication equipment and the advances in communication services created a demand for more complex and flexible network management functions. It becomes difficult to handle such a demand by the traditional, simple manager“management agent” paradigm for network management systems. Mobile agent technology is regarded as one of the promising solutions for handling such demand. We evaluate the efficiency of a mobile agents based network management system, quantitatively as well as qualitatively, by using our prototype SDH (Synchronous Digital Hierarchy) test management functions. The results show effective use of mobile agent technology for network management.
1
Introduction
As the network spreads over many companies and homes, many users want to use various types of network services. To satisfy their customers, it is important for the network provider to offer various up-to-date network services. The network elements have to have a flexibility to adapt themselves to such latest services. In other words, network providers have to manage network resources to satisfy user specific requests. In this context, network management systems manage each path of individual user as well as the whole network resource, and are required to have the flexibility so that the network provider can configure the network to satisfy various requests of each customer. In this research, we propose the use of active network framework [9] to realize a flexible network management system. The mobile agent technology developed in the ACTS/MIAMI (Advanced Communications Technologies and Services/Mobile Intelligent Agents for Managing the Information Infrastructure [4]) project plays a Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 241-248, 1999. Springer-Verlag Berlin Heidelberg 1999
242
Kiminori Sugauchi et al.
central role in our approach. In the ACTS/MIAMI project, network management system and service management system using mobile agent technology are developed. The network management in this project focuses on fault management, configuration management, and performance management. In fault management, mobile agents technology is used for the alarm collection and the correlation analysis. Our research is partially based on the results of the ACT/MIAMI project. Here we try to implement the programmable network elements which are the key components of active network. Some other researchers have proposed the use of mobile agent technology to realize flexible network management [1, 2, 3]. In this paper, we advance these ideas by implementing a SDH (Synchronous Digital Hierarchy) test management system prototype. The evaluation results of the prototype reveal the following advantages of our approach: (1) The use of outbound control helps to solve the security problem [14] which is the most important research issue in realizing secure active network. (2) The performance problem [13], which is also an important criticism on the active network, can be partially avoided in our implementation. (3) The computing load for the network management, which can be an important factor in managing large networks, is decreased by the mobile agent technology. The rest of the paper is structured as follows: Section 2 presents the idea for using active network framework in the network management. Section 3 describes a SDH test system prototype. Section 4 summarizes the findings.
2
Network Management Using Active Network Framework
In this section, we describe our approach toward the flexible network management system. Programmable network elements, which are the core elements of active network, are used to realize the flexibility of the network management. The mobile agent technology developed in ACTS/MIAMI project is used to implement the programmable network elements. The first part of this section describes how to implement programmable network elements using mobile agent technologies. Then, we described the problems in Telecommunication Management Network (TMN) in detail. The last part of this section explains how to use programmable network elements to solve the problems.
2.1 The Programmable Network Element Using Mobile Agent Technology The framework of the active network is proposed to realize flexible network services [5, 6, 7]. The key idea of the active network is the use of programmable network element. In the active network, each network element is controlled by the software. The use of the software makes the modification of network functions and the development of new services easy. The hardware provides the basic functions to support software. The basic functions of the hardware involve the data transport functions and the hardware resource control functions. The typical hardware considered is the switching devices such as routers and ATMs.
Flexible Network Management Using Active Network Framework
243
The software are used to implement high level services to satisfy various requirements. We use mobile agent technology [8] to implement the software. Each mobile agent is a delegated autonomous software. It is downloaded to the network element and executes there locally to accomplish the user request. In other words, a Manager system Moving
Access Network Element
Transporting Part Procedural Part Network Element
Fig. 1. The architecture of mobile agent
mobile agent is the program that moves through the network elements and executes its own procedures autonomously. The typical services considered in our active network framework are dynamic routing and QoS. Figure 1 shows how the mobile agents move through the network and perform their tasks. Here, a mobile agent consists of the transporting part and the procedural part. The transporting part controls the migration of the mobile agent. The procedural part performs actual tasks for accomplishing the user request. In some case, the transporting part uses the processing results of its procedural part. Some researchers call similar mobile agent “capsule”, and used it to realize dynamic routing [13]. Note that the use of mobile agent or similar concept such as “capsule” releases the active network from the limitation of the predefined protocol. The necessity of protocol predefinition in the conventional network technology restricts the flexibility of the network. For example, introducing new standard protocol tends to take months or years of standardizing process. Agent technology can make flexible network by speeding up this process.
2.2 Problems in Telecommunication Management The conventional telecommunication management system is based on the TMN model [11]. A typical configuration of the TMN model consists of many operating systems, workstations, and network elements. Besides them, the TMN model has two major networks, i.e. the data communication network (DCN) and the telecommunication network. The telecommunication network transports user data. On the other hand, DCN is a special network designed to control network system. To exchange control information, each network element communicates through the DCN. Management function in the operating system uses also DCN to operate network elements. The DCN is separated from telecommunication network physically or logically. SS7 is a typically used network for this purpose. Important characteristic of this architecture is the outbound control. The key idea of the outbound control is the separation of user data and control data. In this
244
Kiminori Sugauchi et al.
architecture, the user data can not access DCN network. This alleviates the security problem, but we will discuss this latter. On the other hand, two important problems of the conventional approach to the network management are: • Lack of Flexibility: The growing intelligence of communication equipment and the advances in communication services created a demand for more complex and flexible network management function. It becomes difficult to handle such a demand by the conventional telecommunication management system. In the conventional framework, the management function on the network elements is fixed. Thus if new management function is required, the network provider has to suspend network services to reconfigure the management function. But the current keen competition between providers makes such service suspension difficult. • Load balancing of management tasks: In the SDH network, a path between two network elements is a logical unit of data transfer. A STM has data for many paths. Today, a high speed network such as SDH 10 Gbps network can support at least 384 individual path. The basic management task is executed on each path. In some cases complex configurations have to support thousands of paths. In the initial configuration phase, the network provider has to check the function of each path. However, since the number of the paths is not small, a set of the simple connectivity check can make a large processing load on the network management. It also results in a heavy traffic on the DCN. To support variety of services, load balancing mechanism which alleviates this problem is necessary.
2.3 The Merits of Network Management Using Active Network Framework Figure 2 shows the proposed configuration of the network management system. This configuration is based on TMN model. Agent-based programmable network
Operating System
Operating System
Operating System
User network manager
Data communication network
Software
Software
Software
NE(HW)
NE(HW)
NE(HW)
Workstation
Software NE(HW)
Telecommunication Network
Fig. 2. Network Configuration of Proposed Architecture
Flexible Network Management Using Active Network Framework
245
elements play a central role in this configuration. In this configuration, each operating system, workstation, and network element uses the software agent to execute management task. Operating systems, workstations and network elements have a mechanism to support agent system working on them. Separated from the user traffic, which is carried by the telecommunication network, mobile agents are carried by the DCN. This configuration has the following merits: • Flexibility: Since the new management functions are easily introduced in the network as a new agents, the network management system becomes flexible. When the network provider starts a new network service, the provider has to create a new management agent which supports new service. Without suspending network services, the provider can activate the new management agent so that it supports new service. • Load balancing for network management: Management function based on mobile agents can move around network elements and operating systems. Thus it is not necessary to pre-install all the management functions on the network elements and the operating system. Although the management functions required for the each network service are different, the mobility of the agent can install required management functions on where it is necessary. While the mobile agents execute, the operating system does not manage that function. This reduces the consumption of DCN network bandwidth, network element’s storage and workload of operating system and network elements. Note that our approach is free from two important criticisms of the active network, i.e. security and performance. • Security: As the network becomes the important information infrastructure of the society, how to keep network secure becomes one of the critical issues. Scott et al. [14] proposes the use of program verification techniques to keep the active network secure. However, no existing implementations of active networks use their techniques, and the security issue still remains as a future research issue in general.We use active network framework only on the DCN network. And the conventional techniques are still used for the telecommunication network which carries the user traffic. In other words, our architecture separates the DCN from telecommunication network. The fact that the user can not access the DCN alleviates the security problem. Even if the user tries to use agent that may disturb network services, our approach does not permit such an agent to influence the network service. • Performance: A typical implementation of active network has about 2,000 packets per minute [13, 14]. Since the current hardware switches can handle more than 2,000,000 packets per minute, there is a strong performance criticism on the active network. In our approach, the bottleneck of the performance does not exist in transporting phase but in the procedural phase for the management tasks. For example, various types of tests are performed in the configuration phase of the network installation. However, such tests require minutes of testing period and the transporting time is negligible. While the various types of management task are performed by the agent system, the user data can still enjoy the fast hardware switching performance in our approach.
246
3
Kiminori Sugauchi et al.
Prototype and Evaluation Results
We have implemented a prototypical network management system. It focuses on network test function. In this section, we first describe the test function. Then we describe the implementation of the prototype and the evaluation results of the prototype.
3.1 The Network Management Function Using Mobile Agents The network management task consists of various sub-tasks. The fault management task is one example of the network management task. It consists of “Detection”, “Isolation”, “Restoration” and “Notification”. Here we focus on the test function which is used in the Isolation and Restoration task. The test function is used not only for the fault management but also for the installation of a new network. Although the test function can concentrate on the fault parts on the network for the fault management, it has to test many paths in newly installed network which will be deployed in the service. If a new network is developed and installed, it is necessary to test all the paths in the installed network. It is not necessary to execute these tests instantaneously, but it is desirable to execute these tests during short period for providing network services quickly. If a network management system executes these tests centrally, it needs high performance computing power and high reliability/availability of the DCN. By using mobile agents, test sequences are processed locally on the network equipment. Mobile agents will inform the network manager only when these tests find problems. In this case, the network manager can focus on the paths which fail in tests.
3.2 Software Structure of Prototyping for Test Function We developed a mobile agents based test management function for SDH network management. Figure 3 shows the configuration of the prototype. In the prototype, each network element is simulated by dedicated software on each computer, and the telecommunication network is a simulated network. DCN network is implemented using TCP/IP network. The simulator used in this prototype emirates the specification of an actual SDH products. The prototype of the test functions was developed based on the specifications of the OSI systems management model. In the standards of OSI management, seven test categories are defined (connectivity test, connection test, loopback test, and so on) [12]. We also developed two advanced test functions by combining the simple OSI test functions. One is multiple connectivity test , that is a test for the path connectivity between two network elements. By moving along the path these agents check the section connectivity of the specified paths. The another is multiple loopback test, that is a test for many paths related to a specified network element. These agents execute multiple loopback tests for plural paths.
Flexible Network Management Using Active Network Framework Agent Database
Agent Server Agent platform
Network Configuration
247
G U I
Java Virtual Machine (JVM) Agent Based Network Manager
SDH Simulator
SDH Simulator
Agent platform JVM
SDH Simulator Agent platform JVM
Simulator
Simulator
Agent platform JVM
Simulator
Fig. 3. Prototype Configuration
3.3 Evaluation Result By using the prototype, we evaluated the effectiveness of network management system which uses the mobile agent technology. The conclusions from the evaluation of the prototype are summarized as follows: 1. Number of network operations: In the connectivity test, the connectivity of a single path is the basic item to be checked. In the SDH network, a single path is divided into many sections. A basic connectivity test is executed on each section. After the section tests, the result of the path test can be created by combining the results of the section tests. In traditional test scenario, manager application invokes 4 or 5 operation for one section test. Therefore the number of the management operation increases to 4 or 5 times according to the number of section test. By using mobile agents, the number of test operations is only two because the test agents can move along the path autonomously. The manager simply receives the result of the path test by sending single agent for the path test. In a traditional system, we need over thousand operations. Autonomy of the agent contributes to reduce these operations. Similarly, multiple loopback test also helps to reduce the number of remote operations, even if 10 Gbps network has at least 384 paths. 2. Performance: Although the performance is a typical criticism on the active network, we confirm that the bottleneck of the performance does not exist in transporting phase in our prototype. The testing phase consumes more time to check the connectivity of the network.
4
Summary
We proposed the use of the active network architecture in implementing a network management system. We use mobile agent technology to realize programmable network elements which give the flexibility and load balancing capability to the proposed network management system. We also evaluated efficiency of the prototype SDH test management system. The finds from the evaluation of the prototype are summarized as follows:
248
Kiminori Sugauchi et al.
(1) The use of outbound control solves the security problem which is the most important research issue to realize secure active network. (2) The performance problem, which is also an important criticism of the active network, can be partially solved by our implementation. (3) The computing load for the network management, which is an important issue of large network management, is decreased by the mobile agent technology. The flexibility of the management system enables the network carrier to introduce new network service without suspending their network.
Acknowledgment We wish to acknowledge Mr. Masanori Kataoka, General Manager of Systems Development Laboratory, Hitachi, Ltd. and Prof. Radu Popescu-Zeletin, Director of GMD FOKUS, for giving us a chance of undertaking this GMD–Hitachi Collaboration research. We also thank Mr. Yusuke Yamamoto of the Telecommunication Division, Hitachi, Ltd. for the discussion about SDH specifications.
References [1] T. Magedanz, T. Eckardt: “Mobile Software Agent: A New Paradigm for Telecommunication Management”, NOMS ‘96 (1996) [2] Y. Kim, et al.: “Design Considerations of a Mobile Agent System for the Network Management Purposes”, APNOMS ’98 (1998) [3] M. Baldi, et al.: “Exploiting Code Mobility in Decentralized and Flexible Network Management”, 1st International Workshop MA '97 (1997) [4] http://www.fokus.gmd.de/cc/ima/miami/ [5] S. Rooney, et al.: “The Tempest: A Framework for Safe, Resource-Assured, Programmable Networks”, pp42-53, Vol. 36 No.10, IEEE Communication Magazine (1998) [6] J. Huard, et al.: “A Programmable Transport Architecture with QoS Guarantees”, pp.54-62, Vol. 36 No.10, IEEE Communication Magazine (1998) [7] D. Wetherall, et al.: “Introducing New Internet Services: Why and How”, pp.1219, Vol. No. , IEEE Network Magazine (1998) [8] V. A. Pham, et al.: “Mobile Software Agents: An overview”, pp. 26-37, Vol. 36 No. 7, IEEE Communication Magazine (1998) [9] M. Gervais, et al.: “Enhancing Telecommunications Service Engineering with Mobile Agent Technology and Formal Methods”, pp. 38-43, Vol. 36 No. 7, IEEE Communication Magazine (1998) [10] M. Breugst, et al.: “Mobile Agents – Enabling Technology for Active Network Implementation”, pp.53-60, Vol.3 No.12, IEEE Network Magazine (1998) [11] ITU-T Recommendation M.3010 (1992) [12] ITU-T Recommendation X.737 (1995) [13] Maria et al.: “Active Network Support for Multicast Applications”, pp.46-52, Vol.3, No.12, IEEE Network Magazine (1998) [14] D. Scott et al.: “A Secure Active Network Environment Architecture: Realization in SwitchWare”, pp.37-45, Vol.3, No.12, IEEE Network Magazine (1998)
Managing Spawned Virtual Networks Andrew T. Campbell1, John Vicente2 , and Daniel A. Villela1 1
Center for Telecommunications Research, Columbia University 2 Intel Corporation
Abstract. The creation, deployment and management of network architecture is manual, ad hoc and slow to evolve to meet new service requirements resulting in costly and inflexible deployment cycles. In the Genesis Project (
[email protected]), Columbia University, we envision a different paradigm where new network architectures are dynamically created and deployed in an automated fashion based on the notion of "spawning networks", a new class of open programmable networks. Spawning networks support a virtual network operating system called the Genesis Kernel that is capable of profiling, spawning, architecting and managing distinct virtual network architectures on-thefly. In this paper, we describe a kernel plug-in module called "virtuosity" for the management of multiple spawned virtual networks. Virtuosity exerts control and manages multiple spawned virtual network architectures by dynamically influencing the behavior of a set of resource controllers operating over management-level timescales.
1
Introduction
The rapidly evolving nature of the application base, service demands and underlying network technology presents a significant challenge to the deployment of new network architectures. This challenge calls for new approaches to the way we design, develop, deploy and analyze next-generation network architecture in response to future needs and requirements. Currently, the creation and deployment of network architecture is manual, time consuming and a costly process. To the network architect the creation process is typically ad-hoc in nature, based on hand crafting small-scale prototypes that evolve toward wide scale deployment. We envision [8] a different paradigm where a communication middleware platform is capable of profiling, spawning, architecting and managing distinct virtual network architecture on-the-fly. We call our vision Genesis and summarize here the Genesis Kernel, a virtual network operating system. We believe that the design, creation and deployment of new network architectures should be automated and built on a foundation of spawning networks, a new class of open programmable networks. Spawning networks represent a new approach to the field of programmable networking where the network environment is capable of 1
Daniel Villela is a CNPq-Brazil Scholar 2 John Vicente is a Visiting Researcher at Columbia University
Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 249-261, 1999. Springer-Verlag Berlin Heidelberg 1999
250
Andrew T. Campbell et al.
dynamically creating new network architectures on-the-fly. The Genesis virtual network kernel represents a next-generation approach to the development of programmable networks building on our earlier work on open programmable broadband [15,16,7] and mobile networks [25]. The Genesis Kernel has the capability to spawn child network architectures that can support alternative architectures in comparison to their parent network architectures. We call a virtual network installed on top of a set of network resources a parent network. The parent virtual network kernel has the capability of creating “child networks”. A child network operates in isolation on a subset of its underlying “parent network” resources and topology, supporting the controlled access to a set of users with specific connectivity, security, QOS and isolation requirements. At the lowest level of the Genesis Kernel architecture [8], a transport environment delivers packets from source to destination end-systems through a set of open programmable virtual router nodes called routelets. A virtual network is characterized by a set of routelets interconnected by a set of virtual links, where a set of routelets and virtual links collectively forms a virtual network topology. Each virtual network kernel can create a distinct programming environment that supports routelet programming and enables the interaction between distributed objects that characterize the spawned network architecture. The programming environment comprises a metabus3 that partitions the distributed object space supporting communications between objects associated with the same spawned virtual network. Each virtual network has its own metabus. A binding interface base [1] supports a set of open programmable interfaces on top of the metabus, which provide open access to a set of distributed routelets and virtual links that constitute a virtual network architecture. The metabus and binding interface base also support a set of life cycle services, enabling the profiling, spawning and management of child virtual networks. For full details on the Genesis Kernel see [8]. Within Genesis, resource management of spawned virtual networks is handled by virtuosity [9], a Genesis Kernel plug-in. The virtuosity architectural model (see [9] for complete architectural details) comprises a number of distributed elements. These elements are instantiated as part of the child virtual network kernel during the spawning phase [8] and are deployed as a set of distributed plug-in objects. Virtuosity leverages the benefits of the kernel hierarchical model of inheritance and nesting delivering scalable virtual network resource management. The Genesis virtual network resource management system is governed by four basic design goals that include slow time-scale dynamic provisioning, capacity classes, which provide general purpose ‘resource pipes’, inheritance and autonomous virtual network control. In this paper, we present the elements of the virtuosity system, a next-generation architecture for virtual network resource management. In Section 2, we present the maestro, a central controller responsible for managing the global resource policy within the virtual network. In Section 3, we introduce the auctioneer, which implements an economic auctioning model for resource allocation. An arbitrator, presented in Section 4, represents an abstract virtual network capacity 'scheduler'. We summarize and conclude in Section 5. 3
Metabus is a per-virtual network software bus for object interaction.
Managing Spawned Virtual Networks
2
251
Maestro - Distributed Virtual Network Control
At the core of the virtuosity resource management architecture is the maestro, a key controller that oversees the resources4 (i.e., virtual links that interconnect routelets) of the managed virtual network domain. Virtuosity, through maestro, manages and controls virtual network resources on a slow performance management [13] timescale that operates on the minutes / tens of minutes period. We argue that this is a suitable timescale for virtuosity to operate, while allowing virtual networks to perform dynamic provisioning, as needed. Maestro coordinates virtual network control through distributed virtuosity components performing virtual network monitoring, economic-based resource allocation and capacity-based scheduling all of which operate or exert control on management-level timescales. Using fundamentals of distributed management design the maestro manages global resource policy within a virtual network and its (parent) allocated virtual network resources. In a fully distributed manner, the maestro maintains global state of its virtual network. Maestro uses dynamic provisioning of virtual network resource capacity to meet the changing needs of its child networks (captured in child pervirtual network policy) and to react to changes in its global state. That is, a maestro may need to respond to dynamic changes in its own virtual network (e.g. changes in the resource needs of its local clients/users) and child network resource needs, as well as adjusting to changes imposed on it by the underlying parent network resource availability. In addition, maestro coordinates and influences child network behavior through the integration of monitoring-based feedback and economic factors which are being driven by subscriber service demands and cost potential. Maestro establishes resource policies, coordinates policy distribution and enforces policy through capacity scheduling and policing. A delegate, acting as a proxy agent, serves maestro by promoting decentralized coordination and localized communications. Delegates handle all local resource interactions and control mechanisms on the virtual network domain-specific routelets by interfacing with the other virtuosity elements supporting resource allocation and virtual network scheduling. Maestro interacts with its child networks to promote the efficient use of its global resources while ensuring that the resource needs of its child networks and its own virtual network users are being met. The maestro can influence the way in which resources are allocated to its child networks by setting optimal market pricing [20] and resource allocation strategies, e.g., under provisioning its own virtual link resources but overbooking resources to child networks to maximize revenue for the controlled capacity traffic.
4
Although we restrict the virtual network resource to link bandwidths, we feel that the virtuosity model can be easily extended to support other router resources or by partitioning router resources proportionally based on virtual network link aggregate demands.
252
Andrew T. Campbell et al.
2.1 Maestro Design During the spawning phase of a child network the maestro conducts a virtual network admission control test based on the resources requested by the child network topology. If the test is positive, then the parent provider network admits the child network and allows it to become a participant in the auctioning process controlled by the auctioneer and governed by the child virtual network's policy. Admission is coordinated by the parent maestro using its virtual network hierarchy tree. The parent maestro receives a ReqSpec for admission from a child virtual network and determines if sufficient resources are available within the context of its own available network resources to meet those new demands. If this is the case, it indicates that the parent has sufficient residual capacity in its own right to accommodate the child's needs. Virtuosity implements a measurement-based virtual network admission control test. Admission is based on evaluating the ReqSpec target capacity class resource provisions (viz. rate_quantity) against aggregate capacity class policies and aggregate resource usage. By monitoring the available capacity along all of its virtual links, the maestro determines if resources allocated along its virtual links are also underutilized. Based on this measurement state and capacity threshold violations it can allocate under utilized resources based on the capacity class, bandwidth and policy requested by the new virtual network. If capacity is available, the child network is immediately
auctioneer
(23) passLocalPolicy()
(18) passAuctionResults() (17b) updateSeller() (16b) invokeEvent()
arbitrator
child maestro CP
parent maestro CP delegate
policy cache
(2) requestAdmission() (1) eventSpec() (17c) admissionNotify() (17a) setSeller() (16a) promptEvent() (12,22) updatePolicy() (4a) getaggPolicy()
maestro CP
(14a) rankResources() (14b) resourceAvailability() (14c) rankCClass() (21) estPolicy()
analyzer
measure cache
(4b, 13a) getAggregateStats () (13b) getAggregateViolations()
(5) aggregate() (3) resourceAdmission() (6) notify()
admission controller
(8) requestAdmission() (7) eventSpec() (9) admissionNotify()
(10) checkParentAllocation() (11to parent) commitAllocation () (19) distributeAllocation() (20 from child)commitAllocation ()
(15a) optPrice() (15b) optquantity()
policy controller
Fig. 1. Maestro Object Model
optimizer
resource allocation controller
Managing Spawned Virtual Networks
253
admitted and the child network is allowed to participate in the auction process. In the case that the parent has insufficient resources to accommodate the new child network then it needs to renegotiate its provisioning needs with its own parent (and hence its provider) at the next level down its virtual network inheritance tree [8], [9].The provisioning request enters the parent auctioning process following a successful admission control sequence by traversing the hierarchy tree through several levels until a provider is found that can accommodate the requested demands. Through slow timescale resource allocation, the maestro invokes the auctioning process on a periodic or static deadline basis. This period is driven (again) by slow timescale consideration allowing the auctioning process to reach equilibrium and maintain constant services over longer timescales, e.g., tens of minutes. Resource pricing and quantity announcements to child networks are set such that the parent can achieve more effective utilization and revenue gain. The maestro uses two variables for resource auctioning. These are a price quote, Qij, and a rate quote, Rij, (where i = virtual resource; j = capacity class) which the delegate element relays to the seller object of the resource allocation system for appropriate auctioning. The auctioning process (which is discussed in the next section) requires a recursive, distributed algorithm and global consensus in order to reach steady state [20]; this constrains the static or dynamic invocation process to a lower periodic bound for recurring resource allocations. Upon reaching auctioning equilibrium [20], the maestro receives the results of the auctioning process, calculates local resource policies and stores the resource allocation policy results for child networks in the policy cache. Also depicted in Figure 1, we now illustrate the behavior of the admission control and the resource allocation process from the perspective of the maestro system and its objects (viz., maestro control point (CP), resource allocation controller, admission controller, optimizer, analyzer, policy controller, policy cache, measurement cache, and delegate) embedded between the maestro CP's of the child and parent virtuosity systems. The process begins with the child maestro CP submitting an eventSpec (1) notifying the maestro CP that the child is requesting admission or an extended resource capacity request. This is immediately followed by a requestAdmission() (2) of the associated provisioning specification. The maestro CP then invokes multiple resourceAdmissions() (3) methods from the admission controller for requested capacities on parent resources. The admission controller responds, notify() (6), after testing admission per resource against existing aggregate (4a) provisioning policies (i.e., child networks and local provisioning policy) and aggregate (4b) resource measurements (i.e., resource availability) to determine if the child requested increase or admission specifications exceeds the composite (5) provisioned policies and resource availability. The admission test is based on a rule-based policy that is per capacity class and per class resource usage in the determination of available capacity. The result (in this case) is admission failure along with the failure code and failure specification structure, specified per resources in list form. In turn, the maestro CP must then send an eventSpec (7) and requestAdmission (8) to the parent maestro CP to request additional capacity resources to extend its currently allocated provision, on behalf of the child's request. In this case, admission is successful upon completion of the parent's admission control, and the resource allocation (auctioning) procedures follows with notification through the parent maestro CP (9) with the admissible and allocated provisioning specification.
254
Andrew T. Campbell et al.
Prior to commitment (11) to the parent allocation, the allocation is checked (10) by a local resource allocation controller and stored (12) in the policy cache, if acceptable. The analyzer object is then invoked to assess the balance of global resource consumption (13a,b,c), across the virtual network resources (14a,b) and the capacity classes (14c). The analyzer results are used by the optimizer object to establish the optimal price (15a) and quantity (15b) values per capacity class per managed resource for appropriate auctioning. The auctioneer is event notified (16a,b) for next provisioning interval synchronization and relayed with the optimal per class per resource (17a, b) auctioning variables. At this point, the child maestro CP is then notified (17c) of successful admission and prepares itself for auctioning with the local auctioneer. It is anticipated that reaching auctioning equilibrium will occur on the order of several minutes or longer as the number of competing child subscribers, parent resources and capacity classes increase. Nevertheless, we argue that the extended auctioning period is well in line with the necessity to maintain network stability through management timescale control, furthermore, we believe that this trade-off is offset with the resource efficiency gains that are achievable with slow timescale dynamic provisioning. Upon reaching auctioning equilibrium, auctioning results (18) are submitted (through the maestro CP) to the resource allocation controller, which then proceeds to distribute (19) resource allocations to the child network auctioning 'buyers'. It coordinates with the child's resource allocation controller to gain final commitment (20) on the allocation. If unsuccessful, the resource allocation process may re-cycle through these same steps, until reaching firm commitments by all child network subscribers and their requested resources capacities. If successful, the policy controller object establishes local policies (21), and the maestro CP updates (22) the policy cache with child network provisions and local resource policies. Finally, the delegate object passes (23) the required capacity scheduling and policing policies to manage and enforce the child allocations.
3
An Auction-Based Resource Allocation System
We propose a virtual network resource allocation process based on supply and demand of virtual network services where competing child virtual networks, working on behalf of a community of users and through appropriate specification, request resources and pay for such services to a provider of virtual network services. There are inherent behaviors and objectives that dictate the economics, and more importantly, the effective allocation, partitioning, and utilization of such services. We argue that the provider (parent virtual network) and subscriber (child virtual network) behaviors, and correspondingly their objectives serve as fundamentals that can be leveraged for resource maximization through the influence of economic variables. Network providers seek to achieve resource efficiency through the effective utilization of link resources through effective price-based, load balancing and the addition of multiple virtual network subscribers. The competing nature that both the provider (parent) and subscriber (child) exhibit, we argue, should create the necessary dynamics that leads to a more aggressive environment for achieving resource efficiency.
Managing Spawned Virtual Networks
255
delegate From child maestro (10) passAuctionResults () (2) update()
(4b) setVector()
(1) eventNotify ()
auctioneerCP
buyer
auctioneer
(5) bid() (9) getAllocation()
(8) updateResults()
(3) updateSeller()
auctioning agent seller
bid list
(6) update() (7) allocation()
(4a) ask()
Fig. 2. Auctioneer Object Model
In this paper we consider a strategy known as Progressive Second Price (PSP) [20] that aims to provide high resource efficiency (e.g., cost, utilization) via a competitive, market-driven auctioning process. Auctioning occurs between a set of buyers (child virtual networks) and a seller (parent virtual network). Within a competitive bidding process a successful allocation conclusion for a particular buyer may not be attainable if the buyer is unwilling to pay the market value for the resource capacity or does not offer provisioning alternatives. The auctioning process is designed to follow a bidding procedure, allowing, for example the auctioning of best-effort classes would generally follow the more stringent capacity classes. In our extended auctioning model we introduce two key contract variables: contract_duration and contract_maxcost. These variables represent important provisioning options which allow child subscribers to make long-standing contracts with the parent provider; in this sense, these variables represent a way for a child to avoid the normal open-market competition of the auctioneer.
3.1 Auctioneer Design The auctioneer object architecture is illustrated in Figure 2. It comprises several objects: auctioning agent, seller, buyer, bid list, and auctioneerCP (auctioneer control point). The auctioneer is present at each routelet auctioning resources to a number of child network subscribers for its virtual link resources. The auctioneerCP object acts as a proxy that exchanges necessary information with the maestro (through the delegate) to receive updated parameters for interval-based auctioning. The seller object represents the provider in the auctioning system. Its task is to specify the quantity of the resource that is available for auctioning and the price for the resource. The seller will announce what is the current market price and availability for individual resources and capacity classes via the ask() method. These variables are
256
Andrew T. Campbell et al.
optimally determined based on what drives the provider market towards the desired revenue objective as well as resource gain efficiency, e.g., preferring controlled capacity to constant capacity. On the other hand, the buyer object plays the subscriber or child network role. Several buyers are allowed to be present within an auctioneer, bidding on behalf of the virtual network they represent. Buyer instantiation is a result of admission control; created and enabled when the buyer object participates in the resource allocation process. Once the buyer enters the auctioning process, it requests of the auctioning agent a position in the bid list that contains updated information (bids, allocation, quantities) about each buyer participating in the ongoing process. The buyer can bid for a resource by specifying its required quantity and the price expected to pay (upper bound) for individual capacity classes. The buyer object seeks to minimize cost and maximize rate quantity for each capacity class. The auctioning agent maintains the bid list object and stores updated information (current state) about the auctioning process. By accessing a bid list, the buyer can retrieve its current allocation and find whether it has been granted better conditions throughout the auctioning process. When equilibrium has been reached, the auctioning agent updates the results from the process to the auctioneerCP for forwarding to the maestro. By definition, equilibrium is achieved when all buyers cannot improve their allocations. This can be implemented using a timeout condition to signal that no more changes will occur. Once the buyer reaches a position that it cannot improve, it will cease bidding or notify the child virtual network maestro to seek a provisioning alternative and renegotiate for the resource. The process is considered to be in equilibrium when all buyers are satisfied. The dynamics of the auctioning process is illustrated in step-wise form in Figure 2. A delegate, operating on behalf of maestro, notifies the auctioneerCP that an event is taking place (1). It then updates the optimal price (Pij) and rate quantity (Rij) as provided by the maestro for appropriate auctioning (2), refreshing the seller state (3). The seller object then announces (4a) the available resource quantity and associated pricing for buyer bidding. In parallel, buyers will receive (4b) from children's virtual network maestros the desired strategy (allocation and cost) for their bidding (5). The auctioning agent mediates the auctioning process between buyers and sellers seeking successful auctioning equilibrium and optimal resource allocations. It maintains a bid list via the update() method (6) for allocations resulting from buyers' bidding, according to their resource valuations and recent bidding. The allocations can be retrieved at any time (7) by the auctioning agent to keep buyers informed. The auctioning agent then updates the results (new allocations and costs) at the AuctioneerCP (8). Information about allocations is also available to buyers through requests (getAllocation() method) (9). Delegates will then receive agreed price and rate allocations (10) and will pass the information to the maestro for child networks' policies according to their provisioned share of the parent resource. The maestro seeks closure by communicating the allocations (or denials) to child virtual network maestros through its delegates for final consideration. If agreed allocations are not satisfactory for any subscriber child network, the child maestro may invoke the auction process (again) with alternate specifications, and the previous child
Managing Spawned Virtual Networks
257
allocations and policies are voided by the parent maestro. Further details about the PSP auctioning model can be found in [20].
4
Virtual Network Capacity Scheduling
With the arbitrator, we introduce the notion of virtual link capacity-based scheduling driven by a parent-child hierarchy of virtual network provisioning policies. The arbitrator receives virtual network policy from the maestro over a slow timescale provisioning interval upon completion of the resource allocation process. The virtual link arbitrator manages the access and control of the parent link packet scheduler based on policy-driven virtual network capacity. Leveraging similar ideas of flow quality of service semantics [5], (e.g., deterministic, statistical and best effort) we abstract the QOS class differentiation concepts and apply it to capacity-based 'provisioning' classes. The intent here is to provide more provisioning flexibility and control to the child virtual network for maximizing resource efficiency and QOS control and reduce the burden on the parent to manage the child domain and maintenance of low-level QOS service level agreements (SLA). Rather, moving the provider (parent) and subscriber (child) service models to be based more on provisioning SLAs with flexible capacity classes. Therefore, virtual network SLA maintenance is kept strictly on a virtual link capacity basis while parent policing and regulation treatment can be removed from the delaysensitive models (to be managed autonomously by the child network services) and focused more on virtual link bandwidth sharing.
4.1 Arbitrator Design The capacity arbitrator is based on a set of virtual network capacity classes and class weight policies that are distributed to the arbitrator component by the delegate on behalf of the maestro. Virtual network classes represent differentiated policy for provisioning capacity. Class weights are calculated (by the maestro) based on the following parameters: rate_allocation, percentage, price variables specified in the AllocSpec. The capacity classes and weights are translations of the negotiated resource allocations on individual child virtual resources during the auctioning process and are used as the virtual link scheduling policies. Capacity classes and weights are used by the arbitrator to differentiate child virtual network allocations and the ordering of packet delivery to the parent link resource. During the spawning process, each virtual network is assigned a unique virtual network identifier to distinguish its traffic from other child network traffic. A capacity class identifier function is introduced into the arbitrator architecture prior to the child's link scheduler function to recognize QOS behavior treatment (e.g., best-effort, controlled load, expedited forwarding, etc.) associated with the child specific QOS architecture. This function interworks with a switch vector to assign each packet a stamp that associates it with a particular capacity class prior to its arrival at the routelet port packet link scheduler as illustrated in Figure 3. We refer to this procedure as capacity class mapping. Policy mappings are formed during the provisioning process and distributed during the resource allocation process to the child virtual networks. If, for example, a customer or child network supports only
258
Andrew T. Campbell et al.
best-effort IP traffic classes within a spawned child network and provisions for constant capacity, the switch vector would stamp all traffic with a constant capacity classification. On the other hand, if the customer supports Integrated Services classes (viz. guaranteed delay, controlled load and best-effort) within a spawned child network and provisions for all three capacity classes, then the class identifier would stamp the traffic with corresponding capacity classifications, by default. QOS class and capacity class mappings are considered part of the provisioning policy for child networks and stored within the policy cache of the supporting the maestro. virtual network provisioning policy
supporting arbitrator QoS mapping elements classes
thresholds
constant capacity
capacity scheduler
capacity class ID
monitor
capacity classifier
controlled capacity
to physical link or (grand) parent port
best-effort capacity
parent arbitrator
outgoing parent local traffic
capacity class ID
outgoing virtual network C traffic
:
capacity class ID
outgoing virtual network B traffic
capacity class ID
capacity class switch vector ..
outgoing virtual network A traffic
weights
parent link scheduler child link schedulers
Fig. 3. Arbitrator
A capacity classifier is used to identify virtual networks and their capacity classes. The classifier queues incoming stamped packets (from the output of routelet port link schedulers) to the appropriate capacity queue structures (viz. constant, controlled, and best-effort). Individual capacity queues are created for each child virtual network within an allocated capacity queue structure. Each virtual network queue is then assigned an appropriate weight, based on the policy previously negotiated and distributed by the maestro. Within the provisioned interval, the arbitrator manages scheduling of virtual network control based on the capacity class priority and weights, allowing child networks (and local user traffic) to queue available packets to the parents output port. The capacity arbitrator leverages space (i.e., available resource bandwidth), time (i.e., provisioning interval length) and capacity-class abstractions to
Managing Spawned Virtual Networks
259
manage scheduling of its own user traffic and packets from child virtual networks onto the parent virtual link. The capacity scheduler services the capacity queues in priority order and weighted round-robin for same capacity class queues. As illustrated in Figure 3, child network traffic is scheduled by the child's packet link scheduler associated with the routelet output port and similarly, the parent network routelet port. The introduction of the virtuosity arbitrator into the output port architecture merges both child and parent QOS scheduled traffic and provides coarse capacity scheduling of the composite traffic based on the allocated provisioning policies. It is important to note that the illustration represents the default virtual network resource management implementation, but is not the only parent option for managing child network traffic. Alternatively, the parent may override5 the arbitrator function and integrate child network traffic through its local routelet port link scheduler. Also illustrated in Figure 3, the monitor is central to the arbitrator, performing monitoring and policing on individual parent resources. Policing assures that child virtual networks are not consuming parent virtual networks resources above and beyond their allocation of the virtual link capacity. Policing actions (e.g., dropping, tagging or degrading to best-effort capacity class) is driven by virtual network policy.
5
Conclusion
We are implementing "spawning networks", a new class of open programmable networks. The Genesis Kernel lies at the heart of spawning networks capable of profiling, spawning, architecting and managing distinct virtual network architecture on-demand. In this paper, we have described a kernel-level plug-in module called "virtuosity" for the management of multiple spawned virtual networks. The virtuosity framework comprises a maestro, which performs distributed virtual network control; an auctioneer, which leverages economic models based on auctioning to perform resource allocation; and finally, an arbitrator which performs policy-based virtual network capacity scheduling.
Acknowledgement This work is supported in part by the National Science Foundation (NSF) under CAREER Award ANI-9876299 and with support from COMET Group industrial sponsors. In particular, we would like to thank the Intel Corporation, Hitachi Limited and Nortel Networks for supporting the Genesis Project. John B. Vicente (Intel Corp) would like to thank the Intel Research Council for their support during his visit with the Center for Telecommunications Research, Columbia University. Daniel A. Villela would like to thank the National Council for Scientific and Technological Development (CNPq-Brazil) for sponsoring his scholarship at Columbia University (ref. 200168/98-3). 5
This also suggests that virtuosity, or at least key components of virtuosity are not required and may be substituted. The architectural selection and realization is based on the parent resource management policy set during the profiling phase and programmatically composed during the spawning phase of the life cycle process.
260
Andrew T. Campbell et al.
References [1] Adam, C. .M., et al., "The Binding Interface Base Specification Revision 2.0", OPENSIG Workshop on Open Signalling for ATM, Internet and Mobile Networks, Cambridge, UK, April 1997. [2] Biswas, J., et al., "Application Programming Interfaces for Networks", IEEE P1520 Working Group Draft White Paper, www.ieee-pin.org [3] Blake, S., et al. “A Framework for Differentiated Services”, draft-ietf-diffservframework-01.txt. [4] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., and Weiss W., “An architecture for differentiated services”, draft-ietf-diffserv-arch-02.txt, October 1998. [5] Campbell, A.T., Coulson, G., and D. Hutchison, "A Quality of Service Architecture" , ACM SIGCOMM Computer Communication Review, Vol. 24 No. 2., pg. 6-27, April 1994. [6] Session on “Enabling Virtual Networking”, Organizer and Chair: Andrew T. Campbell, OPENSIG '98 Workshop on Open Signaling for ATM, Internet and Mobile Networks, Toronto, October 5-6 1998. [7] Campbell, A.T., De Meer, H., Kounavis, M.E., Miki, K., Vicente, J., and Villela, D. A., "A Review of Programmable Networks", ACM Computer Communications Review, April 1999. [8] Campbell, A.T., De Meer, H., Kounavis, M.E., Miki, K., Vicente, J., and Villela, D. A., "The Genesis Kernel: A Virtual Network Operating System for Spawning Network Architectures", IEEE 2nd International Conference on Open Architectures and Network Programmability (OPENARCH'99), New York, October 1998, pp. 115127. [9] Campbell, A. T., Vicente, J., and Villela, D. A., "Virtuosity: Performing Virtual Network Management", International Workshop of Quality of Service (IWQoS), London, June 1999. [10] DARPA Active Network Program, http://www.darpa.mil/ito/research/anets /projects.html, 1996. [11] Duffield N., et al., “A Performance Oriented Service Interface for Virtual Private Networks”, draft-duffield-vpn-QOS-framework-00.txt. Work in progress. [12] The Genesis Project: Programmable Virtual Networking, http://comet.columbia.edu/genesis, 1998. [13] Keshav, S., and Sharma, R., “Achieving Quality of Service through Network Performance Management”, Proc. of NOSSDAV’98, Cambridge, July 1998. [14] "The Integration of Real-Time Control with Management in Broadband Networks'', Proceedings of the Workshop on Broadband Communications, Estoril, Portugal, January 20-22, 1992, pp. 193-204. [15] Lazar,A.A., "Programming Telecommunication Networks", IEEE Network, vol.11, no.5, September/October 1997. [16] Lazar, A.A. and A.T Campbell, "Spawning Network Architecture", White Paper, Center for Telecommunications Research, Columbia University, http://comet.columbia.edu/genesis, Janurary 1998. [17] Van der Merwe, J. E. and Leslie, I. M., "Switchlets and Dynamic Virtual ATM Networks", Proc Integrated Network Management V, May 1997.
Managing Spawned Virtual Networks
261
[18] Van der Merwe,J.E., Rooney,S., Leslie,I.M. and Crosby,S.A., "The Tempest - A Practical Framework for Network Programmability", IEEE Network, November 1997. [19] Multiservice Switching Forum (MSF), http://www.msforum.org/ [20] Semret, N., and Lazar, A. A., “Design, Analysis and Simulation of the Progressive Second Price Auction for Network Bandwidth Sharing”, Technical Report CU/CTR/TR 487-98-21 [21] OPENSIG Working Group http://comet.columbia.edu/opensig/ [22] Rajan, R., Martin, J. C., Kamat, S., See, M., Chaudhury, R., Verma, D., Powers, G., Yavatkar, R., “Schema for Differentiated Services and Integrated Services in Networks”, draft-rajan-policy-QOSschema-00.txt, October 1998. Work in progress. [23] Rooney, S., Van der Merwe, J. E., Crosby, S. A., Leslie, I. M., “The Tempest: A Framework for Safe, Resource-Assured, Programmable Networks”, IEEE Communications Magazine, October 1998, pp 42-53. [24] Touch,J. and Hotz,S., "The X-Bone", Third Global Internet Mini-Conference in conjunction with Globecom '98 Sydney, Australia, November 1998. [25] Valko, A. G., Campbell, A. T., Gomez, J., "Cellular IP", INTERNET-DRAFT, draft-valko-cellularip-00.txt
Active Organisations for Routing Steven Willmott and Boi Faltings Laboratoire d’Intelligence Artificielle, Department Informatique Swiss Federal Institute of Technology, IN (Ecublens) CH-1015 Lausanne, Switzerland {willmott,faltings}@lia.di.epfl.ch
Abstract. Communications networks require increasingly complex resource management to stay up and running. This is particularly true in networks which aim to provide some guaranteed quality of service (either explicitly as in ATM and other connection-oriented architectures or implicitly as in a smoothly running IP network). The resulting increased complexity of routing procedures needs to be handled in a coordinated and flexible manner. This control could well be provided by customisable control programs in the network which rely on the computational capabilities provided by active nodes. Not only will control programs need to act independently and autonomously but they will also need to coordinate their actions with each other to ensure that decisions in the network are taken in a coordinated and consistent manner. This paper presents a framework for organising groups of control programs for routing tasks in a network. The organisation is able to adapt its own structure over time as the state of the network changes. Keywords: Routing, organisation, coordination, distributed artificial intelligence.
1
Introduction
Resource management and routing are network management problems which require careful control in today’s communications networks. Despite numerous predictions of a bandwidth glut ([11] among others), bandwidth use still needs to be carefully managed. The increased volumes of data flowing across modern networks mean that mismanagement can very quickly result in bottlenecks and potentially catastrophic cell loss. The problems are particularly acute for networks which aim to provide any kind of quality guarantees: 1. Connection-oriented network architectures such as TDM, SDH, SONET and ATM aim to guarantee Quality of Service (QoS) on a connection by connection basis. Making route calculations involves taking into account large amounts of link state information. Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 262–273, 1999. c Springer-Verlag Berlin Heidelberg 1999
Active Organisations for Routing
263
2. In packet-based networks such as IP, routing is based on shortest path algorithms and there is typically a single main route available given for each source-destination pair (the estimated shortest). One of the principle aims of packet network operators is to keep the call rejection rate low whilst ensuring that accepted customers experience high levels of service. The advent of flow identification, as proposed for IPv6 [RFC2460] and per flow or per application routing (which may be possible with active network technology) would enable far more flexible resource allocation. Good allocation strategies can in both cases dramatically reduce the amount of over-capacity required to ensure smooth running. This paper discusses the use of customisable control programs running in active network nodes to control network routing. To accomplish this control programs need not only local control and information but also to coordinate with each other throughout the network. We present notions of organisation drawn from work in Distributed Artificial Intelligence (DAI) and Management Science as an approach to this problem. The main focus of this paper is on bandwidth adaptive organisations which change structure to match network resource availability.
2
Organised Routing?
Active networks provide the means to insert (possibly arbitrary) control programs and decision logic into individual network nodes. This supports the key aim behind much of today’s active networks research: enabling custom user control programs to be added into the network on a user by user or application by application basis. Arguably, for many applications there will also be an additional need for broader control to ensure that these control programs deliver a coherent final result across the network. Additionally the computations carried out at individual nodes will require (possibly non-local) network state and policy information for execution. Routing and resource allocation is perhaps chief amongst applications requiring wider coordination and state information. Routing algorithms often need to display non-local characteristics, such as prevention of routing loops and avoidance of bottlenecks. It directly follows that resource allocation processes running in network nodes need to be coordinated in their actions and have a clearly defined way of accessing distributed network state information - they need to be organised.
3
Active Nodes
Active networks are an enabling technology for the deployment of more intelligent and flexible network management schemes [12]. In the context of network routing problems active nodes in a network need to provide the following facilities:
264
Steven Willmott and Boi Faltings
– A computational environment which executes control programs operating on the routing process (we assume this environment can execute arbitrary control code but there should be scope for restricting this). – Access for control programs to a restricted set of primitives controlling node and link resources (a virtual instruction set). – A mechanism for updating the control programs present in the computation environment. Ideally this mechanism would be of the programmable switch type [12], acting as a “back door” for uploading new control programs into nodes. These together provide for logical (or actual) mobility of control programs, information, routing policies and inter controller relationships between nodes in the network. The work presented here is being applied in two domains: ATM networks and IP packet networks. The following two subsections outline the types of active nodes required for each. 3.1
Active ATM Nodes
In the case of an ATM network, controllers (the routing processes) do not directly manipulate the packet flow since routing decisions are made on a connection by connection basis. Once routes have been chosen they are set up in the switching tables of intermediate ATM nodes. Controllers instead control the application of route decision policy and how this is coordinated with other switches, making decisions on a per-demand basis. “Active” ATM nodes for our purposes therefore need to provide access to the primitives which control route selection for connections and to signalling processes used for connection set-up. In ATM networks there seems to be less scope for the active network “capsule” approach to passing code to nodes in packet headers. In general, parameters and settings for a whole flow are declared at connection set-up time - leaving much less flexibility for actions to be applied to individual packets.1 3.2
Active Packet Nodes
In packet networks, routing algorithms have direct influence at the packet forwarding level. In fact a strong branch of Active Networks research (advocated in [13] for example) is based upon the idea that the packet is the fundamental unit for control and to control. It is, however, difficult to see how effective routing and resource control could be achieved at the individual packet level (although for other network management functions this may well be the best level for control). Abstraction away from individual packets by aggregation of traffic into flows, groups of flows or other groups is often seen as essential for useful resource planning (see, for example, the efforts to provide flow identification in IPv6 [RFC2460]). Routing 1
Active packet headers might however be used profitably for making VBR and ABR services more controllable.
Active Organisations for Routing
265
tables in IP networks route packets in real time but stay relatively static, these tables are then updated by routing protocols such as RIP [RFC1058] and OSPF [RFC1131] which run in the background. IP routing tables therefore correspond to packet aggregation based on destination address. In active networks there may be several useful criteria for aggregation in the real time routing mechanism (rather than simply by destination) such as by source and destination, application or packet priority. An active packet node in the packet network for our purposes in the routing context would need to provide: – Access to the on-line routing mechanism principally so that it can be updated by routing mechanism operating in the background (e.g. not on a packet by packet basis). – A mechanism for flow identification. – Per-flow routing capabilities. The last two points allow much finer control or routing and are now often seen as essential for good resource management and QoS support in packet networks. Active networks is an important enabling technology for these two properties (and, of course, the first).
4
Building Organisations
There is a large body of work in both Management Science and DAI on how to apply organisational theory to distributed computational systems. [5] gives an AI perspective, [3] and [9] give interesting Management Science viewpoints and a collection of papers covering market based systems can be found in [2]. This work is complemented by the general trend in the network research community towards decentralisation and the use of hierarchies [10] and delegation [6]. The PNNI framework under development by the ATM Forum [1] is perhaps the most advanced and best known use of organisations in network architecture to date. A useful organisation needs to provide the following: – Information Organisation: Dividing up and providing access to information. For routing, this corresponds to representing the information required for making routing decisions (the link states and topology etc.). – Control Organisation: Ensuring distributed control decisions lead to coherent actions being carried out throughout the network. In routing, this corresponds to: 1) de-limiting where routing decisions are taken (hop by hop? at the source?) and 2) avoiding bottlenecks, congestion, oscillations, etc. and making sure the correct reservations are made. Organisations are made up of two types of component: units and relations. Units represent, for example; company departments, employees, sites or (here) areas of a network. Relations describe the relationships between units (such as superiority, parent, child, peer etc.) and define the organisational structure. The
266
Steven Willmott and Boi Faltings
examples given in this paper are all spatially distributed hierarchies. However, much of the discussion is relevant to other types of distribution (functional, by authority), and organisation ([3] for example discusses heterarchies, hierarchies and markets).
5
Static Organisations
In a static organisation the composition of units and existing relations between them remain fixed over time. The following two sections describe a static routing organisation in terms of controllers (units) and structure (formed by relations). 5.1
Control Programs
Control programs represent organisational units and perform management tasks in local areas of the network. For the routing task, control programs require the following types of information: – Routing policies and algorithms. – Continuously updated local information about the network state. This forms the basic input for solving routing tasks. – Responsibilities to other controllers elsewhere in the network (for example their superiors). This corresponds to control programs knowing their place in the organisation. Both the local information and the algorithms/policies may change over time to reflect the changing network state and management control. Controllers have knowledge of a limited area of the network and information about what lies outside this area is obtained via the organisational links. Control programs in a static organisation perform functions at two levels: – Local: Executing local routing tasks given demands for routes and their local state information. – Organisational: Cooperating with other controllers in the organisation to execute non-local routing tasks. These may be tasks arising in the controller’s own area for which non-local information or control is needed, or tasks arising elsewhere (which need information or action in control program’s own sphere of influence). 5.2
Control Structure
Relations are applied to compose many local control programs into an organisational structure which spans the whole network. Figure 1 shows a hierarchical structure with three levels of organisational units. The lowest level controllers (one level above the individual network nodes) each have a local viewpoint and hold information about the network state in that area. Actions are coordinated in accordance with the relations between the units,
Active Organisations for Routing
267
C Nodes Clustered at two levels
B A F
D E
G
K H
M
L
I
Hierarchy
J N
P
C1
O B1 A
B2
B3
B4
G B F E C D K
B5 L H
B6 I
M N O
B7 J
P
Fig. 1. The network nodes on the left are clustered at two levels,
e.g.: peer to peer: B2 communicates directly with B4 to find out about connectivity to node L, or hierarchically: controller C1 mediates between controllers in level B (all the Bx) to perform routing tasks. This control structure the stays fixed over time (although it may be updated for the physical addition/removal of nodes for example).
6
Adaptive Organisations
There are many different organisational structures (even if restricted to hierarchies) and no single organisation is appropriate for all tasks. There are clear arguments for allowing organisations to adapt over time, this is particularly the case when the environment they operate in is dynamic or there may need to be ad-hoc re-organisation (due to failures for example). There has been some preliminary work on adaptive organisations in the Distributed Artificial Intelligence (DAI), literature with [8] and [7] the most useful examples2 . The key requirement behind adaptive organisations is that the controllers in the organisation have some representation of their place in the organisation. The controllers can then apply a set of adaption rules to decide when to change this representation (and inform other controllers of the changes). The following sections present an organisation which adapts to bandwidth availability over time to illustrate the idea and utility of adaptive organisations. 6.1
Resource Summarisation at Different Levels of Abstraction
[4] introduces a clustering scheme for structuring network graphs. The network is divided up into equivalence classes according to connectivity at a specified available bandwidth. The regions created are called Blocking Islands3 . Figure 2 2 3
We refer the reader to [15] for further references. Please note that the clustering techniques and their applications are subject to patent protection.
268
Steven Willmott and Boi Faltings
gives an idea of the structures created by this clustering approach. A single network graph including nodes A to M and connected via links of varying residual capacities is clustered into regions at 6Mbits/sec and 9Mbits/sec.
Fig. 2. Two layers of blocking islands cluster a network at different levels of abstraction. The light grey regions (BI-1(9) - BI-6(9)) cluster equivalence classes of nodes reachable at 9Mbits/sec. The larger regions (BI-7(6) - BI-9(6)) cluster groups of nodes which can be reached at 6Mbits/sec. Dashed lines represent communication links with less than 6Mbits/sec free capacity, solid lines represent links with 6Mbits/sec or more free.
The sets of blocking islands generated for one bandwidth requirement are unique, identify bottlenecks (the inter-regional links) and highlight the existence and location of routes at a given bandwidth level. If two nodes are clustered in the same blocking island at a given bandwidth level there must exist at least one route between them - furthermore all links which form part of the path lie inside this blocking island. Applying this clustering technique several times for different bandwidth requirements represents bandwidth bounded connectivity in the network at different levels of abstraction. Changes in the available bandwidth on the links can cause splitting or merging of regions. 6.2
A Bandwidth Adaptive Routing Organisation
The hierarchy generated by the resource summarisation in Section 6.1 can be used to build an organisation. Control programs in the network nodes gather and hold information for each of the regions (blocking islands above). The simplest mapping is to designate one piece of control code responsible for each region both in the abstract and ground space (node-level). Control programs retain links to their neighbouring (peer) and parent/child regions (or at least to the controllers of these regions). Through the clustering scheme in the section above,
Active Organisations for Routing
269
this structure is then related directly to the bandwidth available and changes over time as resources are allocated and de-allocated. This organisation is applied to performing routing tasks in the network. More specifically, we are currently applying this to allocating CBR demands in ATM networks based on a source routing model.4 Controllers at the lowest level of abstraction perform routing tasks on real network nodes whilst controllers at higher levels coordinate the efforts of their subordinates (at lower levels of abstraction). The useful properties of the clustering scheme identified in [4] apply to any convex metric. However, bandwidth appears to be the most useful of these, primarily because many other QoS parameters (such as delay, jitter etc.) depend heavily upon available bandwidth [14]. Having the organisation adapt to the available bandwidth means that bandwidth information for routing decision making is already implicit in the information structure (before any routing algorithm has even been executed). 6.3
Control Programs
Control programs in an adaptive organisation are generalised versions of those used in static organisations (see Section 5). Controllers now require an additional meta-level of information: – Metrics for evaluating the need to change the organisation structure. These metrics form the update rules of the organisation. In a bandwidth adaptive organisation these metrics and update rules are based upon bandwidth availability in the network (and in this work arising out of the clustering techniques described in Section 6.1). As a result of this additional meta-level control, programs may now act at three levels: – Local: Executing local routing tasks given demands for routes and their local state information. – Organisational: Cooperating with other controllers in the organisation to execute non-local routing tasks. – Meta Organisational: In updating the organisation structure by applying the update rules. It is important to note that the update rules for this new meta-level must be embedded within the individual control programs themselves. Updates of the organisational structure cannot realistically be controlled by external processes (since network failures could cause all local adaption to stop and in large networks external adaption is a complex problem in itself if solved centrally). 4
Since this paper’s main aim is to discuss organisation issues, the routing schemes related to the organisational structures are not discussed here - these are described more precisely in [15].
270
6.4
Steven Willmott and Boi Faltings
Control Structure
Figure 3 shows two clusterings of controllers for a grid network. The first (left hand side) is the starting state with the controllers covering initially defined regions. For a static organisation this configuration persists and remains fixed over time. The right hand side shows an example adapted control structure for the same grid network. Nodes clustered at the top level (sharing the darkest regions, such as B and C) have high bandwidth connectivity available and regions only connected at low levels (such as A and B) are only reachable at low bandwidth.
A High
B
B
C
C
Medium
Bandwidth Available
A
Low 1. Starting state
2. Adapted state
Fig. 3. A grid network with a starting organisation as shown on the left. Over time the structure changes for a bandwidth adaptive organisation (as shown on the right). For a static organisation the initial state would be preserved. Finding a route between neighbouring nodes B and C is quick in the adaptive organisation since both nodes lie in the same local area of control. The same task takes longer in the static organisation because the two nodes happen to be clustered only at the highest level of abstraction. This difference reflects the representation of the ready availability of resources between nodes B and C in the adaptive organisation, which is something the static organisation does not capture. The situation is reversed for communications between nodes A and B and the adaptive organisation clusters these at the top level of the hierarchy whereas the static organisation can make a decision at the most local level. The extra effort required in the adaptive organisation reflects the fact that A and C are connected only by paths which are resource critical which may mean that they should be dealt with by an entity with a broader view of the network. Routing traffic on a critical link may have wider consequences for the rest of the network (for instance it may unnecessarily disconnect two regions of the network).
7
Active Organisations for Routing
An organisation in an active network forms part of the environment control programs on a given node execute in, it defines:
Active Organisations for Routing
271
– What information is available at execution time (information organisation), – What the wider context for the execution of the program is (control organisation). The relationships with other entities in the network may restrict the possible outcomes of computation, may determine, counteract or cause non-local effects. In our work on routing problem, network state information is managed within the organisation. Each controller holds local information and is able to query other controllers to obtain non-local information. The organisation also defines how a routing problem is solved: resource allocation decisions are finally taken by those controllers responsible for the lowest (link) level but where reservations are made is partly controlled from higher up in the hierarchy. The coordination structure provided by the hierarchy ensures that local resource reservations hang together to give complete routes and that load is evenly distributed to avoid congestion. The adaptive organisations described in the previous sections are also “active” in the sense that its control programs are able to represent the state of the organisation explicitly and update the organisation itself. Updates of code and information, as well as relationships etc., do not only come from network operators or users of individual applications but from within the network. Controllers on each node have some autonomy and influence over their own status in the organisation. Controllers may also have influence over controllers on other nodes (at lower levels of abstraction).
7.1
Importance of Active Network Developments
The requirements laid out in Section 3 clearly show the need for active network technology to support the work presented in this paper. Essentially, control programs need to be logically or actually mobile. As the organisational structure, network state and management policy (specific routing algorithms for example) change, the control programs in the network also need to adapt (or be replaced) dynamically. The adaptivity of the organisation requires considerable flexibility in the network nodes.
8
Status of Work
The work on adaptive organisations outlined in this paper is still in its preliminary stages. The node execution environments, control programs, communication mechanisms and adaption algorithms have recently been completed. What is still lacking is a generator for traffic scenarios and extensive testing. Current work is based on an ATM network model but under certain assumptions should also be applicable to packet-based networks (see Section 3).
272
9
Steven Willmott and Boi Faltings
Conclusions
Increasingly intelligent network management schemes, particularly for resource management, are vitally important for the smooth running of future networks. This not only true in connection-oriented networks, such as ATM, but also in packet-based networks where careful resource management is required to improve the ratio between potential load and available capacity (e.g. minimising the amount of over-capacity required). Coordinating the actions of on-line control programs throughout the network goes hand in hand with this need for better resource management, leading to interesting questions for active networks research: – How to facilitate this coordination? – How to prevent the potentially wide diversity of injected programs interacting catastrophically in the network? (even if none of them are violating security instructions). We introduce notions from the field of Distributed Artificial Intelligence on organisations and discuss the use of organisations for routing tasks. The paper contains two main threads of argument: 1. Organisations are important in ensuring that active networks behave coherently when executing many different user/system injected control programs. Programs executing at nodes require both information organisations (to be able to perform useful tasks) and control organisations (to ensure coherent behaviour). 2. Organisational structures in networks are heavily dependent upon active network techniques to provide flexible computation at network nodes and mechanisms for the dynamic update of control programs. This is particularly true of adaptive organisations. To help illustrate these points the paper also presents a bandwidth adaptive organisation scheme based on control programs which update themselves, their network information state and their organisation structure dynamically. The control programs coordinate with each other to ensure coherent execution of the resource management tasks.
Acknowledgements The authors would like to extend their thanks to the other partners in the SPPICC IMMuNe project (of which this work is part). Funding for IMMuNe from the Swiss National Science Foundation5 is also gratefully acknowledged. Thanks also go to Monique Calisti and Christian Frei for helpful comments on earlier drafts. 5
Project Number SPP-ICC 5003-45311.
Active Organisations for Routing
273
References 1. ATM-FORUM. P-NNI V1.0 - ATM Forum approved specification, af-pnni0055.000. ATM FORUM, 1996. 265 2. S. H. Clearwater. Market Based Control: A paradigm fo Distributed Resource Allocation. World Scientific, Singapore, 1996. 265 3. M. S. Fox. An Organisational View of Distributed Systems. IEEE Transactions on Systems, Man and Cybernetics, SMC-11(1):70–80, 1981. 265, 266 4. C. Frei and B. Faltings. A dynamic hierarchy of intelligent agents for network management. Workshop on Artificial Intelligence in Distributed Information Networks (held at IJCAI’97), 1997. 267, 269 5. L. Gasser. DAI Approaches to Coordination. In N. M. Avouris and L. Gasser, editors, Distributed Artificial Intelligence: Theory and Praxis, pages 31–51. Kluwer, 1992. 265 6. G. Goldszmidt and Y. Yemini. Distributed management by delegation. In Proceedings of the 15th International Conference on Distributed Computing Systems (ICDCS’95), pages 333–341, Los Alamitos, CA, USA, May30 June–2 1995. IEEE Computer Society Press. 265 7. F. Guichard and J. Ayel. Logical Reorganisation of DAI Systems. In Proceedings of the ECAI-94 Workshop on Agent Theories, Architectures and Languages (ATAL’94), pages 118–128. Springer Verlag (as Lecture Notes in Artificial Intelligence 890), August 1994. 267 8. T. Ishida, L. Gasser, and M. Yokoo. Organization Self-Design of Distributed Production Systems. IEEE Transactions on Konwledge and Data Engineering, 4(2):123–134, April 1992. 267 9. T. W. Malone. Modeling Coordination in Organisations and Markets. In A. H. Bond and L. Gasser, editors, Readings in Distributed Artificial Intelligence, pages 151–158. Morgan Kaufmann, 1988. 265 10. M. R. Siegl and G Trausmauth. Hierarchical Network Management: a Concept and its Prototype in SNMPv2. Computer Networks and ISDN Systems, 28(4):441–452, February 1996. 265 11. J. M. Smith. Programmable Networks: Selected Challenges in Computer Networking. IEEE Computer Magazine, 32(1):40–42, January 1999. 262 12. David L. Tennenhouse, Jonathan M. Smith, W. David Sincoskie, David J. Wetherall, and Gary J. Minden. A survey of active network research. IEEE Communications, 35(1):80–86, January 1997. 263, 264 13. David L. Tennenhouse and David J. Wetherall. Towards an active network architecture. Computer Communication Review, 26(2), April 1996. 264 14. Z. Wang and J. Crowcroft. Quality-of-Service Routing for Supporting Multimedia Applications. IEEE Journal on Selected Areas in Communications, 14(7), 1996. 269 15. S. N. Willmott, C. Frei, B. Faltings, and M. Calisti. Organisation and Coordination for On-line Routing in Communications Networks. In A. L. G. Hayzelden and J. Bingham, editors, Software Agents for Future Communication Systems. Springer Verlag, 1999. 267, 269
A Dynamic Interdomain Communication Path Setup in Active Network Jyh-haw Yeh, Randy Chow, and Richard Newman Dept. of Computer and Information Science and Engineering University of Florida Gainesville, FL 32611, USA {jhyeh,chow,nemo}@cise.ufl.edu
Abstract. An internetwork is composed of many administrative domains (ADs) with different administrative and security policies for protecting their own valuable resources. A network traffic flow between endto-end stub ADs through intermediate transit ADs must not violate any stub or transit domain policies. Packets may be dropped by routers that detect a policy violation. Therefore, it is necessary for a communication session to set up a communication path in which all constituent routers are willing to serve for the session so that data packets can be delivered safely without being discarded. Moreover, such a communication path cannot always guarantee successful packet delivery if the intermediate routers or links are prone to failure or congestion. This paper proposes a dynamic interdomain communication path setup protocol to address these issues. The protocol is dynamic in the sense that the path determination strategy is distributed and a path can be reconfigured to bypass a failed or congested router or link. These two dynamic features require the intermediate network nodes to perform some computation and to make some decisions. The implementation of the protocol relies on the computational capability of an active network in which active nodes in the network can provide computational capabilities in addition to traditional communication. Thus, the design of the protocol is based on the assumption of an active network architecture. The protocol will be a useful tool for all connection-oriented applications in active networks.
1
Introduction
An internetwork consists of many heterogeneous domains managed under different administrative authorities. For secure interdomain resource sharing, an administrative policy must be defined for each individual authority to specify eligible traffic flows between end-to-end domains and among transit domains. Each domain must have a mechanism to enforce its policy by either serving or dropping packets flowing through them so that valuable resources are not abused
The research is partially supported by NSA under contract number : MDA-904-98C-A892
Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 274–285, 1999. c Springer-Verlag Berlin Heidelberg 1999
A Dynamic Interdomain Communication Path Setup in Active Network
275
by unauthorized accesses. The enforcement mechanism requires authentication and authorization of each packet according to the local domain policy. In a firewall system [1], network traffic filtering is performed on a per-packet basis independently by firewalls in each domain. The authorization process must be performed for every packet in every firewall along the path from source to destination. It is time-consuming since each router needs to consult its local domain policy for authorizing each packet, especially when there are complicated domain policies and various types of requested service. However, a domain policy most likely will allow the same access privilege for all packets in a communication flow, and every packet in a communication flow normally has the same requested type of service. Because of these two properties, many proposed interdomain access control protocols [2,3,4] and policy routing protocols [5,6,7,8] change the authorization process from a per-packet basis to a per-flow basis by building a secure and authorized communication path before transmitting application data. The path setup in these protocols establishes a sequence of routers on route to the destination in which each router agrees to provide the services. For the purposes of authentication and data integrity, a secret session key is generated and distributed to each router in the path. After the path setup, all data packets flow through the same path to the destination. Each data packet carries a MAC (Message Authentication Code) signed by the session key. Instead of consulting with the local domain policy for authorization, each router verifies the MAC using the session key. If the verification succeeds, the service is granted to the packet. In this way, each router only performs the (expensive) authorization process once in the path setup for the entire communication session, and only performs an efficient MAC verification on each packet. IDPR (InterDomain Policy Routing) [5] is a typical policy routing protocol that entails establishment of a communication path using a static path determination strategy. Static path determination means that only the source router determines the path. Each router must maintain a consistent routing information database (RID), containing the connectivity of the internetwork and the associated domain policy of every node, for the computation of feasible paths. The large storage size requirement and the difficulty of maintaining consistency of RIDs among routers make this approach inefficient. This paper proposes a dynamic interdomain communication path setup protocol using a dynamic path determination strategy to eliminate the necessity of RIDs. In contrast to the static approach, the dynamic approach shifts the responsibility of path determination from one node to a set of nodes. Another dynamic feature of this protocol is that a path can be changed to bypass a failed or congested router/link while the connection remains active. Details of this protocol are described in Section 4. The dynamic nature of the proposed protocol requires additional computation and functionalities at the intermediate network nodes. This requirement can be achieved by using the emerging active network architecture [9,10,11,12] in which a network can be treated as a computing
276
Jyh-haw Yeh et al.
engine, as well as a communication network. Therefore, the proposed protocol is designed under the assumption of an active network architecture. Section 3 briefly describes the architecture of an active network.
2
Static Path Setup - Policy Routing
IDPR is a routing protocol designed for connection-oriented interdomain applications. It is composed of three primary protocols: 1. Policy Update Protocol : This handles reliable flooding of link state updates throughout the internetwork. 2. Path Setup Protocol : This installs and maintains routing information at intervening routers. 3. Packet Forwarding Protocol : This forwards data packets along a previously established path. In IDPR, each router has a RID for storing network topology of ADs and their associated transit policies. To provide a consistent view of the internetwork among all routers, the Policy Update Protocol maintains consistency among RIDs by using reliable flooding of link state updates throughout the internetwork. Another important component of IDPR, Path Computation, computes the routing path in accordance with source and transit domain policies. It is not a protocol in that each domain can implement its own version of Path Computation. Note that because of the consistency among all RIDs, the routing path computed by the source router is feasible since it most likely will be agreed on by all transit routers. Thus, an interdomain communication protocol can be implemented using underlying IDPR policy routing facilities to determine a feasible routing path. Having the ability to compute a feasible routing path in a source router, the path setup can confidently commence without the fear of rejection. In the Path Setup protocol, the source router sends out a Setup packet containing the computed path. For each transit router receiving this Setup packet, it checks its local transit policy to determine whether to accept or reject this path. In case of acceptance, the router creates an entry for this session in its Forwarding Information Database (FID). The purpose of this FID is for packet forwarding. An entry in FID consists of a path ID, previous and next router in the path, and possibly a session key. After an entry in the FID is built, the router forwards the Setup packet to the next router in the path, and the process is repeated for the subsequent routers. Once a path has been set up by the path setup protocol, the packet forwarding protocol forwards the user data packets along the path. Each router in the path uses the path ID and session key to check the authenticity and integrity of received data packets. For data packets with correct verification, the router forwards them to the next router as recorded in FID. In this way, all data packets for a session can flow through the established path to the destination. IDPR Path Setup is static because the path is determined ahead of time statically and solely by the source. Each transit router can only accept or reject
A Dynamic Interdomain Communication Path Setup in Active Network
277
the proposed path; it plays no role in the path computation. This violates the following general philosophy: the one providing the service should make the decision. Therefore, a dynamic path setup protocol is proposed in Section 4 in which, in contrast to IDPR, each router determines the next segment (router) of the path.
3
Active Network Architecture
Active networks are a novel approach to network architecture in which each switch in the network provides a computational environment such that customized computation can be performed on the fly based on the messages flowing through them. In essence, the network becomes programmable for any specific application. Currently, a new network protocol generally requires a lengthy standardization process before it can be deployed. However, under an active network architecture, the deployment of a new network protocol can be immediate. Traditional data networks passively transport bits from one end system to another. The network does not care much about the contents of the data payloads it carries, and they are transferred between end systems without modification. Computation is limited within such a network, e,g., header processing in packetswitched networks. The exponential growth of the Internet has brought diverse applications that may require intermediate network nodes to perform some computation on application data. For example, Web browsing can be enhanced if the intermediate nodes support Web page caching, and a path setup protocol needs to encrypt, decrypt, and validate packets at the intermediate nodes for safe key distribution. These two unique features of active network technologies, that routers are programmable for customized computation and new protocols are easily deployed, are ideal for the development and implementation of the dynamic path setup protocol proposed in this paper. Interdomain path setup is not only an application of active networks, it is also an essential tool for all connection-oriented applications in active networks. There is a strong synergy between the two. The active network research group at MIT has identified two approaches to an active network architecture, discrete and integrated, depending on whether programs and user data are transported separately or in an integrated fashion [9,10]. The proposed dynamic path setup protocol follows the integrated approach. To program a network, the integrated approach changes the passive packets of traditional network architectures to active capsules, which are programs with user data embedded. There are many details and issues in the design of an active network that are beyond the scope of this paper. For communication path setup, we will concentrate only on the programming with capsules for protocol implementation. The encoding of capsules has yet to be standardized by the active network research community. Thus, a functional description of capsules in our protocol is given rather than detailed program codes.
278
Jyh-haw Yeh et al.
To implement a path setup protocol in an active network, each router should be equipped as an active node. Since the underlying active network is a distributed computing engine, the determination of a feasible routing path can be decentralized. Thus, each router no longer needs to maintain a large routing information database to keep track of the internetwork topology as in the Policy Routing approach.
4
Dynamic Path Setup
The proposed dynamic path setup protocol in active networks uses active capsules that contain control information for iterative negotiation of the next router from source to destination through some qualified intermediate nodes. Once a path has been set up, the protocol is also responsible for the liveness of the connection by providing a path repair mechanism for the reconfiguration of the path upon failure of a link or a router. Before the detailed description of the protocol given in Sections 4.2 and 4.3, some data structures and packet types used in this protocol are described in the following subsection. 4.1
Soft State and Packet Types
In active network terminology, a “soft state” for a communication session specifies the current status of the session, and is maintained in each participating node. For dynamic path setup, the soft state consists of four data structures in each node of the path. These data structures, listed below, either specify some useful information or record the status of the session. Note that these four data structures in each node can be uniquely identified by the session ID. A Security Association (SA) contains the session ID, session key, encryption algorithms and all other information concerning security. A Qualified Neighbor List (QNL) in a node contains the available neighbors that are willing to provide services for the session. A Node Traversed (NT) is a sequence of nodes that have already been traversed by the Setup Capsule (the Setup Capsule is described later). During path setup, the Setup Capsule carries an NT that is updated by each node. The NT field can be used to avoid routing loops. After the path has been built, the NT contains the path and each node should have a copy of it. A Path Status (PS) is a two bit register that keeps track of the up/down status of the previous and next routers in the path. Packets in this protocol are divided into two categories - Capsule and Message. A Capsule carries a program for which a receiving router will fork a dedicated process to execute the program. Control information carried within a message is usually expected by a waiting process. There are four different capsules and eight different messages used in this protocol. All capsules and messages should carry a session ID for routers to identify the session.
A Dynamic Interdomain Communication Path Setup in Active Network
279
Setup Capsule : A Setup Capsule is generated from the source host. It tries to build a routing path to the destination. A Setup Capsule contains the SRS (source requested service) and an NT structure. Repair Capsule : A node generates a Repair Capsule if it detects a failed or congested router/link. This capsule is used to find a detour route to bypass the failure or congestion. The Repair Capsule contains the SRS and two segments of the original NT separated by the failed or congested router/link. Auth-Req Capsule : An Auth-Req Capsule is generated by a Setup Capsule or a Repair Capsule in each router and broadcasted to all neighbor nodes to collect the authorization information from them. An Auth-Req Capsule should contain the SRS. Data Capsule : A Data Capsule carries the application data and a program for processing the data. Yes/No Message : Upon receiving the Auth-Req Capsule, each neighbor router checks its local policy. A “Yes/No” message is then returned indicating whether the policy allows the SRS or not. Based on these “Yes/No” messages from its neighbors, a node builds a QNL allowing the node to choose the next router to which to forward the Setup Capsule or Repair Capsule. Grant Message : A Grant message is generated from the destination router if a path is found. It is sent all the way back to the source through the path just found. The Grant message carries the final NT (path) and the security association. Upon receiving the Grant message, each router stores the NT and the security association in local storage for future use. Negative Message : A Negative message is generated and sent back to the previous router in the NT if a router could not find any feasible neighbor to which to forward the Setup or Repair Capsules. This occurs either when there is no qualified neighbor or all qualified neighbors send back Negative messages. Alive Message : After a path has been built, each router periodically sends an Alive message to its two adjacent neighbors in the path to confirm continued path viability. Error Message : An Error message is generated if a Data Capsule violates the authenticity and integrity checks based on the security association. Repair Done Message : A Repair Done message is generated if a detour path is found. It is sent back to the failure detecting router through the detour path found. This message carries the new NT (path) and SA. Each router keeps the new NT and SA in local storage for future use. Tear Down Message : A Tear Down message requests routers iteratively to release the memory allocated for a session. If the path can not be repaired, all routers should receive the Tear Down message. On the other hand, if the path is repaired, only the routers not in the new path should receive the Tear Down message. NT Update Message : This message is to update the NT stored in the routers after a detour path has been built. 4.2
Path Setup
The network is treated as a computing engine in the active network architecture. For any network application, this computing engine needs some input programs
280
Jyh-haw Yeh et al.
from the source host and generates outputs for the application. In order to build a communication path, the program should instruct the engine to find a routing path to the destination in which all participating routers are willing to provide the requested service. In the proposed path setup protocol, the input program is a Setup Capsule. The source host prepares and sends the Setup Capsule to the network engine. The Setup Capsule contains an empty NT at the beginning, and adds one router each time when it traverses a router. When a router receives a Setup Capsule, it executes the following procedures. 1. Routing Loop Prevention : It checks the NT to see whether there is a routing loop. A routing loop exists if its own ID is already in the NT. In such a case, a Negative message is send back to the previous router and this process is terminated. 2. Destination Router Process : It checks whether the destination host resides in the current router’s subnet. If it does, the router generates the security association (SA) and sends it along with the NT to the source host via the previous router in the NT in a Grant message. 3. Neighbor Information Collection : If the node is not the destination router, it broadcasts an Auth-Req Capsule containing the source request service (SRS) to all neighbor routers. A QNL is built of all neighbor routers responding with a “Yes.” Each neighbor router executes the Auth-Req Capsule by comparing its local policy and the SRS. A “Yes” message is sent back if the policy allows the SRS. Otherwise, a “No” message is returned. 4. Next Router Selection Process : If the QNL is not empty, this procedure adds the current router to NT. It selects one neighbor router from the QNL until it is empty. For each selected neighbor router, two steps are performed. (1) Forward the Setup Capsule to the selected neighbor router. (2) Put the Setup Capsule Process into sleep and wait for Negative/Grant messages. If a Negative message is received, select another neighbor router and go to step (1). If a Grant message is received, save the SA and NT in local storage and forward the Grant message to the previous router or to the source host if the current router is the source router. If all neighbor routers in QNL are selected and no Grant message is received, a Negative message is sent back to the previous router, the current router is deleted from NT, and this process is terminated. The path determination strategy in this protocol is dynamic because each router decides the next segment (router) of the path. The scenario for this strategy is depicted in Figure 1. 4.3
Path Repair
As described earlier, another dynamic feature of the protocol is to bypass a failed or congested router or link during data transmission. In order to achieve this capability, another input program for instructing routers to repair the path must be installed in each router before transmitting the user data. This path
A Dynamic Interdomain Communication Path Setup in Active Network
281
INTERNET RT
p
ca
req
uth
a
setup cap
source host
RT Y
auth-req cap
RT
au
N req
RT
RT
dest host
th-
ca
p
RT
Y
se
tu
pc
ap
RT
Fig. 1. The scenario for dynamic path determination repair program can be carried in the Setup Capsule or another dedicated capsule. It is activated in each router when the router receives a Grant message, i.e., the path is set. The path repair program basically has three procedures. 1. User Data Forwarding : After a path has been built, the path repair program expects Data capsules from the source host. It checks each capsule’s authenticity and integrity based on the security association. If the checking is successful, the data processing program in the Data Capsule is called and executed. The path repair program resumes the control and forwards the Data capsule after the called program is completed. 2. Failed Router/Link Detection : The path repair program periodically sends an Alive message to the previous and next routers in the path. A bit in the two bit register PS is set if the corresponding router’s Alive message is received within a default time threshold T . If both bits are set, PS is reset at the end of T . By examining the PS, a router can keep track of the Up/Down status of its adjacent routers in the path. Another potential mechanism to detect failure is by the receipt of an Error message from the next router in the path after forwarding a Data Capsule to it. Consider the situation when the next router went down and came up again within the threshold T and the Up/Down protocol did not detect the failure. The soft state of this session would have been lost in next router and it could not recognize the forwarded Data capsule. An Error message would be returned by the next router. 3. Path Repair : If a failed router or link is detected by the Up/Down protocol, the failure detecting router will select another neighbor in its QNL and send a Repair Capsule to it. The scenario for issuing a Repair Capsule in this case is shown in Figure 2. The Repair Capsule is the same as Setup Capsule except that it finds a detour path from the failure detecting router to a reconnecting router. A reconnecting router can be any router closer to endpoint than the failed router in the original path. If the failed router or link is detected by receiving an Error message, the Repair Capsule is sent to the next router to reconnect the path. In both cases, the Repair Capsule carries two segments
282
Jyh-haw Yeh et al.
RT
ap
qc
re th-
au
RT
Y
auth-req cap
au
RT
N req c
-r auth
ap
Do ne
Repair
pai r Re
Y
Y
Repair
RT
RT
auth
-req
Repair Done Repair Done
Source Host
RT
Repair
RT Tear Down
Failed Router
cap
Y
RT
NT Update Failure detecting router
ap
eq c
th-
NT Update Reconnecting router
Dest Host
Fig. 2. The scenario for a successful path repair of the original NT. The first segment of NT starts from the source router to the failure detecting router. This segment of NT is treated the same way as the NT in Setup Capsule: the router ID is inserted to the list when the capsule travels through a router. The second segment of NT contains the remaining routers of the original NT with the failed router marked. It is used to determine whether a new path has been found. A new NT can be computed from these two segments of NT when the path repair is completed. When a router receives a Repair Capsule, it compares its ID to the two segments of NT. There are four possible results of the comparison. (1) If its ID is the marked router in the original NT, the router simply rebuilds its soft state to reconnect the path for the session. (2) If its ID is in the first segment of NT, a routing loop exists and a Negative message is sent back to the previous router in the first segment of NT. (3) If its ID is in the second segment of NT, the router is a reconnecting router and a detour path has been found. The sequence of routers following the reconnecting router in the second segment of NT is appended to the first segment of NT to form a new NT for the new path. Then the reconnecting router sends a Repair Done message containing the new NT to the failure detecting router through the new path. (4) If its ID does not appear in either of the two segments of NT, the same procedure for the Setup Capsule is applied to the Repair Capsule. That is, the router broadcasts the Auth-Req Capsule, builds the QNL, selects a router from the QNL, and forwards the Repair Capsule to the selected router. If the QNL is exhausted without finding a detour path, a Negative message is sent to the neighbor that sent the Repair capsule. For successful path repair, the consistency of the soft state NTs among all routers in the new path should be maintained as follows. (1) Upon receiving the Repair Done message, the failure detecting router issues an NT Update message containing the new NT to all prior routers in the new path.
A Dynamic Interdomain Communication Path Setup in Active Network
283
(2) After sending a Repair Done message, the reconnecting router should issue an NT Update message containing the new NT to all posterior routers in the new path and a Tear Down message to all prior routers in the second segment of NT. In the Up/Down protocol, two routers may detect a failed router/link simultaneously. Only the one nearer to the source in the path issues the Repair Capsule. The one nearer the destination should expect a Repair Capsule or a Tear Down message for a default time threshold T . If nothing is received within T , it means that the path repair is not successful and a Tear Down message is issued to the routers posterior to it in the path. If the failure detecting router receives a Negative message, it means that the path repair attempt was not successful. The router should try to send another Repair Capsule to another neighbor router in its QNL until the QNL is empty. If all neighbor routers on the failure detecting router’s QNL return Negative messages, the correct action is to either send a Tear Down message all the way back to the source or send a Negative message back to the previous router in the NT. The first choice stops searching for another path and informs the source that there is no path at this time; the source must perform path setup to recreate a path. The second choice continues to search another detour path starting from the previous router in the NT. Which choice to select should depend on the upper layer application. In this protocol, there are three major programs running in each participating router. These programs are the Setup Capsule, path repair program, and Repair Capsule. Table 1 briefly summarizes the differences among them.
Setup Capsule path repair program Repair Capsule
objective set up a path
relationship issued by source at the beginning detect the failure and activated after Setup maintain the path Capsule is finished set up a detour route issued by the path to bypass the failure repair pgm when detecting a failure
running routers all routers receive this capsule all routers in the path all routers receive this capsule
Table 1. Comparison among Setup Capsule, path repair program, and Repair Capsule
5
Conclusion
A dynamic interdomain path setup protocol is presented in this paper. We assume that the underlying network architecture is an active network, a novel network architecture in which each intermediate network node is able to perform some customized computation. The protocol is different from others in
284
Jyh-haw Yeh et al.
that it utilizes a distributed path determination strategy and has an automatic path repair mechanism for handling failures. We believe that the philosophy behind this strategy is better suited to the context of interdomain communication, i.e., the one providing the service should make the decision. Furthermore, the automatic path repair mechanism can detect failures much quicker since they are always discovered first by the nearest router. The nearest router can initiate the path repair process immediately upon detecting a failure. An active network environment facilitates both the computational and communication aspects of the protocol. Many active network applications will rely on such a connection setup protocol to establish active nodes for their perspective computation. The reconfiguration of a communication path upon failures is an important protocol design issue. To repair a failed communication session, a fast path repair process is crucial. The process should be efficient in failure detection and recovery. The proposed protocol achieves fast failure detection. However, failure recovery requires a fast detour path setup and relies on a good QNL selection criteria. Each router in the path setup protocol has no knowledge of the QNL in the selected next router. If the QNL is empty, a Negative message may be returned and cause a rewinding of the search. This kind of path setup rewinding should be limited by use of good selection criteria. One way to decrease the possibility of the path setup rewinding is to increase the information available to each router, for example, the past history of previous path setup or QNL of the neighbors. To make this information available to each router may slow down the path setup in another respect. Therefore, future work on this protocol should include finding good QNL selection criteria. Multiple failure is another issue not addressed in the protocol. There may be multiple routers detecting different failures more or less simultaneously. It is not a good idea to have multiple path repair processes running at the same time. Simultaneous repairs may result in redundant work and even incorrect path repair due to interference. This is also an open area to be addressed.
References 1. W. R. Cheswick, S. M. Bellovin: Firewalls and Internet Security. Addison-Wesley, 1994. 275 2. D. Estrin, G. Tsudik: Visa scheme for inter-organization network security. Proc. of the 1987 Symposium on Security and Privacy, 174-183, 1987. 275 3. H. Park, R. Chow: Internetwork Access Control Using Public Key Certificates. Proc. Of IFIP SEC 96 12th International Security Conf., 237-246, May 1996. 275 4. J. Yeh, R. Chow, R. Newman: Interdomain Access Control with Policy Routing. Proc. of Sixth IEEE Computer Society Workshop on Future Trends of Distributed Computing Systems, 46-52. Oct 1997. 275 5. M. Steenstrup: Inter-Domain Policy Routing Protocol Specification: Version 1. RFC 1479. July 1993. 275 6. E. C. Rosen: Exterior Gateway Protocol (EGP). RFC 827, Oct 1982. 275 7. Y. Rekhter, T. Li: A Border Gateway Protocol 4 (BGP-4). RFC 1654, July 1994. 275
A Dynamic Interdomain Communication Path Setup in Active Network
285
8. Y. Rekhter: Inter-domain Routing Protocol (IDRP). J. Internetworking Res. Experience, V 4, pp. 61-80, 1993. 275 9. D. L. Tennenhouse, D. J. Wetherall: Towards an Active Network Architecture. Computer Communication Review, Vol. 26, No. 2, Apr 1996. 275, 277 10. D. L. Tennenhouse, S. J. Garland, L. Shrira, M. F. Kaashoek: From Internet to ActiveNet. Request for Comments, Jan 1996. 275, 277 11. D. J. Wetherall, J. V. Guttag, D. L. Tennenhouse: ANTS: A Toolkit for Building and Dynamically Deploying Network Protocols. IEEE OPENARCH, Apr, 1998. 275 12. S. Bhattacharjee, K. Calvert, E. Zegura: An Architecture for Active Networking. High Performance Networking, Apr 1997. 275
Active Network Challenges to TMN Bharat Bhushan and Jane Hall GMD FOKUS, Berlin, Germany {bhushan,hall}@fokus.gmd.de
Abstract. Data and telecommunications communities have been witnessing two new developments in recent years: emerging active networking concepts and the revision of TMN. In the light of these developments, this paper investigates the extent to which TMN is suitable for managing future networking technologies and includes recommendations about where the TMN standards could be evolved to better accommodate active networking technologies. This paper is rightly timed because public telecommunications operators want to learn more about the usefulness of active networks but are sceptical about what they offer. Solutions to the challenges made by active networks will shape the course that active networks take, and management of active networks is one of the most difficult challenges. TMN wields authority in the field of telecommunications management and can be a key instrument for the management of active networks. Keywords: Active Network, Network Element Management, TMN, Configuration Management, Telecommunications Networks.
1
Introduction
The TMN standards were developed when what can be termed „conventional“ networking technologies were dominant. Since then considerable changes have taken place in the telecommunications market, in network usage, and in the network and value-added services provided as well as in the applications deployed, together with forecasts of even greater and more diverse usage to come. Such changes suggest that current telecommunications network architectures and management will be confronted with many new demands being made upon them. An increasing number of services, all with differing QoS requirements, together with large numbers of users, very large numbers of physical and logical entities, and more demanding customer requirements imply a much greater complexity to be managed and increasing dependence on telecommunications management systems to provide the support that providers need. Research into active networking technologies has proceeded in an attempt to improve the flexibility of networks by supporting more dynamic types of networking [1,2,3]. Two basic approaches exist: the discrete approach and the integrated approach. In the Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 285-299, 1999. Springer-Verlag Berlin Heidelberg 1999
286
Bharat Bhushan and Jane Hall
discrete approach, code is loaded on demand and cached for later execution (out-band service deployment). In the integrated approach, each active package carries its own code, i.e., contains a program fragment (in-band service deployment and subsequent processing). Active networking technologies are being made possible by the advances in software and hardware technologies, and in particular in distributed object-oriented engineering. Combined with modern distributed systems tools, the aim is to provide a greater degree of flexibility, reconfigurability, programmability and management to aid dynamic and rapid service creation to meet the demands of a competitive open service market. However, such technologies cannot be deployed successfully without the related management systems being available to provide the support required by telecommunications operators to meet the increasingly sophisticated demands of a customer-oriented market. This paper is therefore investigating the extent to which the TMN standards1 are appropriate for active networking technologies in order to assess the kinds of changes that need to be introduced into the standards in order to meet the challenges of the future. This paper is structured as follows. Section two introduces related work on managing active networking technologies. Section three investigates the suitability of current TMN standards for managing future networking technologies such as active networks. Section four presents potential active network implementations and shows how they could be managed with TMN. The conclusions summarise the findings of the paper.
2
Related Work on Managing Active Networking Technologies
The concept of active networks has not been limited to networking technology; it has found its application in network management too. This section surveys the efforts being made to apply the concepts and active network technologies to network management. BBN Technology has been developing a programming language and a system architecture which allow packets to carry diagnostic programs. Small packets are encapsulated within an Active Network Encapsulation Protocol [9]. Routing protocols and table updates could be implemented in capsules as could network management functions, such as those provided by SNMP or CMIP. Smart Packets aim to enhance network management by bring the management closer to the node being managed [10]. They extract from nodes the management functionality and construct them with (special-purpose) programming languages, thus making network control more agile. The heterogeneity of platforms used for running the system is a major problem in fault diagnosis. The Smart Packet technology will allow the diagnostics programs to customise themselves according to the platforms on which they run. With the help of Smart Packet technology, the diagnosis of new protocol and services will be possible before special tools are developed. 1
The TMN standards considered by this paper include the TMN functional, information and physical architectures [4] and the principal M.3000 series documents most relevant to the investigation [5] [6] [7] [8].
Active Network Challenges to TMN
287
The Xbind [11] platform has been developed with an aim to create, deploy and manage advanced multimedia services. The Xbind platform also enables the development of mechanisms for distributed network resource allocation, real-time multi-vendor switch control, broadband signalling, and multimedia transport. An implementation of the qGSMP [12] on Xbind has proved the Xbind to be practicable. NetScript [13] is a programming language and an agent-based middleware environment for building and operating active networked systems. With NetScript agents, management functions of intermediate nodes can be programmed and configured as application or user requirements change and these agents can be dispatched to remote networks. NetScript agents can monitor remote and strategically important network nodes. In this application, NetScript agents can function as highlevel filtering programs that watch the network traffic in real-time. The NetScript environment provides its users with a universal abstraction of programmable network devices and the NetScript language itself is a dynamic language. These two features of NetScript can be used to create powerful and programmable SNMP agents. The Darwin project [14] addresses the problem of runtime resource management for advanced network services and applies the concept of active networking to resource management. This approach proposes customised resource management mechanisms to support value-added service applications. These mechanisms allow the applications and service providers to tailor resource management, which in turn adjusts service quality, to suit their needs. The Darwin system architecture is being used to implement a management technique in which QoS is provided for a specific service or application[15]; this is in contrast to a fixed QoS framework, which is used conventionally. The DIRM (Dynamic Integrated Resource Management) project [16] is investigating the area of dynamic QoS management. This project aims to integrate many existing QoS management systems into a set of high-level APIs, which will allow applications to control the QoS for their communications over RSVP. The IEEE P1520 project [17] is developing a reference model for a future network architecture where the developers and administrators of value-added services will be able to access the network (for controlling and deploying services) through a standardised programming interface. The reference model will allow the developer to access and control three different networking technologies, namely ATM networks, the Internet, and SS7 (Signalling System 7) networks, through a single and unified interface. The objective of the reference model is to open up the management and signalling interfaces used to access network nodes and combine then into a single and standardised high-level programmable interface. Active nodes built on the above-mentioned active network-based management applications can allocate resources to the various virtual networks, undertake configuration and reconfiguration functions, also fault, routing and flow control functions. They can take decisions on their own and can report back to the management system. The work mentioned above is oriented to the Internet and the SNMP area. Active networking technology has not yet been examined from the TMN aspect. That is what this paper has set out to do. Research into the impact of each of two approaches (i.e., discrete and integrated approaches) on TMN is a sizeable piece of work. Therefore, this paper considers the active network concept as whole (i.e., including both discrete and integrated approaches) and is investigating the impact of
288
Bharat Bhushan and Jane Hall
the concept on TMN. Some of those aspects of active networks that make most impact on TMN and are relevant to public telecommunications operators are considered for investigation. From the research into active networks it is evident that the use of software as the basic structural foundation of networking is continually spreading. In order to assess the impact of this fact on TMN, this paper gives special attention to the software-related aspects of active networks.
3
The Adequacy of TMN in a Changing Environment
Future networking technologies that are likely to be deployed in telecommunications networks will themselves need to be managed. This section investigates the TMN standards to determine where they do not easily apply to managing emerging and future networking technologies such as active networks.
3.1 Architecture The TMN architecture is based on standardised interfaces, protocols and messages for the exchange of management information [4]. Generic information models and standard interfaces are regarded as the means for performing general management. The functional architecture is based on function blocks which exchange information over reference points. Everything is therefore related to a function block, and all the functional components are located within one of the function blocks. Although the OS (Operations System) physical architecture „must provide the alternatives of either centralizing or distributing the OS functions and data“ (section 6.3 of M.3010) it is pointed out that: „More study is required on how communications between distributed OS functions may be accommodated under the TMN architecture.“ Distributed OS functions were not fully supported as it was not clear enough at the time how this could best be achieved. It was also not so necessary as the cost of memory and processing encouraged centralising OS functions in a scarce resource. Continuous decreases in costs have rendered this argument invalid. In addition, advances in software engineering have made feasible a greater distribution of OS functions than was originally envisaged in the TMN architecture. Such distribution would make management systems more flexible and would be more appropriate for a variety of application areas, including the management of active networking technologies. The TMN architecture is a hierarchical architecture based on logical layers. In M.3010 (section 5.1.2) it is stated that the „element management layer manages each network element on an individual or group basis and supports an abstraction of the functions provided by the network element layer.“ The hierarchical managing/managed approach is reflected in the functional architecture. According to M.3010 (section 2.1.2) the OSF (Operations System Function) processes information „for the purpose of monitoring/coordinating and/or controlling telecommunication functions“ and the NEF (Network Element Function) communicates with the TMN „for the purpose of being monitored and/or controlled.“ The idea at the time, and realised in commercial implementations of the standards, is of a hierarchical approach
Active Network Challenges to TMN
289
with the NEF at the bottom and no NE interacting for management purposes on a peer-to-peer basis with another NE (Network Element). To manage active nodes in the same way is to ignore the rich and diverse functionality that can be supported by active nodes. In particular, the peer-to-peer interaction typical in a distributed software system such as that represented by active networks, cannot be easily accommodated in such a hierarchical approach. Recommendation: First, examine the logical layered architecture from the perspective of future networking technologies and investigate alternative approaches that take into account the possibilities of technological and software advances and that would provide a framework for more flexible and efficient management functionality. Second, distributed OSs should be investigated and incorporated into the standards and no longer just considered „for further study“. It should be possible to distribute the TMN OS functions and data to a greater extent than is currently allowed for.
3.2 Manager / Agent Paradigm TMN is based upon the OSI manager/agent model of the CMIP protocol. In M.3010 (section 3.2) it is stated that the manager role „issues management operation directives and receives notifications“ and that the agent role is „to respond to directives issued by a Manager.“ The interactions are simple and the agent can only respond to the manager’s commands. Apart from issuing notifications, the agent cannot initiate its own interaction and it cannot interact with other agents. The roles are clearly demarcated and although the possibility of management processes taking on both manager and agent roles during a single association is acknowledged in M.3010 (section 3.2), it is pointed out that such a case „requires further study.“ Synchronisation issues and concurrent bidirectional requests were also left for further study, which resulted in an approach where roles are assigned to management processes within a given context and remain fixed for that association, with no possibility of concurrent interactions. This is a shortcoming for more complex management functionality where it could be advantageous for management processes to be able to take on both roles during an association and where negotiation and not just command and response interactions are required. This feature could be useful in complex networks of the conventional type, for example over an X interface for peer-to-peer inter-domain management, as well as in the management of active networks where active nodes could take on the role of both manager and agent in a management interaction. The roles of manager and agent need to be more dynamic and may have no intrinsic significance in future networking technologies where well-defined manager and agent roles for an entire association could restrict the management processes in carrying out their tasks. Recommendations: First, investigate the manager/agent interaction of CMIP in the light of the active network technology paradigm and propose alternatives based on different interaction models and cooperative management solutions. The idea of centralising intelligence in a manager which initiates all request/response interactions with agents is no longer always appropriate and to maintain such centralised control could hamper the future effectiveness of conventional management systems. Second,
290
Bharat Bhushan and Jane Hall
in connection with this, the possibility of a balanced CMIP enabling both manager and agent roles to be adopted during a single association should be examined. This would remove the restrictions currently being experienced in limiting a management process to adopting only one of the two roles in an association.
3.3 Management Information Model TMN uses the OSI management information modelling concepts, including the structure of management information and GDMO (Guidelines for the Definition of Managed Objects). MOs (Managed Objects) are specified according to this model using static definitions that are fixed at compile time. M.3020 (section 3.3.13) states that a „management information schema specifies the information model of a managed system as seen over a particular interface by a particular managing application or system. The information model contains all the object classes that can and will be provided by that managed system to the managing application or system. In particular, it defines the naming structure for those object classes within the managed system. The management information schema defines all possible communication of information between the managing application or system and the managed system.“ This represents an approach to management information modelling based on MIBs that are expected to exist for some time without requiring modification. Managed object definitions to manage networks and network elements have therefore been standardised with the intention of being valid for years whereas an active node can itself make changes to the MIB. The operations that the MO supports are defined in the specification for the managed resource and no extensions or modifications are possible. When changes occur they are provided in a new version of software that supersedes the previous version in a regulated upgrade. All the management information must be in a schema, the bounds of the information are fixed. There is a given set of attributes and actions in an MO definition that cannot take advantage of developments requiring dynamic changes to MO specifications while the system is running. Conditional packages are available which are run time features but they must already exist at compile time, further conditional packages have to wait for a new compilation. Work on on-line extensions to MIBs, which would also be more appropriate for managing active networking technologies, is currently being undertaken but has not yet been included in the TMN standards. The static approach to information modelling is a rather inflexible paradigm for future networking technologies as, for example, the attributes and actions of an active node can be extended dynamically as active nodes incorporate new actions, behaviour, and attributes. This can be enabled with a distributed and extensible software environment which provides the user with a higher-level abstraction of the proprietary management interface to resources. Active nodes can change their functionality, they can load and execute programs that can change their behaviour and they can recognise different protocols dynamically. Current network management is not based on concepts to support such characteristics because the state of the network is reflected by the information that is obtainable from the proprietary management interface and,
Active Network Challenges to TMN
291
in effect, this idea bypasses the software environment that provides a higher-level abstraction of the proprietary management interface. When the TMN standards were developed the emphasis in networking technologies was more on the hardware and equipment comprising the network and less on the networking software. Over the course of time the software supporting networking technologies has become more significant, with more networking functionality being executed in software. Such software-based equipment, that can also support more extensive management functionality, is not really accommodated in this statement. When managing software, different assumptions can apply as software can be changed dynamically during the lifetime of the resource. A different approach to the functionality of the network and its extensibility can be adopted in a way not possible with a hardware oriented approach. The appropriateness of management information models will clearly be challenged in such an environment. If active network elements and active networks are to be comprehensively monitored and controlled, a different understanding of a network element and network may be needed. Active networks represent a distributed system with a significant software component and so need to be managed like a distributed system, i.e., there is additional software to manage. This represents a new challenge to conventional network and network element management which has tended to concentrate on managing hardware. TMN needs to encompass the management of networking software as an integral part of network and network element management. Recommendations: First, investigate how dynamic specifications can be incorporated into management information modelling and how shared management knowledge can take account of dynamic updates. Second, review the definition of a network and network element in the light of the greater significance of software at the network and network element layers.
4
Active Networking Technology Implications for TMN
This section undertakes a closer examination of the management of active networks, investigating in more detail TMN features compared with corresponding active networking features and looking at the implications of active networking technologies for TMN information modelling.
4.1 On-the-Fly Network Resource and Networking Service Creation Active networking offers extensibility for the provisioning and management of virtual networks. Packets carrying new management services can enhance management systems operating within NEs and create network resources on the fly by partitioning existing resources. Examples of network resources are routing tables, buffer space and bandwidth. Examples of network services are connection admission control, congestion notification and control, resource management, and traffic shaping. In order to highlight the need and applicability of rapid creation of network resources and services, an example of dynamic provisioning of virtual networks over a
292
Bharat Bhushan and Jane Hall
single ATM transport network and their efficient management is given here. Virtual networks as described in [18] are based on the programmability of networks. Multiple virtual networks within a single physical network may also be needed to support the needs of different users and applications. To meet these needs, active control of network resources and services is used to provide virtual networks and to guarantee an optimum level of the QoS. Each of the virtual networks is a set of resources allocated to a type of network traffic and can be controlled by a control-system tailored to the specific needs of applications, allowing application-specific customisation of network control. Besides allocating resources, restrictions may also be imposed on the network traffic operated by a virtual network. When the provisioning of a virtual network takes place, efficient management will be required. Active networking allows the virtual network provider to modify parameters of the network resources and services according to dynamically changing user and transport protocol requirements, to associate names with resources and to perform accounting management. Another application of active networks to multi-transport-protocol stacks is given here. The current trends in transport networks suggest that multimedia applications impose stricter QoS requirements than data-oriented applications and require different types of transport protocol compared with a single-medium application. The programmable transport architecture described in [19] addresses this need. It allows applications to choose from and bind to many different protocol stacks according to the applications’ transport requirements. The architecture includes a control and management front-end that carries out dynamic resource provisioning, accounting and QoS control. In the examples of virtual networks and multi-transport-protocol stacks, the key idea of network programmability is implemented by middleware (e.g., front-end in a multi-transport-protocol stack). Middleware is generic and portable enough to support different types of user requirement but at the same time it cannot be managed as a selfcontained software package. It should be managed together with the other networking components and its management should therefore be incorporated into the network management system. One of the prerequisites for the operation of active networks is on-the-fly creation of resources and services, and middleware plays a key role in meeting this prerequisite. On-the-fly creation of resources and services imposes two requirements on TMN. The first requirement concerns network service creation. Using active network technologies, new network services can be dynamically added to the managed network. Viewing this situation from within a TMN environment, if an agent is to manage continually added services the Q adaptor should also be updated in order to interface with newly created services. In fact, the TMN physical architecture can be implemented in a variety of physical configurations (section 4.1 of M.3010). M.3010 (section 2.1.5) states that „the Q adapter is used to connect as part of TMN those non-TMN entities that are NEF-like and OSF-like“. If non-TMN entities that are NEF-like are connected to a TMN system, Q adaptors are difficult to modify rapidly because they are part of the NEF and updating them can require substantial work in the management information model. In this use of Q adaptors, it should be researched as to how Q adaptors allow the NEF to dynamically interface with newly created services. If non-TMN entities that are
Active Network Challenges to TMN
293
OSF-like are connected to a TMN system, Q adaptors may be replaced by new Q adaptors in their entirety, which may not require substantial work. In this use of Q adaptors, it should be researched as to how Q adaptors allow the OSF and MF (sections 2.1.1 and 2.1.4 of M.3010) to dynamically interface with newly created services. It should also be researched as to how this replacement may affect other TMN activities. The second requirement concerns resource partitioning. New types of resource can be dynamically created (or partitioned) from the existing ones. This will require the information model to be changed (new GDMO classes to be created, agents to be recompiled, and so on) to represent the new types of created resources (sections 2.2.2, 2.2.1.3, and 3 of M.3010). But the current TMN information model does not allow new GDMO classes to be created on the fly. Recommendation: Substantial operations take place in active network elements, resulting in the rapid creation of new types of network resource and network service. TMN functional components should facilitate the development of management systems that allow operators and administrators to customise a TMN system in order to manage newly created services and resources. Support for customisation should be provided in both the managing system and the managed system. An application should be able to exercise control over dynamically changing network resources and entities providing network services. A managed system (e.g., an agent interacting with a virtual network) should be able to customise event reports according to the unpredictable changes that occur in the managed network elements.
4.2 TMN Information Modelling for Active Networking This section looks in more detail at TMN information modelling. It investigates information modelling aspects of active networking, discussing how network information can be modelled for active networking and showing how this modelling can influence M.3100. Since TMN information modelling is relevant to the managed system side of a TMN system, only the static nature of a managed system, in particular the MIB, is discussed in this section. Dynamic Changes in Transmission Schemes versus the Static Nature of Network Resource Representation. Active networking architectures, with the help of virtual machines, view networks as a large „seamless“ distributed system. These virtual machines are software modules that provide high-level programmable interfaces to the applications and are able to deploy new protocols dynamically. (e.g., active networking architectures PLANet [20], supporting technology ANTS (Active Network Transport System) [21], and PLAN (Programming Language for Active Networks) [22].) In this large distributed system, the management of individual intermediate network elements is loosely coupled with end systems, where the management system may execute. The intermediate managed elements can take „minor“ management decisions (e.g., which packet to route, which packet to block, find the best path, etc.) independently. An active network-based architecture can allow users to dynamically invoke network services (e.g., address resolution and routing), dynamically change the configuration of routers and switches, thus modifying a given path.
294
Bharat Bhushan and Jane Hall
Figure 1 illustrates three different aspects concerning dynamic changes occurring in the transmission scheme within a network element and the static nature of representation of network element resources. Parts A and B of Figure 1 show the telecommunication functions and resources of a network element and their representations in the NEF. Part C illustrates dynamic changes occurring in an active network element. M.3100 (section 3.5.1) defines the cross-connection MO class as follows: „A point to point cross-connection can be established between: one of CTP (Connection Termination Point) Sink, CTP bi-directional, TTP (Trail Termination Point) Source, TTP bi-directional, or .....“. Once a point-to-point type of crossconnection has been established between two end points of a given type, then the type of cross-connection does not change and the cross-connection MO remains in the MIB until explicitly deleted. In an active networking environment the type of one of the end points used in an already established cross-connection may change in the NE as a result of the allocation of new resources, a change in the transmission scheme or a change in the routing path (see below for an example of multicast). For example, a cross-connection between a CTP sink and a CTP source changing to a crossconnection between the same CTP sink and a new type of GTP (Group Termination Point) (see part B of Figure 1). Or a simple cross-connection changing to a multipoint cross-connection. That is, a new type of GTP or a new type of cross-connection was created in the network element but the MIB did not have a new type of GTP or new type of cross-connection to represent them. This implies that a change occurred in the telecommunication functions of the network equipment but the its representation within NEF remained unchanged (see part A of Figure 1). vpCTP vpTTP vcCTP vcTTP connect disconnect
A
representation of telecom functions and resources
cross-connection { CTP sink – CTP source }
NEF
cross-connection Dynamic changes
Dynamic changes
Representation
multipoint cross-connection
cross-connection { VP VC
CTP sink – new type of GTP }
actual telecom functions and
new resources in delete network element
C
out of TMN
Active Network Element (Switch, router) Link
CTP Sink CTP Bidirectional TTP Sink TTP Bidirectional GTP
cross-connection (eg virtual path)
B
CTP Source CTP Bidirectional TTP Source TTP Bidirectional GTP
Figure 1: Dynamic Nature of Switches and its Effect on Information Modelling Fragments at TMN NE
Active Network Challenges to TMN
295
The above change should be reflected dynamically in the NEM (see part A of Figure 1). Two parts of the NEM that should be updated by the changes occurring in the network element (see Figures 5, 8 and 18 of M.3100) are the containment tree of the NE MIB and the behaviour of the cross-connection MO that changed. With conventional TMN, the status and configuration of the NEM OSF can be changed but the „old“ cross-connection MO will have to be deleted and a new one instantiated with a new pair of termination points. However, objects of a new class cannot be dynamically instantiated because of the absence of the object class definition that represents a new type of resource in the GDMO specification. An example of a multicast server illustrating the above mentioned dynamic change in connection schemes is given here. In multicast, cell replication is done within the network by the network nodes at which a connection splits into two or more branches. In multicast server operation, all end systems wishing to transmit onto a multicast group set up a point-to-point connection with a device called a multicast server. The multicast server receives the cells from end systems across a point-to-multipoint connection. It then serialises and replicates cells and retransmits them to multiple end systems across a point-to-multipoint connection. The multicast server can also connect to all end systems of a multicast group across a point-to-point bi-directional connection and can replicate the cells before transmission. In another multicast scheme, all end systems of a multicast group connect with each other across a pointto-multipoint connection. Hence all nodes operate as transmitter and receiver for one another. This scheme needs no multicast server at all. Therefore, there can be at least three multicast schemes and, depending upon the requirements of applications and users, one scheme can be replaced by another. Theoretically multicast schemes may seem easy to work but practically they are complex, inflexible and make internetworking of existing protocols with ATM difficult. All existing end systems must register information about the newly joined end system. The complexity of the multicast operations can be greatly reduced and advantages of internetworking over ATM can be fully gained by combining autoconfiguration with active network technology. Autoconfiguration features can be implemented as a middleware module, utilising the node-resident processing facilities. This will enable a more agile control of multicast operation and will greatly facilitate the administration and operation of network nodes. (Also refer to [23] and [24] for an error control scheme in active networks). If this scenario is viewed from within a TMN system environment, it appears that in order to complete the task of administration and operation, the managing and managed TMN systems should update themselves according to the changes that occurred in the network element and middleware functions. The administrative, operational and availability states of network nodes should be updated to ascertain the normal functioning and readiness of cross-connections. In summary, management information related to networks in M.3100 is depicted in a tightly tiered structure. The entire management information is organised in a tree shape structure, having the network layer information at the top and the network element layer information at the bottom. The MIB should be able to change its status and configuration as changes occur in an individual network element. The changes
296
Bharat Bhushan and Jane Hall
should also be reflected in the upper layers of the tree structure. In the current form of the TMN functional and information architectures, there is a disparity in management activities that take place in the managed element and the status and configuration of the MIB. This disparity may not pose a problem at the network layer viewpoint but problems of naming may surface at the network element layer. Recommendation: The dynamic nature of active networks raises two questions about information modelling used by the TMN information architecture. First, how will the naming scheme change as new types of resources are dynamically created? Second, how will the consistency and completeness be maintained as new services are dynamically created at a resource? Research is needed to dynamically construct the name of new types of resources, keep the status of the MIB and the managed network consistent, and to maintain completeness (to check whether the services offered by a particular resource are really available). Addressing these issues will make OA&M (Operation, Administration and Maintenance) more flexible and easily scalable for rapidly changing networks. Dynamically Enhancing Functions of NEs and their Effect on the NLM Viewpoint. Active networks allow the operator to dynamically (module-wise) build up the functions of remote network elements and to enhance their functionality. For example, switchlets can be used to download onto a bridge some specialised software module which implements an algorithm (e.g., a tree-search algorithm) and can enhance the function of an ordinary (or non-active) repeater to a self-learning (active) bridge [25]. In an ELAN (Extended LAN), a self-learning bridge can change the logical interconnection between two LANs (or partition the ELAN), thus changing the logical topography of the ELAN. As another example, the functionality of a router can be enhanced to look for a suitable path to route PDUs (Protocol Data Units) on the best available bandwidth when networks become congested [26]. This will change the configuration of the end-to-end connection that is passing through the router. These dynamic changes and many of the same sort in a network element will have an effect on the following aspects of the network layer information model: Physical and Logical Information of the Network Object Class. The Network object class (section 3.1 of M.3100) represents the interconnected (logical and physical) telecommunications and management objects capable of exchanging information. These objects may be owned by a specific provider or associated with a specific network service. The network element is dynamically enhanced to provide a type of service that is different from what it provided during the configuration of the network (or, initialisation of the network layer MIB). In this situation, the Network, ConnectionR1 and TrailR1 object classes should be able to update themselves (e.g., alteration in the containment relationship in the MIB, modification in the transmission function) under the changed configuration of the network. However, TMN does not allow dynamic changes because it has „traditionally been based on a static model, with a fixed location of function, a high degree of central intelligence, and single protocol“ [27]. Recommendation: A network architecture built on active network technologies will be able to change its composition and configuration according to user demand. In
Active Network Challenges to TMN
297
order to manage this situation, research is needed to find the means of reconstructing relationships among network elements at the network layer. Relationships among network elements should change automatically as the functionality of a network element that is interconnected to a network is upgraded (e.g., a repeater is enhanced to function like a bridge). Information on topographical interconnection and configuration of network elements should be able to reflect the dynamically changing network status. Object classes may also emit notifications as a result of a change in the configuration; thus these new notifications can also be defined in the current version of TMN information model.
5
Conclusions
This paper has investigated the impact of managing active networking technologies on TMN. Combined with advances in distributed software technologies, which they can leverage, active network technologies will continue to evolve along with other network technologies that will emerge, no doubt becoming cheaper, easier to use and providing more power. The paper has attempted to pose questions that may need to be answered in the face of such developments. It can be regarded as intending to provide some suggestions about possible future technologies and their impact on TMN in order to stimulate thought about TMN and its evolution at a time of constant and farreaching change in the telecommunications industry. The main conclusion of the paper is that certain management and communication aspects of TMN (functional, physical, and informational architecture) are „static“ when used for the management of active networks, which provide more agile and dynamic functionality. We have considered only these „static TMN“ and „dynamic active networking“ aspects and made recommendations on how TMN could be evolved to overcome the problems associated with the paradigms of today. A new paradigm for management that can effectively manage active networking technologies and their evolution is required. In particular, the use of distributed object-oriented technologies, middleware, object-oriented distributed processing environments, and also agent technologies need to be considered for future TMN evolution. In other words, the consideration of how to manage future networking technologies, such as active networks, can act as a contributing trigger in the trend towards adopting open distributed environments for telecommunications management.
Acknowledgements This work has been carried out within EURESCOM Project P812 and the authors wish to thank their colleagues in this project for their constructive discussion of the ideas presented here.
298
Bharat Bhushan and Jane Hall
References 1. J. Biswas et al., „The IEEE P1520 Standards Initiative for Programmable Network Interfaces“, IEEE Communications, 36 (10), October 1998, pp. 64-70. 2. K.L. Calvert et al., „Directions in Active Networks“, IEEE Communications, 36 (10), October 1998, pp. 72-78. 3. D.L. Tennenhouse, „A Survey of Active Network Research“, IEEE Communications, 35 (1), January 1997, pp. 80-86. 4. Principles for a Telecommunications management network, ITU-T Recommendation M.3010, 1996. 5. TMN interface specification methodology, ITU-T Recommendation M.3020, 1995. 6. Generic network information model, ITU-T Recommendation M.3100, 1995. 7. TMN management services and telecommunications managed areas: overview, ITU-T Recommendation M.3200, 1997. 8. TMN management functions, ITU-T Recommendation M.3400, 1997. 9. Active Network Encapsulation Protocol (ANEP), Request for Comments, Active Networks Group, Authors: D.S. Alexander, B. Braden, C.A. Gunter, A.W. Jackson, A.D. Keromytis, G.J. Minden, D. Wetherall, Status: DRAFT, July 1997. 10. B. Schwartz, et al., „Smart Packets for Active Networks“ January, 1998. http://www.bbn.com 11. A.A. Lazar, K.S. Lim, and F. Marconcini, „Realizing a Foundation for Programmability of ATM Networks with the Binding Architecture“, IEEE Journal of Selected Areas in Communications, 14 (7), September 1996, pp. 1214-1247. 12. C.M. Adam, A.A. Lazar, and M. Nandikesan, „QoS Extensions to GSMP“, COMET Group, Department for Electric Engineering and Centre for Telecommunication Research, Columbia University, New York, USA, Technical Report 471-97-05, April 1997. http://comet.ctr.columbia.edu/xbind/qGSMP 13. Y. Yemini and S. da Silva, „Towards Programmable Networks“, IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM '96), L’Aquila, Italy, October 1996. 14. P. Chandra et al., „Darwin: Customizable Resource Management for ValueAdded Network Services“, Proceedings Sixth IEEE International Conference on Network Protocols (ICNP’98), Austin, October 1998. 15. E. Takahashi, et al, „A Programming Interface For Network Resource Management“, Proceedings Openarch’ 99, New York, March 1999. 16. Dynamic Integrated Resource Management (DIRM) - BBN Distributed Systems Project funded by DARPA/ITO. http:// www.dist-systems.bbn.com/projects/DIRM/ 17. J. Biswas, et al., Application Programming Interfaces for Networks, 1998. http://www.iss.nus.sg/IEEEPIN 18. S. Rooney, et al., „The Tempest: A Framework for Safe, Resource-Assured, Programmable Networks“, IEEE Communications, 36 (10), Oct 1998, pp. 42-53. 19. J.-F. Huard, and A.A. Lazar, „A Programmable Transport Architecture with QoS Guarantees“, IEEE Communications, 36 (10), Oct 1998: 54-62.
Active Network Challenges to TMN
299
20. M. Hicks, et al., „PLANet: An Active Internetwork“, Proceedings IEEE INFOCOM '99, New York, 1999. http://www.cis.upenn.edu/~switchware/ 21. D.J. Wetherall, J.V. Guttag, and D.L. Tennenhouse, „ANTS: A Toolkit for Building and Dynamically Deploying Network Protocols“, Proceedings IEEE OPENARCH’98, San Francisco, April 98. 22. M. Hicks et al., „PLAN: A Packet Language for Active Networks“, Proceedings of the International Conference on Functional Programming (ICFP)’98. http://www.cis.upenn.edu/~switchware/ 23. G. Parulkar, et al., „An Error Control Scheme for large-scale multicast applications“, Proceedings IEEE INFOCOM '98, April 1998, San Francisco. 24. U. Legedza, D. Wetherall, and J. Guttag, „Improving the Performance of Distributed Applications Using Active Networks“, Proceedings IEEE INFOCOM '98, April 1998, San Francisco, pp. 590-599. 25. D.S. Alexander et al., „Active Bridging“, SIGCOMM'97, Cannes, September 1997, Computer Communication Review, 27 (4), October 1997, pp. 101-111. 26. S. Bhattacharjee, K. Calvert, and E.W. Zegura, „An Architecture for Active Networking“, High Performance Networking (HPN’97), White Plains, NY, April 1997. http://www.cc.gatech.edu/projects/canes/pubs.html 27. A. Manley and C. Thomas, „Evolution of TMN Network Object Models for Broadband Management“, IEEE Communications, 35 (10), October 1997, pp. 60-65.
Survivability of Active Networking Services Amit Kulkarni, Gary Minden, Victor Frost, and Joseph Evans1 Department of Electrical Engineering and Computer Science University of Kansas, Lawrence, KS 66045 {kulkarn,gminden,frost,evans}@ittc.ukans.edu
Abstract. Active networking enables the rapid creation and deployment of innovative services in the network. This paper describes an architecture to ensure survivability of services in an active network through the dynamic reconfiguration of service components. We enhance the primary-backup protocol used in traditional distributed systems with active networking features that enable programmable selection of the service location and dynamic reconfiguration of the system if the primary service provider fails.
1
Introduction
One of the goals of active networking is to enable the development and deployment of new, secure and robust services and protocols in the network. The nodes of an active network are programmable, enabling creative, application-specific protocols to be installed in the network. SmartPackets carry code for the protocols to the active nodes, which provide a platform for its execution. Examples of application-specific protocols are audio bridging [1], sensor fusion applications [2], booster protocols [3] and services for improving application performance in wired/wireless networks like active filtering and active merging [4]. In traditional networks, where the primary network services e.g. routing and addressing, are well-known and fixed, a number of custom protocols and approaches [5,6] have been proposed and implemented to ensure survivability of these services during network outages. But an active network enables new services to be deployed at very short notice in the network. For example, in the MAGIC-II project [7], active networking is used to implement an active merging service that provides application-specific merging of client requests to reduce bandwidth demand of terrain visualization applications like TerraVision [8] over wireless links. The active merging service is deployed into the network at application startup and deinstalled when the application terminates. Making services survivable in a dynamic environment like an active network using traditional techniques would require 1
This research is partially funded by the Defense Advanced Research Agency (DARPA) under contract F19628-95-C-0215.
Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 299-306, 1999. Springer-Verlag Berlin Heidelberg 1999
300
Amit Kulkarni et al.
considerable effort on the part of the network administrator. Survivability issues in active networks are being tackled by the Survivable Active Networks project [9] and the NESTOR project [10]. The Survivable Active Networks project focuses on techniques to prevent, detect, isolate and recover from various threats and malicious attacks on active network elements. NESTOR attempts to develop technologies for self-configuring and self-managing network systems that can withstand failure. In contrast, we focus on providing a survivable infrastructure for services deployed in the active network. Survivability is achieved in any system through the replication of components using either the state machine approach or the primary-backup approach [11]. The state machine approach replicates the service state at all servers and presents client requests to non-faulty servers. In the more popular primary-backup approach, one server is designated as the primary and all other servers are backups. Clients make requests to the primary server only. If the primary fails, one of the backups takes over, i.e. a failover occurs. These approaches require extensive manual coordination and intervention, e.g. determining locations of the primary and backup servers, configuring setup files, and resetting the configuration during recovery from failure. In this paper, we modify the primary-backup protocol to demonstrate how a conventional protocol can be extended in an active networking environment with novel features such as automatic setup, programmable selection of service location and dynamic reconfiguration after system failure. Automatic setup is achieved by allowing the primary to install its own backup server. The primary sends ferret packets that execute a service-specific algorithm at the active nodes they visit to determine if the node can host the backup server. The primary chooses a backup from the set of hosts identified by the ferret packets. This enables dynamic selection of the best available location for the backup at the current time instead of statically chosen locations. If the primary fails, the system dynamically reconfigures itself with the backup taking over as the new primary server and selecting its own backup server. If the backup fails, the primary automatically selects a new backup. This process can continue ad infinitum whenever there is failure of the primary or backup server.
2
Protocol Operation
In the description of the fault-tolerant protocol below, the server is an in-network proxy implementing a specific service. Clients are assumed to be unaware of the location of the service and hence the protocol also implements a discovery phase. The assumption of location transparency is particularly relevant to active networking because services can be deployed dynamically in an active network. The protocol also makes the following assumptions: 1. The system is 1-fault tolerant i.e. the probability of both the primary and backup failing in some interval of interest is very small. 2. A crash implies that either the server fails or the node hosting the server fails. 3. Link failures in the network do not partition the network. 4. SmartPackets transmitted over a link are not lost, duplicated or corrupted.
Survivability of Active Networking Services
301
5. 6.
SmartPackets sent over a link are received in the proper sequence in a finite time. SmartPackets take a maximum round trip time δ to traverse a distance of interest (n) and back, measured in hops from the sending node. 7. The underlying routing protocol guarantees that SmartPackets always use the shortest route to their destination. Maintaining adequate link redundancy can satisfy assumption 3. Assumptions 4 and 5 are guaranteed by an underlying reliability protocol. In the following sections, we describe the different phases of the protocol.
2.1 Selection of Backup In the setup phase, the network administrator injects the server as a SmartPacket into the network. The SmartPacket routes itself to a destination chosen by the administrator, where it starts executing as the primary server. The primary server searches for the location of its backup by flooding its immediate neighborhood with ferret packets that contain a service-specific algorithm to determine the suitability of a neighboring node to host the backup server. Ferret packets apply the service-specific algorithm at every node it visits to evaluate if the node can host the backup server. Examples of the selection criterion are: 1. Distance from primary, e.g. node closest to the primary is chosen as backup. 2. Buffer space (memory) or processing power. 3. Special node functionality such as its function as a gateway. 4. Number of adjoining active nodes (i.e. degree of connectivity)
Active Node Bid Primary n-hop neighborhood
Ferret packet
Bid
Fig. 1. Primary sends ferret packets to solicit bids
Each ferret packet contains a replica of the server code so that a fully functional server can be created at the backup location, once it is identified. Flooding is controlled by restricting ferret packets to visit at most n nodes, marking nodes as VISITED and by carrying lists of visited nodes. Candidate backup servers instantiated at neighboring nodes send in bids to the primary (see Fig. 1). The primary collects bids using two “small state” locations BidStatus and BidDrop, which are named caches at the active nodes accessible to SmartPackets belonging to the same service
302
Amit Kulkarni et al.
application. Bid packets deposit bids in BidDrop if BidStatus has value OPEN and return to the originating node to indicate a BidConfim status to the candidate backup servers. Bid packets arriving after the bidding closes return with a BIDCLOSED status. If the BidStatus location is not available, the Bid packet returns with a NOBIDLOCATION status indicating primary failure. After a timeout interval δ, the primary makes its selection from the available bids and sends a Confirm packet to the selected backup location. Reject packets are sent to all candidate backup servers which terminate upon receipt of the packet. If the primary fails during the selection process, candidate backup servers receive a NOBIDLOCATION status. Candidate backup servers elect a primary from amongst themselves by implementing a bid-flooding protocol in which each backup server floods its bid message in its n-hop neighborhood. Each bid message deposits its bid at the location of a candidate backup server. After a time interval δ, each backup server checks the bids it has received. If its own bid is not the best bid, it terminates itself. Thus after time δ, the backup server with the best bid becomes the primary. There can be multiple primaries in the system at this point but they are separated by at least n hops and therefore serve different (and maybe overlapping) domains.
2.2 Normal Operation Since clients do not know a priori the location of the primary and backup servers, the primary server advertises its presence using beacon packets broadcast periodically with an interval τ. Beacon packets store information about the location of the primary server, the distance in hops to the server, the location of the backup server and the time when the information will expire, in the cache of the active nodes they visit. Beacon packets are restricted to visit at most n nodes. A client request is a SmartPacket that routes itself through the active network seeking information about the primary server. When it reaches an active node that has valid information in its cache, it obtains the location of the primary server and routes itself to that location. If the beacon information has expired, the request packet assumes that the primary has failed and routes itself to the location of the backup. The primary server also sends state update packets to the backup when new requests arrive and when pending requests are satisfied. The state update packets return to the primary with a SUCCESS status if they are able to update the backup’s state. Beacon packets are programmed to check the status of the backup (which lies within n hops from the primary) and return with its status if they reach the backup’s location. The backup uses the beacon packets as indication that the primary is functional. The backup sends a PrimaryTest packet to the primary if it does not receive a packet from the primary for time τ. The PrimaryTest packet returns to the backup with a FAILED status if the primary has failed. The backup then takes over as the new primary. The time to recover from failure is thus not more than τ + δ. Similarly, if the primary receives a FAILED status from a packet it sent, it begins selection of a new backup in time not exceeding τ + δ. When failover occurs, the backup sends out ferret packets as in the setup phase to select its own backup. It then starts sending beacon packets to the neighboring nodes in its n-hop neighborhood. The beacon packets overwrite cache information at the nodes, enabling future client requests to be routed to the new primary.
Survivability of Active Networking Services
303
2.3 Response Times The failover time for this protocol is τ + δ and the time taken by a primary to detect backup failure is τ + δ, as described in section 2.2. The reconfiguration time for the system is the time taken by a primary to select and appoint a backup server. This is the sum of the time taken by the ferret packets to search and locate suitable backup locations, and the time taken by the Confirm message to reach the selected backup. Since the maximum round-trip time over n hops is δ, the maximum reconfiguration time is δ+δ=2δ. If the primary fails during the backup selection process, the existing candidate backups detect failure after an interval δ when they send PrimaryTest packets, which take a maximum time δ to return with a FAILED status. The bid flooding protocol takes a maximum time δ to determine a new primary, which gives the maximum total response time as δ+δ+δ=3δ. The response times for the various scenarios are summarized in Table 1. Table 1. Response times
Event Maximum time taken by backup to become primary after primary failure Maximum time taken by primary to detect backup failure Maximum time taken by primary to select and appoint backup Maximum time taken to elect new primary if primary fails during backup selection process
3
Response time τ+δ τ+δ 2δ 3δ
Proofs of Service Properties
Since active networks permit deployment of user-supplied protocols and services, it is necessary to make some formal statements about their properties. In this section, we identify a few properties possessed by this protocol and attempt to prove their validity. The next step is to transform these properties into a formal description to enable mechanized checking, which is beyond the scope of this paper. Property 1: The primary chooses exactly one backup after the selection process. Proof: Bid packets from candidate backups arriving after the bidding closes return with a BidClosed status, causing those backup candidates to self-terminate. The primary makes its selection from the available bids and sends a Confirm packet to only one candidate backup. All candidates receiving the Reject packet terminate. Property 2: At most one server acts as the primary in its n-hop neighborhood. Proof: This implies that two servers within n hops of each other cannot both be primary servers. During normal operation, the backup does not change its status until a PrimaryTest packet sent to the primary returns with a FAILED status. If the primary dies during backup selection process, it is possible for two primaries to exist. However, the bid-flooding protocol ensures that only one primary exists in its n-hop neighborhood. Property 3: The protocol is deadlock-free. Proof: A deadlock occurs if the primary and the backup are both waiting for messages from the other. This cannot happen because packets sent by one to the other always
304
Amit Kulkarni et al.
return with a valid status. The backup detects failure and takes over if there is a lack of beacon message after time τ and a PrimaryTest message returns after a time not more than δ with a FAILED status. The primary detects failure and selects a new backup if the latest state update or beacon message returns with a FAILED status. Property 4: A client request always attempts to locate an active primary server. Proof: When a client request reaches an active node containing beacon information, it retrieves information about the location of the primary server and tests its validity. We consider the proof on a case-by-case basis: Case I: The information is valid but the primary has failed since last update. The request traversing towards the primary realizes the primary has failed either because there is no route to the primary, or the cache set up to deposit the request at the primary location does not exist. The request then retrieves the location of the backup server from the beacon cache and re-directs itself towards the backup. Case II: The information is invalid and the primary has failed. If the information is invalid, the request retrieves the location of the backup server from the beacon cache and travels towards the backup. Case III: The request reaches an active node that lies in the overlap of neighborhoods of two primary servers.
Primary A
Node 1 Primary B
Node 2
Fig. 3. Overlapping Domains
This can occur if the primary fails before selecting a backup and there are two new primaries that are more than n hops apart but share some active nodes (see Fig. 3). Assuming that primary A deposits its beacon information at the node 1 and keeps it valid, node 1 will be part of that primary A’s neighborhood. Similarly node 2 can be part of Primary B’s neighborhood. The request flip-flops between the two servers as it travels from node 1 to node 2 until the routing takes it out of the region and it eventually makes progress towards only one of the two servers. Another scenario is when the routing is such that the next hop from node 1 towards primary A is node 2 and vice-versa. The request will then bounce between the two nodes infinitely. This is impossible because if one of the nodes, say node 1, is closer to primary A than primary B, then it will have been captured by primary A. If node 2 is on the route from node 1 to primary A then that implies node 2 is closer to primary A than primary B and yet primary B captures node 2. This contradicts assumption (6), which guarantees that only shortest routes are followed.
Survivability of Active Networking Services
4
305
Summary
In this paper, we presented an architecture for ensuring survivability of active networking services that reside inside the network. Our goal was to demonstrate that traditional survivability protocols such as the primary-backup protocol could be extended in an active network with features such as automatic setup of services, programmable selection of backup location and support for dynamic re-configuration in the event of system failure. We modified the protocol to support the above features through the use of code replication and distribution, and through the use of ferret packets. Additionally, we also proved some interesting properties of the service to serve as a basis for verifying the algorithm in a formal specification model. Correctness of the specification can then be checked mechanically to ensure the correctness of the services deployed in active networks.
References 1. Legedza, U., Wetherall, D., and Guttag, J.: “Improving the performance of distributed applications using active networks,” Proc. Of IEEE INFOCOM, San Francisco, April 1998. 2. Yeadon, N.:Quality of Service for Multimedia Communications, Ph.D. Thesis, Lancaster Univ., May 1996. 3. Bakin, D., Marcus, W., McAuley, A., and Raleigh, T.: “An FEC booster for UDP Applications over terrestrial and satellite wireless networks,” Intl. Satellite Mobile Conference, Pasadena, CA, June 19, 1997. 4. Kulkarni, A. and Minden, G.: “Active networking services for wired/wireless networks,” Proc. IEEE INFOCOM, New York, 1999. 5. Garcia-Luna-Aceves, J., and Murthy, S.: “A loop-free path-finding algorithm: specification, verification and complexity,” Proc. Of IEEE INFOCOM, Boston, MA, 1995. 6. Garcia-Luna-Aceves, J.: “A fail-safe routing algorithm for multihop packet radio networks,” Proc. IEEE INFOCOM, Miami, Florida, 1986. 7. Frost, V., Minden, G., Evans, J. and Niehaus, D.: MAGIC-II – A large scale internetwork supporting high speed distributed storage, http://www.ukans.magic.net. 8. Leclerc, Y., and Reddy, M.: “TerraVision II: using VRML to browse the world,” in Data Visualization, St. Louis, MO, October 1997. 9. Sekar, R.: Survivable Active Networks, Telcordia Technologies, Iowa State University and AT&T Research, URL http://rcssgi.cs.iastate.edu/seclab/projects/survivable.htm. 10. Yemini, Y., Konstantinou, A., and Florissi, D.: NESTOR: Network Self Management and Organization, http://www.cs.columbia.edu/dcc/nestor. 11. Mullender, S.: Distributed Systems, 2nd Edition, Addison Wesley, 1993.
306
5
Amit Kulkarni et al.
Appendix: Program Pseudo-Code proc ProxyServer while (IsPrmy == false) { (S:State, IsPrmy:boolean); try { location: Address; waitfor(stateUpdateMsg or backupServ: Address; beacon); BidStatus: SmallState; } catch(TimeoutException){ BidDrop: SmallState; send PrimaryTest packet; routeTo(location); if (status == FAILED) { CodeForBackup: IsPrmy = true;break;} if (IsPrmy == false) { } } prepare Bid; CodeForPrimary: send(Bid,SourceAddress); waitfor(Bid); send(ferretPkt, *); if (status == BIDCONFIRM) { while (BidStatus == OPEN) { try { // backup has not responded; waitfor(Confirm or Reject); for all interfaces { if (Reject) exit; send(ferretPkt, *);} } catch (TimeoutException){ waitfor(TimeOutInterval); //send PrimaryTest packet if (BidDrop != null) { //check if primary is alive // there are available bids send PrimaryTest packet; BidStatus = FALSE; waitfor(PrimaryTest); // select best bid if (status == FAILED) { backupServ = analyzeBids(); for all interfaces { send Reject packets; send(confirmPkt,backupServ);} send(BidPkt, *); } wait(TimeOutInterval); while (true) { Winner = analyzeBids(); if (Winner != self) exit; send(beaconPkt, *); if request arrives { else IsPrmy = true; }} }elseif (status==BIDCLOSED) { S' = S + reqState; send stateUpdateMsg; exit; process request;} }elseif(status==NOBIDLOCATION){ waitfor(Msg); // elect new primary if (status == FAILED) { for all interfaces { identify new backup;} send(BidPkt, *); } wait(TimeOutInterval); endproc ProxyServer Winner = analyzeBids(); if (Winner != self) exit; else IsPrmy = true;} }
A Secure Plan Michael Hicks and Angelos D. Keromytis Distributed Systems Lab CIS Department, University of Pennsylvania 200 S. 33rd Str., Philadelphia, PA 19104, USA {mwh,angelos}@dsl.cis.upenn.edu
Abstract. Active Networks promise greater flexibility than current networks, but threaten safety and security by virtue of their programmability. In this paper, we describe the design and implementation of a security architecture for the active network PLANet [HMA+99]. Security is obtained with a two-level architecture that combines a functionally restricted packet language, PLAN [HKM+98], with an environment of general-purpose service routines governed by trust management [BFL96]. In particular, we employ a technique which expands or contracts a packet’s service environment based on its level of privilege, termed namespace-based security. As an application of our security architecture, we outline the design and implementation of an active-network firewall. We find that the addition of the firewall imposes an approximately 34% latency overhead and as little as a 6.7% space overhead to incoming packets.
1
Introduction
Active Networks offer the ability to program the network on a per-router, peruser, or even per-packet basis. Unfortunately, this added programmability compromises the security of the system by allowing a wider range of potential attacks. Any feasible Active Network architecture therefore requires strong security guarantees. We would like these guarantees to come at the lowest possible price to the flexibility, performance, and usability of the system. This paper presents the design and implementation of a security architecture for PLANet [HMA+99], an active internetwork based on PLAN, the Packet Language for Active Networks [HKM+98]. Our approach is to partition the problem into two levels: language-based security for PLAN programs, complemented by namespace-based security for more general router services, governed by trust management. We briefly discuss PLAN and its role in this architecture, but focus more attention on service security. We present both architecture and implementation, and conclude with some applications of our approach, including a simple firewall that ‘filters’ active packets. [HK99], an extended version of this paper, contains more detailed motivation and performance analysis.
This work was supported by DARPA under Contract #N66001-96-C-852, with additional support from the Intel Corporation.
Stefan Covaci (Ed.): IWAN’99, LNCS 1653, pp. 307–314, 1999. c Springer-Verlag Berlin Heidelberg 1999
308
Michael Hicks and Angelos D. Keromytis
protocol A
resource allocation
routing network management
core services service installation PLAN packet protocol B
Fig. 1. PLANet’s security architecture.
2
Architecture
Our security architecture is illustrated in Figure 1. The solid boxes define the two levels of the architecture: the contents of the central box define the PLAN level which is usable without need of credentials, while the remaining area forms the service level. This architecture falls along functional boundaries: all PLAN programs, by their nature, are safe (as defined below) and so may run unauthenticated, while, in general, service routines are unsafe, and must be partitioned by level of trust, visualized by the dotted boxes. We augment the PLAN level with a fixed set of ‘core services’ which are known to be functionally safe. This architecture is designed to guard against the standard threats to computational resources and their contents [AAKS99]. In particular, we defend against attacks that would deny service, seek to obtain unauthorized content, and misrepresent (spoof) identity. We explain PLAN’s role in defending against these attacks below. 2.1
PLAN
PLAN [HKM+98] is a small functional language with syntax similar to ML [Ler,MTH90]. To express remote computation, it includes a primitive OnRemote (among others) that evaluates an expression at a remote node. Invoking OnRemote will result in a newly spawned packet. By design, the language has properties that prevent some attacks. PLAN is resource- and expression-limited, thus preventing CPU and memory denial-ofservice attacks. For example, all PLAN programs are guaranteed to terminate1 , since PLAN does not provide a means to express non-fixed-length iteration or recursion. Additionally, PLAN programs are isolated from one another since there is no means of direct communication among them, and because the language’s 1
PLAN programs terminate as long as the services called also terminate.
A Secure Plan
309
strong typing and garbage collection prevent indirect means, such as through pointer swizzling or buffer overflows. Finally, a network resource bound counter, similar to IP’s “Time to Live” (TTL) field, is used to bound network resources.
3
Service Security via Trust Management
Because of their general-purpose nature, service routines may perform actions which, if exploited, could be used to mount an attack. A radical approach to this problem would be to prevent any service routine from being installed that could potentially harm the node. However, this would preclude the addition of service routines—for example, network management operations—that should be available to trusted users. We thus employ security mechanisms which allow authorized programs to access potentially unsafe service routines. 3.1
Trust Management
In determining the form of these security mechanisms, we arrived at some basic requirements. First, the mechanisms should be simple to understand and employ. Second, security policies should be modifiable as needed, while the system is operating. Furthermore, policy mechanisms should be flexible enough to anticipate future application needs. Finally, security mechanisms must scale to support increasing numbers of principals and their trust relations. To meet these requirements, our service security relies on trust management [BFL96,BFIK99]. Trust management assigns some level of privilege (or trust) to a user, or principal, of the system. In particular, if a running PLAN program wishes to invoke a privileged service routine or alter a service parameter, the principal associated with the packet must be authenticated, and then the operation must be authorized. If either step fails, the operation is denied. We consider the question of policy and mechanism for authorization below; details about our particular implementation of authentication and authorization are presented in the next section. 3.2
Policy and Mechanism
Before applying trust management, we must consider what sorts of policies we would like to express, and what particular mechanisms we shall use to enforce these policies. For our system, we want our policies to express what services, above the core services, are available to certain users. We also find it convenient to indicate which services should be unavailable for a particular user; this will be motivated in Section 5. For purposes of simplicity and scalability, we choose to map sets of principals to sets of services. We also need to manage delegation policies with regard to these mappings. For example, we might specify that the services in set s may be accessed not only by principal p, but also by those principals authorized by p. In keeping with our requirements, this policy should scale to include many nodes, principals, services; and be alterable on-the-fly.
310
Michael Hicks and Angelos D. Keromytis
Furthermore, we want to specify not only whether a service routine may be invoked, but how it may be used. For example, a resident state service which allows packets to leave state on the routers might apportion different amounts of space to different users. We should also be able to specify general resource usage parameters, such as CPU and memory use. To enforce security policy we require strong principal authentication, and use a policy manager on every node; more details are given in the next section. In our system, packets must authenticate themselves at some point before accessing privileged services; at this time, the appropriate services are added to (or subtracted from) the packet’s current service symbol table. We call this approach namespace-based security. Since PLAN is strongly typed and looks up services on an as-needed basis, programs are incapable of invoking code outside of this updated table. Additionally, we allow those services which may require policy-based parameterization to query the policy manager as necessary during their execution. For example, the resident state service mentioned above would query the local policy to determine how much memory the current principal was allowed to occupy. We feel there are some compelling advantages to this approach. First, namespace-based policies are simple to formulate and easy to change. Second, because namespace-based security is centrally-administered, individual service routines may be written without concern for security, and policies may change dynamically without worry of inconsistency. Furthermore, unauthenticated programs may access the core services without additional performance penalty. Finally, because namespace-based security is not by itself sufficient, we allow services to formulate their own usage policies. There is still some work to be done in our current system. Namespace-based security only applies to PLAN service routine calls, not calls between service routines. This is slightly more difficult, but entirely possible, since Caml, our service implementation language, provides a mechanism which may be used to implement namespace-based security: module thinning. The use of module thinning has been explored for active networks in [Ale98] and for mobile agent systems in [LOW98]. Also, while we have experimented with mechanisms for enforcing resource usage, we have yet to arrive at ones that are sufficiently lightweight. Relevant details may be found in [Hic98].
4 4.1
Implementation Authentication
Before a PLAN program may invoke a trusted service, its associated principal must be determined; this is the process of authentication. Authentication is typically done in a public-key setting by verifying a digital signature in the context of some communication (e.g., a packet). In PLAN, one obvious link between communication and authentication is the chunk. A chunk (or code hunk) may be thought of as a function that is waiting to be applied. In PLAN, chunks are first-class—they may be manipulated as
A Secure Plan
311
data—and consist internally of some PLAN code, a function name, and a list of values to be used as arguments during the application. A chunk is typically used as an argument to OnRemote to specify some code to evaluate remotely. A chunk may also be evaluated locally by passing it to the eval service, which resolves the function name with the current environment, performs the application, and returns the result. We have added a service called authEval which takes as arguments a chunk, a digital signature, and a public key. authEval verifies the signature against the binary representation of the chunk, and if successful, the chunk is evaluated. There are two key advantages to this approach. One is that a principal signs exactly the piece of code he wants to execute, and may only have extra privilege while executing that piece of code. Second, only those programs which require authorization will have the extra time and space overheads. However, there is no protection against replay attacks, and public key operations are notoriously slow. Furthermore, authentication is only unidirectional (principal to node), thus providing less confidence to the caller. We mitigate these problems by using a variant of the mutual authentication protocol described in [AAKS98]. 4.2
Authorization
As our policy manager, we have chosen to use the Query Certificate Manager (QCM) [GJ98], which provides comprehensive security credential location and retrieval services, employing a distributed ACL. While in this paper we are making use of QCM, our architecture is designed so that other policy managers be used instead. In particular, we are also experimenting with the KeyNote [BFIK99] trust-management system. QCM is used to specify the services to be added to or subtracted from the default service-environment by associating certain thicken and thin sets of services with a principal or set of principals. Once a principal has been authenticated, these sets are used to modify the default environment. The resulting service environment is then used during subsequent chunk evaluation. As an optimization, we can cache this environment for future reference, thus avoiding repeated invocations of QCM and reconstructions of the environment. A key advantage of using QCM is that it can be used for more than just specifying sets of principals on a per-node basis. In particular, sets described in a distributed manner impose no additional query complexity. For example, a node A may define a set which partially resides at another node B: l = { p1 , p2 , ... , pn } union B$m If the authorization service on A makes a membership test on set l, QCM will automatically query B if necessary. QCM may also make use of certificates, which are signed assertions about set relationships, to short-circuit remote queries. These may be passed as additional arguments to authEval, or may be obtained during node-node authentication. This allows QCM to implement both pushand pull-based information-retrieval.
312
5
Michael Hicks and Angelos D. Keromytis
A Simple Active Firewall
As an application of our architecture, we implemented a simple active firewall. Typically, firewalls filter certain types of packets, such as all TCP connection requests on certain port numbers. Usually such packets are easily identified by their protocol headers. However, in PLANet, and indeed in any active-packet system, there is no quick way to assess a packet’s functionality. Our approach is that rather than filter packets at the firewall, we associate with them a thinned service environment in which any potentially harmful services are removed. The packets may then be evaluated inside the trusted network using only those services. While this may seem to contradict our premise stated in Section 2 that the default environment should consist only of ‘safe’ services, in the context of a trusted Intranet, we would expect that the default privilege allowed to local packets exceeds that of foreign packets. Furthermore, we would not want to impose the overhead of authentication and authorization on local packets in the general case. To thin the environment of foreign packets, our firewall associates them with a guest identity that has the appropriate policy. To do this, the firewall encapsulates each packet with a small wrapper which calls authEval with the original chunk, using the guest identity. In general, this would require the firewall to sign all incoming packets. However, because the guest environment will provide less privilege than the default environment, we should be able to conceivably avoid the cryptographic cost: any authenticating principal whose environment is thinned and not thickened can be ‘taken at his word.’ In the base PLANet implementation, a two-hop ping takes 2.13 ms for a minimally-sized packet (80 bytes) and 3.06 ms for a maximally-sized one (1500 bytes). Changing the middle node to the ‘signing firewall’ adds 37% and 32% to the round-trip times, respectively, raising them to 2.91 and 4.03 ms. Between 1/3 and 1/2 of this overhead is attributable to signing and verification, depending on the packet size. For the firewall, the remaining overhead is due to encapsulation costs (which requires extra marshalling and copying), while for the end-host it is due to decapsulation and additional interpretation costs. Parallelism and specialpurpose hardware can further reduce cryptographic costs and improve latency and throughput. If we eliminate the cryptographic operations, we reduce the end-to-end ping times to 2.55 and 3.41 ms for minimal and maximal payload, respectively. This reduces the firewall-induced overhead to 20% and 11%. A smarter PLAN interpreter would also considerably improve overall performance. The firewall also imposes a fixed 101-byte space overhead due to the extra code and signature that is attached to the incoming packets. This translates to 126% and 6.8% space overhead for the minimal and maximal payload packets respectively. One way of mitigating this overhead is for PLAN to support code caching and language-level remote-references. Since all PLAN values are immutable, the contents of a remote reference may be safely cached without the need for a coherence protocol.
A Secure Plan
6
313
Related Work
Research in the area of security for active networks is in its early stages. The SANE [AAKS98] architecture is part of the SwitchWare Project [AAH+98] at the University of Pennsylvania. SANE is currently used in conjunction with the ALIEN architecture [Ale98]. Security is achieved in ALIEN through a combination of module thinning and type safety. Similar approaches have been taken in [LR99,BSP+95,vE99]. Other language-based protection schemes can be found in [BSP+95,CLFL94,HCC98,LOW98,Moo98]. The main difference between this work and SANE lies in that we can depend on a provably safe language (PLAN) for those packets that do not require special privileges. Furthermore, programming constructs available in PLAN (e.g., chunks), considerably ease the task of implementing security abstractions. A working group within the Active Networks project has been defining a common security meta-architecture [Mur98]. However, this architecture has not become concrete enough for implementation. Secure PLAN is currently being extended to support validation and verification [NL96,Nec97] for active extensions. We have demonstrated that our architecture addresses possible threats while still preserving the flexibility and usability of the system. This architecture is based on language safety, authentication, and trust management. We discussed the practicality and acceptable performance of our approach experimentally, in the context of an active firewall.
References AAH+98. D. S. Alexander, W. A. Arbaugh, M. Hicks, P. Kakkar, A. D. Keromytis, J. T. Moore, C. A. Gunter, S. M. Nettles, and J. M. Smith. The SwitchWare Active Network Architecture. IEEE Network Magazine, special issue on Active and Programmable Networks, 12(3):29–36, 1998. 313 AAKS98. D. S. Alexander, W. A. Arbaugh, A. D. Keromytis, and J. M. Smith. A Secure Active Network Environment Architecture: Realization in SwitchWare. IEEE Network Magazine, special issue on Active and Programmable Networks, 12(3):37– 45, 1998. 311, 313 AAKS99. D. S. Alexander, W. A. Arbaugh, A. D. Keromytis, and J. M. Smith. Security in Active Networks. In Secure Internet Programming [VJ99]. 308 Ale98. D. S. Alexander. ALIEN: A Generalized Computing Model of Active Networks. PhD thesis, University of Pennsylvania, September 1998. 310, 313 BFIK99. M. Blaze, J. Feigenbaum, J. Ioannidis, and A. Keromytis. The Role of Trust Management in Distributed Systems Security. In Secure Internet Programming [VJ99]. 309, 311 BFL96. M. Blaze, J. Feigenbaum, and J. Lacy. Decentralized Trust Management. In Proceedings of the 17th Symposium on Security and Privacy, pages 164–173. IEEE Computer Society Press, Los Alamitos, 1996. 307, 309 BSP+95. B. Bershad, S. Savage, P. Pardyak, E. G. Sirer, M. Fiuczynski, D. Becker, S. Eggers, and C. Chambers. Extensibility, Safety and Performance in the SPIN Operating System. In Proceedings of 15th Symposium on Operating Systems Principles, pages 267–284, December 1995. 313
314
Michael Hicks and Angelos D. Keromytis
CLFL94. J. S. Chase, H. M. Levy, M. J. Feeley, and E. D. Lazowska. Sharing and Protection in a Single-Address-Space Operating System. In ACM Transactions on Computer systems, November 1994. 313 GJ98. Carl A. Gunter and Trevor Jim. Policy-Directed Certificate Retrieval. http://www.cis.upenn.edu/~{}qcm, 1998. 311 HCC98. C. Hawblitzel, C. Chang, and G. Czajkowski. Implementing Multiple Protection Domains in Java. In Proceedings of the 1998 USENIX Annual Technical Conference, pages 259–270, June 1998. 313 Hic98. Michael Hicks. PLAN System Security. Technical Report MS-CIS-98-25, Department of Computer and Information Science, University of Pennsylvania, April 1998. 310 HK99. Michael Hicks and Angelos D. Keromytis. A Secure PLAN. Technical Report MS-CIS-99-14, Department of Computer and Information Science, University of Pennsylvania, May 1999. 307 HKM+98. Michael Hicks, Pankaj Kakkar, Jonathan T. Moore, Carl A. Gunter, and Scott Nettles. PLAN: A Packet Language for Active Networks. In Proceedings of the Third ACM SIGPLAN International Conference on Functional Programming Languages, pages 86–93. ACM, 1998. 307, 308 HMA+99. Michael Hicks, Jonathan T. Moore, D. Scott Alexander, Carl A. Gunter, and Scott Nettles. PLANet: An Active Internetwork. In Proceedings of the Eighteenth IEEE Computer and Communication Society INFOCOM Conference, pages 1124–1133. IEEE, 1999. 307 Ler. Xavier Leroy. The Caml Special Light System (Release 1.10). http://pauillac.inria.fr/ocaml. 308 LOW98. J. Y. Levy, J. K. Ousterhout, and B. B. Welch. The Safe-Tcl Security Model. In Proceedings of the 1998 USENIX Annual Technical Conference, pages 271–282, June 1998. 310, 313 LR99. X. Leroy and F. Rouaix. Security properties of typed applets. In Secure Internet Programming [VJ99]. 313 Moo98. J. Moore. Mobile Code Security Techniques. Technical Report MS-CIS-98-28, University of Pennsylvania, May 1998. 313 MTH90. Robin Milner, Mads Tofte, and Robert Harper. The Definition of Standard ML. The MIT Press, 1990. 308 Mur98. Security Architecture for Active Nets, June 1998. Draft available at http://www.ittc.ukans.edu/~{}ansecure/0079.html. 313 Nec97. George C. Necula. Proof-Carrying Code. In Proceedings of the 24th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 106–119. ACM Press, New York, January 1997. 313 NL96. George C. Necula and Peter Lee. Safe Kernel Extensions Without Run-Time Checking. In Second Symposium on Operating System Design and Implementation, pages 229–243. Usenix, Seattle, 1996. 313 vE99. T. von Eicken. J-Kernel a capability based operating system for Java. In Secure Internet Programming [VJ99]. 313 VJ99. Jan Vitek and Christian Jensen. Secure Internet Programming: Security Issues for Mobile and Distributed Objects. Lecture Notes in Computer Science. SpringerVerlag Inc., New York, NY, USA, 1999. 313, 314
Control on Demand Gísli Hjálmtsý son
1
and Samrat Bhattacharjee2
1
2
AT&T Labs – Research, 180 Park Avenue, Florham Park, NJ 07932 College of Computing, Georgia Institute of Technology, Atlanta, Georgia 303321
Abstract. Control on demand is a paradigm for network programmability at the network transport level. Previous work on active and programmable networking at this level either achieves flexibility by inserting significant software in the critical forwarding path, or achieves efficiency by sacrificing functionality, relegating programmability to control plane connection management. In contrast, control-on-demand takes the middle ground, acting both in the control plane and in the data plane, still without adding software in the critical forwarding path. Rather than applying essential programs to every datagram our approach is to apply the installed programs asynchronously from data forwarding. This way we avoid essential processing in the critical forwarding path, applying the (user) installed service logic for service enhancement only. By retaining the current forwarding model, control-ondemand is consistent with current trends in router architectures with increasingly optimize and hardware enhanced forwarding engines. Applying the service logic asynchronously barely impacts router performance and robustness, making control-on-demand viable in practice in the near future. The main contribution of this paper is the control-on-demand paradigm, and the interface between application (service/user) programs and the forwarding engine. User programs execute in an execution environment, and use this interface to program the facilities of the forwarding engine, and to access the data-path. We describe our prototype controlon-demand IPv6 router, and discuss abstractions and mechanisms we have developed to support control-on-demand, most notably featherweight flows. We discuss two applications we have experimented with to demonstrate the potential of asynchronous enhancement controls.
1
Introduction
The explosive growth in networking and computing has increased the need to introduce increasingly complex network services at accelerated rate. Whereas traditional networks were designed and customized for a single network service model, the rapid 1
Work done while at AT&T Labs – Research, Florham Park, NJ.
Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 315-331, 1999. Springer-Verlag Berlin Heidelberg 1999
316
Gísli Hjálmtsý son and Samrat Bhattacharjee
change in infrastructure technologies calls for a more adaptable service model optimized for flexibility. However, the high quality of the telephony network is partially due to the customization at all levels of the network. Active and programmable networking enable service specific network customization when (and where) needed, while retaining flexibility. The essential characteristic of programmable networks is that the network interface is widened to allow service (application) semantics to be provided to the network. Current networks offer only preinstalled service semantics for a single service model, accepting only data across the network interface. The interface of an active and programmable network accepts both data and the logic to interpret that data, thereby enabling service specific treatment of the data inside the network, effectively allowing the service model to be customized for each application. The time scale and granularity appropriate for introducing new service logic remains a research topic. In particular, practicality may dictate conservative change policies to enhance network stability and manageability. Similarly, the usefulness of very specialized service logic, applicable only to small set of flows may not sufficiently justify the added network complexity and operational risk. Moreover, the value of protocol updates on network elements is questionable, as the lifetime of network hardware has become a fraction of the lifetime of protocols. Even more important is the time scale that the installed programs operate on. Current research efforts on active and programmable networking range from coarse timescale service management [1] and software updates of network elements, to connectivity management on time-scale of private network provisioning or connection duration [2,3,4], to active involvement in the correct forwarding of every datagram [5,6,7]. Whereas the in-data-path approaches that insert essential software in the critical forwarding path threaten robustness and performance, pure control-plane programs operating only at provisioning and call setup time-scales in comparison offer very restricted functionality because of their inability to act in the data-path. Control-on-demand operates on a time-scale finer than call control, but larger than per packet time scales. The installed service specific programs act on flows, rather than individual packets, and control the router facilities to adaptively optimize their use to maximize flow utility. To this end the control programs exploit local information, forwarding statistics and service semantics. In addition the control programs can opportunistically peek into the data-path, either by subscribing to parts of each data packet, or by peeking at parts of the packets in the flows queue at any given time. This way the programmable model of control-on-demand is significantly richer than that of strict control plane programming, without reducing router performance or robustness. In this paper we describe control-on-demand and how its service model provides sufficient richness to solve interesting problems previously solved only by acting fully in the data path. Yet it is sufficiently restricted and efficient to be viable for practical use in the near term. We give two examples of applications showing the potential of asynchronous enhancement control: the first selective discard as congestion adaptation of a video stream, the second adaptive smoothing of media streams. The first example illustrates how asynchronous application of the control programs achieves
Control on Demand
317
the benefits of in-data-path processing. The second shows how judicious separation of the smoothing work into two time-scales allows the performance critical work to be delegated to the forwarding engine, while executing the (service specific) smoothing policy asynchronously from and at a larger time scale than data forwarding. The rest of the paper is organized as follows. In Section 2 we motivate the service model of control-on-demand and contrast it to other active and programmable networking service models. Section 3 discusses related work. We present the nodal architecture in 4, and discuss our IPv6 prototype router in Section 5. In Section 6 we then discuss the architecture and particularly the programmable interface in more detail. Section 7 addresses security. In Section 8 we discuss three supporting mechanisms that we have developed as part of this work. In Section 9 we discuss applications of control-on-demand. We then conclude.
2
The Programmable Model
The programmable model of control on demand is motivated to enrich the network service model while exploiting fast-path (hardware) optimizations. In particular our goal is to avoid perturbing the critical forwarding path, to ensure that those (current) services for which current network models are satisfactory remain unaffected. In contrast to approaches where each node is either programmable or not [6], or where services are either active or passive [5], control-on-demand allows services to exhibit degrees of activity, ranging from needing only basic forwarding to fully acting in the data path. This way a service can balance the "activity" cost against the potential improvement in service utility. In particular some potentially important applications of programmable networking, including advanced group management in VPNs and floor control in teleconferencing, require only control plane programmability. Clearly existing store-execute-and-forward models can support these type of applications. In so doing, however, they send all datagrams of such flows through slow-path processing. In contrast control-on-demand supports such control plane programmability without any impact on forwarding performance. Similarly, the enhancement semantics of control-on-demand remain valid across the network hierarchy. In larger networks the ratio of processing to bandwidth is different at different parts of the topology. Currently backbone routers have ample bandwidth but limited per packet processing resources, whereas closer to the network edge this ratio is reversed. Using control-on-demand a service specific control policy may act aggressively in the data-path and the control plane close to the edge, but act only in control plane on backbone routers. Our approach is consistent with current trends in forwarding technologies, where switches and increasingly high performance routers implement forwarding in hardware. Adding additional software in the critical forwarding path goes against this trend. In particular, essential software in the forwarding path has significant negative impact on forwarding performance, threatens robustness and interoperability. Beyond
318
Gísli Hjálmtsý son and Samrat Bhattacharjee
forwarding facilities, important hardware facilities including multiple queues and advanced scheduling support are increasingly available in network nodes. To ever achieve viable performance, the model must support programmability while allowing services to exploit these hardware facilities. The programmable model of control-on-demand is inherently efficient. Most other work on active networking change the existing store and forward model, changing it to store, execute and then forward. In contrast control-on-demand leaves the store and forward model unchanged. The dynamically installed service specific controllers execute asynchronously from data forwarding. In particular, a data packet may be forwarded before the controller gets a chance to run and peek at the packet. This separation is similar to the separation of control plane and data plane in connection oriented networks and in line with other work on open signaling and programmable ATM control [2,3,4,8]. Control-on-demand can be viewed as an evolution and generalization of the control plane. Control-on-demand however goes further than prior work on control plane technologies by allowing the service specific controllers to act (while asynchronously) on the data being forwarded. The separation into two threads of execution is ideally suited to exploit hardware facilities. For example in stream smoothing, there are two natural time-scales, a fine scale for scheduling individual packets to a target rate, and a coarser one for setting the target. Control-on-demand can exploit these different time-scales, by having the installed program logic set the target, delegating the fine scale scheduling to the underlying (hardware) scheduler. Whereas there is potential value in having the installed program acting on each and every datagram, many applications proposed for active networking lend themselves nicely to such separation. A most essential characteristic of IP and key to its success is the softness of state inside the network, and the nature of this soft state. Apart from the routing database, (cached) state, most notably the forwarding cache, is used purely for performance enhancement but is not essential for correctly delivering packets to destination. In particular this state can be lost, or removed at the routers discretion without affecting the validity of state elsewhere in the network. Control-on-demand retains this enhancement characteristic, as the installed program logic is not essential for correct forwarding. Instead, the installed program logic is executed asynchronously to forwarding in an effort to enhance service quality. Consequently the installed program could be applied to the data stream on a best effort basis. In particular the router may refrain from executing the installed program during overload or remove it altogether at its discretion. Of course the more predictable the CPU scheduling the better the enhancement.
3
Related Work
While the number of projects on active and programmable networking is rapidly growing, this work mostly draws on the active networking work in IP described in [5,6,7], and the work on control plane programmability in ATM networks in [2,3,4].
Control on Demand
319
In [5] Tennenhouse and Wetherall propose an active networking model carrying typed packets, capsules, containing the data and the necessary code for correct data processing, and forwarding at every node. An appeal of this model is that every capsule carries the code essential for its own correct processing and delivery. Capsules thus retain the property of IP datagrams that each capsule is independent of other capsules. For the most part however, active networking is about establishing, maintaining and sharing service specific state inside the network over multiple datagrams. Control-on-demand in contrast retains the enhancement nature of Internet state generalizing the soft enhancement state to include a program. In ANTS [9] the capsule is optimized to contain only a request for a program to be downloaded and installed, with the first packet simply waiting the completion of this installation before it is processed and forwarded. This procedure amounts to a flow setup, with substantial service provisioning and thus significant flow setup time (the datagram triggering the installation implicitly signaling a flow setup). Control-on-demand similarly uses implicit “signaling” to trigger the program installation, but using more efficient signaling mechanisms. In [4] Van der Merwe and Lesley describe an architecture for programmable ATM networks. The X-bind effort at Columbia [2,3] has explored similar issues, developing a programmable control architecture for connection management in ATM. In contrast to control-on-demand both of these are strictly in the control plane and thus not applicable to problems requiring interaction with the data path. The interfaces and flow level management capabilities in control-on-demand draws on both of these projects. Both the X-bind and Cambridge switchlets, however, assume a significant amount of supporting infrastructure, whereas the only extra-nodal infrastructure needed for control-on-demand is one for retrieving control programs when needed. We simply use web technology, URL’s and a simple HTTP daemon, for this. This work builds on and extends [1] describing an architecture for an active approach to network and service management. It also builds on prior work at AT&T labs, on introducing code into a running C++ program [10]. Other related work, includes the PLAN work on language and security aspects of programmable networking [6] the Darwin project at CMU on application aware networking [11], and CANE project at Georgia Institute of Technology adapting telephony’s advanced intelligent networking (AIN) [12] to packet networks by offering a menu of software functions one of which is for forwarding [6]. The authors of [13] share many of our concerns with prior work on active networking. Their architecture for accessing and installing dynamic code could be used with control-on-demand. Currently we simply use HTTP with the router running an access policy for scrutinizing code servers.
320
4
Gísli Hjálmtsý son and Samrat Bhattacharjee
The Nodal Architecture
The nodal architecture for control-on-demand is Control depicted in Figure 1. The essential characteristic Meta of this architecture is a strong separation between on Control the service generic forwarding engine and the demand user installed control programs executing in an execution environment. In particular the dataInterface path does not go through the execution environForwarding engine ment. The architecture supports multiple execution environments, each managed by a metaDATA PATH controller (Figure 1). The meta-controller is responsible for installing and maintaining the dy- Figure 1 : The Nodal Architecnamically installed control programs. Assuming ture for Control-on-demand. that the code is larger than will fit in one protocol An opaque interface separates the unit (packet/frame), installation means locating, control part from the forwarding authenticating, downloading, verifying, (possibly engine. Note that the data-path compiling) and running the code implementing does not go through the controlthe requested control policy. Control programs ler. are (autonomous) typed objects. The manager maintains a cache of available programs, to avoid the downloading and installation of frequently requested ones. Once installed and associated to the requesting flow, the forward engine and the flow specific control interact directly. A control policy may be assigned and reassigned at any time. Ensuring separation in processing of service specific programs requires mechanisms not found in common operating systems. Such mechanisms are however provided for example in the Nemesis operating system [14]. Although the architecture supports arbitrary execution environments, in our prototyping the execution environment is simply the native code of the nodal hardware. Although the type of execution environment and its properties may play a significant role in various aspects of programmable networking in general, and on safety and security in particular, we see these as complementary to the architectural issues that is the focus of our work. Candidate execution environments could be based on the virtual machine of Java. The forwarding engine performs basic and multicast forwarding. In addition it may have some quality of service enhancing facilities increasingly found in modern routers, such as multiple queues and scheduling capabilities. The capabilities of individual network nodes will vary. One of the difficult parts of this work is to provide an interface to these facilities abstract enough to hide differences in implementation, still rich enough to efficiently exploit them. In particular, the interface abstractions defined herein apply to both IP routers and ATM switches (assuming though frame-based forwarding facilities for an ATM switch), to the level that a single flow may be forwarded across both ATM and IP platforms.
Control on Demand
321
4.1 The Control Semantics Control-on-demand is flow oriented. In a connection oriented network like ATM this simply means that the control programs are assigned to (multicast) connections. In IPv6 we exploit the flow label. For IPv4 networks we use a filter definition and a flow classifier to group multiple packets into flows [RSVP ref]. There are several motivating points to take a flow oriented approach. One is to amortize the “investment” of installing the on-demand controller. Another one is that many important applications of programmable networking only apply at flow or connectivity level, for example, group management or floor control in a multicast. To further reduce the on-line (real-time) demands the control programs are applied asynchronously to the data path. Since the control is for enhancement only it is not essential that it be applied to every packet. Most applications designed for the Internet gracefully adapt to changing network conditions, and rely only on the end-systems for correctness. Thus enhancement services inside the network are never essential, however, but increase in utility when applied more consistently. With these semantics the on-demand control is applied asynchronously to data forwarding, even when fine grained job scheduling (CPU) is available. Since the only essential work performed at the network nodes is forwarding, control-on-demand is inherently efficient; during overload the node may invoke only those programs that reduce the congestion. Therefore, a control-on-demand node has at least the same throughput as a bare forwarding engine.
5
Prototype Implementation in IPv6
We have prototyped the control-on-demand architecture, with the objective to further our understanding of the architectural issues and verify the viability of the paradigm in general and the interfaces in particular. Our implementation consists of three parts, mechanisms to enable communication between applications and controllers, implementing a control-on-demand node, and service specific program prototypes for number of applications. The emphasis of this prototype has been on polishing the abstractions and interfaces and to verify their usefulness (functional verification) by applying them to number of important problems. Our prototype is implemented using a Linux IPv6 router, running kernel version 2.1.43.
5.1 The control-on-demand Router We use the flow label of IPv6 to explicitly identify a flow and interpret it to signal a request for a special treatment. For every labeled flow we maintain state and maintain a separate queue. The state at least caches the outgoing port(s), but in general consists of the attributes assigned to the flow. For our purposes this state includes a controller reference, queue reference, time of flow initialization, and some flow statistics. Figure 2 shows how a flow transitions through four states during its life-span. A datagram arriving with a currently unknown flow identifier is identified as the begin-
322
Gísli Hjálmtsý son and Samrat Bhattacharjee
ning of a new flow at the current node. A new flow state is new flow created, and its state set to initialize. If the datagram conNULL tains the cc-extension a copy of the datagram is forwarded Request to the meta-controller, after the original datagram is routed end-of-flow and forwarded according to the IPv6 routing tables. The Initialize outgoing interface is recorded and cached as part of the Success flow's state. Subsequent datagrams of the flow are routed Active based on its flow identifier, effectively pinning the routes Failure for labeled flow. If the datagram does not contain a ccextension no other flow processing is requested and the Ignore state transits to ignore. In that case subsequent datagrams are simply forwarded. Figure 2: State transiFor flows requesting control-on-demand the environtions in controller ment manager, running in user space, investigates the ex“life” tension header, determining the controller requested. In particular the meta-controller consults its cache of controllers to see if the requested controller is already locally available. If not, using the code reference the metacontroller retrieves and installs it. On successful installation of the on-demand control, the flow state becomes active, indicating to the router that the flow controller is ready to act on arriving datagrams. In addition the controller reference in the state is updated. If the installation fails, the flow state is set to ignore. (In principle controlled by the default policy). The meta-controller may change the state from active to ignore at any time, for example if a controller fails for some reason. While the flow controller is being installed, flow datagrams are simply forwarded. In addition, some general (per flow) statistics are complied. The flow specific controller gains access to this information. Upon flow termination, determined either by inactivity or via an explicit notification from the flow controller, the meta-controller removes the flow state and reclaims resources allocated to the flow. Our prototype implements the essentials of the interfaces described below. In particular the subscribe/publish interface supporting the frame peeking is fully prototyped. The perturbation to the fast path for packets whose flow label is not set is two single word equality tests, but an additional test is needed on the IPv6 input processor to decide the state of a known flow. We use hop-by-hop extension headers to implement the message interface.
6
Details of the Architecture
A key issue in this work is defining the interfaces and primitives necessary and sufficient to support the control-on-demand paradigm. There are four major parts to the interface: the meta-control interface, a message exchange interface, facility access interface, and a subscribe/publish interface. The interface between individual controllers and the forward engine consists of all but the meta-control interfaces.
Control on Demand
6.1 The Facility Access Interface
Meta-controller actions: Assign(flow identifier, controller reference) Notifications: topology-change<set of inputs, set of outputs>
323
The facility access interface Actions on flows (implicit argument: flow identifier): provides access to the re- i) Reservations sources of the forwarding enreserve-buffer(packets, bytes) reserve-bandwidth(bandwidth, set of ports) gine. The interface is used by set-schedule(ordered list of {byte number, rate} pairs) the meta-controller to assign set-attribute(list of {attribute, value} pairs) controllers to flows, and by the ii) Forwarding control flow controllers to manipulate iblock(subset of input ports) : blocks input on the their data flow. The only subset of ports specified. oblock(subset of output ports) : blocks output on the primitive used by the metasubset of ports specified. controller assigns a controller delay(D-time, subset of output ports) : schedules to a flow. Controller assignarriving packets at least D-time units after arrival. ment may change dynamically. In particular, the meta- Actions on packets (implicit argument: packet reference): release-at(time, subset of output ports) : schedules controller may assign a null packet for departure on a set of output ports. controller to a flow at its disblock(subset of output ports) : blocks packet on a set of output ports specified. cretion. The facility access discard() : discards the packet. interface may reflect specific capabilities of the forwarding engine, such as scheduling Figure 3: The Facility Access Primitives capabilities, but hides the particular forwarding technology (e.g., ATM vs. MPLS or IP forwarding). The flow controllers however do learn the local flow topology. This is given as a set of ports participating in the flow. (We use a bit-vector implementation for the set of ports. Another implementation might instead use a list of port references.) We assume that the forwarding engine does basic group management (e.g., adding a leaf). In addition the flow controller is notified of topology changes. This enables the controller to exert flow level forwarding control (connectivity management) as if it were only in the control plane, by specifying properties on a subset of the ports. The controller may block the flow for input on a subset of ports, causing all packets on those ports to be discarded. Similarly the controller can block the flow on a set of output ports, removing those ports from the forwarding topology. These mechanisms allow the controller to do group management and to exercise floor control. In contrast to an in-data-path solution these primitives support flow level “connectivity” management without being in the data-path. Since the controller is activated asynchronously to the forwarding there is a window of opportunity, namely from the time the packet arrives until it is forwarded, within which the control must be run in order to see the packet. The controller may impose a fixed packet delay to increase the size of this window and thus relaxing realtime constraints on controller scheduling. This reduces the overhead of context switching, by making the controller work on multiple packets each time. Other primitives acting at flow level are primitives for resource reservations, and lower level policy selection. The controller may reserve buffers, specifying both maximum number of bytes and maximum number of packets. The latter allows the controller to efficiently limit the total number of packets buffered at the node, for ex-
324
Gísli Hjálmtsý son and Samrat Bhattacharjee
ample for active retransmis- send-to-meta-control(flow, type, data) type (one of): activate, inform, control sion. The interface has a data : type dependent : primitive to reserve bandwidth activate: policy type, code (reference), data on a set of ports. The controller inform: list of attributes. may interact with the underlycontrol: list of {attribute, value} pairs. ing scheduler by providing it send-flood(flow, set of output ports, data) with a list of start byte, rate send-next(flow, set of output ports, data) pairs, {bi,ri}. The semantics of this schedule is that after bi Figure 4 : The Message Exchange Primitives bytes, packets are forwarded at rate ri. If the controller is not activated for a long time the rate simply remains unchanged. This supports flow level smoothing while minimizing the coupling between the controller and the forwarding, and allows us to do smoothing on a best effort basis. The last primitive, set-attribute supports assignment to named attributes. It is used for scheduling property selection, queue priority and more. Until there is a strong convergence in reservation models the reservation interface remains under revision. The controllers may decide the fate of individual packets (see the subscribe/publish interface below). The release-at primitive supports scheduling of a packet for release on a subset of output ports. The primary use of this primitive is to schedule a retransmission of a packet (currently in the buffer) to react to downstream losses, but also allows the controller to explicitly schedule a particular packet. A packet may be blocked on a set of output ports, enabling the controller to filter a stream. Lastly, a packet may be discarded.
6.2 The Message Exchange Interface The message exchange interface provides a mechanism for asynchronous communication between controllers. All application (service) specific signaling is performed using this message exchange. Since these messages are arbitrary in size and content and may in particular include program code, it is important that these controller-tocontroller messages are not sent on a shared signaling channel. The message exchange interface supports exchanges between application(s) and meta-controllers and exchanges between flow specific controllers (service specific signaling). The interface has three primitives: a) send-to-meta-control, b) send-flood, and c) send-next. Whereas the first is the primitive used by applications to interact with metacontrollers, the second and third are used by the application and the flow specific controllers to perform application level signaling. All of the primitives take a flow identifier as an argument. In addition, a metacontrol message sent using the first primitive, has two parameters, a request type, that is one of activate, inform, or control and request data which is request type specific. A meta-control message is sent to all meta-controllers of the flow unchanged (i.e., intermediate controllers cannot change it). The primary meta-control message is an activate message used by applications to install and activate a control policy. The request data for an activate message contains three mandatory parameters: a flow identifier (which may be provided implicitly), a policy type name, and a policy implementation (reference), optionally followed by arbitrary policy specific parameters. The
Control on Demand
325
policy implementation parameter either contains the code, or is a globally valid network reference, a URL for example, from where the policy implementation may be retrieved. The policy specific parameters are provided to the flow controller on execution. If an activate message specifies a flow identifier already associated with the same control policy, the meta-controller performs a “refresh,” reinstalling a new version of the policy implementation. For an inform message, the data contains a flow identifier, followed by a list of attributes whose values are returned. If the attribute list is empty a list of all attributes defined for the particular flow is returned. Similarly the control message contains a flow identifier, followed by a list of attribute value pairs. The send-flood and send-next primitives are used for service specific (signaling) messages between installed controllers, and for (signaling) exchanges between metacontrollers. The meta-control messages are distinguished from the others by setting the flow identifier to zero. Both primitives take two additional parameters, a set of output ports, and service specific data. The message is output on the ports specified, and are either “flooded” in the case of send-flood or sent to the “next” flow specific controller(s) only. The layout of the service specific data is at the service/application’s discretion and is not specified. The data layout tor the meta-control exhanges is analogous to that of send-to-meta-control, containing a flow identifier, request type and request specific data. The only request type currently defined is migrate, taking as data a wrapped object (see the meta-control interface, Section 6.4).
6.3 The Subscribe / Publish Interface The subscribe/publish interface subscribe-stats(flow subscribe-stats(flowidentifier) identifier) allows flow specific controllers subscribe-peek(flow subscribe-peek(flowidentifier, identifier,offset, offset,length) length) offset to subscribe to (request) events offset - -offset offsetpeeking peekingbegins begins length length - -number numberofofbytes bytestotopeek peekatat(0(0indicates indicatesall) all) and information published (on subscribe-ignore(flow indentifier) subscribe-ignore(flow indentifier) request) by the forwarding engine. The three primitives of Figure 5 : The Subscribe/Publish Primitives this interface are shown in Figure 5. The controller may subscribe to simple flow statistics, such as number of packets and bytes transmitted since last invocation, or the number of bytes (packets) currently in the queue. If the flow identifier is set to 0, the controller receives nodal statistics about queue length and packet loss rate. The second primitive, subscribepeek, implements frame peeking, allowing the controller to subscribe to receive (peek at) a portion of the packet payload. Subscribe-peek does not cancel subscription to statistics. The last primitive cancels all subscriptions. The controller gains access to its subscriptions when it activated. A published peek event contains a packet reference, which in turn may be used by the controller to manipulate the packet through the interface. One of the benefits of this interface is that the flow controller may dynamically change the volume of data that goes through it. For example, a flow controller that during congestion performs a selective discard, can subscribe only to the flow statistics until a queue builds up at which time it starts peeking at the application level framing, to selectively discard the less important packages. Even during congestion the volume going through the controller is minimal. This way the controller can manipulate the data-flow without being in the data-path.
326
Gísli Hjálmtsý son and Samrat Bhattacharjee
6.4 The Meta-Control Interface
The controller has a meta- create(data) : create a controller. control interface, that is used clone() : clone a controller. Clone restarts at run(). by the meta-controller for delete() : destroy a controller. management of the dynami- run() : Execute the flow controller. Is the only entry point cally installed control policies. wrap() : wraps the controller (and its state). Returns a wrapped controller (byte string). Every controller must imple- unwrap(a wrap) : reinstantiates and runs a controller. ment this interface. go-next(set of output ports) : migrates one hop. The interface primitives in- go-flood(set of output ports) : migrates to all. clude operations to create a flow controller clone an exist- Figure 6 : The Meta-control Interface Primitives ing controller, and destroy a flow controller. The create primitive has one (untyped) argument for initialization data. This is the controller specific data provided in the activate message to the metacontroller. A clone is an exact replica of the cloned controller (as in fork) including its state, but is executed using the run method. The delete destroys a controller and reclaims resources allocated to it. A controller may be wrapped (for shipping or storage), and later (re)created (unwrapped). The execution of a new flow controller always starts by executing the method run. A flow controller may migrate, either jump one hop (go-next), or be “flooded” (goflood) to all nodes in the flow downstream of the specified set of output ports.
7
Security
Although security is not part of our research, the architecture does have some helpful security implications. As the installed programs are executed for enhancement only, a router may choose not to execute a particular program. Running programs may be interrupted. Thus just like in shared computing environments, local program termination is not essential for system correctness. As the forwarding model is not changed global termination (correct delivery) is unaffected. Before installing code the metacontroller consults a nodal security policy manager. In our prototype the policy simply limits the set of servers from where it will fetch programs. Frame peeking gives the service specific programs read access through copying a portion of the datagram. The per packet frame peeking does not lock a packet from being forwarded or discarded by the forwarding engine. Write access is limited to the packet payload only. Non-interference (in name/address space) among the control programs is assured in our case through standard user level protection, and by limiting each user process to access and manipulate only state of the flow it is assigned to. Resource consumption is managed by resource limits and scheduling.
Control on Demand
8
327
Supporting Mechanisms
8.1 Exploiting the IPv6 Flow Label Control on demand is flow oriented. Whereas in a connection oriented network this would simply mean that controllers are associated with connections, in an IP network it means that a sequence of packets are identified (classified) as belonging to the same flow. In our prototype we exploit the IPv6 flow label to explicitly identify flows. To define a flow an end-system simply sets a (locally) unique flow label on a packet intended for that flow. Assignment of a non-zero flow label declares the intent to use this flow for “something special,” and enables the end-system to refer to the flow for later attribute assignment. A router processes such a packet by creating an entry in a flow cache, which in our current prototype is simply a copy of the corresponding entry (in the regular cache) for the destination address. Subsequent packets with the same flow label are routed using this flow cache, effectively pinning the route of the flow. To assign a controller to the flow, we use hop-by-hop extension headers (see below). A new controller may be assigned to the flow at any time. Our use of the IPv6 flow label is consistent with its intended use, and does not change the softness of the state within the network. Since control-on-demand is enhancement control, and is never assumed, a router can discard the cached flow state at its discretion. In particular policies used for invalidating state in the regular cache may be applied to the flow cache. Thus, explicit flow release on termination is not necessary.
8.2 Frame Peeking Although the semantics of control-on-demand are designed to reduce the real-time (on-line) processing requirements on the router control processor, acting in the data path ultimately requires viewing the data being transported. To enhance the efficiency of this we use a new primitive for frame peeking - a mechanism that enables a controller to peek at a portion of a datagram rather than the full frame. The primitive takes two parameters, a) offset within the payload from where peeking starts and b) length, indicating the number of bytes to peek at. This primitive is motivated by the observation that most often the installed program only needs to peek at very few bytes in the application level header (see MPEG stream thinning below). Enabling the controller to only peek at parts of each packet/frame as opposed to having to be fully in the data path enhances the efficiency in two ways. First, by reducing the bandwidth across the boundary between the forwarding engine and the installed programs, and second, by allowing the datagrams to remain in the buffers of the forwarding engine. In a software router, like our prototype, this benefit is primarily the reduction in data copying from kernel space to user space where the controllers execute. In a more optimized high performance router, leaving the datagrams in the forwarding buffers provides an additional performance benefit. In particular, if the underlying forwarding hardware is an ATM switch, frame peeking reduces packet reassembly needed and avoids segmentation completely, further enhancing performance.
328
Gísli Hjálmtsý son and Samrat Bhattacharjee
Frame peeking and its benefits are not specific to control-on-demand and could benefit store-execute-and-forward approaches similarly by reducing the data copying across the abstraction boundary. However, frame peeking is performed at the point of buffering. In many routers this happens at output queue, thus forbidding frame peeking to be used as basis for routing decisions.
9
Example Applications of control-on-demand
To further verify the viability of our approach we have implemented numerous applications of control-on-demand. We discuss two of them below, and show how asynchronous application of the control program is sufficient to achieve the same results. 1. Selective_Discard() 2. While (1) { 3. Sleep(some time); 4. (Qinfo, matches, signature) := query (Q, offset, peekLen); 5. if (Qinfo.Bytes < highWaterMark) continue; 6. response r = new response(signature, ||matches|| ); 7. for ( i=0; i < || matches ||; i++ ){ 8. byte frame_type := matches[i]; 9. r[i] := NOOP; 10. if (frame_type != I-frame and Qinfo.Bytes > lowWaterMark){ 11. r[i] := DISCARD; 12. Qinfo.Bytes := packet->length; 13. } // endif; 14. } // endfor 15. respond( r ); 16. } // endwhile
Figure 7: Code segment implementing selective discard for MPEG
9.1 Stream Thinning as Congestion Adaptation We have implemented selective discard for a MPEG video streams. MPEG is hierarchically coded using three types of frames I-frames, P-frames and B-frames. Whereas the loss of an I-frame can affect all frames until the next I-frame (a group of pictures), loosing a small number of P- and B- frames only degrades quality marginally. The MPEG selective discard policy discards the less important P- and B-frames in an attempt to protect the to I-frames. Our approach to congestion adaptation using control-on-demand is based on the observation that during congestion buffers are long and thus queuing time significant. When run, the video stream controller executes a query (Figure 7, line 3) and blocks. When invoked the request returns flow and queue statistics, and a list of entries for each packet currently in the queue, each entry containing the result of the frame peeking query. Most of the time, the queue is small (or empty) and the controller goes to sleep. If however, the queue exceeds a high water mark, indicating congestion, the controller prepares a vector of operations, one for each packet, discarding all B and P frames until a low watermark is crossed. On completion the controller returns this vector using the respond primitive (Figure 7, line 15).
Control on Demand
329
This example also nicely illustrates the effectiveness of frame peeking for stream thinning. We used this scheme on a short MPEG movie, encoded into 26546 packets, a total of 13.86 MB. Most datagrams are of size 560 bytes, with few of size 170 bytes, with average of 522 bytes per datagram. Peeking at the one byte needed to identify the MPEG frame results in 27 KB (0.2%) of the data to be copied to user space. Blocking a packet also takes only one byte of response keeping the data copying at 0.2% of what an in-data-path solution would do. Enlarging the packets (the MTU in our testbed is 1500 bytes) would lower this ratio further. We conclude that the frame peeking significantly reduces the data copying across the interface.
9.2 Application-Specific Traffic Shaping Another example to demonstrate the effectiveness of control-on-demand, we have implemented a work-ahead smoothing service. The service performs applicationspecific traffic shaping to reduce the burstiness of a variable-bit-rate video stream, while avoiding underflow and overflow of the client playback buffer. The onlinesmoothing service has two natural time scales: a medium time scale for computing a transmission schedule consisting of target transmission rates, and a fine time scale for coordinating the transmission of individual packets based on the schedule. The smoother subscribes to flow statistics, including packet sizes and timestamps, but does not need to copy the packet contents. When activated by the CPU scheduler, the smoothing controller executes code to generate a list of {byte, rate} pairs, which are given to the packet scheduler (using the set_schedule primitive). At the data-path level, the packet scheduler switches to ratei after transmitting bytei. Hence, while the controller applies service-specific criteria to set the schedule, the performance-critical work of scheduling packets for transmission is delegated to forwarding engine. For the smoothing to be effective, the CPU scheduler must run the smoothing controller before the schedule is depleted (it is acceptable to run it too early). Assuming that each schedule is used for transmitting several packets this requirement is not hard to meet. Hence, using control-on-demand the application specific smoothing can be realized in an asynchronous manner with negligible impact on forwarding performance.
10
Conclusion
Control-on-demand is a new paradigm for active and programmable networks. Its service model provides sufficient richness to act in the data path, yet is efficient enough to make it practical. Control-on-demand does not adopt the store-execute-andforward model, but retains the store and forward model unchanged. In particular, control-on-demand does not add any software in the critical forwarding path. The programs installed on-demand, are executed on best effort basis at the discretion of the router asynchronously from data forwarding. Control-on-demand state is enhancement state (including the programs) and not needed for correct forwarding, thus maintaining one of the critical features of the Internet.
330
Gísli Hjálmtsý son and Samrat Bhattacharjee
The service model naturally allows the installed control programs to exploit lower level facilities, in particular hardware facilities. In addition, through frame peeking which allows the control programs to peek at fraction of the datagram payload, the controllers may adjust the number of bytes "peeked at" and thus can control the degree to which they act in the data path. Our results show that the savings in bandwidth between controller and forwarding engine is significant. For stream thinning, we showed an example where peeking only at the one byte needed to determine the payload type, data copying was reduced to mere 0.2% of the data flow. We have implemented control-on-demand prototype on an IPv6 router. Through experimentation with application of control-on-demand to number of problems we have verified the functionality of the architecture and interfaces. We exploit the IPv6 flow label to facilitate flow level state sharing and implement flow pinning, while retaining the softness of the flow state. We conclude that control-on-demand is sufficiently rich for range of applications, but is at the same time efficient enough to be of practical value.
References: 1. Gísli Hjálmtsý son and Ajay Jain, “Agent-based Approach to Service Management - towards Service Independent Network Architecture,” Integrated Network Management V - Integrated management in a virtual world, Proceedings of the Fifth IFIP/IEEE International Symposium on Integrated Network Management, pp. 715729, San Diego, California. Aurel Lazar, Roberto Saracco and Rolf Stadler editors, Chapman & Hall, May 1997. 2. A.A. Lazar and R. Stadler, “On Reducing the Complexity of Management and Control of Future Broadband Networks.” Proceedings of the Workshop on Distributed Systems: Operations and Management, Long Branch, NJ, 1993. 3. Aneroussis, N.G., Lazar, A.A., and Pendarakis, D.E., ``Taming XUNET III,'' ACM Computer Communications Review, Volume 25, Number 3, July 1995, pp. 4465. 4. Sean Rooney, Jacobus E. van der Merwe, Simon Crosby and Ian Leslie The Tempest, a Framework for Safe, Resource Assured, Programmable Networks, IEEE Communications Magazine, Vol. 36, No. 10, October 1998, pp.42-53. 5. D. L. Tennenhouse and D. J. Wetherall. “Towards an Active Network Architecture,” Computer Communication Review, 1996. 6. D. Scott Alexander, Marianne Shaw, Scott M. Nettles and Jonathan M. Smith, “Active Bridging,” Sigcomm 1997, Cannes, France, September 1997. 7. Samrat Bhattacharjee, Ken Calvert and Ellen W. Zegura. “An Architecture for Active Networking” High Performance Networking (HPN'97), White Plains, NY, April 1997. 8. Gísli Hjálmtsý son, "Lightweight Call Setup - Supporting Connection and Connectionless Services," in Proceedings of the 15th International Teletraffic Congress ITC-15, pp. 35-45, Washington, DC, USA. V. Ramaswami, P. E. Wirth editors, Elsevier, June 1997.
Control on Demand
331
9. D. Wetherall, J. Guttag, and D. L. Tennenhouse, "ANTS: A Toolkit for Building and Dynamically Deploying Network Protocols", IEEE OPENARCH, San Francisco, CA, 1998. 10. Robert Gray and Gísli Hjálmtsý son, "Dynamic C++ classes - A Lightweight mechanism to update code in a running program," in proceedings of the USENIX Annual Technical Conference, pp. 65-76, June, 1998 11. Prashant Chandra, Allan Fisher, Corey Kosak, T. S. Eugene Ng, Peter Steenkiste, Eduardo Takahashi, Hui Zhang, “Darwin: Resource Management for Value-Added Customizable Network Services,” Sixth IEEE ICNP, Austin, Oct. 1998. 12. Bell Communication Research, Inc., “AIN Release 1: Service Logic Program Framework Generic Reqirements,” FA-NWT-001132. 13. Dan Decasper and Bernhard Plattner “DAN: Distributed Code Caching for Active Networks” in the proceedings of INFOCOM’98, San Francisco, California, March 1998. 14. I. M. Leslie, D. McAuley, R. Black, T. Roscoe, P. Barham, D. Evers, R. Fairbairns, and E. Hyden, “The Design and Implementation of an Operating System to Support Distributed Multimedia Applications,” IEEE JSAC, Vol. 14, No. 7, pp. 12801297, September 1996.
Agent Based Security for the Active Network Infrastructure Stamatis Karnouskos, Ingo Busse, and Stefan Covaci German National Research Center for Information Technology Research Institute for Open Communication Systems (GMD-FOKUS) Kaiserin-Augusta-Allee 31, D-10589 Berlin, Germany http://www.fokus.gmd.de/ima/
Abstract. Security in Active Networks is still in its infancy! This paper presents a new Agent-Based Security architecture for the Active Network Infrastructure (ABSANI). It is explained why agents in combination with Java are considered the appropriate solution for security architecture and how this can be applied in the Active Networks. An agent-based Active Node architecture is introduced and ABSANI is placed within that approach. Subsequently, all the basic components of the ABSANI are analyzed arguing for the benefits they offer. Finally an application scenario of Place-oriented Virtual Private Networks is demonstrated.
1
Introduction
This approach integrates multi-domain parallel evolving technologies (Agents, Java, CORBA). We try to mix the benefits of Agent Technology and where needed of CORBA in order to apply it successfully to the Active Networks domain. We present shortly these areas, how each one can be used as a benefit to the other and where and why our approach stands today in relation with the already ongoing research.
1.1 Active Network Technology The last years a variety of approaches have been pursued in order to provide a flexible programmable network infrastructure that could "change its behavior on drop of a dime". Active Network (AN) technology aims to move dynamic computation within the network and therefore making it more intelligent not just to its end-points but also in the intermediate nodes. An Active Network is a group of network nodes (switches, routers, -called Active Nodes hereafter-) that support the deployment and execution of user applications (embedded in the user communications), without interrupting the network operation. In this way, an Active Network is in the position to offer dynamically customized/programmed network services (e.g. connection) to the customers/users or even enables users to inject their own applications to support Stefan Covaci (Ed.): IWAN'99, LNCS 1653, pp. 330-344, 1999. Springer-Verlag Berlin Heidelberg 1999
Agent Based Security for the Active Network Infrastructure
331
their communication needs. Programmable networks open many new possibilities for innovative applications that are unimaginable with traditional data networks. This dynamic network programmability can be conceived by two different approaches: I. In-band programming of the network nodes (also widely known as the capsule approach). The program is integrated into every packet of data sent to the network (the program is injected on the same path as the data). When these capsules arrive at the Active Node, the node evaluates the programs and adapts its functionality. The programs within the capsules are typically very small due to the size limitation of the packets and the transport overhead imposed by the capsule programs. Active Network programmability based on capsules is therefore limited. That is definitely negative especially in the context of connection-oriented communication environments where active node re-configuration/programming (activated by the reconfiguration of network connections) is needed much less frequently than the processing of packet payload. It is not necessary and not very efficient to equip each data packet with a computation capability as this adds too much overhead to the processing of packets. Thus capsules have very low utilization in such context. II. Out-band programming of the network nodes. Here the programs are injected into the node in a different session from the actual data packets that they affect. User would send the program to the network node (switch/router) where it should be stored and later when data arrives, it is executed processing that data. The data can have some information (e.g. special tags) that would let the node decide how to handle it or what program to execute. Within this approach which makes clear the separation of data/communication packets nodes can be programmed via injection of new program code into the active nodes, where injection can typically be done by specific packets (e.g. mobile agents) that are evaluated at the network nodes. Our architecture supports exactly this approach. Finally in this category falls also the notion of remote manipulation (binding) of the node’s resources through a set of well defined interfaces [1]. This is not considered a pure AN approach as we have high-level configurability/remote manipulation and not programmability of the node. The difference between remote manipulation and active code injection is similar to the difference between a RPCbased and a Mobile Agent (MA)-based software design paradigm, where MAs can help to increase the flexibility and robustness. In addition, it allows for load balancing of the active network services.
1.2 Agent Technology Software agents is a rapidly developing area of research. The research community has still not found a clear answer to the most popular question "What is an agent?" and the debate still goes on. A general answer could be: Agents are software components that act alone or in communities on behalf of an entity and are delegated to perform tasks under some constraints or action plans. However agents come in myriad of different types depending on their nature and the environment.
332
Stamatis Karnouskos et al.
Examples are: Collaborative agents, Autonomous/Proactive agents, Interface agents, Mobile agents, Reactive agents, Hybrid agents, Intelligent/Smart agents, Mental/Emotional agents etc. The above categorization is not unique and depends on some of the attributes agents show in greatest degree. Of course there can be mixed agents i.e. an Intelligent Agent can also be Mobile. In our Active Network infrastructure a variety of agents can be used. E.g. • Intelligent agents that reside on the node and "intelligently" configure the node's resources for optimal performance. • Mobile agents that can be "dumb" but execute trivial tasks in all nodes of the Active Network Infrastructure • Collaborative agents that work in teams and take care of the security within an Active Network domain. E.g., automatic certified security updates on the AN nodes, elimination of denial of service attempts by blocking the source of attack to the nearest AN node etc. Mobile agent systems provide the AN infrastructure with many advantages. MAs shatter the notion of Client/Server model and eliminate its limitations. They provide robust networks as the hold time for connections is reduced only to the time required to move an agent, the agent carries credentials and therefore the connection is not tied to constant user authentication, load balancing can be achieved as there is no request flow across the connection in order to "guide" the agent and respond to results, there even has been already standardization efforts defining interoperable interaction between agent systems [2].
2
Motivation
Security in Active Networks is still in its infancy!Active node programming is typically a security-critical activity. Of course in such a programmable network the security implications are far more complex than in current environments. Although there has been some research concerning the security of AN little or no effort has been made to make a dynamic, extensible, configurable and interoperable. ANs demand that this security architecture is as highly programmable and evolvable as possible. Extensive and expensive authentication measures are necessary to protect the active node resources from malicious intrusions. Such security measures can not be applied on the basis of individual packets due to their time and space requirements. Our solution is an Agent Based Security Architecture for Active Networks. With this approach we don’t seek a one-side technological approach to the AN security problem but the integration of parallel evolving technologies. ABSANI aims in integrating cutting-edge technologies in order to produce a high-security architecture and deal with the advanced security threats that Active Network technology introduces. There is no need in re-inventing the wheel in the security approach we take. By building upon existing security schemes we make sure that our architecture is open and interoperable. We understand that these are parallel developing domains into which much research effort has been invested the last years and which will keep on evolving fast. By integrating state-of-the-art components we make sure that our architecture stays up-to-date and advances/adapts to current needs as its components evolve. That not
Agent Based Security for the Active Network Infrastructure
333
only is in favor of its internal/external security but also of its lifetime. Within the ABSANI architecture we try to encompass the flexibility and special characteristics of agent technology. - resource abuse - masquerade - repudiation - denial of service attack
complete control of agency over hosted agent
Agency
Agency - eavesdropping - alternation - record/replay
Place
Message Services / Services / Resources Service / Resources Resource
Class Server Registry
User
- eavesdropping - alternation - masquerade
Fig. 1. Security Threats to Agent-Based Applications
We use the agent-based approach to program an Active Node. In such an environment author of the MA code, the user, the owner of the hardware, the owner of the execution platform can be different entities governed by different security policies in a heterogeneous environment. As we also see in Fig.1 security in such an environment is an extremely sensitive issue. The hosts have to be protected from malicious agents and the agents themselves have to be protected from malicious hosts or other malicious agents who could attack them. Moreover the communication road between the AN nodes has to be protected with state of the art security techniques. The Agent Community as well as the AN Community work on these topics. Our open security architecture assures that future solutions in the agent security domain can be applicable to our approach, therefore strengthening the Node's protection system.
3
The Active Network Architecture AN Node #2
Legacy Router AN Node #3
Agent AN add-on Legacy Router
Agent AN add-on Legacy Router
AN Node #1
Agent AN add-on Legacy Router
Legacy Router
User's AN or Legacy Router
Fig. 2. Active Network Infrastructure
334
Stamatis Karnouskos et al.
The Active Network Infrastructure is seen as a network of co-existing AN nodes and legacy nodes. User initiates agents that traverse the network and configure the Active Nodes. In Fig.2 the user has initiated an Agent to change the behavior of AN Node #2 and AN Node #3. The agent visits the target node and executes. Then, having fulfilled its tasks, moves to the next AN Node via the Legacy Router. There he executes again. Our notion of an Active Node architecture is with embedded the agent technology (illustrated in Fig.3). As we can see agents can empower current Routers and transform them to Active Nodes. The resources of the node can be accesses/controlled by visiting agents and according to the node's policy schemes.
Agent AN add
1st Execution Environement
nth Execution Environement
Agent Platform CORB SNMP Interface
Legacy Router
Abstraction Layer of Router Resources Routing Hardware
Fig. 3. Active Node Architecture
4
The Security Architecture
Security can’t be an afterthought! It has to be integrated with the node's core function and not implemented at the end as an extra, optional or explicitly called service. The new security architecture for AN proposed hereafter is based on mobile agent technology. Wherever we detect significant benefits we make use (Fig.4) of the Common Object Request Broker (CORBA) [3] which is today an established standard that enhances the original RPC based architectures by allowing relatively free and transparent distribution of service functionality. Currently no standard that handles the interoperability between different agent platforms and the usability of CORBA services by agent based components exists. By further developing this architecture we hope to provide feedback to future standardization efforts. Active Node Security Services Agent System
CORBA
Node Resources Fig. 4. Technology view of ABSANI
Agent Based Security for the Active Network Infrastructure
335
The architecture consists of Places that interact with the core of the architecture (Fig.5). The communication is done mainly between the enforcement engines and Resource Managers. Analytically the components that this architecture consists of are:
Place #1 Enforcement Engine
Enforcement Engine
Credential DB Resource Manager
Policy DB Component DB
Management Place
Place #n
...
Credential DB
Credential DB Resource Manager
Resource Manager
Policy DB Component DB
Cache
Audit
Enforcement Engine
Policy DB
Cache
Component DB
Audit
Cache
Audit
Node-Enforcement Engine
Node Credential DB Node Policy DB
Node - Resource Manager
Node Component DB Audit
Cache
Fig. 5. Overall Architecture view
4.1 Place A Place is a context within an agent system1 in which an agent is executed. This context can provide services/functions such as access to local resources. A place is associated with a location which consists of a place name and the address of the agent system within which the place resides. A Place can be used in different ways. Places are: • Dynamically assigned to agents as they enter the node. The criteria can vary e.g. all agents coming from a specific user or agents belonging to a specific policy scheme etc. A policy manager and a resource manager are assigned to the Place and are given the general security guidelines, which can never be bypassed. If an agent has sufficient credentials then he can fully interact with the components
1
An agent system is a platform that can create, interpret, execute, transfer, and terminate agents. An agent system is identified by its name and address uniquely. One or more Places reside within an Agent System.
336
Stamatis Karnouskos et al.
e.g. change the Place's policy, ask for more resources, insert elements into the component database etc. •
Statically assigned per entity (e.g. user, enterprise etc). Again static resources are given to the Place and the local Resource Manager manages them. With this way it is possible for an Enterprise to setup a network of Places in various nodes, creating a Place Oriented Virtual Private Network (PO-VPN). This offers several advantages e.g. secure communication between company-trusted places etc.
4.2 Policy DB The Policy Database is responsible for maintaining all policy schemes. By separating the PolicyDB from the Enforcement Engine we insert a dynamic way of policy modification within the node. We use an already existing language to define the policies to be stored in the database. The security policy defines the access each piece of code has to resources. Signed code can run with different privileges based on the key that it used. Thus users can tune their trade-off between security and functionality (of course within the allowed limits). We make use of the principal of least-privilege. This principal states that only the minimally powerful authority should be used to authorize a request for access. Thus any mistakes from “powered” users will lead to the least possible damage. Following this thought a principal with the authority to do many different things should be able to indicate which one of those authorities should be used in a specific request. E.g. An administrator wants to backup the Node's databases. He holds two keys the Supervisor_Key (allowed to do anything within the DB) and Read_Key (allowed only to read the DB). He should use the second key to backup his DB. Thus even if something goes wrong no modification/damage can occur at the DB. Any attempts to describe the security policy in terms of each individual principal's authority to access each individual object is not scalable and not understandable for those instituting the policy. Thus it has been proposed to group principals and objects into sets with common attributes, where the attributes are used in making security decisions rather than the individual identities. So we have Rolebased Policy, Group policy, clearance labels, domains etc We are also experimenting with the KeyNote Trust Management System [4] in order to realize flexible policies. In any case policy files are humanreadable/understandable.
4.3 Credential DB Credentials of principals/code & components are stored in this database. A principal is an entity that can make a request for access that is subject to authorization. Security relies not only to the authentication of the entity but also to the
Agent Based Security for the Active Network Infrastructure
337
activities he wants to perform. The credentials combine a description of the identity of the principal and also attributes associated with the principal and the actions he wants to perform to take the decision whether he is granted to do what he asks for or not. Scenario: The principal may want to execute code that is not trusted (but the principal is trusted). On hard node security level this should be denied. Therefore the Enforcement Engine checks a) if the principal is trusted and allowed to perform the desired action b) if the code he wants to execute is trusted. X509v3 and SPKI Certificates [5] are used as credentials in a heterogeneous environment with a key used as the primary identification of a principal. The credentials include a hash of the content, list of signers and their signatures, certificates, other info associated with the specific action or agent. Credentials can be associated with various components such as agents, code, policies etc. Credentials are used to: • Verify that the component was created/distributed/authenticated by the claiming principals. • Verify that the component hasn’t been altered after it has been signed. • Fulfil partially the non-repudiation need so that the originator of that code can't deny it.
4.4 Component DB The Component Database can be considered a general Database of active code, protocols, etc. It can also be used for caching agent's code but its use is far more extended than simple caching. As we will demonstrate it is a non-removable part of this architecture that strengthens the overall security. Security is by nature overhead in the communication and execution in order to protect the system. We accept that. Yet there are novel ways/techniques to minimize this overhead (under certain conditions) and fortify the Security on the node.
The multiple re-visit by the same agent scenario: An agent performs multiple visits to the node. Each time we verify the agent's credentials, put him within a specific policy framework, check it while it executes, authorize every call it makes in other objects or resources it wants to use. It is obvious that if this agent is a frequent visitor it is dull to re-apply the same actions again and again. A caching scheme must be used. Now this caching can be done in different levels. We can cache the agent's code, the agent's credentials, components that the agent needs, monitor the agent's use of resources and associate with a specific agent code etc. Then the next time the agent comes to the node we don’t have to verify its user nor its code. Also as it has executed before we know approximately what its behavior and needs are. Furthermore we have in the Component DB stored its verified, checked and authorized code. Thus we take from our Component DB the code of the agent (which we trust) and only the data of the newly arrived agent. In that way we avoid the repetition of authorizations which are time consuming. Of
338
Stamatis Karnouskos et al.
course this is a policy matter and can be changed but the node should have the means to provide this flexibility, and in order to do that we need the Component DB.
The common component usage scenario : As before, we have agents that visit our node. In this case the distinguishingcharacteristic is not that the code of the agent comes is the same (as in 4.4.1), but that they make use of similar components. E.g. the agents in order to execute an action need some special protocol or some special cryptographic module etc. We (the Node Manager or even the Place Manager) could provide such components in the Component DB signed and tested in the specific environment. The agent then can make a call to those components and perform its actions. As all components will be signed the agent can decide whether it is safe to use those components or not. Such a DB serves in multiple ways. The agent can be lighter as there is no need to carry everything he needs, the node security is enforced as it executed components that have been thoroughly tested by the Node Provider and all the actions are faster as overhead due to security actions are minimized.
4.5 Resource Manager A Resource Manager is available in order to handle the resources. Place Resource Manager. The Place resource Manager can handle the resources that are dedicated to a specific place. It can be contacted also directly via the agents that reside in the associated place also in the case that there is a need for more resources. • Node Resource Manager: Handles the LocalNode Resources. It is contacted via the NodeSecurityManager or via the PlaceResourceManager (Fig.6). It is also the gateway to the resources of another node or nodes. An interface is provided on how this security Architecture interacts with the Resource Manager. •
Note that the resources available to a certain Place are transparent to the Agent. That means that local resources could be extended via CORBA in order to access resources in other AN nodes. This helps with the Place Oriented Virtual Private Network (PO-VPN) as we will explain later.
4.6 Cache The Cache is another essential part of the architecture in order to improve performance. Security checks are time-/computing- consuming processes. In our effort not to duplicate all the time the security checks we have a cache. Caches exist in all Places and are accessible via the Security Enforcer only. Security checks that have been done via the Enforcement Engine are stored with a time limit in the cache. If the time limit expires then the security checks are performed again, otherwise the security check is considered valid and is used by the system.
Agent Based Security for the Active Network Infrastructure
339
The Policy DB can be dynamically updated via the Enforcement Engine any time. Thus the problem is faced that the cache contains outdated information. We solve this problem by deleting -each time the policy for an Entity changes- the cached security checks that are associated with this key/person partially or completely. So next time a security check is requested it will not exist in cache and it will be performed from the beginning. This is a novel method to speed-up the performance of our architecture. Place #x
Place-Enforcement Engine
Place-Credential DB Place - Resource Manager Place- Policy DB
Place- Component DB Place- Cache Place- Audit
Node-Enforcement Engine Node Credential DB Node Policy DB Node - Resource Manager Node Component DB Node Audit
Node Cache
Fig. 6. Figure 1 - Component Communication View
4.7 The Node Management Place A special dedicated Place, the Management Place is responsible for changing the Node's general behavior (Policy, DBs etc). Agents that execute in that environment are "privileged" agents and are under highest security controls. They are able to modify the node Databases and its security scheme, thus extra care has to be taken. Generally this environment should be restricted only to Node Administrators. Normal users can change the behavior of Places assigned to them but they are not able to contact/execute within the isolated and highly protected Management Place. Provisioning and Configuration is done only via the Management Place.
340
Stamatis Karnouskos et al.
4.8 Auditing Experience has shown that 100% security is difficult to realize - if not impossible - due to the multiple factors that interfere. Collecting data generated by network activity provide a useful tool in analyzing the existent security and also trace back (if possible) the originators of a security breakout. Audit data include any attempt to achieve different security level or change entries in the system's databases etc. Intrusion attempts can also be detected via audit e.g. when we see repetitive failures in the attempt to use a component/service we can adapt our policy/behavior so that we prevent any possible intrusions. The more detailed the audit process is the better can various activities be debugged and protected from repeated errors or false configurations.
4.9 Enforcement Engine The Enforcement Engine is used to enforce the policy on the Node and on the Places. An Enforcement Engine must satisfy three important rules. • It is always invoked. The Enforcement Engine should not be called explicitly. Each action should be evaluated and allowed only if it complies with the Policy. •
It is tamperproof. The information that the Enforcement Engine relies on shouldn’t be altered in any way by third unauthorized entities. This calls for Signed objects that no-one can alter.
•
It is verifiable. Enforcement Engine relies on trusted unchanged basic code in order to boot-up. Then its abilities can be expanded.
The Node Administrator is able to use a GUI and edit the Node Policy & Credential Database prior of system run. Place Administrators are able to alter their Policy & Credential DBs via Agent Interface.
5
The Language Decision
One approach is to design a new language tailored to the needs of active networking and our system. The difficulty would be i) designing from the scratch a new language with a bunch of desired features (e.g. safety, performance), ii) if we don’t manage to address all required features needed by the user it would be impossible for user to implement the mobile code he wants, iii) it would require a huge amount of work to keep the language up-to-date with all needs, iv) it would be used by a limited number of people (AN people only) and therefore bugs, errors, etc would be seldom if not at all reported. The other approach is to use an existing language. Java is a very popular language designed especially for mobile code and with security in mind. Multiple research (and not only) domains use this language. Therefore bugs, errors are found and reported fast. The language is a commercial product and advances as day by day
Agent Based Security for the Active Network Infrastructure
341
new features/libraries are added. Also Java is a safe language. The basic security concepts in Java are based on the following components: language design, byte code verifier, class loader, security manager. In the following each part will presented and investigated on how to use the concept to support a security model for the agent platform. First of all, Java is a safe language. That means, there are several mechanism inherent to Java, providing protection against incorrect programs, notably: strictly typed language, careful control of casts, lack of pointer arithmetic,automatic memory management including garbage collection to avoid memory leaks and dangling pointers, check of array references to ensure that they are within the bounds of the array Even though a compiler performs thorough type checking, there is still the possibility of an attack via the use of a “hostile” compiler. Since the agency does not load the source code of an agent but already compiled code in the form of class files there is no way of determining whether the bytecodes were produced by a trustworthy compiler or by an adversary attempting to exploit the agency. Therefore a class verifier is called for. The class verifier [6][7] of Java is used to check every class that is loaded into the Java virtual machine over the network. Before any loaded code is executed, the class is scanned and verified to ensure that it conforms to the specification of the Java virtual machine. The class verifier operates in four passes. The first pass checks that the class file is conformant to the class file format. The second pass performs all verification that can be performed without looking at the bytecode. This includes for example a check whether final classes or methods are subclassed or overridden, respectively. The third pass is a data-flow analysis on each method assuring that there will be no stack over- or underflow, registers always have a value when being accessed, methods are called with appropriate arguments, types are correctly used, and that the opcodes have appropriate typed arguments on the stack and in the registers. This is also referred to as the bytecode verifier. The fourth pass is done during run-time. It ensures for example that a method exists when being called, i.e. it guarantees that the symbolic references are working. The class loader [8] first checks the local codebase of an agency. If a class is available locally it is not loaded over the network but from the local codebase. This prevents the system classes with access control checks from being replaced. In addition the class loader sets the protection domain. The security manager [9] is contacted whenever sensitive system resources, such as the file system or the network for example, are accessed. A check method is called in order to determine whether the calling entity has the required access permissions. To distinguish between the access of foreign classes and the access of system classes the call stack is analyzed. The class loader of each call on the stack is determined and the permissions are the intersection of the permissions of each protection domain which is contained in the class loader.
342
6
Stamatis Karnouskos et al.
Design Goals Fulfilled
This Security Architecture has been designed with the following guidelines in mind: • Simplicity. The model is as simple as possible to understand and administer. The simpler the whole thing is the better the Security Architecture functions and evolves. • Scalability. Our Security architecture can be applied from small and low agent/node populated systems up to large intra- and inter-enterprise ones. To make that sure we: i) have a flexible and advanced policy and access controls (role-based security etc), ii) support of various domains that enforce different policies, iii) Manage distribution of data and cryptographic keys e.g. across the network without human intervention. • Flexibility This is probably the most significant driving force in the design of this security architecture. Flexibility is enhanced to the maximum for end-users and administrators but not at the cost of safety/security. Choice of access control Policy, choices of audit policy or security functionality profiles are some examples. • Interoperability The architecture uses CORBA for interoperability reasons. CORBA guarantees consistent security schemes among heterogeneous systems where different ORBs are deployed by various vendors. We raise this security to a higher level so that the Agent world is able to use these advantages. Also we use the Grasshopper [10] a MASIF [11] compliant Agent Platform. • Performance The trade-off between Performance and Security is always a controversial issue within the research community. Security is by its nature overhead. Though different users have different needs. We can't simply provide a homogeneous security facility. Security should be user or even task specific. The super-user who enforces a specific security scheme (in a Place or a node) should also be able to select/decide on the tradeoff between certain security and performance (of course within some limits). For better performance we have included Caches within each execution place as well as Component_DBs where code can reside. We hope that in the future the performance of Java will be also improved. • Object-Orientation Interfaces are purely object oriented. We think that in this way system's integrity is promoted and complexity of security mechanisms is hidden under simple interfaces. Those interfaces could be at any time changed/enhanced without an impact on the way this architecture nor its users use them. This approach offers also survivability as well as ability to advance and adapt to future needs. • Access-Control Access Control aims at preventing an agents from accessing unauthorized resources. In our Security Architecture calls to resources are intercepted. Then the Security Manager is called in order to decide whether this call complies with the
Agent Based Security for the Active Network Infrastructure
343
Policy. If so the action is allowed otherwise denied and an error code is returned. Primarily the following goals must be satisfied : Safety - A Safe system limits the possibility that an agent will write to another agent’s namespace and therefore bringing it into an unstable, false or unintended state. Privacy - Agents should not be able to access the address space of another Agent and read it's data. • Safety The use of a safe language such as Java provides some guarantees concerning the Safety. • Conditional-Access Most traditional operating systems deny or allow access. Via our Security architecture we are able to allow conditional resource access. E.g. one Agent can request more memory in order to execute additional tasks. The PlaceResourceManager contacts the NodeResourceManager and requests e.g. more memory. The Enforcement Engine checks to see if this request complies with the current policy and if so more memory is dynamically assigned to a Place for a certain time.
7 Place-Oriented Virtual Private Networks – An Application Scenario An application scenario is introduced here in order to show the flexibility/advantages ABSANI offers. We introduce the concept of Place-Oriented Virtual Private Networks (PO-VPNs). VPNs offer to Enterprises the opportunity to construct their own Network and administer it the way it eases their needs. The ABSANI architecture has been designed with this goal also in mind: The offer of Places which can be leased to 3rd party entities and managed by them. This of course assumes partitioning or multiplexing of available resources. An Enterprise can obtain for its needs a Place in strategically located Active Nodes, thus constructing a PO-VPN. As it can manage all policies/resources on the assigned Place, it has complete control (to the allowed limits set by the Node Operator) over the PO-VPN. Actually this looks like a distributed Agency spread over various nodes. A possible scenario is that an Enterprise wants to create a PO-VPN. As various providers would offer services in various prices it could be a benefit to chose among the best offers not overall (as an offered packet) but partially (specific service selection). What we mean exactly with that? Suppose that a provider offers high speed processing power (fast CPUs) but limited storage capacity to the node. A second one offers better price on huge amount of storage but low processing power. The user should be able to combine both. E.g. execute the code in the fast node and have results stored on the slow node with the extended storage. That is of course difficult to realize/implement at the moment but we would like to leave this possibility open as this could be a future evolutionary step. In any case those kind of scenarios are supported by the Security Architecture we have presented here.
344
8
Stamatis Karnouskos et al.
Summary and Conclusions
An agent-based Security architecture has been presented. ABSANI uses the mobile agent technology and the benefits that derive from it in order to apply them in the Active Network domain. It has been demonstrated with an agent-based active node architecture how agents can empower the current passive routers and transform them to Active Nodes. We have placed the security architecture within this AN nodes. We have showed that benefits such as simplicity, scalability, flexibility, interoperability, performance and safety have been addressed successfully. Agent community has invested a lot of effort trying to make mobile code secure and flexible and Active Networks' objectives can be achieved via our approach. This approach provides a dynamic, extensible, configurable, and interoperable way to secure Active Networks. Also with the use of Java we can guarantee a high level of safeness. Furthermore by combining approaches (in a Lego-like way) we enhance not only the interoperability of our architecture but as well its lifetime. ABSANI offers a security scheme dealing successfully with the current needs of secure active networking, and will continue its fast evolvement as long as agent technology keeps on advancing.
References 1.
C. M. Adam, J.-F. Huard, A. A. Lazar, K.-S. Lim, M. Nandikesan, E. Shim, “Proposal for Standardization of ATM Binding Interface Base 2.1”, submitted to P1520, January 1999. 2. Mobile Agent System Interoperability Facility, URL : http://www.fokus.gmd.de/research/cc/ima/masif/ 3. OMG Web Site : http://www.omg.org/ 4. The KeyNote Trust-Management System. URL : http://www.cis.upenn.edu/~angelos/keynote.html 5. Simple Public Key Infrastructure. URL : http://www.ietf.org/html.charters/spki-charter.html 6. F. Yellin, „Low Level Security in Java“, 1997, deleted from www.javasoft.com. 7. B. Venners, „Security and the Class Verifier“, JavaWorld, October 1997, http://www.javaworld.com/javaworld/jw-10-1997/ 8. B. Venners, „Security and the Class Loader Architecture“, JavaWorld, Septermber 1997, http://www.javaworld.com/javaworld/jw-09-1997/ 9. B. Venners, „Java Security: How to Install the Security Manager and Customize your Security Policy“, JavaWorld, November 1997, http://www.javaworld.com“javaworld/jw-11-1997/ 10. IKV++ GmbH - Grasshopper URL : http://www.ikv.de/products/grasshopper
Author Index Kulkarni, A.
Alexander, D.S. 1 Athanas, P.M. 180 Bäumer, C. 109 Bharghavan, V. 37 Bhattacharjee, S. 315 Bhushan, B. 285 Briscoe, B. 48 Busse, I. 165, 330 Campbell, A. 13, 249 Cardoe, R 117 Chow, R. 274 Cohen, A. 132 Covaci, S. 165, 188, 241, 330
299
Lee, D.C. 180 Leichsenring, A. Lo Re, G. 97 Lupu, E. 73
165
Magedanz, T. 109 Marshall, I. 60, 188 Menage, P. 25 Midkiff, S.F. 180 Mihai, N. 144 Miki, K. 13 Minden, G. 299 Miyazaki, S. 241
Damianakis, K. 48 Delgrossi, L. 97 Denazis, S. 13 Di Fatta, G. 97 Donohoe, M. 188
Nakane, K. 241 Newman, R. 274 Norden, S. 212
Evans, J.
Rangarajan, S. 132 Raz, D. 220 Redlich, J.-P. 232 Rizzo, M. 48
Parkkila, S.
299
Faltings, B. 262 Ferrari, D. 97 Finney, J. 117 Frost, V. 299 Fry, M. 60 Gavalas, D. 199 Ghosh, A. 60 Greenwood, D.P.A.
199
Hall, J. 285 Hicks, M. 307 Hjálmtsý son, G. 315 Jones, M.T. 180 Juhola, A. 188 Karnouskous, S 330 Keromytis, A.D. 307 Kiriha, Y. 85
188
Scott, A.C. 117 Shavitt, Y. 220 Shepherd, W.D. 117 Sivakumar, R. 37 Sloman, M 73 Smith, J.M. 1 Sugauchi, K. 241 Suzuki, M. 232 Tassel, J. 48 Tschudin, C. 156 Vanecek, G. 144 Vanet, G. 85 Velasco, L. 60 Velte, T. 188 Venkitaraman, N. 37
346
Author Index
Vicente, J. 13, 249 Villela, D.A. 249
Yeh, J.-h. Yoshida, K.
Weinstein, S. 232 Willmott, S. 262 Wong, K.F. 212
Zhang, T.
274 241 241