Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
5764
Sergey Balandin Dmitri Moltchanov Yevgeni Koucheryavy (Eds.)
Smart Spaces and Next Generation Wired/Wireless Networking 9th International Conference, NEW2AN 2009 and Second Conference on Smart Spaces, ruSMART 2009 St. Petersburg, Russia, September 15-18, 2009 Proceedings
13
Volume Editors Sergey Balandin Nokia Research Center Itamerenkatu 11-13, 00180 Helsinki, Finland E-mail:
[email protected] Dmitri Moltchanov Yevgeni Koucheryavy Tampere University of Technology Department of Communications Engineering Korkeakoulunkatu 10, 33720 Tampere, Finland E-mail: {moltchan, yk}@cs.tut.fi
Library of Congress Control Number: 2009933595 CR Subject Classification (1998): C.2, B.8, C.4, D.2, K.6, D.4.6, K.6.5 LNCS Sublibrary: SL 5 – Computer Communication Networks and Telecommunications ISSN ISBN-10 ISBN-13
0302-9743 3-642-04188-4 Springer Berlin Heidelberg New York 978-3-642-04188-4 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12753468 06/3180 543210
Preface
We welcome you to the joint proceedings of the 9th NEW2AN (Next-Generation Teletraffic and Wired/Wireless Advanced Networking) and the Second ruSMART conferences held in St. Petersburg, Russia during September 15-17, 2009. This year NEW2AN featured significant contributions to various aspects of networking. Presented topics encompassed several layers of communication networks: from physical layers to transport protocols. In particular, issues of QoS in wireless and IP-based multi-service networks were dealt with. Cross-layer optimization, traffic characterization were also addressed within the program. It is also worth mentioning the emphasis placed on wireless networks, including, but not limited to, cellular networks, wireless local area networks, personal area networks, mobile ad hoc networks, and sensor networks. The Second Conference on Smart Spaces, ruSMART 2009, was targeted at attracting the attention of academic and industrial researchers to an emerging area of smart spaces that creates completely new opportunities for making fully customized applications and services for the users. The conference is a meeting place for leading experts from top affiliations around the world, with particularly active participation and strong interest from Russian attendees that have a good reputation for high-quality research and business in innovative service creation and applications development. The NEW2AN/ruSMART 2009 call for papers attracted 82 papers from 22 countries, resulting in an acceptance rate of 39%. With the help of the excellent Technical Program Committee and a number of associated reviewers, the best 32 high-quality papers were selected for publication. The conference was organized in seven single track sessions. We wish to thank the Technical Program Committee members of both conferences and the associated reviewers for their hard work and important contribution to the conference. The Technical Program of both conferences benefited from two keynote speakers: - Aaron J. Quigley, University College Dublin, Ireland - Manfred Schneps-Schneppe, Ventspils University College, Latvia This year the conferences were supported by the sponsor packages provided by NOKIA, Nokia Siemens Networks, Ubitel (Russia) and Baltic IT Ltd (Russia) and organized in cooperation with ITC (International Teletraffic Congress), IEEE, and Popov Society. The support of these organizations is gratefully acknowledged. Finally, we wish to thank many people who contributed to the organization. In particular, Jakub Jakubiak (TUT, Finland) carried a substantial load of the submissions and reviews, website maintaining, did an excellent job on the compilation of camera-ready papers and liaising with Springer. Many thanks go to
VI
Preface
Natalia Avdeenko and Ekaterina Antonyuk (Monomax Meetings & Incentives) for their excellent local organization efforts and the conference’s social program preparation. We believe that the 9th NEW2AN and the Second ruSMART conferences provided an interesting and up-to-date scientific program. We hope that the participants enjoyed the technical and social conference program, Russian hospitality and the beautiful city of St. Petersburg.
July 2009
Sergey Balandin Dmitri Moltchanov Yevgeni Koucheryavy
Organization
NEW2AN International Advisory Committee Ian F. Akyildiz Nina Bhatti Igor Faynberg Jarmo Harju Andrey Koucheryavy Villy B. Iversen Paul K¨ uhn Kyu Ouk Lee Mohammad S. Obaidat Michael Smirnov Manfred Sneps-Sneppe Ioannis Stavrakakis Sergey Stepanov Phuoc Tran-Gia Gennady Yanovsky
Georgia Institute of Technology, USA Hewlett Packard, USA Alcatel Lucent, USA Tampere University of Technology, Finland ZNIIS R&D, Russia Technical University of Denmark, Denmark University of Stuttgart, Germany ETRI, Korea Monmouth University, USA Fraunhofer FOKUS, Germany Ventspils University College, Latvia University of Athens, Greece Sistema Telecom, Russia University of W¨ urzburg, Germany State University of Telecommunications, Russia
NEW2AN Technical Program Committee TPC Chair Dmitri Moltchanov Mari Carmen Aguayo-Torres Ozgur B. Akan Khalid Al-Begain Sergey Andreev Tricha Anjali Konstantin Avrachenkov Francisco Barcelo Sergey Balandin Thomas M. Bohnert Torsten Braun Chrysostomos Chrysostomou Georg Carle Ibrahim Develi Roman Dunaytsev Eylem Ekici
Tampere University of Technology, Finland University of Malaga, Spain METU, Turkey University of Glamorgan, UK State University Aerospace Instrumentation, Russia Illinois Institute of Technology, USA INRIA, France UPC, Spain Nokia, Finland SAP Research, Switzerland University of Bern, Switzerland University of Cyprus, Cyprus University of T¨ ubingen, Germany Erciyes University, Turkey Tampere University of Technology, Finland Ohio State University, USA
VIII
Organization
Sergey Gorinsky Markus Fidler Giovanni Giambene Stefano Giordano Ivan Ganchev Vitaly Gutin Martin Karsten Andreas Kassler Maria Kihl Tatiana Kozlova Madsen Yevgeni Koucheryavy Jong-Hyouk Lee Vitaly Li Lemin Li Leszek T. Lilien Saverio Mascolo Maja Matijaˇsevic Paulo Mendes Ilka Miloucheva Edmundo Monteiro Se´an Murphy Marc Necker Mairtin O’Droma Jaudelice Cavalcante de Oliveira Evgeni Osipov George Pavlou Simon Pietro Romano Alexander Sayenko Dirk Staehle Sergei Semenov Burkhard Stiller Weilian Su Veselin Rakocevic Dmitry Tkachenko Vassilis Tsaoussidis
Washington University in St. Louis, USA NTNU Trondheim, Norway University of Siena, Italy University of Pisa, Italy University of Limerick, Ireland Popov Society, Russia University of Waterloo, Canada Karlstad University, Sweden Lund University, Sweden Aalborg University, Denmark Tampere University of Technology, Finland (Chair) Sungkyunkwan University, R. Korea Kangwon National University, R. Korea University of Electronic Science and Techn. of China, China Western Michigan University, USA Politecnico di Bari, Italy University of Zagreb, FER, Croatia INESC Porto, Portugal Salzburg Research, Austria University of Coimbra, Portugal University College Dublin, Ireland University of Stuttgart, Germany University of Limerick, Ireland Drexel University, USA Lulea University of Technology, Sweden University of Surrey, UK Universit` a degli Studi di Napoli “Federico II”, Italy Nokia Siemens Networks, Finland University of W¨ urzburg, Germany Nokia, Finland University of Zurich and ETH Zurich, Switzerland Naval Postgraduate School, USA City University London, UK IEEE St.Petersburg BT/CE/COM Chapter, Russia Demokritos University of Thrace, Greece
Organization
Christian Tschudin Andrey Turlikov Kurt Tutschku Alexey Vinel Lars Wolf
University of Basel, Switzerland State University of Aerospace Instrumentation, Russia University of Vienna, Austria SPIIRAN, Russia Technische Universit¨ at Braunschweig, Germany
NEW2AN Additional Reviewers B. Alhaija A. Baryun C. Callegari S. Diamantopoulos R. Garroppo P. Godlewski J. Granjal C. Hoene
D. Iacono J. Jakubiak I. Komnios J. H. Lee E. Leitgeb D. Milic P. Mitoraj M. Okada
C. Pelizzoni J. Peltotalo V. Pereira M.-D. Perez Guirao S. Poryazov T. Staub M. Waelchli
ruSMART Executive Technical Program Committee Sergey Boldyrev Nikolai Nefedov Ian Oliver Alexander Smirnov Vladimir Gorodetsky Michael Lawo Michael Smirnov Dieter Uckelmann Cornel Klein Maxim Osipov
Nokia Research Center, Helsinki, Finland Nokia Research Center, Zurich, Switzerland Nokia Research Center, Helsinki, Finland SPIIRAS, St.-Petersburg, Russia SPIIRAS, St.-Petersburg, Russia Center for Computing Technologies (TZI), University of Bremen, Germany Fraunhofer FOKUS, Germany LogDynamics Lab, University of Bremen, Germany Siemens Corporate Technology, Germany Siemens CT, Embedded Linux, Russia
ruSMART Technical Program Committee Juha Laurila Sergey Balandin Alexey Dudkov Didem Gozupek Kim Geunhyung Reto Krummenacher Prem Jayaraman Michel Banˆ atre Sergei Bogomolov
Nokia Research Center, Switzerland Nokia Research Center, Finland University of Turku, Finland Bogazici University, Turkey Dong Eui University, Korea STI Innsbruck, Austria Monash University, Australia IRISA, France LGERP R&D Lab, Russia
IX
X
Organization
Gianpaolo Cugola Dimitri Konstantas Markus Taumberger Bilhanan Silverajan Aaron J. Quigley
Politecnico di Milano, Italy University of Geneva, Switzerland VTT, Finland Tampere University of Technology, Finland University College Dublin, Ireland
ruSMART Additional Reviewers Y. Koucheryavy
Organization
XI
Table of Contents
I
ruSMART
Invited Talk ITU G.hn Concept and Home Automation . . . . . . . . . . . . . . . . . . . . . . . . . . Manfred Schneps-Schneppe
1
Session I Extending Context Spaces Theory by Predicting Run-Time Context . . . . Andrey Boytsov, Arkady Zaslavsky, and K˚ are Synnes
8
Cross-Domain Interoperability: A Case Study . . . . . . . . . . . . . . . . . . . . . . . . Jukka Honkola, Hannu Laine, Ronald Brown, and Ian Oliver
22
A Topology Based Approach for Context-Awareness in Smart Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Antonio Coronato and Giuseppe De Pietro Anonymous Agent Coordination in Smart Spaces: State-of-the-Art . . . . . Alexander Smirnov, Alexey Kashevnik, Nikolay Shilov, Ian Oliver, Sergey Balandin, and Sergey Boldyrev
32 42
Session II On-the-Fly Situation Composition within Smart Spaces . . . . . . . . . . . . . . . Prem Prakash Jayaraman, Arkady Zaslavsky, and Jerker Delsing Empower Mobile Workspaces by Wireless Networks and Wearable Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Lawo, Otthein Herzog, Michael Boronowski, and Peter Knackfuß Multimodal Interaction with Intelligent Meeting Room Facilities from Inside and Outside . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.L. Ronzhin and V.Yu. Budkov
52
66
77
Session III Ubi-Check: A Pervasive Integrity Checking System . . . . . . . . . . . . . . . . . . . Michel Banˆ atre, Fabien Allard, and Paul Couderc
89
XIV
Table of Contents
Towards a Lightweight Security Solution for User-Friendly Management of Distributed Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pentti Tarvainen, Mikko Ala-Louko, Marko Jaakola, Ilkka Uusitalo, Spyros Lalis, Tomasz Paczesny, Markus Taumberger, and Pekka Savolainen Cross-Site Management of User Online Attributes . . . . . . . . . . . . . . . . . . . . Jin Liu
II
97
110
NEW2AN
Teletraffic Issues Analysis and Optimization of Aggregation in a Reconfigurable Optical ADD/DROP Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J.M. Fourneau, N. Izri, and D. Verch`ere Teletraffic Capacity Performance of WDM/DS-OCDMA Passive Optical Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohammad Gharaei, Catherine Lepers, Olfa Affes, and Philippe Gallion
120
132
Estimation of GoS Parameters in Intelligent Network . . . . . . . . . . . . . . . . . Irina Buzyukova, Yulia Gaidamaka, and Gennady Yanovsky
143
Multi-Skill Call Center as a Grading from “Old” Telephony . . . . . . . . . . . Manfred Schneps-Schneppe and Janis Sedols
154
Traffic Measurements, Modeling, and Control A Real-Time Algorithm for Skype Traffic Detection and Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Adami, C. Callegari, S. Giordano, M. Pagano, and T. Pepe
168
HTTP Traffic Measurements on Access Networks, Analysis of Results and Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vladimir Deart, Vladimir Mankov, and Alexander Pilugin
180
The Video Streaming Monitoring in the Next Generation Networks . . . . . A. Paramonov, D. Tarasov, and A. Koucheryavy The Poisson Cluster Process Runs as a Model for the Internet Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jerzy Martyna
191
206
Table of Contents
XV
Peer-to-Peer Systems Proactive Peer-to-Peer Traffic Control When Delivering Large Amounts of Content within a Large-Scale Organization . . . . . . . . . . . . . . . . . . . . . . . . Chih-Chin Liang, Chia-Hung Wang, Hsing Luh, Ping-Yu Hsu, and Wuyi Yue
217
ISP-Driven Managed P2P Framework for Effective Real-Time IPTV Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seil Jeon, Younghan Kim, Jonghwa Yi, and Shingak Kang
229
Fault-Tolerant Architecture for Peer to Peer Network Management Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maryam Barshan, Mahmood Fathy, and Saleh Yousefi
241
Security Issues Public Key Signatures and Lightweight Security Solutions in a Wireless Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dmitrij Lagutin and Sasu Tarkoma
253
On the Operational Security Assurance Evaluation of Networked IT Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Artur Hecker and Michel Riguidel
266
Trust Management Using Networks of Volunteers in Ubiquitous Computing Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mijeom Kim, Mohan Kumar, and Sukju Jung
279
A Fast and Efficient Handover Authentication Achieving Conditional Privacy in V2I Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaeduck Choi, Souhwan Jung, Younghan Kim, and Myungsik Yoo
291
Wireless Networks: Ad Hoc and Mesh Robust and Efficient Routing in Wireless Mesh Networks with Unreliable Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Apostolia Papapostolou, Vasilis Friderikos, Tara A. Yahiya, and Hakima Chaouchi Video Compression for Wireless Transmission: Reducing the Power Consumption of the WPAN Hi-Speed Systems . . . . . . . . . . . . . . . . . . . . . . . Andrey Belogolovy, Eugine Belyaev, Anton Sergeev, and Andrey Turlikov Link Stability-Aware Ad Hoc Routing Protocol with Multiple Sampling Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Myungsik Yoo, Junwon Lee, Younghan Kim, and Souhwan Jung
301
313
323
XVI
Table of Contents
Wireless Networks: Capacity and Mobility A Quality of Service Based Mobility Management in Heterogenous Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tara A. Yahiya and Hakima Chaouchi
334
A Novel Cooperative Relaying Scheme for Next Generation Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hong Zhou
346
Frequency Allocation Scheme to Maximize Cell Capacity in a Cellular System with Cooperative Relay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wooju Lee, Kyongkuk Cho, Dongweon Yoon, Sang Kyu Park, and Yeonsoo Jang
358
Achieving Secondary Capacity under Interference from a Primary Base Station . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shadi Ebrahimi Asl and Bahman Abolhassani
365
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
377
ITU G.hn Concept and Home Automation Manfred Schneps-Schneppe Ventspils University College Inzenieru st 101,Ventspils, Latvia
[email protected] For long time, many in the buildings industry have been looking for a day when home automation systems would become fully integrated with communication and humaninterface practices, with standards widely employed for information technology. Now the time is coming in the form of the ITU Recommendations G.hn.
1 SmartHouse Concept The most important approach in the field in home automation is the European “SmartHouse Code of Practice” [1]. Figure 1 shows the key role of the residential gateway (home gateway) in a SmartHouse framework.
Fig. 1. Role of the RG in a SmartHouse framework
The SmartHouse consists of a large and wide ranging set of services, applications, equipment, networks and systems that act together in delivering the “intelligent” home in order to address security and control, communications, leisure and comfort, environment integration and accessibility. Table 1 below illustrates the relationship between the technical requirements on bandwidth and SmartHouse services offered. The G.hn approach should enable the ability to manage home electronic systems from one main control point and make our household run smoother, feel better and save energy [2], taking into account many home automation requirements, particularly: S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 1–7, 2009. © Springer-Verlag Berlin Heidelberg 2009
2
M. Schneps-Schneppe Table 1. Examples of SmartHouse services and associated bandwidth
Interoperability. A good example of interoperability is having the lights turn off, the thermostats set back when you press a “goodbye” button on a keypad or when a motion sensor notices that you have exited a room. Remote Access. Remote access capabilities allow us to monitor our home’s environment and alter the settings of the lights, thermostats and other gear if necessary all from your laptop or cellphone. Expandability. It is important that a home automation system can be easily expanded both vertically to incorporate additional products and horizontally to support additional rooms. Manufacturers can support vertical and horizontal expandability by designing their systems to speak a common network language, like IP (Internet Protocol), and by offering wireless products that can communicate with a home’s existing network of wired products. Upgradeability. Software is the driving force of an automation system. The more sophisticated that software is, the more the system can do (see HGI picture below).
ITU G.hn Concept and Home Automation
3
Variety of Interfaces. In the age of universal graphical user interfaces (GUI), is there any reason to continue using a different home automation manufacturer's graphics instead of Web-browser-type operator interfaces? Energy-Savings. One of the hottest topics in the consumer media is energy conservation. Automation systems can help save energy by turning off electronics devices automatically.
2 ITU Recommendations G.hn The ITU concept G.hn is a worldwide standard, addresses technology for wired home-networks. G.hn is the common name for the "next generation" home network technology standard. The G.hn technology goal is to unify connectivity of digital content and media devices by providing a network over three popular types of wiring found in homes today: coax cable, phone lines, and AC power wiring and to supply data rates up to 1 Gbit/s. The ITU G.hn standard’s goal is to specify the physical (layer 1) and link (layer 2) layers for home wired networks (Fig. 2). This work culminated in Recommendation ITU G.9960 specifying G.hn’s Physical Layer. The G.hn Data Link Layer is divided into three sub-layers: 1) Application Protocol Convergence (APC) Layer, which receives Ethernet frames from the application layer and encapsulates them into G.hn MAC Service Data Units, 2) Logical Link Control (LLC), which is responsible for encryption, aggregation, segmentation and automatic repeat-request, and 3) Medium Access Control (MAC), which schedules channel access in according to TDMA method to avoid collisions. G.hn specifies a Physical Layer based on fast Fourier transform (FFT) Orthogonal frequency-division multiplexing (OFDM) modulation and Low-Density Parity-Check
Fig. 2. G.hn node protocol stack
4
M. Schneps-Schneppe
Fig. 3. Home phoneline networking data frame is based on Ethernet standards
FEC code. OFDM systems split the transmitted signal into multiple orthogonal carriers. These carriers are modulated using Quadrature amplitude modulation (QAM). The G.hn Physical Layer is divided into three sub-layers: 1) Physical Coding Sublayer (PCS), for generating PHY headers (Fig. 3), 2) Physical Medium Attachment (PMA), for scrambling and FEC coding/decoding, 3) Physical Medium Dependent (PMD), for bit-loading/stuffing and OFDM modulation. The PMD sub-layer is the only sub-layer in the G.hn stack that is "medium dependent" (ie, some parameters may have different values for each media - power lines, phone lines and coaxial cable). The rest of sub-layers (APC, LLC, MAC, PCS and PMA) are "medium independent". G.hn proponents are working now to make G.hn the future universal wired home networking standard worldwide. G.hn profiles for home automation are on further study yet (Fig. 4).
Fig. 4. Example of a G.hn network
ITU G.hn Concept and Home Automation
5
3 Home Gateway Initiative: RG Software Development Approach 1 There are currently several competing service delivery architectures now. OSGi Alliance is one of erlier, and Home Gateway Initiative is the newiest. The Home Gateway Initiative (HGI) is an open forum launched by a number of telephone companies (Belgacom, BT, DT, FT, KPN, Teliasonera, NTT, Telefonica, Telecom Italia) in December 2004 with the aim to release specifications of the home gateway. Many manufacturers (including Microsoft) have joined the alliance. The initiative will drive the development of residential gateways supporting the delivery of services. The initiative will take as a basis the work undertaken within existing bodies (such as ITU-T, Broadband forum, DLNA, OSGi Alliance, etc). The goals of the initiative are to produce and downstream requirements for a residential gateway enabling end to end delivery of services (Fig. 5), to work with manufacturers in order to leverage volumes, to validate with manufacturer against uses cases and requirements, to ensure interoperability.
Fig. 5. Home gateway environment according to the HGI concept
4 OSGi Alliance: RG Software Development Approach 2 The OSGi Alliance(formerlyas the Open Services Gateway initiative) was founded by Ericsson, IBM, Motorola, Sun Microsystems and others join in 1999. Among its members are more than 35 companies from quite different business areas. The Alliance and its members have specified a Java-based service platform that can be remotely managed (Fig. 6). (We are with Java-based service platform for long timr, see http://abava.blogspot.com/.)
6
M. Schneps-Schneppe
Fig. 6. OSGi Service Gateway Architecture
5 Conclusion The right time is now to develop the G.hn profiles for home automation. According to Wikipedia [4], three home networking organizations that promoted previously incompatible technologies (CEPCA, HomePNA and the Universal Powerline Association), announced that they had agreed to promote G.hn as the single nextgeneration standard for wired home networking, and to work to ensure coexistence with existing products in the market. The Continental Automated Buildings Association (CABA) and HomeGrid Forum signed a liaison agreement to support HomeGrid Forum’s efforts in conjunction with ITU-T G.hn to make it easy for consumers worldwide to connect devices and enjoy innovative applications using existing home wiring. Case studies. The paper is illustrated by case studies (from Latvia): Case 1. The KNX (Konnex) standard based demonstration room is arranged by Urban Art company (Riga) in cooperation with University of Latvia. Time switch programs enable users to heat or ventilate each room individually, while a remote control is also available for operating light switches, monitoring windows and doors for home security. The web server enables users to operate intelligent room from any PC or smart phone, requiring only a common operating system and a browser. Another new feature is the alarm function that sends e-mail or text message warnings to four predefined recipients.
ITU G.hn Concept and Home Automation
7
Case 2. Abavanet company (Ventspils) has developed a series of m-bus controlled devices for water supply automatic measurement (wireline and wireless). Case 3. MikroDators company (Ogre) has produced wireless measuring devices for hot water supplier Rigas Siltums company.
References 1. 2. 3. 4.
SmartHouse Code of Practice, CENELEC, CWA 50487 (November 2005) http://www.electronichouse.com/article/ 10_features_to_look_for_in_a_home_automation_system/ Home Gateway Technical Requirements (2008), http://www.homegateway.org/ http://en.wikipedia.org/wiki/G.hn
Extending Context Spaces Theory by Predicting Run-Time Context Andrey Boytsov, Arkady Zaslavsky, and Kåre Synnes Department of Computer Science and Electrical Engineering Luleå University of Technology, SE-971 87 Luleå {Andrey.Boytsov,Arkady.Zaslavsky,Kare.Synnes}@ltu.se
Abstract. Context awareness and prediction are important for pervasive computing systems. The recently developed theory of context spaces addresses problems related to sensor data uncertainty and high-level situation reasoning. This paper proposes and discusses componentized context prediction algorithms and thus extends the context spaces theory. This paper focuses on two questions: how to plug-in appropriate context prediction techniques, including Markov chains, Bayesian reasoning and sequence predictors, to the context spaces theory and how to estimate the efficiency of those techniques. The paper also proposes and presents a testbed for testing a variety of context prediction methods. The results and ongoing implementation are also discussed. Keywords: context awareness, context prediction, context spaces theory, pervasive computing, Markov model, Bayesian network, branch prediction, neural network.
1 Introduction Pervasive computing is a paradigm where computing systems are integrated into the everyday life and environment in a non-intrusive, graceful and transparent manner. For example, it can be a smart home or office, where doors are automatically opened and light is automatically turned on right before a person enters the room [18]. Or it can be a smart car, which suggests the fastest way to the destination and which assesses its own condition and proposes maintenance plan. Many pervasive computing systems are now being introduced into our life. Context awareness and context prediction are relatively new research areas, but they are becoming an important part of pervasive computing systems. This paper analyzes various context prediction techniques and proposes a plug-in approach to context prediction for pervasive computing systems at run-time. This approach is proposed as an extension of context spaces theory [17, 19]. This paper focuses on how to apply various available context prediction techniques and how to estimate the efficiency of those techniques. This paper also proposes validation of the pluggable context prediction techniques using the “Moonprobe” model and application scenario that imitates movement of vehicle over a sophisticated landscape. The article is structured as follows. Necessary definitions are included in section 2. Section 3 provides a brief overview of the context spaces theory – its essence, S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 8–21, 2009. © Springer-Verlag Berlin Heidelberg 2009
Extending Context Spaces Theory by Predicting Run-Time Context
9
addressed problems, proposed solutions and current research challenges. In section 4 we propose context prediction methods and develop the algorithms for their adaptation to the theory of context spaces. In section 5 we introduce “Moonprobe” model and application scenario – a testing framework that we developed to estimate context prediction efficiency. Section 6 summarizes the paper and proposes future work.
2 Definitions This paper addresses many important issues of pervasive computing systems. Generally, pervasive systems comprise many small and inexpensive specialized devices. If a device is used to specifically obtain information from the environment that device is usually referred to as a sensor. Each device performs its own functions, but to be able to perform them effectively the device usually needs to communicate with other devices and process the information that it obtains from them. One of the most important characteristics of a pervasive computing system is its context. In earlier works on context awareness different definitions of context were proposed. Comprehensive overview of those efforts was presented by Dey and Abowd [6]. They define context as “any information that can be used to characterize the situation of an entity.” In fact, every piece of information that a system has is a part of that system’s context. The system is context aware if it can use context information to its benefit. Reasoning about context is the process of obtaining new knowledge from current and possibly predicted context. Context model is a way of context representation that is used for further reasoning. We assume that context aware pervasive computing systems have the following features: y Sensors supply data that will be used by pervasive system for further reasoning. Examples of sensors may include light sensors, altitude sensors, accelerometers, etc. y Sensors transfer all the data to reasoning engine via a sensor network. y The reasoning engine, which can possibly be a distributed system, performs reasoning, makes context predictions and produces some output that can be used for modifying system’s behavior. Many important issues of pervasive computing systems are left out of scope of this paper, for example, security, privacy, need for distributed computations and other related problems.
3 Context Spaces Theory The theory of context spaces [17, 19] is a recently developed approach for context awareness and reasoning which addresses the problems of sensors uncertainty and unreliability. It also deals with situation reasoning and the problems of context representation in a structured and meaningful manner. Context spaces theory is designed to enable context awareness in clear and insightful way. This theory uses spatial metaphors for representing context as a multidimensional space. To understand context spaces theory we need to introduce several new
10
A. Boytsov, A. Zaslavsky, and K. Synnes
terms. Any kind of data that is used to reason about context is called context attribute. Context attribute usually corresponds to a domain of values of interest, which are either measured by sensors directly or calculated from other context attributes. It can be either numerical value or a value from pre-defined set of non-numerical options. Context state represents the set of all relevant context attributes at a certain time. A set of all possible context states constitutes application space. Therefore, application space can be viewed as a multi-dimensional space where the number of dimensions is equal to the number of context attributes in the context state. The state of the system is represented by a point in the application space and the behavior of the system is represented by a trajectory moving through the application space over time. Situation space is meant to represent real life situation. It can be defined as subspace of the application space. So if context state is in the subspace representing situation S, it means that situation S is occurring. Situation S has the level of occurrence certainty. It depends on the probability of the context state to be within S and it also depends on the subspace within S. See figure 1 for simple illustration. We’ll keep referring to that example for context prediction approaches illustration throughout the article.
Fig. 1. Context spaces theory
In context spaces theory several methods were developed for reasoning about the context. Bayesian reasoning [23] or Dempster-Schafer algorithm [24] are used to get overall confidence in the fact that a situation is occurring. Algebraic operations on situations and some logic-based methods were developed for reasoning in terms of situations. Context spaces theory was implemented in ECORA [20] – Extensible Context Oriented Reasoning Architecture. ECORA is a framework for development of contextaware applications. That framework provides its functionality as a set of Java classes to be integrated into the prototypes of context-aware systems. The research presented in this paper develops context prediction methods that can be used to extend the context spaces theory and benefit ECORA-based applications.
Extending Context Spaces Theory by Predicting Run-Time Context
11
4 Context Prediction for Context Spaces Theory The ability to predict future context will benefit runtime performance of context aware applications. For example, use cases for context prediction can be found in [16]. Predicted context can be used for early warnings, for power management, for system reconfiguration etc. Context prediction is one of research challenges for context spaces theory. In practice, most of current context prediction approaches involve inferring various context features during run-time. The results of inference are then used for context prediction. That inference is actually a machine learning task. For this purpose neural networks [12], Markov chains [5] and Bayesian networks [23] might be sufficiently effective tools. Here we analyze possible context prediction techniques and develop the appropriate algorithms for two-way mapping between the context spaces theory and the analyzed context prediction technique. Sequence predictors for context prediction. Many context prediction approaches were inspired by research in UNIX next command prediction [8, 9]. There authors suggested to represent each UNIX command as a symbol and the whole command flow – as symbol sequence. Also they proposed a set of algorithms for sequence prediction. Sequence prediction techniques were later used in a variety of systems from “Smart Home” architectures to prediction of movement in cellular networks. Various modifications of LZ78 algorithm [26] were used in various papers [1, 4, 7, 10, 15, 22]. It is worth noting the paper [15] which presents general application-independent architecture for context prediction. [15] also suggests that Active LeZi algorithm can be used to predict future situations (note that the concept of a situation in [15] differs from the concept of a situation in context spaces theory). The Active LeZi algorithm itself is described in [10]. According to [15], Active LeZi provides up to 99% of accuracy on 1-step prediction. Sequence prediction techniques are feasible in context spaces theory as well. For prediction purpose context can be represented as a sequence of situations. When situation occurs, a special symbol is added to the sequence. When situation is over, another special symbol can be added to the sequence as well. Sequence prediction techniques can be applied to predict the next symbol in that sequence and, therefore, to predict the occurrence of a situation. For example, context state flow in Fig. 1 can be described by following sequence: (SS3-in)(SS4-in)(SS3-out)(SS4-out)(SS1-in) Sequence prediction algorithms should be adjusted to the fact that situations have certain levels of confidence. It can be done by introducing the threshold for the level of certainty. We consider that the situation is occurring if its level of confidence is above the threshold and we consider that the situation is over if its level of confidence drops below the threshold. In [15] active LeZi showed very good performance in a variety of cases. Incorporation of active LeZi algorithm into context spaces theory is in progress. Its efficiency is currently being investigated. Neural networks for context prediction. Some techniques of context prediction use neural networks to learn from user behavior. For example, [14] describes “Neural Network House” project where special equipment predicts expected occupancy patterns of rooms in the house, estimates hot water usage, estimates the likelihood for a room to be entered soon, etc. In [14] authors used feedforward neural networks trained
12
A. Boytsov, A. Zaslavsky, and K. Synnes
with back propagation. Projects related to indoor movement prediction considered neural network approach as well. In [25] authors addressed the problem of predicting next room the person is going to visit. A neural network took current room number as an input and produced the most probable next room number as an output. Logs of user movement were used to train the network. For prediction purposes authors chose multi-layer perceptron with one hidden layer and back propagation learning algorithm. Neural networks adaptation for context spaces theory can use several approaches. For example, the neural network can accept current situation (with its level of confidence) and predict next situation to occur. Alternatively, the neural network can accept the whole set of current situations and return a whole set of situations expected over a period of time. Additionally, neural network can accept context coordinates and infer the expected future coordinates or expected future situations. Markov models for context prediction. Markov models proved to be a feasible way of context prediction. In [2, 3] authors used Markov chains to predict what internet site user is likely to visit next. Context prediction using Markov chains based techniques and Bayesian networks based techniques were considered in the [13]. In there authors addressed the active device resolution problem, when the user has a remote control to interact with a set of devices, but there are always several devices in user proximity. Some papers use hidden Markov model to learn from user behavior. For example, in [11] hidden Markov models are used to obtain prediction of next room that the person is likely to visit. The layout of rooms in the house, represented as a graph, is handled as a hidden Markov model, and the history is used to obtain the probabilities of moving from one room to another. Context in our approach can be viewed as a Markov model as well. For that we need to represent the context as a graph where every node represents a situation. However, Markov model states have to be mutually exclusive. It is not generally so in context spaces theory. To make situations mutually exclusive, we need to find all kinds of situations intersection and extract them as new situations. To do this we developed the following algorithm. Algorithm MM4CS. Step 1. Create three entities: Entity 1: set of situations. Initially it contains all defined situations. From now on initial situations will be referred to as S1, S2, S3, …, SN. Also they will be referred to as old situation spaces. Newly acquired situations may appear as a result of situation decomposition. They will be referred to according to the situation algebra expressions they were obtained from, for example, S1∩S2∩!S3 or S10∩S8, etc. From now on, for simplicity, when talking about situations and it does not matter whether those situations are initial or newly acquired we’ll refer to them as A,B,C, etc. Subspace of application space that includes whole application space except the situation A will be referred to as !A. Entity 2: set of situation pairs. Elements of the set will be referred like (A, B), where A and B are some situations. Expression (A,*) is a joint name of all elements containing the situation A. Pair of situation with itself like (A,A) are not supported. Situation pairs are unordered, i.e. (A,B) and (B,A) both identify the same set element. Initially the set contains all available pairs of starting set of situations, i.e. (Si, Sj) for every i,j ϵ [1.. N] and i≠j.
Extending Context Spaces Theory by Predicting Run-Time Context
13
Entity 3: dependency graph. Initially it has 2 sets of vertices. Each vertex of each set represents a situation. Vertex will be referred as (A)j where the A represents situation and the superscript represents the set number (1 or 2). Dependency graph will be used for prediction in terms of old situation spaces. Initially it has only vertices (Si)1 and (Si)2 and edges between (Si)1 and (Si)2 (i ϵ [1.. N]). Edges are undirected. Also in the graph special temporary vertices can appear during the runtime. They will be referred as, for example, (A)j:B. Step 2. Take any element from a situation pairs set. Let’s say, it is element (A, B). See, if situation spaces A and B have any intersections in application space. It can be done using situation algebra provided in [17]. If there are no intersections between A and B – just remove the element (A, B) from the situation pair set. If there are any intersections, a set of substeps has to be performed: Substep 2.1. Remove element (A, B) from the set. Substep 2.2. Iterate through the situation pairs list and perform following changes: Substep 2.2.1. Every (A,*) element should be replaced with two new elements. Element (A∩B, *) is created in all cases and element (A∩!B, *) is created if the subspace A∩!B is not empty. Substep 2.2.2. Every (B,*) element should be replaced with two new elements as well. Element (A∩B, *) is created if it was not created on substep 2.2.1 and element (!A∩B, *) is created if the subspace !A∩B is not empty. Substep 2.3. Update dependency graph: Substep 2.3.1. Vertex (A)2 should be removed and several new vertices should be created instead. Vertex (A∩B)2:A should be created in all cases and vertex (A∩!B)2 should be created if subspace A∩!B is not empty. Newly created vertices should have the edges to every element that (A)2 had the edge to. Substep 2.3.2. Vertex (B)2 should be removed and several new vertices should be created instead. Vertex (A∩B)2:B should be created in all cases and vertex (!A∩B)2 should be created if subspace !A∩B is not empty. Newly created vertices should have the edges to every element that (B)2 had the edge to. Substep 2.3.3. Vertices (A∩B)2:A and (A∩B)2:B should be merged to new vertex (A∩B)2. All the edges leading to either (A∩B)2:A or (A∩B)2:B should be redirected to (A∩B)2 instead. Substep 2.4. In situation set remove situations A and B and introduce all nonempty situations of these: A∩B, !A∩B, A∩!B. Step 3. If situation pair list is not empty – go to step 2. If situation pair list is empty – add new situation to the situations list: !S1∩!S2∩!S3∩!...∩!SN. Also add corresponding vertex to the dependency graph: (!S1∩! S2∩!S3∩!...∩!SN)2. It has no edges. Another option is not to mention !S1∩! S2∩!S3∩!...∩!SN situation and consider the system being in process of transition when no situation is triggered. ■ Resulting list of situations will now be referred to as new situation spaces. Resulting dependency graph will have old situation spaces in one set of vertices and new situation spaces in another. Edges between the vertices mean that old situation space has corresponding new situation space as a part of its decomposition. Algorithm has to be applied to the system only once – before the runtime. When this algorithm is complete, mutually exclusive set of situations is obtained. Now this
14
A. Boytsov, A. Zaslavsky, and K. Synnes
new set of situation spaces can be represented as states of the Markov model. All Markov model based context predictors are applicable for it. For illustration see Fig. 2. There are two options of Markov models for the system described in Fig. 1 and it depends on what option do we take on step 3 of an algorithm.
Fig. 2. Markov model for Fig.1
Reasoning in terms of old situation spaces can be easily done using new situation spaces. Let’s say, we need prediction for the situation Si from old situation spaces. It can be done in the following manner. Step 1. Find (Si)1 vertex. It is connected to some other vertices. Let them be (D1)2, (D2)2, (D3)2, … (Dk)2,etc. Step 2. Take predicted levels of confidence for those vertices. It is done using Markov predictor directly. Step 3. Infer the level of confidence of situation: P(Si) = P(D1U D2U D3U…U Dk) Where P(Si) – predicted level of confidence in situation Si, and sign U is a operator for uniting situations. Uniting situations can be done by the means of situation algebra provided by [17]. P(Si) is the demanded result. However, these 3 steps should be done on every case of prediction and therefore they will introduce some run-time overhead. So, Markov models seem to be feasible way of context prediction for context spaces theory. However, to make it applicable for context spaces some pre-processing is needed. Some computation overhead in the run-time will be introduced as well to transition from new situation spaces to old ones. Bayesian networks for context prediction. Bayesian network approach for context prediction was considered in the paper [21]. That research addressed the problem of predicting next room person is likely to visit. Context was represented as dynamic Bayesian network, where current room depended on several rooms visited previously and duration of staying in current room depended on current room itself, time of the day and week day. In [21] dynamic Bayesian network learned from the history of indoor movements. One more case of context prediction using Bayesian networks was described in [13]. There authors addressed active device resolution problem. [13] paper was already discussed in previous section.
Extending Context Spaces Theory by Predicting Run-Time Context
15
Context within context spaces theory can actually be represented by dynamic Bayesian network. In a simplest form it can allow to establish correlations between situations in time (and situations do not have to be mutually exclusive in this case). Dynamic Bayesian networks can also help if there are any additional suggestions about factors that influence the context but are not included directly in the application space. Branch prediction for context prediction. Paper [18] considered applying branch prediction algorithms for context prediction in pervasive computing systems. Branch prediction algorithms are used in microprocessors to predict the future flow of the program after branch instruction. In [18] authors consider another example: branch prediction is used to predict moving of the person around the house or office. Branch prediction algorithms can be applied to context spaces theory as well. Once again, context should be represented in terms of situations, and the situations should be represented as a graph. One option is to use mutually exclusive situation decomposition that was presented for Markov model predictors. Then we can predict moving through the graph over time using branch prediction techniques. Another option is to use the same approach like we took to apply sequence predictors: when situation happens or wears off, special symbol is added to the sequence. Branch predictors can be used to predict next symbol there. Summary. The features of different approaches to context spaces theory can be summarized in following way (see table 1). Table 1. Context prediction approaches summary
Approach Sequence predictors
Benefits and shortcomings Good results are proven in practice. Low adaptation efforts are needed.
Neural networks
Large variety of options to choose and low adaptation efforts needed. However, additional thorough testing is needed by every application to determine what exact neural network type is the most feasible. They can be applied. However, this approach requires splitting the application space in a set of non-overlapping situations, which is not natural for context spaces theory and requires significant pre-processing Approach is able to estimate the influence of different factors, not depending on whether they are in the context or not yet. Low adaptation efforts are needed. The algorithms are simple. However, prediction results are influenced only by small part of the history and the algorithms are capable of working with simple patterns only.
Markov chains
Bayesian networks Branch predictors
Summary This approach is the most perspective . The approach is quite perspective. The efficiency of this approach needs to be evaluated. The approach is very perspective. In general case that approach is least applicable.
16
A. Boytsov, A. Zaslavsky, and K. Synnes
Common challenges and future work directions. As it was shown all context prediction methods have to overcome some specific challenges to be applied to context spaces theory. But there is one common challenge, which was not yet addressed: all the defined context prediction methods deal with a discrete set of possible events or situations. So to apply the prediction methods context should be represented as a set of situations. But the flow of any prediction algorithm depends on how we define situation spaces within application space. Different situation spaces will result in different algorithm flows, even if application space is generally the same. This effect might be vital. So the question arises: what is the best way to define situations inside context space for prediction purposes? Results can depend on exact prediction algorithm or exact application. Situations should be both meaningful for the user and good for prediction. This question needs further research.
5 Testbed for Context Prediction Methods For testing context prediction methods identified above, a testbed is needed. Testbed application should include following features: 1. Widest range of context reasoning and context prediction methods should be applicable. 2. Testbed should be able to estimate the effectiveness of those techniques. 3. Testbed should be configurable and flexible. 4. Architecture of the testbed system must provide the way to replace context reasoning or context prediction method easily. 5. Conforming to the rules above, testbed model still has to be as simple as possible. For validating context prediction methods we introduce “Moonprobe” model and application scenario. “Moonprobe” is a model that simulates vehicle going over some sophisticated landscape in 2D environment. The aim of the vehicle is to reach some destination and not to crash or run out of fuel on its way. This model was developed using XJ Technologies AnyLogic modeling tool ([27]). “Moonprobe” testbed model represents a vehicle. That vehicle is affected by a group of forces (see formula (1)): Fres = Fgr + N + Fen .
(1)
Where: Fres – resulting force. It is the vector sum of all the forces affecting the moonprobe. Fgr – gravity force. It is constant and always affects the probe. Fen – engine force. Moonprobe has an engine, which can be used to dampen the fall or just keep going forward. Engine force is set by probe controlling device. Engine has limited fuel capacity. If the probe ran out of fuel engine force is constantly equal to zero. N – support reaction force. It affects the probe only if it is on the ground. Acceleration of the probe can be found with formula (2). A = (Fgr + N + Fen)/M Where M is the mass of the vehicle.
(2)
Extending Context Spaces Theory by Predicting Run-Time Context
17
In a very simplified manner the model can be described by the following set of equations (see formula (3)).
. |
(3)
|
Where t is time, X is probe coordinate vector, V is probe velocity vector, M is mass of the probe (known system parameter), N is support reaction vector (directed perpendicular to the ground), Fl – fuel level remained, g – gravity vector, FCR – fuel consumption rate (known system parameter), Fen – engine force vector (value and direction set by probe control system). In some cases probe can leave the ground (see Fig. 3). When probe lands, it will experience ground collision.
Fig. 3. Probe leaves the ground
Ground collisions are considered to be dangerous for the probe. Collision effect is estimated by the projection of probe speed perpendicular to the ground level. If it exceeds the threshold, the probe crashes. Engines can be used to slow down the landing and therefore avoid the crash. The “Moonprobe” model deals with the problems that every pervasive system deals with: sensor uncertainty, sensor outages, determining environment characteristics, fusing heterogeneous sensor data etc. By now the moonprobe uses around 20 context parameters to reason about the situations. Moonprobe has a set of situations to be aware of. They can be classified in several major groups. Group 1. Sensor outages. Any sensor can break down. Scenarios of sensor outages are configurable. The behavior of failed sensor is tunable as well, but the most convenient ways seem to be following: (1) Sensors start to show random values (2) Sensors keep returning the same value all the time not regarding to characteristic under measurement (3) Sensors increase their errors
18
A. Boytsov, A. Zaslavsky, and K. Synnes
Fig. 4. Moonprobe system architecture
Fig. 5. "Moonprobe" system working
Time between sensor outage and sensor outage detection can be used as a metric to estimate the quality of reasoning about that situation. Group 2. Current progress of going to the destination point. The probe needs to estimate whether it is going the right way (considering the fact not all the sensors might be working). Group 3. Safety conditions. The probe needs to estimate the level of crash risk. The architecture of the model is following (see Fig. 4). System consists of several components. • •
“Environment and probe” component. It implements the physical model of the system. Also it monitors what happens to the probe (crash, sensor outage etc.). “Sensors” component. Sensors measure the data from the probe in certain moments of time and introduce some errors.
Extending Context Spaces Theory by Predicting Run-Time Context
19
Fig. 6. Situation estimations in “Moonprobe”
• •
“Controlling engine” component. It takes sensed data and provides control information (e.g. desired engine power). “Main” component provides overall assessment of experimental results.
All the context prediction methods described in section 4 can be applied to moonprobe scenario. Context prediction techniques can be incorporated in “Controlling engine” component. There context prediction can be used to find optimal values of engine power to minimize crash risk and reduce fuel consumption. The development of the “Moonprobe” testbed is now complete. Screen dumps of running “Moonprobe” is depicted on Fig. 5(complete) and Fig.6 (partial). Model evaluation has shown that the “Moonprobe” model is really capable of estimating prediction results and that context prediction and context awareness algorithms are really pluggable into it. Exact measurements of the result of different context prediction methods are in progress.
6 Conclusion and Future Work Context spaces theory is a promising approach, which addresses many problems of context awareness in pervasive computing. Adding accurate and reliable context prediction methods is one of the research challenges of that theory. In this paper a set of available context prediction techniques was proposed. Twoway mapping algorithms were developed to apply these context prediction methods to context spaces theory. Efficiency of context prediction methods was addressed. A
20
A. Boytsov, A. Zaslavsky, and K. Synnes
special testbed was developed to evaluate the effectiveness of context prediction methods. Feasibility of the proposed testbed has been also proven. As a result, context spaces theory was enhanced with context prediction algorithms and their applicability and feasibility were demonstrated. By extending context spaces theory with context prediction techniques it can address not only problems of uncertain sensors or situation reasoning, but context prediction problem as well. It will benefit ECORA-based applications and widen the applicability of ECORA to new class of tasks – the tasks which require prediction capability. The proposed approach incorporates pluggable algorithms of context prediction, so for every application it can be defined what exact context prediction approach is the best. Future work will concentrate on accuracy metrics and analytical comparison of the efficiency of context prediction methods. Another interesting challenge for future work is how to define the situation spaces in a manner that both grants usability and enhances context prediction.
References 1. Ashbrook, D.: Learning significant locations and predicting user movement with gps. In: ISWC 2002: Proceedings of the 6th IEEE International Symposium on Wearable Computers, Washington, DC, USA. IEEE Computer Society, Los Alamitos (2002) 2. Albrecht, D.W., Zukerman, I., Nicholson, A.E.: Pre-sending documents on the WWW: A comparative study. In: IJCAI 1999 – Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (1999) 3. Bestavros, A.: Speculative data dissemination and service to reduce server load, network traffic and service time in distributed information systems. In: Proceedings of the 1996 International Conference on Data Engineering (1996) 4. Bhattacharya, A., Das, S.K.: Lezi update: An information-theoretic approach to track mobile users in PCS networks. In: Mobile Computing and Networking, pp. 1–12 (1999) 5. Cappe, O., Moulines, E., Ryden, T.: Inference in Hidden Markov Models. Springer, Heidelberg (2005) 6. Dey, A.K., Abowd, G.D.: Towards a better understanding of context and contextawareness. Technical report (1999) 7. Das, S.K., Cook, D.J., Bhattacharya, A., Heierman III, E.O., Lin, T.-Y.: The role of prediction algorithms in the MavHome smart home architecture. IEEE Wireless Communications 9(6), 77–84 (2003) 8. Davison, B.D., Hirsh, H.: Toward an adaptive command line interface. In: Advances in Human Factors/Ergonomics: Design of Computing Systems: Social and Ergonomic Considerations, pp. 505–508. Elsevier Science Publishers, San Francisco (1997); Proceedings of the Seventh International Conference on Human-Computer Interaction 9. Davison, B.D., Hisrch, H.: Probabilistic online action prediction. In: Proceedings of the AAAI Spring Symposium on Intelligent Environments (1998) 10. Gopalratnam, K., Cook, D.J.: Active Lezi: An incremental parsing algorithm for sequential prediction. In: Proceedings of the Florida Artificial Intelligence Research Symposium (2003) 11. Gellert, A., Vintan, L.: Person Movement Prediction Using Hidden Markov Models. Studies in Informatics and Control 15(1) (March 2006) 12. Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall, Englewood Cliffs (1999)
Extending Context Spaces Theory by Predicting Run-Time Context
21
13. Kaowthumrong, K., Lebsack, J., Han, R.: Automated selection of the active device in interactive multi-device smart spaces (2002) 14. Mozer, M.C.: The neural network house: An environment that adapts to its inhabitants. In: Proceedings of the AAAI 1998 Spring Symposium on Intelligent Environments, pp. 110–114. AAAI Press, Menlo Park (1998) 15. Mayrhofer, R., Radi, H., Ferscha, A.: Recognizing and predicting context by learning from user behavior. Radiomatics: Journal of Communication Engineering, special issue on Advances in Mobile Multimedia 1(1) (May 2004) 16. Nurmi, P., Martin, M., Flanagan, J.A.: Enabling proactiviness through Context Prediction. In: Proc. Workshop on Context Awareness for Proactive Systems, pp. 159–168. Helsinki University Press, Helsinki (2005) 17. Padovitz’s, A.: Context Management and Reasoning about Situations in Pervasive Computing, PhD Theses, Caulfield School of Information Technology, Monash University, Australia (2006) 18. Petzold, J., Bagci, F., Trumler, W., Ungerer, T.: Context Prediction Based on Branch Prediction Methods (July 2003) 19. Padovitz, A., Loke, S.W., Zaslavsky, A.: Towards a theory of context spaces. In: Proceedings of the Second IEEE Annual Conference on Pervasive Computing and Communications Workshops, 2004, pp. 38–42 (2004) 20. Padovitz, A., Loke, S.W., Zaslavsky, A.: The ECORA framework: A hybrid architecture for context-oriented pervasive computing. Pervasive and Mobile Computing 4(2), 182–215 (2008) 21. Petzold, J., Pietzowski, A., Bagci, F., Trumler, W., Ungerer, T.: Prediction of Indoor Movements Using Bayesian Networks. In: Strang, T., Linnhoff-Popien, C. (eds.) LoCA 2005. LNCS, vol. 3479, pp. 211–222. Springer, Heidelberg (2005) 22. Roy, A., Das Bhaumik, S.K., Bhattacharya, A., Basu, K., Cook, D.J., Das, S.K.: Location aware resource management in smart homes, pp. 481–488 (2003) 23. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall, Englewood Cliffs (2003) 24. Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976) 25. Vintan, L., Gellert, A., Petzold, J., Ungerer, T.: Person Movement Prediction Using Neural Networks. In: First Workshop on Modeling and Retrieval of Context, Ulm, Germany (September 2004) 26. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24(5), 530–536 (1978) 27. How to Build a Combined Agent Based / System Dynamics Model Logic in AnyLogic (2008), http://www.xjtek.com/file/161
Cross-Domain Interoperability: A Case Study Jukka Honkola, Hannu Laine, Ronald Brown, and Ian Oliver Nokia Research Center P.O. Box 407 FI-00045 NOKIA GROUP
[email protected],
[email protected],
[email protected],
[email protected] Abstract. We describe a case study of the behaviour of four agents using a space based communication architecture. We demonstrate that interoperability may be achieved by the agents merely describing information about themselves using an agreed upon common ontology. The case study consists of an exercise logger, a game, a mood rendered attached to audio player and phone status observing agents. The desired scenario emerges from the independent actions of the agents.
1 Introduction Interoperability between devices and the software and applications that run on those devices is probably the major goal for many developers of such systems. This can be achieved in a number of ways: particularly through open, agreed standards or often through monopoly. Whichever particular ‘interoperability’ route is taken there will always be a plurality of standards relating to how such systems communicate [6][1]. For example, the Universal Plug and Play interoperability standard is plagued with manufacturer and device specific variations which complicate and nullify to some extent the long and complex standardization process employed to avoid this. Technologies such as the Semantic Web [2] provide enablers for solutions to these problems. Within the Semantic Web exist information representation formats such as the Resource Description Framework (RDF) [10] and the web ontology language OWL [9] which themselves build upon enablers such as XML. Using these technologies we can address and provide solutions1 to the interoperability problem. At one level there are existing solutions such as Web Services (SOAP, WSDL, UDDI etc) and various stacks of middleware for processing representation formats and preservation and reasoning about the semantics of messages. At the other, there are solutions based around more declarative mechanisms such as TripCom [11] and the space-based solutions, for example Java Spaces [4] for the transport and processing of messages and information to assist in interoperability. We do not believe that interoperability will be achieved through standardization committees, nor through standard, globally accepted semantics and ontologies but rather by the unification of semantics in a localized manner, as described in previous work [8]. 1
Plural, never singular!
S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 22–31, 2009. c Springer-Verlag Berlin Heidelberg 2009
Cross-Domain Interoperability: A Case Study
23
In this paper we describe an approach to solving the interoperability problem through a combination of context gathering, reasoning and agents within a space-based infrastructure taking advantage of technologies such as RDF. We demonstrate this through the ad hoc integration or mash-up of a number of distinct applications to demonstrate the principles of this approach.
2 The M3 Concept The M3 system consists of a space based communication mechanism for independent agents. The agents communicate implicitly by inserting information to the space and querying the information in the space. The space is represented by one or more semantic information brokers (SIBs), which store the information as an RDF graph. The agents can access the space by connecting to any of the SIBs making up the space by whatever connectivity mechanims the SIBs offer. Usually, the connection will be over some network, and the agents will be running on various devices. The information in the space is the union of the information contained in the participating SIBs. Thus, the agent sees the same information content regardless of the SIB to which it is connected. The high-level system architecture is shown in Figure 1.
Agent Agent
Agent
Agent
M3 Space SIB SIB SIB
Fig. 1. A diagram of the general system architecture: Agents, M3 spaces and SIBs
The agents may use five different operations to access the information stored in the space: Insert : Insert information in the space Remove : Remove information from the space Update : Atomically update the information, i.e. a combination of insert and remove executed atomically Query : Query for information in the space Subscribe : Set up a persistent query in the space; changes to the query results are reported to the subscriber
24
J. Honkola et al.
In addition to the four access operations there are Join and Leave operations. An agent must have joined the space in order to access the information in the space. The join and leave operations can thus be used to provide access control and encrypted sessions, though the exact mechanisms for these are still undefined. In its basic form the M3 space does not restrict the structure or semantics of the information in any way. Thus, we do not enforce nor guarantee adherence to any specific ontologies, neither do we provide any complex reasoning2. Furthermore, information consistency is not guaranteed. The agents accessing the space are free to interpret the information in whatever way they want. We are planning to provide, though, a mechanism to attach agents directly to the SIBs. These agents have a more powerful interface to access the information and can be e.g. guaranteed exclusive access to the information for series of operations. Such agents may perform more complex reasoning, for example ontology repair or translation between different ontologies. However, they may not join any other spaces but are fixed to a single SIB and thus a single space. The M3 spaces are of local and dynamic nature, in contrast to semantic web which embodies Tim Berners-Lee’s idea of semantic web [2] as a “giant global graph”. The locality and dynamicity—we envision that the spaces will store very dynamic context information, for example—poses different challenges than the internet-wide semantic web. For example, in order to provide a true interoperability for local ubiquitous agents, the space (i.e. SIBs) will have to provide a multitude of connectivity options in addition to http: plain TCP/IP, NoTA [7], Bluetooth,. . . Furthermore, the space should be fairly responsive. While we do not aim for real-time or near real-time system, even half minute long response times for operations are unacceptable. The responsiveness is one of the factors behind the fundamental decision to not enforce any specific ontologies and allowing the agents to interpret the information freely, as it lessens the computational burden of the infrastructure. Another, and more important reason is that we explicitly want to allow mashing up information from different domains in whatever way the agents see best. Strict ontology enforcement would make this kind of activity extremely difficult as all new ways of mashing up the information would require approval from some ontology governance committee. However, as mentioned above, we still plan to provide means for ontology enforcement for cases where the space provider explicitly wishes to restrict the ways the information is used as there are bound to be also situations where this is the best approach. The information content in a M3 space may be distributed over several SIBs. The distribution mechanism assumes that the set of SIBs forming a M3 space are totally routable but not necessarily totally connected. The information content that the agents see is the same regardless the SIB where they are connected. 2.1 Applications in M3 Spaces The notion of application in M3 space is differs radically from the traditional notion of a monolithic application. Rather, as a long term vision, we see the applications as possible scenarios which are enabled by certain sets of agents. Thus, we do not see an 2
The current implementation of the concept understands the owl: sameAs concept.
Cross-Domain Interoperability: A Case Study
25
email application running in M3 space, but we could have a collection of agents present which allow for sending, receiving, composing and reading email. For this kind of scenario based notion of application, we also would like to know whether the available agents can succesfully execute the scenario. The envisioned model of using this system is that the user has a set of agents which are capable of executing certain scenarios. If a user needs to perform a new scenario that the current set of agents are not capable of executing, she could go and find a suitable agent from some directory by describing the desired scenario and the agents she already has. Thus, we need some formal or semi-formal way of describing agent behavior both with respect to the M3 space and to the environment. While there exists research addressing behavior in multi-agent systems, for example by Herlea, Jonker, Treur and Wijngaards [5], this kind of ad-hoc assembly of agents in order to execute a certain scenario seems to be quite unaddressed in current research. However, slightly similar problems have been addressed in e.g. web service orchestration research [3], but these still seem to concentrate on design-time analysis rather than run-time analysis. As for shorter term, our vision is that sets of existing applications would be enhanced by being able to interoperate and thus allow execution of (automatic) scenarios that would have been impossible or required extensive work to implement without the M3 approach.
3 The Case Study The case study consists of four agents, Exercise logger, SuperTux game, Phone line observer, and Mood renderer, running on several devices. The exercise logger and phone line observer run on Nokia S60 platform phones, and the SuperTux and mood renderer run on Nokia N800 internet tablets. The M3 space infrastructure runs on a Linux laptop. 3.1 Application Development Vision The case study setup illustrates a “legacy enhancement” approach to application development in M3 spaces. Conceptually, we added M3 space capability to existing applications (SuperTux, Mood renderer, SportsTracker, Telephony) even if the SportsTracker and Telephony were implemented as standalone simple applications for practical reasons. The starting point for the case study was a single person (as defined in Person ontology, section 3.2) who would be exercising and tracking the workouts with an exercise logger, playing computer games and listening to music. The goal was to demonstrate the benefits of interoperability between existing application, that is, to be able to execute a scenario where the SuperTux game would award extra lives for exercising, a mood renderer embedded in a media player would play suitable music depending on a game state, and the game and media player would react accordingly if the person receives a call. 3.2 Ontologies The ontologies used in this case study have been defined informally, and are not enforced in any way. The different components of the case study assume that they are used correctly. A pseudo-UML diagram of the complete ontology is shown in Figure 2,
26
J. Honkola et al. SuperTux, Mood renderer, Telephony status observer
SuperTux, Mood renderer
OperatorAccount
Game m3:sessionstart m3:sessionend m3:lives
m3:phonelinestatus
m3:player
m3:mobileaccount
All
SuperTux, Workou monitor
m3:user
Workout
Person
m3:totaldistance m3:totaltime m3:stoptime m3:username m3:starttime
m3:username
Fig. 2. An informal description of used ontologies. The using agents are named on top of the “class” box. m3:person m3:workout
m3:supertux
rdf:type rdf:type ?1
m3:user
m3:player
rdf:type
m3:username
m3:starttime ?3
?2 m3:sessionstart
"SportsTrackerUser" m3:lives
m3:totaltime m3:mobileaccount
"4" "2008−10−30T11:30:02"
?4 m3:phonelinestatus "idle"
"2008−10−29T07:02:30"
m3:stoptime "2008−10−29T08:05:12" m3:totaldistance "2053132"
"3738786"
rdf:type m3:operatoraccount
Fig. 3. An example RDF graph during the execution of scenario
and an example instantiation of the ontology when a game is being played and the phoneline is idle is shown in Figure 3. As a general note, the needs of the scenario studied drove the modeling of the ontologies. Most of the choices of what to include and what to exclude were based on the need of the information in executing the chosen scenario. A broader study might need to revisit the ontologies and generalize end expand them. The agents use different parts of the combined3 case study ontology. The Workout Monitor agent understands Workout and Person ontologies, the SuperTux agent understands all ontologies, the Telephony Observer agent understands OperatorAccount and Person ontologies, and the Mood Renderer understands Game and OperatorAccount ontologies. 3
A combination of Person, OperatorAccount, Game, and Workout ontologies.
Cross-Domain Interoperability: A Case Study
27
Person. The person ontology describes the information about a person necessary for this case study. A person can have a name (m3:username) and an operator account (m3:mobileaccount). Operator Account. The operator account ontology contains information about the phoneline status of the account. In a more realistic case, there could also be information about the phone number, used talking time, amount of data transferred etc. Workout. The workout ontology describes a generic distance-oriented workout. Thus, we include information about the length of the workout, distance traveled, person who has done the workout etc. but no information related to, for example, weightlifting where we probably would like to know the number of repeats and weights instead of time used and distance. The choice of what to include in the ontology was influenced by the information available in the SportsTracker application. Game. The Supertux game ontology includes information about the gaming session start time, stop time, lives left and player. The session start is defined to be the time when the game executable was started, showing a welcome screen. The lives value is inserted when a player starts a game, new or saved, from the welcome screen. When the player quits the game completely, that is, exits the welcome screen, the session stop time is inserted. The ontology could include also information about current score, amount of coins, field being played etc. which the mood renderer could use. 3.3 Agents Next we describe the operation of agents making up the case study. It should be noted that the agents are only communicating implicitly through the M3 space by reading the information inserted to it by other agents. Workout Monitor. The workout monitor reads SportsTracker workout files and writes a summary of the workouts to M3 space. We currently read the information related to general description of the workout and insert that in the M3 space. Any path information is ignored as we have no use for it in the current scenarios. The workout information is related to the person doing the workouts. If a person with same name as the SportsTracker username is found, we attach the workout in question to that person, otherwise we create a new person. We use an SportsTracker internal workout identifier to identify workouts so that we do not insert duplicate workouts if the monitor is run again with same workouts as before. Supertux. Supertux is a classic jump’n’run sidescroller game which has taken strong inspiration from the original SuperMario games for Nintendo. Supertux is open source and available under GPL. The ontologies related to Supertux game are the Person ontology, the Workout ontology and the Supertux ontology. Each Supertux game is assigned with player information. If the person class instance of the player can not be found from the M3 space the
28
J. Honkola et al.
game creates one and inserts player information according to the Person ontology. The person class instance also binds the player to the workout information inserted to the M3 space. Prior starting the game supertux queries workout information from the M3 space. The first query searches for all instances of the Workout class. For each found workout class instance a new query is performed in order to find out the details of the workout. If there are long and recent enough workouts performed by the player the game awards two extra lives to a game level. The game creates also an instance of itself and inserts information such as reference to the person class of the player, session start and stop time to the M3 space. In addition, the game updates how many lives the player has left during playing a game level. Every time the player looses or gains one life in the game, the game removes old livesinformation and inserts the new information to the M3 space. Telephony status observer. The telephony voice line status is part of a OperatorAccount class. Operator accounts are bound to a person i.e. it is assumed that each account belongs to a person4 . The voice line status only has two states, namely active and idle. The phone line status monitor subscribes to voice line status changes provided by Symbian’s CTelephony API. When the user dials a phone number to establish voice connection or when there is incoming call to the phone, CTelephony API provides the current voice line status to the observer application. The observer application transforms the CTelephony states (e.g. EStatusRinging, EStatusDialling, EStatusIdle, . . . ) to corresponding values defined by the OperatorAccount ontology and updates the information to the M3 space. Mood renderer. A mood renderer conceptually would have interfaces to sensory, i.e. mood rendering devices of a smart space invironment in order to cooridinate their controls according to a mood determined by its knowlege processing. The questions would be many about such knowlege for determing a mood and how to render it for a given environment. The mood renderer of the case study is a very simple start on the problem, which side-steps many issues for practical cases, while showing the essence of knowlege processing for the given concept. The mood renderer in this case study is based on music as the only rendering device, and thus, the simple assumption that music, more precisely some audio track, maps to a determinable mood, and still further, that the mapping stems solely from the state of a partical game being played and applies invariently with regard to time. This dodges questions about personal tastes and more than one person contributing to the smart space mood. In choosing a track to play, volume is not controlled, but play, pause and stop are used where appropriate. Mood rendering is based solely on two sources of information in the smart space environment. The first is an operating supertux game and the second is phone status. Note then, that this mood renderer is solely reactive to information about a dynamic smart space environment to which it belongs—for simplicity, it does not contribute information even though it clearly impacts it. Given the simplifications stated, and that the mood renderer understands the ontologies of the supertux game and telephony status observer. It follows their two independent 4
Or a group of persons—we do not prevent that.
Cross-Domain Interoperability: A Case Study
29
contributions to the smart space environment and combines them to render a mood appropriate for the state of game play, while maintaining the courtesy of pausing an audio track, if being played while any phone status of the smart space is active. The mood renderer is implemented as two persistent queries for smart space information. The first is looking for an active game in order to render its mood, and the second is looking for any phone with an active status. When an active game is found, a third persistent query, particular to this game, looks for its level of m3:lives and renders a mood as CALM, for no lives before the real play starts, and then when m3:lives exist, either “UP-BEAT” or “DOWN-BEAT” according to a threshold configuration on the number of lives. When the game ends, audio rendering is stopped. The mood renderer’s behavior can be described as follows: – Start with renderer.setTrack (NONE), renderer.setMode (STOP), and gameSelected=FALSE. – Subscribe to the call-status attribute for all phone-status-observers. Monitor subscription results, for any attibute value of “active”, if currentMode is PLAY, then do renderer.setMode (PAUSE); otherwise, if currentMode is PAUSE, do renderer.setMode (PLAY) i.e. any playing pauses during any phone call, or resumes when no phone call if paused during a game. – Subscribe to a supertux game, which has a session-start and no session-end, or session-start after session-end. Monitor subscription results, and, whenever fulfilled, first ensure that rendering is stopped and end subscriptions for any earlier game. Then render music/mood mapping for the selected game using the next subscription. – Subscribe to session-end and lives attributes of the selected game. When fulfilled, set renderer.setMood (mood), for mood according to lives as follows: • no lives: CALM i.e. game started, but not yet played. • lives above threshold: UP-BEAT • below threshold: DOWN-BEAT When session-end of game is after session-start, stop rendering and end subscriptions about this game.
4 Conclusions The idea of agents interoperating by communicating implicitly through M3 spaces worked well for this case study. We were able to execute the scenario in mind and add the phone agent to it without modifying the existing ontology. However, this work was performed inside a single team in a single company. This kind of close cooperation between people implementing the agents obviously makes things easier. Most of the work required in integrating the different agents into a coherent scenario was related to the mood renderer observing the state of the SuperTux game. The ontology did not directly provide information about all the state changes in the game. For example, when starting to play a level in the game, the mood renderer deduce this by observing the appearance of the lives attribute in the M3 space. This is however something that will probably be a very common case as the ontologies can not and
30
J. Honkola et al.
should not try to describe all possible information that some agent may need. It should suffice that the information is deducible from the explicitly stated information. Another aspect of local pervasive computing environments that was not really addressed in this case study was the handling of resources that can only be used by one agent at a time. In this case the mood renderer represents such resource (audio player), and implicitly it is usually the game that uses it as the mood renderer deduces the mood based on the game state alone. When the phone rings, the mood renderer will stop the music and continue after the call is finished. However, in a more complex case there needs to be (preferably) clear policy for each resource on who can use it. We currently think that this can be handled at the ontology level, with suitable coordination structures and an agent acting as a gatekeeper. In general, this is however a subject for further study. The basic principles of agent-dependent interpretation of information and implicit communication proved useful when the phoneline status information was added to the case study. When we only describe the information and not the actions to be taken by the agents, the agent can select suitable course of action based on the detailed knowledge of the agent in question. In this particular case, the correct course of action for the SuperTux game is to pause when the phoneline is active but not automatically resume the game when it goes idle, as the player is not necessarly ready to continue immediately after ending the call. On the other hand, the mood renderer should continue playing the music immediately after the distraction (in this case call) has ended. If we expanded the case study with more persons with each having mobile phones, the difference with mood renderer and SuperTux regarding the behavior when a call arrives would be even more. Sensible behavior for mood renderer would be to stop the music whenever anyone in the same (acoustic) location receives a call, while the game should only react to calls to the player of the game. Of course, in some cases it might be desirable to have coordinated response from a group of agents to certain events. We believe that this is best handled outside the system, by e.g. requirements in the scenario descriptions or ontology documentation. Anyhow, such coordinated responses would not apply to all agents but rather to only certain groups of agents involved in executing a given scenario. An obvious next step related to the interoperability hypothesis is to have external research partners extend the demo with agents implemented outside Nokia. This would clarify the required level of ontology and agent behavior description as the communication would then approximate more the real world case where the agents would be implemented by multiple vendors.
Acknowledgements The authors would like to thank Vesa Luukkala for helping with ontology modeling, and Antti Lappetel¨ainen for helpful comments.
References 1. Bergamaschi, S., Castano, S., Vincini, M.: Semantic Integration of Semi-structured and Structured Data Sources. SIGMOD Record 28(1), 54–59 (1999) 2. Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific American (May 2001)
Cross-Domain Interoperability: A Case Study
31
3. Foster, H., Uchitel, S., Magee, J., Kramer, J.: Compatibility verification for web service choreography. In: Proceedings of IEEE International Conference on Web Services, July 6-9, pp. 738–741. IEEE, Los Alamitos (2004) 4. Freeman, E., Hupfer, S., Arnold, K.: JavaSpaces Principles, Patterns and Practice. AddisonWesley, Reading (1999) 5. Herlea, D.E., Jonker, C.M., Treur, J., Wijngaards, N.J.E.: Specification of Bahavioural Requirements within Compositional Multi-agent System Design. In: Garijo, F.J., Boman, M. (eds.) MAAMAW 1999. LNCS, vol. 1647, pp. 8–27. Springer, Heidelberg (1999) 6. Lewis, G.A., Morris, E., Simanta, S., Wrage, L.: Why Standards Are Not Enough To Guarantee End-to-End Interoperability. In: Proceedings of 7th IEEE International Conference on Composition-Based Software Systems (ICCBSS), Madrid, Spain, February 25–29. IEEE, Los Alamitos (2008) 7. Network on terminal architecture (November 2008), http://www.notaworld.org 8. Oliver, I., Honkola, J.: Personal Semantic Web Through A Space Based Computing Environment. In: Second IEEE Interntional Conference on Semantic Computing, Santa Clara, CA, USA, August 4-7. IEEE, Los Alamitos (2008) 9. Web ontology language, http://www.w3.org/2004/OWL/ 10. Resource description framework, http://www.w3.org/RDF/ 11. Teymourian, K., Nixon, L., Wutke, D., Krummenacher, R., Moritsch, H., K¨uhn, E., Schreiber, C.: Implementation of a novel semantic web middleware approach based on triplespaces. In: Second IEEE International Conference on Semantc Computing, Santa Clara, CA, USA, August 4-7, pp. 518–523. IEEE, Los Alamitos (2008)
A Topology Based Approach for Context-Awareness in Smart Environments Antonio Coronato and Giuseppe De Pietro Institute of High Performance Computing and Networking ICAR-CNR, Via P. Castellino 111, 80131, Naples - Italy
[email protected],
[email protected] Abstract. This paper presents logic models and mechanisms to achieve context awareness and future context awareness. The proposed approach relies on handling the topology of the physical environment. We describe methods and tools to specify and implement topology services that can provide mechanisms for both rapid prototyping application services and handling context awareness information. Finally, we report some case studies and applications of the proposed approach1 .
1
Introduction
Ambient intelligence and Smart Environments are emerging discipline and technologies that brings intelligence to our common physical environments. Such intelligent environments relies on the use of a large variety of sensors and sensor networks, smart objects and devices, artificial intelligence technologies, etc. The resulting environments are systems capable of knowing and understanding the physical and virtual context of users and objects and responding intelligently to such knowledge [3]. This paper proposes logic models ed implementation mechanisms to achieve context-awareness in smart environments. All solutions are based on the physical topology of the environment. The logic model relies on the classification of locations in two classes: semantic (Rooms, Corridors, Laboratories, Floors, etc.) and physics (the area sensed by a short range RFID, the one sensed by a temperature sensor, the one covered by a WiFi access point, etc.). Since the physical locations are the ones sensed by sensors and positioning systems, whereas context-aware services and applications need semantic locations (e.g. in which room is the user? Where is he going to? Which is the temperature of the laboratory? etc.) we have defined a logic model that joins semantic locations and physical locations by means of a virtual concept that we named Atomic Location. In addition to this, we have defined logic models to represents properties like the contiguity and distance of locations. This makes the environment capable of inferring several kind of situations. 1
This work has been partly supported by the project OLIMPO within the POR Misura 3.17 funded by the Regione Campania.
S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 32–41, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Topology Based Approach for Context-Awareness in Smart Environments
33
The logic models are only part of the contribution of this paper. In addition, we have also provided a set of formal tools that enable the designer to specify the topology and some dynamic aspects in a rigorous and verifiable way and a set of ontologies and rules that enable the developer to implement intelligent mechanisms for context-awareness. The choice of formal tools is due to the fact that smart environments and ambient intelligence are more often applied to critical scenarios (e.g., healthcare, emergency situations, disaster recover, etc.). As well known from literature, they represent an effective tool to prevent faults [3]. The rest of this paper is organized as follows. Section 2 presents the logic models. Section 3 describes the formal tools that we apply to specify the topology. Section 4 presents the ontology on which relies our context-service and the rules that make it able to infer context information. Finally, Section 5 concludes the paper.
2
Logic View
The approach presented in this paper relies on a topology model that we have defined to provide a unique and uniform representation for location information, independently of the particular positioning system. The model is based on the concepts of physical and semantic locations. These notions are not new in literature [2], but we have partially re-elaborated them [5]. A physical location specifies the position of a mobile entity and is characterized by different granularities and scopes, depending on the particular positioning system adopted. Instead, a semantic location specifies the meaning of a location and usually groups more physical locations. As an example, GPS coordinates represent physical locations, whereas a semantic location can be a building, an office inside a building, a meeting room, an area in front of a wall-monitor, etc. The proposed model describes physical locations as the proximity to wellknown points. The technique of proximity is applicable when the environment is equipped with sensors to reveal the presence of mobile users, or with positioning systems to detect the position of mobile devices. In such cases, a sensor/positioning device, which covers a specific area, defines a physical location corresponding with the covered area. In the model, the covered area is called SensedArea. Let us show an example with two specific positioning systems. The former relies on the Wi-Fi technology to locate Wi-Fi enabled mobile devices. The latter uses RFID identification systems to locate RFID tagged mobile users. Corresponding sensed areas are: – Wi-FiSensedArea, which is identified by the physical area covered by a specific Wi-Fi AP, i.e. the region in which the AP is able to grant access to a Wi-Fi mobile device; – RFIDSensedArea, which is identified by the physical area covered by a specific RFID reader; i.e., the region in which the RFID reader is able to detect an RFID tagged user.
34
A. Coronato and G. De Pietro
In addition to these, a set of semantic locations, like Building, Floor, Room and Corridor, have been defined. Finally, the concept of AtomicLocation is used to represent the linking element between physical and semantic locations. Its physical dimension corresponds with the one of the SensedArea of the finest positioning system in the environment. In the proposed example, an AtomicLocation corresponds with an RFIDSensedArea because an RFID reader covers a smaller area than the one covered by a Wi-Fi AP. As shown in figure 1, the concept of AtomicLocation is used to build a grid of locations in the physical environment (AtomicLocation Perspective). Therefore, some RFID readers are positioned in order to match with AtomicLocations. Not all the AtomicLocations are covered by an RFIDReader (RFIDSensed Perspective). Differently, each Wi-Fi AP spreads a signal over many AtomicLocations (Wi-FiSensedArea Perspective). Finally, semantic locations in turn group many atomic locations and correspond with well defined physical spaces (SemanticLocation Perspective). As a result, relationships between sensed areas and semantic locations are built by means of atomic locations. This makes the model able to adapt easily
Fig. 1. Logic Models
A Topology Based Approach for Context-Awareness in Smart Environments
35
Fig. 2. Contiguity relationships between locations
to changes like the introduction of new positioning systems or the definition of new semantic locations. The introduction of a new positioning system obviously means that the physical dimension of an atomic location has to be changed, but the location model still remains applicable. This model provides some mechanisms that enable the conferring of semantic to some information. As an example, in addition to the localization problem that has already been discussed, let us focus on the setup of sensors, for example a temperature sensor. Let us also suppose that there’s the environment is equipped with a topology service that handles the topology model. When the sensor is installed, the topology service is updated with information on i) the new resource; ii) its location; and, iii) its covered area. From then on, all information coming from that sensor are associated to the covered area; that is, the sensed temperature is the temperature of the covered area. Another aspect, which is relevant to our aims, is the contiguity of locations. We say that two locations (L1 and L2 ) are contiguous if it is possible to move from L1 to L2 directly and viceversa. The relation of contiguity among locations is logically represented by bi-directional graphs as the one shown in figure 2. In the graph, the weight of each arch, d(x,y), represents the distance among the locations Lx and Ly or the minimum time a mobile entity needs to move from the starting location to the destination. This model is useful for tow main reasons. First, the system can predict the future position of a moving user. For an example, in a museum the visitor who is in a room will next be in one of the contiguous rooms. As a result, the system can prepare its resources depending on the future position. Second, this can help in detecting and avoiding inconsistencies. As a matter of fact, a tracking system can easily detect a large set of false positive using the concepts of distance and time.
3
Specification View
In this section we briefly describe some formal tools adopted to specify the topology of a smart environment and several dynamic aspects.
36
3.1
A. Coronato and G. De Pietro
Ambient Calculus
Ambient Calculus was proposed by Luca Cardelli and Andrew Gordon to describe the movement of processes and devices. It is a framework that derives from π-calculus [9]. In Ambient Calculus, all entities are described as processes. An important kind of process is an ambient, which is a bounded place where computation happens. An ambient delimits what is and what is not contained and executed within it. Each ambient has a name and can be moved as a whole, into and out of other ambients, by performing in and out operations. Ambient Calculus allows a description of complex phenomena in terms of creation and destruction of nested ambients, and movement of processes into and out of these ambients. In Ambient Calculus, n[P] represents an ambient, n is the name of the ambient, and P is the process running inside it. The process “0” is the one that does nothing. Parallel execution is denoted by a binary operator “|” that is commutative and associative. “!P” denotes the unbounded replication of the process P within its ambient. The following expression n[P1 | . . . | Pj | m1 [...] | . . . | mk [...]]
describes an ambient with the name, n, that contains j processes with the names P1 , ..., Pj and k ambients with the names m1 , ...mk . Figure 3 reports the equivalent graphic representation.
Fig. 3. Ambient Calculus: Graphical Representation for Ambients
Some processes can execute an action that changes the state of the world around them. This behavior of processes is specified using capabilities. The process M.P executes an action described by the capability M, and then continues as the process P. There are three kinds of capabilities, for entering (in), for exiting (out ) and for opening up (open) an ambient, respectively. Figure 4 shows these three capabilities. The process “in m.P” instructs the ambient surrounding “in m.P” to enter a sibling ambient named m. The reduction rule, which specifies the change in state of the world is: n[in m.P] | m[Q] → m[n[P] | Q] (a)
The action “out m.P” instructs the ambient surrounding “out m.P” to exit its sibling ambient named m. The reduction rule is: m[n[out m.P] | Q] → n[P] | m[Q] (b)
A Topology Based Approach for Context-Awareness in Smart Environments
37
Fig. 4. Ambient Calculus: Capabilities
The action “open m.P” provides a way of dissolving the boundary of an ambient named m located at the same level as open. The rule is: open m.P | m[Q] → P | Q (c)
Output and input actions are respectively denoted by “M.P” and “(x).P”, to release and capture capabilities. Finally, the restriction operator (Vn)P creates a new (unique) name n within a scope P. The new name can be used to name ambients and to operate on ambients by name. 3.2
Ambient Logic
Ambient Logic is a temporal (modal) logic that allows the expression of properties with respect to a specific location (ambient). Logical formulas in ambient logic include i) those defined for propositional logic, like negation (¬), disjunction (∧), and conjunction (∨); ii) universal (∀) and existential (∃) quantifications, as in first-order-logics; iii) eventually (3) and always () temporal operators, as in temporal logics; and, in additionto classic logics, iv) spatial operators such as somewhere ( ) and everywhere ( ). The satisfaction relation P |= A means that the process P satisfies the closed formula A (i.e. A contains no free variables). More specifically, the formula P |= A@L means that the process P satisfies the closed formula A at the location L. In the temporal modality, the relation P → P0 indicates that the ambient P can be reduced to P0 using exactly one reduction rule. Similarly, in the spatial modality, the relation P ↓ P0 indicates that the ambient P contains P0 within exactly one level of nesting.
38
3.3
A. Coronato and G. De Pietro
Real-Time Temporal Logic
Despite their name, temporal logics (e.g. Ambient Logic) do not provide mechanisms for reasoning about “time” in a quantitative sense; i.e. they do not enable you to express real-time constraints. For this reason, several extensions to temporal logics have been proposed [1], which fall into two categories: explicit time logics or implicit time logics. For the rest of the paper, we will refer to the Real-Time Temporal Logic (RTTL), which is an explicit time formalism that relies on the use of a special variable T to indicate the value of time at any state and use first-order quantification over that variable to make real-time assertions. With this logic, the formula: ∀ t ≥ T0 : (α ∧ t=T )
means that for any instant after T0 , α is always true. 3.4
Application of Such Formal Tools
This sub-section shows how to apply the formal tools presented so far. In particular, as first it should be provided a structural view of the topology. To achieve this, we use Ambient Calculus, each location being an ambient. Indeed, we can model a room as an ambient of ambient calculus. The room contains resources that are steel described as ambients and is located at a floor, which is also an ambient. Next, we can describe the movements that mobile entities (users and resources) can perform within the environment by means of the in and out capabilities (users are modeled as ambients as well). Finally, constraints regarding the movements within the environment and the temporal duration of movements can be specified using Ambient Logic and the Real Time Temporal Logic respectively.
4
Implementation View
Our implementation relies on the use of Formal Ontology and reasoners. In literature one can find many definitions of ontology. For the purposes of this work, we are interested in ontology as a formal tool for modeling domains [6]. Main elements to take into account in an ontology are classes, sometimes called concepts, which represent entities of the domain being modeled, and relations among classes. A class can have subclasses that represent concepts, which are more specific than the superclass, and instances, which are elements of the real world classified as members of a class. In practical terms, developing an ontology includes: 1. 2. 3. 4.
defining the classes in the ontology; arranging the classes in a taxonomic (subclass–superclass) hierarchy; defining the relations; figuring out relations among classes and entities.
A Topology Based Approach for Context-Awareness in Smart Environments
39
In the proposed methodology, ontology is used to define, in a formal and unambiguous way, all the entities (resources, services, users, etc.) involved in the application domain being designed, and the relations among them. Relations are formally defined as primitives [6] by specifying their meaning and what mathematical properties (reflexive, transitive, commutative, etc.) they satisfy. Figure 5 presents an example of ontology.
Fig. 5. Ontology
It is quite obvious that a Building contains one or more Floors, which, in turn contain Rooms, Corridors, Laboratories, etc. These are semantic locations that are related to physical locations through Atomic Locations. Thus, physical locations like RFID Sensed Area, WiFi Sensed Area, Temperature Sensed Area, etc, as well as semantic locations, maps some atomic locations. It is important to note that other concepts, not reported in the figure, are related to atomic locations. Indeed, each sensor (e.g. RFID Reader, WiFi Accecc Point, Temperature Sensor, etc.) is physically installed in a specific atomic location. At the same way, a User moves from atomic locations to atomic locations. The difference between the relations contains and maps is that the latter one doesn’t imply a perfect correspondence between the locations. In other words, we can say, for example, that a WiFi sensed area contains (maps) a specific atomic location with a certain degree of approximation due to the variability of the electromagnetic fields. On the contrary, a floor contains completely a room without uncertainty. The relation isContiguous has already been defined in section 2. On top of the ontology it is possible to specify several rules that enable the system to infer context information. For example, rules can represent the logic and reasoning mechanisms the environment utilizes to choose the location information with the most proper level of granularity when a mobile object is located
40
A. Coronato and G. De Pietro
Fig. 6. Rules
by more than one positioning system. To achieve this aim, we report in figure 6 two SWRL rules we have realized, built on the previous ontology. In detail, the first rule makes the environment able to conclude that, if a mobile entity is sensed by an RFID reader, then it is located in the atomic location associated with the sensed area covered by that RFID reader. The second rule enables the environment to conclude that, if a mobile entity is sensed by a Wi-Fi access point and is not sensed by an RFID reader, then it is located in the atomic location associated with the sensed area covered by that Wi-Fi access point. When a mobile entity is sensed by both an RFID reader and a Wi-Fi AP, both the rules shown above enable the environment to choose the atomic location with the finest granularity. As a matter of fact, since the mobile entity is sensed by an RFID reader, the condition of the second rule is not verified. Otherwise, the condition of the first rule is verified and, thus, the environment concludes that the mobile entity is located in the atomic location associated with the sensed area covered by the RFID reader. In addition to this, it is possible to conceive other rules (not reported for the sake of brevity) that equip the environment with mechanisms to: 1. assign semantic; as an example, the temperature sensed by a temperature sensor, which is located in an atomic location, is automatically considered as the temperature of the Temperature Sensed Area associated to that atomic location; 2. predict future position of a user; from the current position the user can move only in semantic location that is contiguous to the current one; 3. avoid inconsistent location detections; if a user is located in a location “far” (by considering the physical distance and the minimum time a user needs to move) from the last one in which he has been correctly located, the location system is faulty.
A Topology Based Approach for Context-Awareness in Smart Environments
5
41
Conclusions
In this paper we have presented logic models and implementation mechanisms to achieve context-awareness in smart environments. All solutions are based on the physical topology of the environment. In addition to this, we have presented a framework of formal tools that enable designers to specify in rigorous and verifiable way several aspects of a topology, both static and dynamic. Such a choice has been motivated considering that more often smart environments and smart technologies are applied to critical scenarios, for which a good practice is to specify requirements and functionalities in an unambiguous way. Such solutions appear to be effective in providing smart environments with intelligent facilities.
References 1. Heitmeyer, C., Mandrioli, D.: Formal Methods for Real-Time Computing. John Wiley & Sons, Inc., Chichester (2004) 2. Pradhan, S.: Semantic locations. Personal Technologies 4(4), 213–216 (2000) 3. Cook, D.J., Augusto, J.C., Jakkula, V.R.: Ambient intelligence: Technologies, applications, and opportunities. Pervasive and Mobile Computing (April 15, 2008) (in press) (accepted manuscript), ISSN 1574-1192, doi:10.1016/j.pmcj.2009.04.001, http://www.sciencedirect.com/science/article/B7MF1-4W2W5PH-1/2/ 7ac602bb16ca409e2f70c13aa24ae0d5 4. Saha, G.K.: Software fault avoidance issues. ACM Ubiquity 7(46), 1–15 (2006) 5. Coronato, A., De Pietro, G., Esposito, M.: A Multimodal Semantic Location Service for Intelligent Environments: An Application for Smart Hospitals. Journal of Personal Communications (to appear) 6. Guarino, N.: Formal Ontology, Conceptual Analysis and Knowledge Representation. International Journal of Human-Computer Studies. Special Issue on Formal Ontology, Conceptual Analysis and Knowledge Representation (1995) 7. Cardelli, L., Gordon, A.D.: Ambient logic. Theoretical Computer Science. Elsevier (ed.) (June 2000) 8. Cardelli, L., Gordon, A.: Anytime, anywhere modal logics for mobile ambients. In: 27th ACM Symposium on Principles of Programming Languages (2000) 9. Milner, R.: The Pi Calculus and Its Applications (Keynote Address). In: Int. The Proc. of the 1998 Joint International Conference and Symposium on Logic Programming. MIT Press, Cambridge (1998)
Anonymous Agent Coordination in Smart Spaces: State-of-the-Art Alexander Smirnov1, Alexey Kashevnik1, Nikolay Shilov1, Ian Oliver2, Sergey Balandin2, and Sergey Boldyrev2 1 SPIIRAS, 39, 14 line, 199178, St.Petersburg, Russia {smir,alexey,nick}@iias.spb.su 2 Nokia Research, Itämerenkatu 11-13, 00180, Helsinki, Finland {ian.oliver,sergey.balandin,sergey.boldyrev}@nokia.com
Abstract. Development of new technologies brings people new possibilities such as smart spaces. Smart spaces can provide better user experience by allowing a user to connect new devices flexibly and to access all the information in the multi device system from any of the devices. The paper describes work-in-progress performed under a joint project of SPIIRAS and Nokia. The project is aimed at the analysis of existing anonymous agent coordination languages and adaptation of them to coordination of smart space devices. Keywords: smart space, coordination, anonymous agents.
1 Introduction Modern device usage is moving towards so called “smart spaces” where a number of devices can use a shared view of resources and services [1, 2]. Smart spaces can provide better user experience by allowing a user to bring in new devices flexibly and to access all the information in the multi device system from any of the devices. Examples of smart spaces can be found in [3, 4, 5]. As is assumed by such environment, one of the essential features is the information sub-system that should provide a permanent robust infrastructure to store and retrieve the information of different types from the multitude of different environment participants. Further development of Smart Spaces methodologies and techniques is a key for creating attractive use case studies and building the developer eco-system in the future. In the previous project [6] a good grounding to start implementing some of the key ideas of Smart Spaces has been prepared. In the earlier definitions of Smart Space no coordination between agents has been considered yet. However, in reality there is a need for some coordination between the agents, e.g. for resolving conflicts of simultaneous access to the shared data resource. The main focus of this paper is analyzing the state-of-the-art in the area of agent coordination languages and mechanisms of conflict resolution in at the time of agent coordination. S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 42 – 51, 2009. © Springer-Verlag Berlin Heidelberg 2009
Anonymous Agent Coordination in Smart Spaces: State-of-the-Art
43
The analysis of the state-of-the-art is performed taking into account earlier developed reference model of the smart space [6] and additional requirements such as agents anonymity and compatibility with the developed approach.
2 Reference Model of Smart Space The general reference model of the profile-based context-aware smart space is illustrated by Fig. 1. Each node (a logical unit capable to perform some actions depending on information units stored in accessible information storages) can be located at several physical devices or several nodes can be located at one physical device. A node can access a predefined limited set of information storages to acquire information units from them. The nodes exchange information units (IU) / fragments, which are stored in the distributed storage accessible by the nodes of the smart space. Domain Ontology
Configurator Node User User Service Service
User Node 1
…
User Node k
Distributed Information Unit/Fragment Storage Node 1 … Node n
User Profile
User Profile
Node Profiles Context Context History
Fig. 1. Reference model of the information recycling-enabled smart space
IU represents a logical expression: “subject”-“predicate”-“object” = [true | false]. Subject is usually an actor (human or node that performs or is supposed to perform a certain action, e.g. a multimedia centre). Predicate is an action that has to be performed (e.g., “playing music”). Object points to the target with which the action has to be performed (e.g., a song being played). Information fragment – is an IU extended by the “precondition” statement, which defines when the appropriate information unit is valid, and also might include the “post-condition” statement. The post-condition is aimed to describe what is supposed to be done when the precondition is met the information unit is hold. Each information fragment may consist of several preconditions, information units and post-conditions. The ontology defines possible capabilities of the nodes in the smart space and sets some constraints on these capabilities in the form of IUs and information fragments, which are stored in the information storages. The information storages can be physically located at one or several physical devices or several information storages can be located at one physical device. IUs can
44
A. Smirnov et al.
be transferred between information storages in order to become accessible to appropriate nodes. To estimate the efficiency of the information allocation the cost function has been introduced. The “cost” of IU transfer can be calculated as a sum of IU transmission and IU receiving costs. These costs are different for different information storages. However, cost of IU transmission from the given information storage is constant and does not depend on the destination information storage. Analogously, the cost of IU receiving for a given information storage is constant and does not depend on the transmitting information storage. To increase the efficiency of the smart space operation the personification is required. For this purpose the users and devices have to be described by a set of characteristics required for interaction with other nodes. User nodes communicate with user profiles. The user profiles are stored in information storages accessible by user nodes and contain the following information: - User identification and auxiliary information (user ID, name, personal data, contact information such as e-mail, mobile phone, etc.) - User privacy policies (e.g. calendar can be shared with certain configurator nodes, phonebook cannot be shared) - User preferences (detectable rules of the user preferred behaviour, e.g., microphone loudness, light brightness, etc.). Models and algorithms for detection of the user preferences is out of the scope of the project. Node profiles help to communicate with various nodes. The node profile is stored in an information storage accessible by the configurator node and contains the following information: - Node identification (its ID and name) - Possible statuses of the node (e.g., light is on, 30% brightness) - Node capabilities (what the node can do) - described via a domain ontology The context describes the current status of the smart space. It is represented by the set of statuses of nodes constituting the smart space. The context forms a part of the information environment where the nodes interact [6]. In the current project the information environment is a smart space. The urgency of the information contained in the context can be defined in relation to all nodes taking part in the information exchange. In course of time the context changes, thus a new context appears. The context history stores these contexts. The history of contexts enables: (i) information recycling (in what conditions an information unit/fragment was actual), (ii) finding behaviour patterns for users (user preferences), and (iii) finding behaviour patterns for nodes (anticipation of node statuses). User nodes can use additional modules (e.g., user nodes can apply to the calendar in order to get information about current user’s appointments). This interaction can be done via exchange of same information units. E.g., having access to the current smart space context the calendar can generate information unit that the user is going to have a presentation in the room, where he/she is now. One of the major principles of the smart space organisation is that the participating agents do not know about the presence of each other. There are no any direct links between them (Fig. 2). Each agent can put facts (information units) into the smart
Anonymous Agent Coordination in Smart Spaces: State-of-the-Art
45
space (shared information storage), which are accessible by other participating agents, read them and modify. But it is left unknown which agent put each particular information unit (an agent only knows which information units were put by it). In this study the agents are considered to be non-antagonistic, i.e. they do not put information with an intention to disturb/destroy the smart space. The issues of security are out of the scope of this study. The problem of simultaneous access to the shared information is similar to that in the database management systems. It is necessary to make sure that the information units stored cannot be simultaneously modified and read or modified and deleted, etc. This mechanism used in the database management systems is referred to as transactions. In smart spaces such a mechanism has to be developed and described using ontology.
Agent 1
X
Agent 2
I/F
Shared Information Storage Fig. 2. Communication of anonymous agents
As a result the requirements and associated problems have been formulated as criteria for the state of the art (Table 1). Table 1. Requirements and associated problems Problems to be solved Agent anonymity Conflict resolution Coordination Control flow Interoperability Information transfer Information consistency
Requirements Anonymity in terms of “agents’ knowing each other” Avoiding problems arising from simultaneous access (read/delete/update) to shared information Compatibility with the approach Information driven control has to be provided Interoperability has to be supported by ontologies Compatibility with information transfer via information units (RDF triples) Checking of relationships between information units
3 State-of-the-Art: Results Studied research efforts consider the agent coordination problem from different aspects. These include: agent coordination languages [7-22], coordination of entities
46
A. Smirnov et al.
[23-27], and conflict resolution at the time of agent coordination [28-30]. Such review made it possible to cover all problems set (see Introduction). Below the summary of the technologies and techniques used for problem resolution is presented. 3.1 Agent Anonymity All considered works can be split into two major groups: (i) agents communicate directly and (ii) agents communicate through a certain information space (tuple space, dataspace, shared space of information, etc.). In the latter case the direct communication between agents assumes certain knowledge of each other. The former case is very similar to the smart space philosophy where agents share information via putting it into the shared storage. The Encapsulation Coordination Model (ECM) [18] also assumes anonymous communication through specific communication means, but this approach is not compatible with the smart space approach. 3.2 Conflict Resolution In this section the conflict is considered as a problem of simultaneous access to the shared information is similar to that in the database management systems. Three major techniques for solving this problem have been considered. The database transactions guarantee that database operations are processed reliably, without data corruption. They also provide for information consistency by checking records in the related tables. The semaphores are technique preventing simultaneous access to the same piece of information by two or more entities. They are implemented via variables and have a number of disadvantages (e.g., a semaphore can be locked and not released, deadlocks may occur). The elaboration of semaphores is the technique of monitors. The monitors are usually integrated in compilers or interpreters to enable transparent locking and unlocking of jointly used resources. In the current project the technique of semaphores is planned to be used with elements of the transaction and monitor mechanisms. 3.3 Control Flow Two types of control flows are usually identified: event-driven control and information-driven control. The event-driven control assumes the following behavior “do something if a certain event occurs”. This is the most common way of creating information systems, which assumes production, detection, consumption of, and reaction to events [31]. For example, when a as soon as the temperature in the room decreases below given level the heating is switched on. Information-driven control assumes presence of intelligence. E.g., an agent may decide to “do something if certain knowledge becomes available”. To make this work, the idea of information must be treated in such a way as to give semantic properties (meaning, content) a role in the explanation of system behaviour. On this interpretation, some of the behaviour of information-driven control systems is causally explained by the statistical correlations that exist between internal states and the external conditions about which they carry information [32, 33]. Simple examples include computing systems that support instructions like: if then
Anonymous Agent Coordination in Smart Spaces: State-of-the-Art
47
else . However, the way a condition works may change conditionally under other controls. For example, the way A influences B can depend on C [34]. The major difference is that in the information driven control a performed action is not necessarily related to the new knowledge. This new knowledge can be used by an agent to clarify some existing knowledge that, in turn, becomes the reason for the action [35]. In the smart space approach corresponds to the information-driven control due to the fact that actions are performed by the agents depending on the information available in the shared storage. 3.4 Information Transfer Information transfer formalisms in the reviewed approaches can be divided into two major groups: (i) formalisms specifically developed for particular tasks/projects, and (ii) commonly used formalisms. The first group includes: - tuple based formalism {“a string”, 15.01, 17, “another string”} [9]; - XML dataspaces [8]; 3 foo blahblah 17 - rule based predicates [21] adjacent(X,Y), location(robot, X), location(car, X) The second group includes RDF formalism, i.e. the whole document can be seen as a set of subject-predicate-object triples [27]: {:John :Loves :Mary}. Since in the smart space approach the formalisms are planned to be used by various devices the usage of specifically developed formalisms is not reasonable. Among the commonly used formalisms the rule-based predicates and tuples can be mentioned. The latter are well compatible with RDF used in smart spaces, which is basically an RDF triple. As a result the techniques used in approaches assuming tuple-based information exchange can be considered as compatible with the smart space approach. 3.5 Coordination 3 techniques of coordination are used: - Negotiation - Organizational structuring - Multiagent planning Generally speaking, all three techniques can be called negotiation. However, there is some difference. While pure negotiation assumes information exchange and following certain behavior strategies by agents, the organizational structuring techniques try to
48
A. Smirnov et al.
provide this by supplying an a priori organization by long-term relationships between agents [17], (e.g., master/slave). Multiagent planning assumes existence of a certain planning engine that produces action plans for all agents to follow taking into account their capabilities and preferences. Since the smart space approach assumes the information driven control and consists of anonymous agents the negotiation mechanisms could not be directly implemented. The reason for that is that negotiation assumes direct communication between agents, which is not possible in case of agents’ anonymity and interaction through an information space. However, certain scenarios might involve elements of other coordination techniques such as multiagent planning (e.g., in a conference room the planning node may create presentation schedules for participants). 3.6 Interoperability In most systems the communication between agents is provided by the common understanding of messages, which in turn is supported by ontologies. In other systems the interoperability is achieved via using the same limited specified language but this is not applicable to the smart space approach. In technical level interoperability can be achieved using techniques like web-services, which has a unified description language WSDL and protocol for structuring messages exchange SOAP (e.g., [36]). There are following approaches for determine the degree of interoperability between two systems. The paper [37] introduces the five Levels of Conceptual Interoperability Model: - “System Specific Data”, no interoperability between two systems. Data is used within each system in a proprietary way with no sharing. - “Documented Data”, data is documented using a common protocol, such as the Object Model Template and is accessible via interfaces. - “Aligned Static Data”, data is documented using a common reference model based on a common ontology. - “Aligned Dynamic Data”, the use of the data within the federate/ component is well defined using standard software engineering methods such as UML. - “Harmonized Data”, semantic connections between data that are not related concerning the execution code is made obvious by documenting the conceptual model underlying the component. Usage of the ontologies is perfectly compatible with the smart space approach (Aligned Static Data interoperability level). Though, in different projects different knowledge representation formalisms are used (RDF, plain XML, other formalisms) this is not a critical difference and the technologies/techniques used in such projects can be easily adapted. 3.7 Information Consistency To provide for information consistency two techniques have been found. The first one is used in database transactions. They preserve the consistency of the information in the information space (tuple space, XML dataspace, etc.), e.g., if a records is deleted, the other records depending on it will be deleted automatically. The other technique
Anonymous Agent Coordination in Smart Spaces: State-of-the-Art
49
assumes usage of reasoners compatible with the information representation formalism (RDF). This will make it possible to avoid information inconsistencies in the shared storage. Since in the presented approach the consistency of the information inside the smart space is not important to a certain degree, these problems will not be considered.
4 Conclusion The preliminary list of techniques can be formulated as follows. For resolving the problem of simultaneous access to information the techniques of semaphores and monitors will be used. The particular implementation issues are a subject of the future research. For resolution of conflicts of interests the techniques of organizational structuring and multiagent planning are to be applied. And for interoperability support the technology of ontology management together with the RDF are to be applied. The presented results are work-in-progress. The future work will include development of anonymous agent coordination/behavior strategy. For this purpose it is planned to choose the “driver” techniques and technologies defined as the basis with their further adaptation to the smart space approach and testing through a case study. Acknowledgments. The paper is due to the joint project between SPIIRAS and Nokia Research.
References 1. Oliver, I., Honkola, J.: Personal Semantic Web Through A Space Based Computing Environment. In: Middleware for Semantic Web 2008 at ICSC 2008, Santa Clara, CA, USA (2008) 2. Oliver, I., Honkola, J., Ziegler, J.: Dynamic, Localised Space Based Semantic Webs. In: WWW/Internet Conference, Freiburg, Germany (2008) 3. Oliver, I.: Design and Validation of a Distributed Computation Environment for Mobile Devices. In: European Simulation Multiconference: Modelling and Simulation 2007, Westin Dragonara Hotel, St.Julian’s, Malta, October 22-24 (2007) 4. Oliver, I., Nuutila, E., Seppo, T.: Context gathering in meetings: Business processes meet the Agents and the Semantic Web. In: The 4th International Workshop on Technologies for Context-Aware Business Process Management (2009) 5. Jantunen, J., Oliver, I., Boldyrev, S., Honkola, J.: Agent/Space-Based Computing and RF memory Tag Interaction. In: The 3rd International Workshop on RFID Technology Concepts, Applications, Challenges (2009) 6. Smirnov, A., Krizhanovsky, A., Shilov, N., Lappetelainen, A., Oliver, I., Boldyrev, S.: Efficient Distributed Information Management in Smart Spaces. In: Proceedings of the Third International Conference on Digital Information Management (ICDIM 2008), UK, London, November 13-16, pp. 483–488 (2008); [Electronic Resource], IEEE Catalog Number: CFP08DIM-CDR, ISBN: 978-1-4244-2917-2 7. 3APL An Abstract Agent Programming Language, http://www.cs.uu.nl/3apl/
50
A. Smirnov et al.
8. Cabri, G., Leonardi, L., Zambonelli, F.: XML Dataspaces for Mobile Agent Coordination. In: Proceedings of the 2000 ACM Symposium on Applied Computing, pp. 181–188 (2000) 9. Carriero, N., Gelernter, D.: Linda in context. Artificial Intelligence and Language Processing, Cohen, J. (ed.) 32(4), 444–458 (1989) 10. Castellani, S., Ciancarini, P., Rossi, D.: The ShaPE of ShaDE: a coordination system. Technical Report UBLCS, Dipartimento di Scienze dell’Informazione, Università di Bologna, Italy (1995) 11. Ciancarini, P., Knoche, A., Tolksdorf, R., Vitaly, F.: PageSpace, An Architecture to Coordinate Distributed Applications on the Web. In: Fifth International World Wide Web Conference, Paris (1996) 12. Barbuceanu, M., Fox, M.: COOL: A Language for Describing Coordination in MultiAgent Systems. In: Proceedings of the First International Conference oil Multi-Agent Systems (ICMAS 1995), pp. 17–24 (1995) 13. Dastani, M., Dignum, F., Meyer, J.: Autonomy and Agent Deliberation. In: Proceedings of The First International Workshop on Computatinal Autonomy - Potential, Risks, Solutions, Melbourne (2003) 14. Agent Communication Language, Foundation for Intelligent Physical Agents, FIPA, Ver. 1.0, Part 2 (1997) 15. Hindriks, K., de Boer, F., van der Hoek, W., Meyer, J.-J.C.: Agent programming in 3APL. Autonomous Agents and Multi-Agent Systems 2(4), 357–401 (1999) 16. Katasonov, A., Terziyan, V.: Semantic Agent Programming Language (S-APL): A Middleware Platform for the Semantic Web. In: 2nd IEEE International Conference on Semantic Computing, Santa Clara, USA, pp. 504–511 (2008) 17. Schumacher, M., Chantemargue, F., Hirsbrunner, B.: The STL++ Coordination Language: a Base for Implementing Distributed Multi-Agent Applications. In: Third International Conference, Coordination Languages and Models, Amsterdam, pp. 399–414 (1999) 18. Schumacher, M.: Objective Coordination in Multi-Agent System Engineering. LNCS (LNAI), vol. 2039. Springer, Heidelberg (2001) 19. Tolksdorf, R.: Coordination in Open Distributed Systems. PhD thesis, Technische Universität Berlin (1994) 20. Tolksdorf, R.: Laura - a service-based coordination language. Science of Computer Programming 31(2-3), 359–381 (1998) 21. Rao, A.: AgentSpeak(L): BDI agents speak out in a logical computable language. In: Perram, J., Van de Velde, W. (eds.) MAAMAW 1996. LNCS, vol. 1038, pp. 42–55. Springer, Heidelberg (1996) 22. Wells, G.: Coordination Languages: Back to the Future with Linda. In: Proceedings of the Second International Workshop on Coordination and Adaptation Techniques for Software Entities (WCAT 2005), Glasgow, Scotland, pp. 87–98 (2005) 23. Sixth Framework Programme: Context-aware business application Service Co-ordination in Mobile Computing Environments (2004-2007), http://www.ist-cascom.org 24. Herrero-Perez, D., Martinez-Barbera, H.: Decentralized Coordination of Automated Guided Vehicles. In: Padgham, Parkes, Müller, Parsons (eds.) Proc. of 7th Int. Conf. on Autonomous Agents and Multiagent Systems, Estoril, Portugal, pp. 1195–1198 (2008) 25. Simperl, E., Krummenacher, R., Nixon, L.: A Coordination Model for Triplespace Computing. In: Murphy, A.L., Vitek, J. (eds.) COORDINATION 2007. LNCS, vol. 4467, pp. 1–18. Springer, Heidelberg (2007) 26. Sixth Framework Programme: Triple Space Communication (2006-2009), http://www.tripcom.org
Anonymous Agent Coordination in Smart Spaces: State-of-the-Art
51
27. UBIWARE Platform (2007-2010), http://www.cs.jyu.fi/ai/OntoGroup/UBIWARE_details.htm 28. Wikipedia dictionary (2009), http://en.wikipedia.org/wiki/ACID 29. Dijkstra, E.: Cooperating Sequential Processes, Technical Report EWD-123, Technological University, Eindhoven, Netherlands (1965) 30. Hoare, C.: Monitors: an operating system structuring concept. Communications of the ACM 17(10), 549–557 (1974) 31. Chandy, K.: Event-Driven Applications: Costs, Benefits and Design Approaches, California Institute of Technology (2006) 32. Dretske, F.: The Explanatory Role of Information. Philosophical Transactions: Physical Sciences and Engineering 349(1689), 59–69 (1994) 33. Sloman, A.: What Sort of Control System Is Able To Have A Personality? In: Creating Personalities for Synthetic Actors: Towards Autonomous Personality Agents, pp. 166–208. Springer, Heidelberg (1995) 34. Sloman, A.: Semantics in an intelligent control system. Philosophical Transactions of the Royal Society 349(1689), 43–58 (1994) 35. Sloman, A., Prescott, A., Shadbolt, N., Steedman, M.: Semantics In An Intelligent Control System. Discussion. Philosophical Transactions-Royal Society of London. Physical Sciences and Engineering 349(1689), 43–58 (1994) 36. Watkins, E., Ardle, M., Leonard, T., Surridge, M.: Cross-middleware Interoperability in Distributed Concurrent Engineering 37. Tolk, A.: The Levels of Conceptual Interoperability Model. In: Fall Simulation Interoperability Workshop, Orlanda, Florida (2003)
On-the-Fly Situation Composition within Smart Spaces Prem Prakash Jayaraman1, Arkady Zaslavsky2, and Jerker Delsing2 1
Caulfield School of Information Technology, Monash University, 900 Dandenong Road, Melbourne, Victoria 3145
[email protected] 2 Department of Computer Science and Electrical Engineering, Lulea University of Technology, Lulea, Sweden {arkady.zaslavsky,jerker.delsing}@ltu.se
Abstract. Advances in pervasive computing systems have made smart computing spaces a reality. These smart spaces are sources of large amount of data required for context aware pervasive applications to function autonomously. In this paper we present a situation aware reasoning system that composes situations at runtime based on available information from the smart spaces. Our proposed system R-CS uses situation composition on-the-fly to compute temporal situations that best represent the real world situation (contextual information). Our proposed situation composition algorithm is dependent on underlying sensor data (hardware and software). These sensory data are prone to errors like inaccuracy, old data, data ambiguity etc. R-CS proposes algorithms that incorporate sensor data errors estimation techniques into our proposed dynamic situation composition based reasoning system. R-CS is built as an extension to Context Spaces, a fixed situation set based reasoning system. We implement R-CS dynamic situation composition algorithms over context spaces and validate our proposed R-CS model against context spaces’ fixed situation reasoning model. Keywords: Context Modelling, Smart Spaces, Reasoning Under Uncertainty.
1 Introduction Pervasive systems depend on contextual information to perform tasks in smart environments. The advances in pervasive computing research have made smart spaces a possibility. A smart space is a smart environment that has commuting embedded into it and can provide information that can be used to model the real world into the virtual computing world. We use the term smart spaces to represent such environments in the context of this paper. This contextual information originates from individual or multiple sensory sources. The sensor data by itself has no meaning but when fused with appropriate context provides a virtual view of the physical smart space. We define “Context” as “that which surrounds, and gives meaning to, something else” [1]. In smart spaces (pervasive environments), our definition of Context is realized by fusing data obtained from multiple sensors within the environment. S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 52–65, 2009. © Springer-Verlag Berlin Heidelberg 2009
On-the-Fly Situation Composition within Smart Spaces
53
In this paper we use a situation based reasoning approach. A situation as defined by [16] is “the combination of circumstances at a given moment”. We use contextual elements (attributes) to define these circumstances. For e.g. in a smart room, based on sensor inputs (contextual attributes) for light, noise and location, we can infer the occurrence of situations Meeting, Presentation. Hence a situation is a virtual world representation of a real world smart space. The situation based reasoning model Ranked-Context Spaces (R-CS) proposed in this paper evolves from Context Spaces (CS) [2, 14]. CS is a theoretical approach towards modelling context using situations. Each situation is a composition of context attributes with well defined regions. The CS model uses a fixed set definition of situations. This is acceptable in most cases but in smart spaces where new sensory inputs appear and disappear, the present CS system does not scale. We extend CS fixed situation space representation to a dynamic situation spaces composition where the situation space computed on the run is a collection of context attributes with it corresponding region that best defines the current situation within the smart space. To achieve this we use simple multi-attribute rating technique (SMART) [4] instead of the multi attribute utility theory (MAUT) [5] used by CS. The paper also proposes and implements algorithms into R-CS to incorporate sensor inaccuracies that improves the reasoning accuracy. By incorporating dynamic situation composition into CS model, our context model which is a multi dimensional virtual world representation of the physical smart space grows and shirks in with changing contextual attributes. We use the term multi-dimensional to represent multiple context attribute used to compose a situation. The key contribution of the paper includes proposing and implementing R-CS which incorporates dynamic situation composition and a novel sensor inaccuracy estimation algorithm into CS. We argue that current error estimation and fixed situation set representation of smart spaces used by CS is insufficient for changing pervasive environments. R-CS aims to develop and propose solutions to these challenges identified in CS. This papers contribution is a further work to our previous paper [15]. The rest of the paper is organized as follows. Section 2 presents related works in the area of context modelling. Section 3 provides a background into context spaces theory. Section 4 presents our proposed dynamic situation composition system R-CS, and our proposed sensor inaccuracy estimation algorithm which has been implemented in R-CS. Section 5 validates our proposed R-CS approaches by comparing the outcome of situation reasoning to CS approach. Section 6 concludes the paper.
2 Context Reasoning: An Overview Reasoning technologies generally used in context-aware pervasive system include rule based logic, Bayesian network and neural network. Pervasive environments need models that can handle reasoning under varying degrees of uncertainty. Context modelling and reasoning has been explored and presented by number of researchers. Context ToolKit [11] is a framework that presents a framework for interfacing between devices and software entities that provide contextual information. Five context abstractions namely Widgets, Interpreters, Aggregators, Services and Discoverers can be used by context-aware application developers to prototype applications. The
54
P.P. Jayaraman, A. Zaslavsky, and J. Delsing
widgets acquire context information from sensors, interpreter interprets that into high level information and aggregator collects related context information for a specific entity. Services execute behaviours and discovers maintain a list of services available to the application. Applications use discoverers to find specific components or a set of component that match certain criteria. Context models based on Logic and ontology have been presented in [12, 13]. In this paper we look at situation based reasoning approach. A situation is a higher level abstraction of the physical world which is inferred from low level context obtained from group of sensors (context attributes). A situation comprises number of context attributes, each having a well defined region within the specific situation space. The major challenge in situation reasoning is reasoning under uncertainty. One of the methods used for reasoning under uncertainty is Bayesian reasoning. In [6], a Bayesian technique for location estimation based on inaccurate sensor values is presented and [7] uses this approach for computing location of devices indoor. Applying Bayesian approach has the limitation of prior knowledge of probabilities which may not be always available. Fuzzy based systems [3] have been used in control systems that use Fuzzy Rule set to achieve specific results. Though Fuzzy systems are useful in controlling system process, they are less effective in reasoning situations [8] especially when we have situations with overlapping regions which makes reasoning under uncertainty harder. Neural network based approaches for classifying context has several shortfalls when used with quick deployable context aware applications. Neural network require an extensive training phase over sufficient number of patterns. Training can result in local minima, i.e., achieving less than optimal solutions [9]. Neural network are considered “black boxes” as it’s difficult to back track problem to the cause. Review of context modelling approaches shows that most have shortfalls which makes it difficult to provide a generic approach for context modelling under changing situations and applications domains. To this extent most of the work in the literature has not addressed the problem of dynamic smart spaces where new input data may influence the reasoning process. CS is also one such context model which provides a generic framework for context aware application to reason based on situation modelling. As mentioned in previous section, CS still does not address changing smart spaces. In the next section we present an overview of Context Spaces (CS) theory and introduce the heuristics incorporated in CS. We then identify the shortfalls of CS and propose additional heuristics that are incorporated into R-CS to aid reasoning under uncertainty in smart spaces.
3 Context Spaces The term context has a number of definitions. We provided one such definition in section 1. We elaborate the definition further re-defining context as that which refers to a particular centre of interest, providing added information about whom, where and when. We use context as a representation scheme for data that can be used to compose situations which can be reasoned upon. We use figure 1 o depict the relation between context and situations. The contextual information is dependent on
On-the-Fly Situation Composition within Smart Spaces
55
underlying sensor data (software and hardware) whose relations are computed by adding context to them at the context layer. The Situations layer is a composition of related context attributes each having a well defined region that best describes a situation. (E.g. Meeting, Presentation). CS uses the following definitions to represent the context-situation pyramid in the CS model [2, 14].
Situations
Context
Sensory Originated Data
Fig. 1. Context – Situation Pyramid
A “Context Attribute” defines data obtained from individual or group of sensor that is used in situation reasoning [2, 14]. By itself a “Context Attribute” is a single piece of data that has no meaning until we relate context to it (E.g.: Temperature vs. Temperature of Human Body). A “Situation Space” is a collection of context attributes that define a situation. Each context attribute within the situation has a well defined region. A “region” of a context attribute is a set of acceptable values that falls within the specific situation definition. CS uses a crisp representation of “region”. E.g. For a situation “Moderate Operating Environment of Machine A” context attribute temperature would be within the region “25 – 28 oC”. “Context State” represents the current value of a context attribute. CS uses multi attribute utility theorem to compute confidence in situation occurrence [2, 14]. The confidence is the measurement parameter used to determine the occurrence of a situation. To compute the confidence, CS uses two primary heuristics 1. Inaccuracies of Sensors using error probability 2. Contributing and Relevance factor of each context attribute ai in the situation space S. The heuristics identified provides sensor error probability, relevance and contribution value of the context attribute. The relevance and contribution are values given to the context attributes based on it context state (current value) in the situation space S. A contribution function computes the contribution (between 0 and 1) of context attribute ai over the region Ai defined in the situation space S. The relevance factor gives the attributes importance in the situation S. The situation space definition of CS has fixed set context attribute regions (A1, A2, An) whose importance (relevance given by a weight w) is given by
∑
n i =0
wi = 1
(1)
56
P.P. Jayaraman, A. Zaslavsky, and J. Delsing
We argue that in real world smart environments, this may not always be the case. Hence we propose the use of an extended situation space set and compute a temporal situation space on the run at every instance t, from available set of context attribute regions. The probabilistic error computation used in CS needs to have pre-determined probability but does not learn from other information like sensor data freshness, noise etc. Our proposal incorporates data freshness to estimate the sensor in-accuracy in addition to the probabilistic approach employed by CS aiming to decrease sensor inaccuracy caused due to aging data.
4 Situation Composition On-the-Fly The dynamic situation composition arises from our argument that fixed situation based reasoning does not cater to changing smart spaces where new context information is appears and disappears. Hence we need a model that can adapt to changing situations and compute situation definitions dynamically using available context attributes. Our proposal is to compute situation dynamically based on changing real world situations. We extend the situation space presented in CS from a finite set of highly relevant attribute to a universal situation space of all possible (not infinity) attributes. At any given instance t, R-CS builds a situation space on the run based on sensory input available within the smart space that has a equivalent context attribute definition in the universal situation space. Since our situation space definition is an extended set of all possible attributes, we use simple multi-attribute rating technique (SMART) [4] instead of Multi Attribute Utility Theory (MAUT) used in CS [5]. MAUT is an approach that uses a single utility value to fuse multiple attribute values. MAUT lacks ranking and attaching importance to these attributes. Hence we use SMART that used ranking approach to rank the attributes regions based on relevance and contributing factor. We also employ data quality parameters like data freshness into R-CS which is important while reasoning applications based on sensor originated data. Since our temporal situation space is a subset of the universal situation space, our context model that best represents the smart spaces grows and shrinks in size (attributes). 4.1 Situation Composition Based on Smart Space Information In [2] CS Situation space is defined as, a representation of a real world situation. Our proposal is to define a generic universal situation space as SU, a representation of situation with all possible combinations of context attributes and corresponding weights. As against the context spaces definition of weights in equation (1) we define this weight wi denoting the importance of the attribute region to the situation to be a value between a lower and a higher scale (application specific). CS [2, 14] uses MAUT approach to compute utility. We propose the use of (SMART) [4] that employs MAUT but adds ranking to MAUT. The ranking of a context region is dynamic and is dependent on both the relevance and it contribution to the situation. Based on this ranking, a temporal situation space ST is compute at runtime.
On-the-Fly Situation Composition within Smart Spaces
57
Definition1: Situation Space (SU) – A universal situation space which has all possible sets of context attributes and corresponding regions with their importance
SU = {w1 A1, w2 A2, ...wn An } where min ≤ w ≤ max
(2)
Definition2: Situation Space (ST) – A temporal situation space at time T which represents the real world situation with delta changes that has a list of context attributes and regions that best defines the situation at time T.
S T = {w1 A1 , w2 A2 ,...wm Am } where m ≤ n ,
∑
m i =0
wi = 1 and S T ⊆ SU
(3)
The temporal situation space ST is a dynamic representation of the situation at time t and changes as situation evolves. The temporal situation space is composed of the highly ranked attributes which is determined by the context attributes contribution and relevance values. By the use of a dynamic temporal situation composition, R-CS enforces dynamic ranking of attributes at runtime hence moving away from fixed situation space definitions to situation space definitions dynamically. The universal situation space at any point in time has n number of context attributes. Hence we propose a partition algorithm that partitions the situation space based on the relevance of the context attribute. This partition is application specific. The reason we chose a partitioned situation space is to reduce the complexity of iterating through all the attributes when confidence can be reached by considering the first few partitions. 4.2 Partitioning Parameter
δ
Relevance, the importance of the attribute region in the situation space SU is one of our key parameter to compute the partitioned situation space. We define a partition parameter δ called partitioner. Definition 3: Partition δ is defined as set of conditions C that satisfy a predicate P i.e. {C| P(C)}. The partition parameter partitions the universal situation space based on the predicate P. We use this partitioned space to compute the temporal situation space which takes into consideration the attributes contribution. E.g. the contribution for context attribute temperature for the situation “Machine Operating GOOD” will be 1 if temperature has value between 17 and 19 and will be 0 if it is outside the region. The temporal situation space is computed as a function of the relevance (weight) and the contribution. The algorithm iterates through every partition of the universal situation space until a confidence threshold is reached. The confidence threshold is defined as the difference in the confidence of the situations being reasoned. The higher the threshold, higher is the certainty of occurrence of one situation over the other. To reduce weight distribution while situations overlap, we propose a weight redistribution algorithm. The situation overlap occurs when non-coexisting situations attribute regions overlap with each. The
58
P.P. Jayaraman, A. Zaslavsky, and J. Delsing
weight redistribution is performed over the temporal space hence preserving the weights of context regions in the universal situation space. While reasoning two situations which are non-coexistent, the weights (importance) associated with that attribute regions is recomputed using the equation (5)
Max( wi ) − Min( wi ) Max( wi ) + Min( wi )
(4)
where wi is the set of weights assigned to attribute regions in the temporal situation space ST. Once the weight of overlapping attribute regions is computed, the entire situation space weights are normalized to compute the new weights. Figure 2 and 3 illustrates the pseudo-code to re-compute weights for overlapping regions and code to partition the Universal situation space.
Re-Initialize-Weights(TemporalSituation S, Temporal Situations S’) for each attribute-regions in S if (attribute-regions(S) = attribute-region(S’) attribute-regions(S).overlap = true end if end for for each attribute-region in S compute reduce-weight = (max(S-region-weight) min(S-region-weight)) / (max(S-region-weight) + min(S-region-weight)) if (attribute-region.overlap = true)) newweight = oldweight + reduce attribute = region.setweight (newweight) add attribute-region to returnSituationSpace end if return returnSituationSpace End
Fig. 2. Pseudo Code to recomputed overlapping attribute regions weights
Function Partition(UniversalSittuationsspace, δ) Declare PartitionedSituation for each attribute region in situationsspace for each predicate in δ if importance satisfies predicate add attribute region to partitionedSituation add partition number to attribute region end if end for return PartitionedSituation end procedure
Fig. 3. Pseudo Code to partition universal situation space SU
On-the-Fly Situation Composition within Smart Spaces
59
4.3 Incorporating Sensor Inaccuracy into Dynamic Situations The sensor inaccuracy used by CS is both primitive and has crisp boundary. The contribution using the error function is computed using the formula
ci = Perror ∗ RgionContribution
(5)
The RgionContribution is computed using a crisp boundary and is determined by checking if the sensor value falls within the attribute region. If it does not a value 0is returned and if it is a value 1 is returned. The Perror provides the probability by which the sensor reading is accurate. The crisp region does not suit all real world cases. Also CS does not change the sensor value based on the error probability. E.g. if there is 5% error, then the value can have a ± 0.05 variation from the outer boundaries. This is not considered in CS while computing the contribution. Hence we propose the use of non-crisp boundaries as illustrated in figure 4.
Fig. 4. Comparison between CS Crisp boundary and R-CS membership
To convert the crisp region into a value based on bordering regions, we employ the principle used by Fuzzy membership functions [3]. E.g. if we define a situation HOT whose temperature can be ranging from 34 to 40, a real value (context state) of 40.05 can still fall under the condition HOT if we have some tolerance in the outer regions of the context attribute. Our technique focuses on values that are inside and at the border of the defined region. Hence we re-define the context attribute’s region with two values namely outer range and inner range. The inner range represents values which returns contribution factor of 1 while the outrange is a ± value to the inner range values that returns a contribution based on where the sensor value falls in the outer range. Our approach tests the sensor value against the inner and outer region after applying the error probability over the sensed value. To compute sensor inaccuracy, we also introduce data freshness. By data freshness, we mean how recent the data is? We do not get into quality of context [10] as that is a topic of research by itself. Incorporating probabilistic sensed value and sensor data freshness, our new contribution function is given in (6). We implement these inaccuracy estimation algorithms into our proposed system. The freshness parameter is application specific.
ci = RgionContributioni ∗ σ i whereσ i = Perror ∗ Freshness
(6)
60
P.P. Jayaraman, A. Zaslavsky, and J. Delsing
is the composite error computed from probability of error and freshness. To improve sensor accuracy, more data quality related parameters can be added to σ.
5 Simulation and Evaluations R-CS has been implemented over CS model for sensor fusion and reasoning by incorporating the proposed algorithms for situation partitioning, re-computation of weights and improved error estimation. We present our evaluation by validating the proposed approaches against CS approach. 5.1 Sensor Inaccuracy We compare CS based confidence generation and R-CS based confidence generation using our proposed probabilistic incorporated, freshness based error computation algorithm. To test the effect of the improved error estimation algorithm, we ran simulations to reason a Presentation situation. The situation definition is shown in Table 1. Table 2, provides the output of using probabilistic error correction and data freshness technique. As it shows, the R-CS system computes a confidence of 1 for attribute noise taking into consideration its error rate while returns 0 to people attribute taking into consideration the freshness threshold is greater than 3. The result of our simulation run is shown in figure 5. The X-axis is the number of simulation run and the Y-axis is the confidence computed using CS and R-CS. The Table 1. Presentation Situation Definition Situation:
Presentation
Attributes
Regions
LIGHT NOISE PROJECTOR PEOPLE
“>-=10" " > 20 " "= ON" "> 5 "
"2" " = 2 " "= Presentation Mode" "= Inside" Meeting
Attributes A1
Regions "! DIM"
A2 A3 A4 A5
"> 4 " "!ON" ">1" "! Presentation Mode" "= Inside"
A6
A6
2 < δ 10 All the participants are nearby the meeting table There is a participant near the plasma/screen There is a participant near the window Participants are distributed throughout the room
1
2-10
0
Number of participants
Night
Evening
Day
Morning
Projector on
Plasma panel on
Curtains are closed
Screen is lowered down
Group L4
Time of day
Group L3
Light groups on
Group L2
Situational information
Group L1
State of actuator devices
Commands generated automatically, when specific combination of actuator states and events occurs
1. Turn-on the light group L4 2. Open the curtains 3. Lower the screen 4. Turn-off the light group L1 5. Open the curtains 6. Turn-off whole the light 7. Lift the screen 8. Close the curtains
… The black cells corresponding to the turned-on devices and other features of the room indicated a current situation. The right column consists of examples of the commands, automatically generated at occurrence of corresponding combination of events and equipment state. These commands were received during matching the real user command and the state of situational map. For some commands the conditions arisen them are slightly changed and adapted for implementation in automatic mode. For instance, a voice command to turn-off the light is said by user inside the room before she/he goes out. In automatic mode this action is performed when all the participants left the room. By this reason commands 6-8 in Table 1 to turn-off the light, lift the screen and close the curtains are performed when the number of participants is zero. By Figure 2 now we can explain how to control the meeting room in the automatic mode. Changing the situation caused by user behavior or day time or switching the actuators calls the procedure of the map analysis. If a predefined situation is detected a corresponding command for activation or deactivation of actuators is sent. The interaction between various software modules distributed on several computers is accomplished by client-server architecture via TCP/IP protocol. At that command and notification of the actuator states are collected and processed in queue. Scenarios of control of multimedia appliances were based on special dialogues of speech interaction between a user and the meeting room. The multimodal applications “SPIIRAS inquiry” and “St. Petersburg map” were adopted based on similar systems
84
A.L. Ronzhin and V.Yu. Budkov
realized in a multimodal kiosk [11]. Voice control system for TV set and radio implements commands to select a channel by its number or title, to change the settings of sound and picture. In the application “Smart board” voice commands intended for selecting color, width of pen, brush or other instruments for handwrited sketches on the touchscreen. Only examples of voice commands are mentioned above, which could be useful at interaction with the intellectual applications. Moreover, most of the commands could be activated by gestures on the touchscreen. In particular, in map-based applications a direct pointing to a graphical object is more needed, but speech commands are used for operation with objects [6]. Besides the current state of the dialogue the spatial position of a user should be taken into account. In contrast to control of the actuator devices the voice commands aimed for to the multimedia applications are perceived only from a space near the plasma panel (not far than 1.5 meters). This limitation helps to decrease number of false voice commands appeared as a result of background noise and parallel user conversations that increases accuracy of distant speech recognition. Special attention should be paid to influence of activity of controlled devices during transitional stages on the systems of audio-visual data processing. For instance, the speech recognition and sound source localization systems should be notified before starting the gears of screen and curtains, because their noise influences on the system performance. The analogical situation is appeared at turning-on the multimedia projector and condition system. Since these devices will work continuously then a system of spatial filtration of sound signal should suppress noises before the speech recognition phase. Opening the curtains can significantly influence on changing the luminosity in the room and lead to fail of the video tracking systems, for example loss of existent objects or appearance a false object in the room. The same problem happens when turning-on/turning-off the light, so the systems of video processing should be preliminary notified about changing the luminosity in the meeting room. Verification of the technological framework and experimental detection of potential conflicts at the room control appeared owing to uncoordinated work of equipment or unpredicted behavior of users are conducted. Analysis of the extended situational map, as well as discursive information of real dialogues, personified user data will allow us to extract templates of behaviors and preferences of main groups of users, scenarios of man-machine interactions and most important commands, which should be automatically performed to facilitate efficiency of meetings and lectures in the intelligent room.
4 Control of Meeting Room Facilities from Outside Remote control of room facilities and observation of current situation from outside were realized by web-interface adaptive to hardware-software features of a client’s device. A user could change state of four light groups, curtains, screen and multimedia projector, as well as get picture from five different cameras located inside the room. Formatting the web-page containing the controls and a picture from camera is automatically adapted to the peculiarities of client’s mobile device and browser. Table 2 contains a list of the Nokia mobile devices and their characteristics, which are taken into account at adaptation of web-page format and used in the experiments.
Multimodal Interaction with Intelligent Meeting Room Facilities
85
Table 2. The list of Nokia devices verified at control of the room Device
Operation system
Screen resolutions
Nokia N73
S60 3rd Edition (initial release) Symbian OS v9.1
240x320
Nokia N95
S60 3rd Edition, Feature Pack 1 Symbian OS v9.2
Nokia 5800
S60 5th Edition Symbian OS v9.4
320х240 240x320 320х240 360x640 640х360 Nokia E61 Nokia E60 Nokia E90 Nokia N810
S60 3rd Edition (initial release) Symbian OS v9.1 S60 3rd Edition (initial release) Symbian OS v9.1 S60 3rd Edition, Feature Pack 1 Maemo Internet Tablet OS 2008 Edition
320x240 352x416 800x352 800x480
Browser Auto Touchresolution orientation screen 234x277 234x302 314x200 234x277 + 234x302 314x200 360x493 360x640 + + 502x288 640х360 314x200 314x220 346x346 346x386 794x284 800x352 696x362 + 800x480
The window size, which could be used for web page view, is significantly varied owing to different screen sizes and browser options in the tested devices. The maximal resolution of browsers is used in the full screen mode, when service buttons are hidden. In the several devices the orientation of the screen could be changed manually and automatically that also increases the number of possible browser resolutions. As a result 14 different browser resolutions are applied in the referred list of the tested devices. Moreover, motion of cursor is discrete that requires to use sufficiently large buttons in order to select them using joystick or buttons for cursor control. The presence of touch-screen improves the control speed, but more usability is achieved with large buttons too. All these factors were taken into account during development of the web-page for control of the room facilities. Base layout of the web-page included buttons for turn activator states and the window with room picture presenting its current state and provide visual feedback of turning activators. At that buttons as check boxes and room picture were not overlaid. For each browser resolution a specific web-page layout was designed in order to maximally use the browser screen. First of all the changes applied to placement of the check-boxes and the window with room picture relatively each other, as well as the font size of captions is changed concerning the screen resolution. Three examples of web-page layouts are presented in the Figure 3 in real scale. Placement of elements in the web page is specified using the CSS style tables. The resolution and orientation of screen are checked every 500 ms using Java Script. At the changing screen parameters the corresponding layout of the page is automatically selected and generated for a client.
86
A.L. Ronzhin and V.Yu. Budkov
Fig. 3. Examples of web-page layouts
To increase size of the controls and the room picture a pop-up menu overlaid the picture window was designed in the last version of the web-page. As a result the room picture window uses the browser resolution completely, but the menu is appeared when user need to change state of the activators or chose another camera, as it is shown in Figure 4. In the initial mode the menu is hidden and a small icon located in right upper corner of the browser reveals about existence of the menu. Since the cameras send picture, which size has aspect ratio of 4x3, at the wide screen the room picture would be shown incorrectly. To avoid this problem the proportion of the picture is saved, but released space is used for talking head model, using of which increases the naturalness of interaction. The talking head pronounces notifications coming from the room server and confirms the performing of the user commands and other changes occurred in the room. The further research will be focused on introduction the multimodal interfaces into the mobile devices, since control by dozen controllers with different functions via web browser and mobile phone is not so convenient owing to not so large screen. Taking into account limited computational resources of the mobile devices a distributed model of audio-visual processing will be used. The signals recorded by the mobile devices will be transferred to the room server, where speech recognition, head tracking and other technologies will be applied for them. Such approach will allow user to communicate with the room using speech, gestures and other natural modalities.
Multimodal Interaction with Intelligent Meeting Room Facilities
87
Fig. 4. Examples of web-page layouts with pop-up menu and talking head model
5 Conclusion The developed intelligent meeting room is a distributed system with the network of intelligent agents (software modules), actuator devices, multimedia equipment and audio-visual sensors. The main aim of the room is providing of meeting or lecture participants with required services based on analysis of the current situation. Awareness of the room about spatial position of the participants, their activities, role in the current event, their preferences helps to predict the intentions and needs of participants. Context modeling, context reasoning, knowledge sharing are stayed the most important challenges of the ambient intelligent design. Assignment of easy-to-use and well-timed services, at that stay invisible for user, is one of another important feature of ambient intelligent. In the developed intelligent room all the computational resources are located in the adjacent premises, so the participants could observe only microphones, video cameras, as well as equipment for output of visual and audio information. Implementation of multimodal user interface capable to perceive speech, movements, poses and gestures of participants in order to determinate their needs provides the natural and intuitively understandable way of interaction with the intelligent room. Development of a network of intelligent meeting rooms gives the opportunity to organize a videoconference between spatially distributed participants and facilitates to increase collaboration, access to higher knowledge/competence, reduce costs for transport and staff, and increase the quality of education due to automatic immediate monitoring by every student during the lessons. Using the various combinations of multimodal interfaces and the equipment of the intelligent meeting room the fundamental issues of human-machine interaction are studied and applied models in security medicine, robotics, logistics and other scientific areas are investigated now. Acknowledgments. The intelligent meeting room of SPIIRAS was supplied in 2008 at the financial support of the Russian Foundation for Basic Research. Full spectrum of mobile devices was donated by Nokia to study issues of remote user interaction with a smart space.
88
A.L. Ronzhin and V.Yu. Budkov
References 1. Ducatel, K., Bogdanowicz, M., Scapolo, F., Leijten, J., Burgelman, J.-C.: ISTAG - Scenarios of Ambient Intelligence in 2010. European Commission Community Research (2001) 2. Aldrich, F.: Smart Homes: Past, Present and Future. In: Harper, R. (ed.) Inside the Smart Home, pp. 17–39. Springer, London (2003) 3. Gann, D., Venables, T., Barlow, J.: Digital Futures: Making Homes Smarter. Chartered Institute of Housing, Coventry (1999) 4. Masakowski, Y.: Cognition-Centric Systems Design: A Paradigm Shift in System Design. In: 7th International Conference on Computer and IT Applications in the Maritime Industries, pp. 603–607 (2008) 5. Degler, D., Battle, L.: Knowledge management in pursuit of performance the challenge of context. Performance Improvement 39(6), 25–31 (2007) 6. Tzovaras, D. (ed.): Multimodal User Interfaces: From Signals to Interaction. Springer, Heidelberg (2008) 7. Chai, J., Pan, S., Zhou, M.: MIND: A Context-based Multimodal Interpretation Framework. Kluwer Academic Publishers, Dordrecht (2005) 8. Ronzhin, A.: Topological peculiarities of morpho-phonemic approach to representation of the vocabulary of Russian speech recognition. Bulletin of Computer and Information Technologies (9), 12–19 (2008) 9. Wallhoff, F., Zobl, M., Rigoll, G.: Action segmentation and recognition in meeting room scenarios. In: International Conference on Image Processing (ICIP 2004), pp. 2223–2226 (2004) 10. Lobanov, B., Tsirulnik, L., Železný, M., Krňoul, Z., Ronzhin, A., Karpov, A.: AudioVisual Russian Speech Synthesis System. Informatics, Minsk, Belarus 20(4), 67–78 (2008) 11. Karpov, A., Ronzhin, A.: An Information Enquiry Kiosk with a Multimodal User Interface. In: 9th International Conference Pattern Recognition and Image Analysis (PRIA 2008), Nizhny Novgorod, Russia, pp. 265–268 (2008)
Ubi-Check: A Pervasive Integrity Checking System Michel Banˆ atre, Fabien Allard, and Paul Couderc INRIA Rennes / IRISA http://www.irisa.fr/aces
Abstract. Integrity checking is an important concern in many activities, such as logistic, telecommunication or even day to day tasks such as checking for someone missing in a group. While the computing and telecommunication worlds commonly use digital integrity checking, many activities from the real world do not beneficiate from automatic mechanisms for ensuring integrity. RFID technology offer promising perspectives for this problem, but also raises strong privacy concerns as they are usually based on global identification and tracking. In this paper we present an alternative approach, Ubi-Check, based on the concept of coupled physical objects which enable integrity checking relying only on local interactions, without the support of a global information system.
1
Introduction
Integrity checking is an important concern in many activities, both in the real world and in the information society. The basic purpose is to verify that a set of objects, parts, components, people remains the same along some activity or process, or remains consistent against a given property (such as a part count). In the real world, it is a common step in logistic: objects to be transported are usually checked by the sender (for their conformance to the recipient expectation), and at arrival by the recipient. When a school get a group of children to a museum, people responsible for the children will regularly check that no one is missing. Yet another common example is to check for our personal belongings when leaving a place, to avoid lost. While important, these verification are tedious, vulnerable to human errors, and often forgotten. Because of these vulnerabilities, problems arise: E-commerce clients sometimes receive incomplete packages, valuable and important objects (notebook computers, passports etc.) get lost in airports, planes, trains, hotels, etc. with sometimes dramatic consequences. While there are very few automatic solutions to improve the situation in the real world, integrity checking in the computing world is a basic and widely used mechanism: magnetic and optical storage devices, network communications are all using checksums and error checking code to detect information corruption, to name a few. The emergence of Ubiquitous computing and the rapid penetration of RFID devices enables similar integrity checking solutions to work for physical objects. S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 89–96, 2009. c Springer-Verlag Berlin Heidelberg 2009
90
M. Banˆ atre, F. Allard, and P. Couderc
The purpose of this paper is to present the design of such a system, and one of its application. The paper is organized as follows: in next section, we detail the problem. Then we present the design and implementation of the Ubi-Check system. Finally, some related works and perspectives are discussed.
2
The Problem
Let’s focus on a typical application scenario, which well help identify the key issues of the problem. Consider someone at the airport who is going to cross the security gate. He is required to wear off his jacket, his belt, to put in a container his mobile phone, his music player, to remove from his bag his notebook computer, and may be other objects... All that in hurry, with other people in the queue doing the same. Obviously, personal objects are vulnerable to get lost in this situation: objects can get stuck inside the scanner, can stack up on each other at the exit of the scanner, and it is easy to forget something while being stressed to get a flight. Another vulnerability is to get the object of someone else because, such as a notebook computer of the same model. The vulnerability is introduced because: 1. objects belonging to a common set have to be separated from each other in some occasion 2. getting them back together is not checked by reliable process Consider what’s happen in a computer network: digital objects are fragmented into “packets” which can be transported by independently of each other in the network. When they arrive at a destination point, packets are assembled together to rebuild the original object, which is checked for integrity. For this purpose, packets include additional information enabling error detection. Of course, networks are more complex than this simple view, with multiple encapsulation and fragmentation levels, but for the analogy with real objects and people, the basic principle is sufficient: We can consider a set of physical objects as “data” which are going to be transported and eventually separated at some occasions. At some point where the set of physical object is assumed to be complete, integrity checks will take place. For instance, in our airport security gate scenario, the integrity check would be performed on leaving the zone. Our goal is to propose an integrity checking system that could be integrated at strategic places to warn people when missing objects are detected, or that they are carrying someone’s else object. Such a system would turn some area into smart spaces where people would not have to worry of object lost, which is interesting for trains, hotels, etc. Such a system is only interesting if it can be realistically deployed, given the constraints of the real-world. To this end, some important requirements have to be considered :
Ubi-Check: A Pervasive Integrity Checking System
1. 2. 3. 4. 5.
91
ease of use and as low as possible impact on existing processes low cost for the user scalability and reliability ease of on-site integration privacy respect
We think important to insist on the two latter requirements. Integration issues can lead to death of emerging technologies or experimental systems: the cost of integrating something new into an operational infrastructure is very high, and dependence or impact on existing information systems should be as low as possible for a chance of acceptance. Privacy concerns raise strong resistance to RFID technology [8]. As we will see, a core idea of Ubi-Check is to ensure anonymous operation and no dependence on databases.
3
System Design
The Ubi-Check system is based on the principle of coupled objects. Coupled objects are a group of physical objects that are logically associated together, meaning that they carry digital information referencing other objects from the set, or representing their membership to the group. An important property is that this information is physically stored on the object. Typically, this information will be stored on RFID memory tags embedded on the objects. In our application scenario, this means that users of the Ubi-Check system would have their important objects enabled with tags. Ideally, those tags could be embedded into the object at build time by the manufacturer, but user installed tags could of course be added for objects not ready for the service. Then, there are two procedures in the system. A first one consisting of associating all the objects of a group (ie the objects of a person). And a second one
Fig. 1. Group creation and checking
92
M. Banˆ atre, F. Allard, and P. Couderc
where integrity will be checked at the appropriate places. The figure 1 sums up the process, which we detail in the following. 3.1
Group Creation
At this step, the user presents himself in small area in the range of an RFID writer, and a group is initialized for all its tagged objects: a signature is computed from individual identifiers of the tags. The identifiers can be those attributed at tag construction, or generated for the Ubi-Check system. The latter case is better to protect user’s privacy, because new identifiers can be used each time a group is created, thereby reducing the risk of user tracking by their objects. A user could create a new group a each trip for example. As we can see, a group is made of a set of identifiers. We need to store the group representation somewhere, such that the group integrity could be checked at appropriate places. Storage in a database is a straightforward solution, but it would requires that each checkpoint could access this database, through a communication infrastructure which raises several issues: – Deployment and operating cost – Reliability and scalability, because checking operation would be dependant on the availability of the communication infrastructure and the remote database, and the communication load would increase linearly with the number of users of the service – Privacy, as group representation associated with individuals would we stored in a central database. However, depending on the nature of the object identifiers and the group representation that is used, this issue may be mitigated. These issues conflict with our design goals which motivate an alternative solution, that would not depend on a remote service to operate. In the concept of coupled objects previously mentioned, the logical association between the objects (or the group membership) is part of the physical objects themselves. This can be easily implemented with RFID tags, which in addition to the identifier part, can provide a programmable memory of up to a few kilobit. The group representation will be stored in this memory. Because the size is limited and the integrity check should be fast, the group is represented by a signature, computed by a hash code function. A good discussion of hash functions in the context of RFID is [3]. This approach enables full autonomous operation of both the association points and the checkpoints. An additional property can be stored in a particular object, considered as the owner of the group: the cardinal of the the set. The owner tag would typically be associated with an object that the user would always keep with him, such as his watch. We will see in the next section how it is used. 3.2
Checking Phase
Once a group is formed, the user can move away with his objects. He can separate from his objects, but if he pass a checking point without the complete set, a warning is shown.
Ubi-Check: A Pervasive Integrity Checking System
93
The checking point is made of an RFID reader controlling a double antenna set up arranged close to each other (typically separated by one meter), in order to detect objects crossing the checkpoint. A time frame Δt is set to allow a group of objects to cross the gate. In practice, we have used an interval of 2 to 3 seconds with good success for typical pedestrian flow. A cyclic buffer logs all the identifier i passing the gate, the timestamp of the event ti , and the signature Si of the tag. The integrity check is triggered for each group that is reaching the end of its time frame, that is : ∀i such as t0 ≤ tsi ≤ t0 + Δt and Si = X where t0 is the timestamp when the first identifier of the group of signature X is read. The hash code H(i0 , ..., in ) is checked against the X, and three cases have to be considered : 1. H(i0 , ..., in ) = X, meaning that the set is complete. In Ubi-Check this is shown on as green status at the exit of the checkpoint. 2. H(i0 , ..., in ) = X, and one of the identifier has the owner status described in previous section. This means that there is at least one missing object from the group. Ubi-Check reports a warning of missing objects. The number of missing objects is known as the owner tag includes the cardinal of the set. 3. H(i0 , ..., in ) = X, and no identifier has owner status. Ubi-Check reports that you are carrying one or more objects that do not belong to you. 3.3
Implementation and Experimentation
A prototype of the system has been implemented: it uses FEIG HF 13.56 Mhz readers, controlled by a standard embedded PC, and standard HF read/write tags. A single unit can play both the roles of the group creator (association), and checkpoint. The figure 2 shows a simple set up with an LCD display used to report status. Performance of the readers allow typical rate of capture of 10 tags/s, which in practice translates into checking up to 2 users per second. As we can expect with RFID technology, and especially in this context of free mobility and on-the-fly tags acquisition, mis-read are possible. The occurrence are highly dependant on the placement of the tags in the objects, the nature and the environment of the objects. While some advices can be suggested to the users for optimal tag placement, metal proximity and other perturbation cannot always be avoided. However, as the system is based on a multiplicity of tags, read failures typically lead to false warnings, not missed warning as failing the read all the tags from a user is very unlikely. In case of a false warning, the user can pass again the checkpoint. The system was experimented with the at the “Fete de la science” in November 2008, a two days event where the general public was invited to experiment with recent scientific and technological developments. The feedback was very positive regarding the relevance of service. However, the reading reliability is currently not robust enough to allow operational deployments in environments
94
M. Banˆ atre, F. Allard, and P. Couderc
Fig. 2. A simple Ubi-Check set up
with sustained and continuous flow of people, such as airports. But in other contexts with relaxed constraints, such as in hotels, the system performance could be adequate.
4
Related Works
RFID is a hot topic with many issues given its broad application domain and emerging success in security, accountability, tracking, etc. However, the UbiCheck service and its underlying coupled-objects principle differ than many RFID systems where the concept of identification is central and related to database supported information systems. In some works, the tags memory are used to store semantic information, such as annotation, keywords, properties [1,7]. Ubi-Check is in the line of this idea: RFID are used to store in a distributed way group information over a set of physical artifacts. The concept using distributed RFID infrastructure as pervasive memory storage is due to Bohn and Mattern [2]. Maintaining group membership information in order to cooperate with “friend devices” is a basic mechanism (known as pairing or association) in personal area networks (PAN) such as Bluetooth or Zigbee. Some personal security systems based on PAN for luggages were proposed [5], which enable the owner to monitor some of his belongings, such as his briefcase, and trigger an alarm when the object is out of range. A major drawback of active monitoring is the energy
Ubi-Check: A Pervasive Integrity Checking System
95
power which is required, as well as poential conflicts with radio regulations that can exist in some places, such as in particular in airplanes. Still in the context of Bluetooth, RFID has also been used to store PAN addresses in order to improve discovery and connexions establishment time [9]. It can be seen as storing “links” between physical objects, such as in Ubi-Check, but without the idea of a fragmented group. Yet another variant is FamilyNet [6], where RFID tags are used to provide intuitive network integration of appliances. Here, there is a notion of group membership, but it resides on information servers instead of being self-contained in the set of tags as in Ubi-Check. Probably the closest concept to Ubi-Check is SmartBox [4], where abstractions are proposed to determine common high level properties (such as completeness) of groups of physical artifacts using RFID infrastructures.
5
Conclusion
We presented a service for enabling smart spaces to check that people do not leave a protected area without all their belongings, or with objects of other people. The system is based on completely autonomous checkpoints, and logical group distributed over a set of physical artifacts. The strong points of this solution are its independence of any remote information system support or network support, and user’s privacy respect as it is anonymous and does not relies on global identifiers. As we have seen, RF reading reliability have to be improved for some application scenarios. We have also examined other scenarios, such as checking at home the integrity of a complex medical prescription with a group of medication set up by a druggist. In further research we are investigating other variations of the coupled-objects concept to improve trust and accountability in logistic and e-commerce.
References 1. Banˆ atre, M., Becus, M., Couderc, P.: Ubi-board: A smart information diffusion system. In: Balandin, S., Moltchanov, D., Koucheryavy, Y. (eds.) NEW2AN 2008 / ruSMART 2008. LNCS, vol. 5174, pp. 318–329. Springer, Heidelberg (2008) 2. Bohn, J., Mattern, F.: Super-distributed RFID tag infrastructures. In: Markopoulos, P., Eggen, B., Aarts, E., Crowley, J.L. (eds.) EUSAI 2004. LNCS, vol. 3295, pp. 1–12. Springer, Heidelberg (2004) 3. Feldhofer, M., Rechberger, C.: A case against currently used hash functions in RFID protocols, pp. 372–381 (2006) 4. Floerkemeier, C., Lampe, M., Schoch, T.: The smart box concept for ubiquitous computing environments. In: Proceedings of sOc 2003 (Smart Objects Conference), Grenoble, May 2003, pp. 118–121 (2003) 5. Kraemer, R.: The bluetooth briefcase: Intelligent luggage for increased security (2004), http://www-rnks.informatik.tu-cottbus.de/content/unrestricted/ teachings/2004/
96
M. Banˆ atre, F. Allard, and P. Couderc
6. Mackay, W., Beaudouin-Lafon, M.: Familynet: A tangible interface for managing intimate social networks. In: Proceedings of SOUPS 2005, Symposium On Usable Privacy and Security. ACM, New York (July 2005) 7. Noia, T.D., Sciascio, E.D., Donini, F.M., Ruta, M., Scioscia, F., Tinelli, E.: Semantic-based bluetooth-rfid interaction for advanced resource discovery in pervasive contexts. Int. J. Semantic Web Inf. Syst. 4(1), 50–74 (2008) 8. Peris-Lopez, P., Castro, J.C.H., Est´evez-Tapiador, J.M., Ribagorda, A.: Rfid systems: A survey on security threats and proposed solutions. PWC, 159–170 (2006) 9. Salminen, T., Hosio, S., Riekki, J.: Enhancing bluetooth connectivity with rfid. In: PERCOM 2006: Proceedings of the Fourth Annual IEEE International Conference on Pervasive Computing and Communications, Washington, DC, USA, pp. 36–41. IEEE Computer Society, Los Alamitos (2006)
Towards a Lightweight Security Solution for User-Friendly Management of Distributed Sensor Networks Pentti Tarvainen1, , Mikko Ala-Louko1, Marko Jaakola1, Ilkka Uusitalo1 , Spyros Lalis2 , Tomasz Paczesny3, Markus Taumberger1 , and Pekka Savolainen1 1 VTT Technical Research Centre of Finland, Kaitov¨ ayl¨ a 1, P.O. Box 1100, FI-90571 Oulu, Finland
[email protected] 2 Centre for Research and Technology Thessaly (CERETETH), Technology Park of Thessaly, 1st Industrial Area, GR 385 00, Volos, Greece
[email protected] 3 Warsaw University of Technology (WUT), Pl. Politechniki 1, 00-661 Warszawa, Poland
[email protected] Abstract. Wirelessly networked sensors and actuators are a cornerstone of systems that try to introduce smartness in a wide range of outdoor and indoor environments. The robust and secure operation of such systems is of central importance and will definitely play a key role in terms of their adoption by organizations and individuals. However, the commonly applied security mechanisms based on current standards are either resource consuming, inconvenient to use, or both. In this paper, a generic, lightweight and user-friendly approach to implement secure network management and confidentiality in a Distributed Sensor Networks (DSNs) is proposed. Keywords: information security, distributed sensor network, wireless network, network management.
1
Introduction
A new era of applications programming is coming, triggered by the appearance of the new generation of wireless embedded microcontroller and sensoractuator devices. Besides mobile phones and PDAs (Personal Data Assistants), programmable nodes with sensing and actuating capabilities are already embedded in household appliances and cars; in the recent years compute-sense-actuate units also come as separate, small, multipurpose and ad-hoc wireless devices (aka motes) which can be deployed in a straightforward way. With the increasing power-efficiency of embedded systems and ad-hoc wireless technologies, we can expect wireless distributed monitoring and control systems and applications spreading from high-end to more commodity use.
Corresponding author.
S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 97–109, 2009. c Springer-Verlag Berlin Heidelberg 2009
98
P. Tarvainen et al.
Such systems, henceforth referred to as Distributed Sensor Networks (DSNs), are expected to have a strong connection to our everyday life, and at least economic losses are expected in case of failures during their operation. The additional value of wireless DSNs is largely in their flexible deployment and communication capability. However, this also opens up most of the security1 vulnerabilities of such systems. There are known attack methods to compromise the ad-hoc communication channel between the nodes of a network. There are also precaution mechanisms built into the communication standards, to support the generation of network keys as well as the encryption and decryption of the messages. But increased security usually translates to increased cost, both in terms of development effort and run-time resource consumption. At the same time, the system must be maintenance-free or at least easily manageable even by non-technical users. To make matters worse, most of the security aids provided to the application developer are specific to the networking technology being used, i.e., there is the risk of having to re-define security-related protocols and implementations as part of the system’s evolution. Hard application coding and the lack of portability of the security logic – or low security and reliability – can turn out to be an obstacle for the DSN technology penetration to the mass market. Motivated by these challenges, this paper presents a lightweight and portable security solution for DSNs designed by the authors in the POBICOS project [1]. The solution enables the user to manage such a system in a straightforward way and achieves sufficient data confidentiality at the application level. The rest of this paper is organized as follows: Section 2 briefly discusses the related work. Section 3 provides a short introduction to the DSNs and their inherent vulnerabilities, security threats and requirements. In section 4 we propose two novel approaches to manage security in the DSNs, thus providing practical mechanisms and security protocols for implementation in typical DSN context. Section 5 discusses implementation issues of the security abstraction and, finally, section 6 summarizes the paper and delineates the future work.
2
Related Work
Existing security methods are available for many of the threats present in sensor networks but they are usually too resource consuming to be used in DSNs [2,3,4]. Actuator nodes, when present in the DNS, introduce increased risk. However, most of the research done in the field of DSNs does not consider that actuator nodes could exist in the network. A security related work that takes actuator nodes into account has been published in [5,6,7]. DSN security issues and a public key cryptography based key establishment scheme for DSNs are given in [7]. A cluster based communication architecture around each actuator of a DNS could be used to reduce the key management overhead. To suit this demand, a suchlike, scalable, energy efficient routing algorithm is introduced in [5]. 1
In this paper, the term security is used to refer to information security as well as protection against unauthorized access and control.
Towards a Lightweight Security Solution
99
Sensor networks, such as the ZigBee [8] and the WirelessHART [9], are based on the IEEE 802.15.4 [10] standard. For example, in the ZigBee there are two different modes defined for security-related operations; the “commercial mode” and the “residential mode”. In the “commercial mode”, three different types of encryption keys are used to ensure network security. Hence, the memory required for the ZigBee Trust Center grows with the number of devices in the network and it also requires implementation of key establishment using specific protocol and entity authentication. Due to these issues, the “commercial mode” is not suitable for purposes of the DSNs. In the “residential mode”, only one network key can be used. All devices using this shared key encrypt and decrypt information. This approach requires little resources from the ZigBee Trust Center. The memory required for the Trust Center does not grow with the number of devices in the network. Due to that, this mode can be exploited in the DSN security approach. However, the network protocol is not encrypted during the first steps of delivery of the network key and the “residential mode” lacks in security until the network key is spread to all the devices in the network. Moreover, in the “residential mode” the IEEE 802.15.4 based sensor networks are vulnerable to “insider attacks”, i.e. a malicious node knowing the network key can attack the system from inside.
3
Distributed Sensor Networks (DSNs)
Distributed sensor networks consist of large amounts of spatially distributed, selforganizing, low power, low cost sensor nodes deployed to monitor and possibly control certain physical variables such as temperature, humidity or luminosity. The capabilities of these sensor nodes include computing, sensing, wireless communication and optionally actuating. Each node is also responsible of routing in a multi-hop fashion, constituting effectively an ad-hoc network with other nodes. Earlier it was common to deploy large numbers of similar sensor nodes to a sensor field to monitor usually a single physical variable such as temperature [2]. Nowadays DSNs are becoming more common in the domain of private homes as well, in the spirit of the Internet of Things or cooperating objects. Not only does this cause significant heterogeneity, also in many application scenarios DSNs handle confidential data that has to be kept secure and private from outsiders. This creates the need for security and privacy mechanisms [3,11]. New approaches are required that allow to manage DSNs intuitively while maintaining security and privacy of the users home and data. However, the implementation of these features on networks without infrastructure, consisting mostly of low-resource nodes, is much more troublesome than in infrastructure supported, high-resource networks. The amount of overhead must be kept small to minimize the energy consuming radio transmission time among the sensor nodes, memory space for key storage is limited, and the computational power used for security functions has to be kept small. In addition, the security mechanisms have to be scalable to support networks with varying amounts of nodes [6]. These constraints limit how extensive the used security features can eventually be.
100
P. Tarvainen et al.
Due to the wireless nature of communication the DSNs are vulnerable to several security attacks such as eavesdropping, denial-of-Service (DoS), tampering, selective forwarding, sinkhole attack, wormhole attack, and sybil attack. The DSNs have also specific security requirements such as data confidentiality, data authenticity, data integrity, and data freshness. Detailed descriptions of these threats and requirements can be found for example in [2,4,12,13].
4
Security Protocols and Mechanisms for the DSN
At the network level, our proposed security solution for the DSN Devices offers two levels of security: (1) a One Network Key Approach ( Figure 1) and (2) on utilization of the public and private key pairs, i.e. an Identity-based Cryptography Approach (Figure 2). These two approaches are alternative and selectable by application developer. The security of the DSNs is based on utilization of a security card (S-Card) as a network-specific trust centre, bearing responsibility over all the security-related configuration actions. The S-Card is like a safe deposit box of the network, withholding the network’s sensitive configuration data. The S-Card is networkspecific, thus it does not interfere with other (e.g. next-door neighbours’) networks beside the one it belongs to. User access to the S-Card is secured by a personal security code. Delivery of the network keys into the devices can be performed by means of the S-Card with close proximity. A Trust Center distributes network keys for the purpose of the DSN and end-to-end application configuration management. The Trust Center software is running on the S-Card. 4.1
One Network Key Approach
In the following we focus on the functionality of the S-Card in the case of the One Network Key Approach. (Figure 1). In this approach, the DSN Device securely communicates with other devices using one 128-bit network key and network identification code. For purposes of trust management, the device accepts an active network key originating from the Trust Center via unencrypted key transport with close proximity. The Trust Center maintains a list of devices and network keys of all the devices in the network and controls policies of network admittance. Network key can be updated periodically. Furthermore, it is assumed that “insider attacks” are not possible due to the limitations of the One Network Key Approach, i.e. no rogue device is assumed to get hold of the network key. In user authentication, the DSN security solution exploits the S-Card with integrated keyboard and user-configurable security code. In addition, each SCard contains an extended security code which can be used to reset devices in the network in the cases of security code or S-Card getting lost. In following steps, close proximity is used in communication between the S-Card and the DSN Devices during key distribution.
Towards a Lightweight Security Solution
101
DSN Network 1 Security-Card1: Network Key1, Network ID1 Trust Center Security Code1, Extended Security Code1 Ext
Close Proximity: NWK Key1, NWK ID1 Close Proximity: NWK Key1, NWK ID1
NWK Key1 + NWK ID1 Blocked
Outside Attacker
DSN Network 2 User
Security-Card2: Network Key2, Network ID2
Close Proximity: NWK Key2, NWK ID2 N
Inside Attacker Unblocked NWK Key2 + NWK ID2
Trust Center Security Code2, Extended Security Code2
Close Proximity: NWK Key2, NWK ID2
Fig. 1. One Network Key Approach
A new S-Card is taken into use with the steps as follows: 1. User acquires a S-Card from genuine dealer and brings the S-Card home, 2. The S-Card comes with a manual containing the extended security code for the S-Card. The user stores the extended security code in a safe place, 3. User activates the S-Card by pressing a button on it and is prompted to type in a security code of choice, 4. User types in a security code and is asked to confirm it by retyping, 5. S-Card generates a network key and network ID from the S-Card’s 64-bit hardware address. As a result, the S-Card now contains the user’s security code, extended security code, network key and network ID, 6. User takes the S-Card into use and activates DSN Devices in his/her home A user can manage several DSNs by acquiring additional S-Cards which can each be configured to use the same security code. Each S-Card contains the security credentials of the corresponding network in addition with the user specified security code. The user can move DSN Devices between different DSNs by using a single security code and the S-Cards of each network respectively. Adding a new device to the network can be performed as follows: 1. User brings the S-Card close to a DSN Device, 2. Near Field Connection (NFC) between the S-Card and the DSN Device can now be established,
102
P. Tarvainen et al.
3. User enters the security code (in the case of a new S-Card the security code is entered twice), 4. User is verified by the security code known by the S-Card (the S-Card is personal to the user), 5. The S-Card sends the user’s security code, S-Card’s extended security code, network key and network ID to the DSN Device. In addition, the S-Card sends an up-to-date list of other DSN members, the Member Table, to the new DSN Device, 6. If the user’s security code and the S-Card’s extended security code correspond to the ones previously transmitted to the DSN Device, network key, network ID and Member Table are accepted by the Device. If this is the first security code that the device hears, it learns it and stores it into its memory along with the S-Cards extended security code, network key, network ID and Member Table, 7. Having the required security credentials, the DSN Device attempts to join the network, 8. At this point, the new DSN Device gets its DSN address that is calculated from the underlying network’s network address at the hardware layer (HAL), 9. The new DSN Device broadcasts exactly once, with unreliable transport messaging, a new Member Table after it has added its own address to it, 10. The S-Card and all the other DSN Devices in the network update their Member Tables with the one that contains the new Device After the steps above the DSN IS READY TO OPERATE IN SECURE MANNER, i.e. air interface is secured against “outsider attacks”. Autonomous Device leaving from the network happens as follows: 1. When in range of the DSN, the S-Card monitors the network state by receiving updated Member Tables that are being broadcasted when changes in the network membership occur. For this purpose, the S-Card utilizes similar communication technology used by normal DSN Devices, i.e. it is a member of the DSN when not in NFC-mode. If the S-Card notices that a single DSN Device has dropped out from the Member Table permanently, a global network key change is issued in the network by the S-Card, 2. The S-Card receives an updated Member Table from the DSN that indicates that a DSN Device has left the network, 3. The S-Card waits for a constant time period during which it pings the missing device periodically. If the DSN Device that was missing reappears, i.e. answers to ping, no operations are needed and the S-Card is no longer in the waiting mode, 4. The constant time period ends and the DSN Device hasn’t reappeared in the network, 5. The S-Card generates a new network key from its 64-bit hardware address and issues a global network key change by sending the new network key to each node explicitly by using reliable communications, 6. Once every device in the DSN has acknowledged the new network key transport message, the S-Card changes its own network key to the newly generated
Towards a Lightweight Security Solution
103
one. If a single DSN Device is not in range of the network during the network key change it is dropped out and must be added again to the network as specified in section “Adding a new device to the network” Removing a Device from the network is performed as follows: 1. The user wants to remove a malfunctioning Device X from the DSN via the S-Card, 2. The S-Card initiates a global network key change as specified in section “Autonomous Device leaving from the network” in such a way that the new network key is sent to all devices in the DSN except the Device X which is to be left out of the DSN, 3. After the network key change, the S-Card broadcasts a new Member Table where Device X has been dropped out, 4. The remaining devices in the network communicate now with the new network key encryption and the malfunctioning Device X is left out of the DSN If the user forgets his/her security code, the extended security code of the S-Card can be used to reset a DSN Device into a state where it accepts a new security code, i.e. the step described in section “Adding a new device to the network” can be performed. The device reset must be performed on every device separately with NFC-connection. Similarly, the S-Card’s user-configurable security code is reset when using the extended security code. If the user has lost the extended security code of the S-Card, a genuine dealer must be contacted. If the user loses the S-Card of the DSN, a new S-Card must be acquired from a genuine dealer. Having the new S-Card, the user can reset all his/her DSN Devices by typing the extended security code of the lost S-Card while in NFCconnection mode. This action must be performed on every Device respectively. After this, the new S-Card can be used to reconfigure the network as specified earlier. If both S-Card and the corresponding extended security code are lost, the user is required to get the memories of his/her DSN Devices flashed by a dealer or technician. 4.2
Identity-Based Cryptography Approach
In Identity-Based Cryptography Approach, the users’ or devices’ identity information, such as unique names, can be used as public keys for encryption and/or signature verification [14,15,16]. Identity-based cryptography reduces the complexity and management cost of public-key cryptography systems by eliminating the need for a key directory and public-key certificates. Identity-based cryptography allows any party to generate a public key from a known identity value. A trusted third party, called the Private Key Generator (PKG), included in the Trust Center, generates the corresponding private keys. To operate, the PKG first publishes a master public key, and retains the corresponding master private key. Given the master public key, any party can compute a public key corresponding to the identity ID by combining the master public key with the identity
104
P. Tarvainen et al.
DSN Network 1 Security-Card1: Master Public Key1, Master Private Key1 Trust Center + PKG
Close Proximity: Master Public Key
Public Key1 = Master Public Key1 + Node ID1
Node ID Private Key = Master Private Key + Node ID
Close Proximity: Security Code1, See the steps Extended Security Code1 a above
Encryption with Public Key + Node ID Blocked Outside Attacker Inside Attacker
Decryption with Private Key
User
Security-Card2: Master Public Key2, Master Private Key2
DSN Network 2 Close Proximity: See the steps above
Encryption with Public Key + Node ID
Trust Center + PKG Security Code2, Extended Security Code2
Blocked
Close Proximity: See the steps above
Decryption with Private Key
Fig. 2. Identity-based Cryptography Approach
value. To obtain a corresponding private key, the DSN Device with identity ID contacts the PKG, which uses the master private key to generate the private key for identity ID. As a result, parties may encrypt messages (or verify signatures) with no prior distribution of keys between individual participants. In [17] the authors propose to assign a descriptive identity to nodes, meaning the identity will represent the function of the node in the network. If there is more than one node with the same function, the identity will include additional information to differentiate those nodes, e.g. location or serial information. When DSN Device A wants to send information to another DSN Device B, it will first obtain the identity information of B from device’s own local database, and then use this information, together with master public key, to encrypt the information. Each DSN Device will also be granted a private key associated with its public identity information and will use that key to decrypt information destined to it. The Trust Center generates the key pair for each DSN Device in the network. The private key would be deployed by means of close proximity (Figure 2). The working steps in more detail is as follows: 1. The Trust Center generates a master public key, Master-key, and transfers it to the DSN Device with close proximity, 2. A DSN Device joins the network and authenticates with the Trust Center by using its identity information, Node ID. The membership approval can be done manually by means of close proximity,
Towards a Lightweight Security Solution
105
3. Being approved, the DSN Device is granted with a private key, associated with its identity information by means of close proximity, 4. Devices publish their Node ID in cleartext along with their Node ID and address encrypted with their private key. If Device A wants to communicate with Device B, it decrypts B’s encrypted contact info using the master public key and the published Node ID of B. If the decrypted Node ID matches the published Node ID, A knows that the decrypted address corresponds to B. The information sent to B can be encrypted using A’s private key, and B can decrypt it using the same (symmetrical) process, based on A’s published (or piggybacked) contact info, 5. The encrypted message passes through multiple hops to reach the destination. The message is secure since only the designated DSN Device can decrypt and read the information using its private key In a system of N nodes, N key pairs and N node IDs are needed to retrieve and maintained, so the implementation would not be too complex. Also the Trust Center would need only a bit more resources to generate and distribute the key pairs.
5
Implementation of the Security Abstraction
At network level, the DSN is wireless and it can be located for example on top of star, tree, and mesh topologies. Security related issues are hidden from end-users as far as possible. At node level, the DSN security solution exploits a stack architecture principle made up of a set of layers (Figure 3). Each layer performs a specific set of services for the layer above. The architecture builds on this foundation by providing a Network and Security Adaptation Layer and a framework for a Core Middleware (CMI) Layer. The Core Middleware (CMI) Layer includes a Communications Module (CommM) Layer that allows frame security and device authentication to be based on network and cryptography keys. Furthermore, the layer is responsible for the processing steps needed to securely transmit outgoing frames, securely receive incoming frames, and securely establish and manage network and cryptography keys. In addition, the layer is responsible for network management, device authentication, and software data encryption. The Network and Security Adaptation Layer acts as “glue” between hardware and the core middleware. The layer includes Best-effort Transport, Reliable Transport, and Fragmentation services. The layer is responsible for (1) processing steps needed to securely transmit outgoing frames, and (2) securely receive incoming frames. The underlying network technology (i.e. the Medium Access Control (MAC) Layer and the Physical (PHY) Layer) below the Network and Security Adaptation Layer may vary. The basic principle in our security approach is that the underlying layers adopt the security mechanisms from the used sensor network technology if available. In addition, a software encryption scheme is introduced to support platforms with no prior security mechanisms.
P. Tarvainen et al.
SECURITY SUITES
Data Entity Service Access Point
Management xME-SAP Entity Service Access Point
DSN Defined
EntityHALI
Communications Module (CommM) DatagramTransportI
ReliableTransportI
NetworkMgmtI
Network and Security Adaptation Layer MLDE-SAP
Layer Interface xDE-SAP
API-SAP Generic Instructions
Core Middleware (CMI) Layer
S-Card including Trust Center One Network Key Approach or Identity-based Cryptography Approach
APPLICATION
API-SAP Non-Generic Instructions
MLME-SAP
Medium Access Control (MAC) Layer PLDE-SAP
PHY Layer, HARDWARE
PLME-SAP Air Interface
Underlying NWK Defined
106
Fig. 3. DSN security architecture at node level
In proposed security solution of the DSNs the actual physical network technology below the Network and Security Adaptation Layer may vary and the underlying layers adopt the security mechanisms from the used technology. As an example, in following we focus on implementation of the security abstraction in the case of the ZigBee based networks. To avoid the lack in the “residential mode” the DSN security solution exploits the proposed S-Card approach in close proximity delivery of the network keys to all the devices in the network. After the delivery is completed, the DSN Devices are able to communicate via interface in a secure manner. The proposed One Network Key Approach is able to protect the DSN against the “outsider attacks”, but not against the “insider attacks”. However, there are possibilities to achieve this, although the system will get a bit more complex as the number of keys and the complexity of algorithms grows. A solution against “insider attacks” is to provide node and message authentication by using the proposed Identity-based Cryptography Approach. The Datagram Transport and Reliable Transport use the HW encryption provided by the ZigBee radio processor (instead of the optional SW encryption), so the encryption takes place underneath the Network and Security Adaptation Layer. The encryption key is initialized during the joining into the network. Functionality of the Network Management is as follows: – Network key changes (for hardware encryption) are implemented by configuring the proper key in, and then committing a ZigBee stack restart, – A global network key change sends the new key for every network member explicitly; after this every node reboots into the new security scheme after a coarse synchronization period,
Towards a Lightweight Security Solution
107
Table 1. Commands and events of the NetworkMgmtI -interface
Command or Event
Short Description
Attaches the node into a DSN, which is idencommand int joinNetwork (idtype NetworkID, keytype tified with NetworkID and encrypted with NetworkKey, addrtype *Addr) NetworkKey. After a successful joining, the node’s new device address is in Addr. command int leaveNetwork ()
Removes the node from the current DSN.
command int Returns the current local knowledge of other getNetworkMembers (addrtype network members into Addr in a list form. Total count of the other members is stored in **Addr, int *len) len. This event’s purpose is to update the node’s event void knowledge of other network members. networkMembersChanged (addrtype **Addr, int *len) command int banNodeFromNetwork (addrtype Addr)
Used to ban a node from the network.
command void Writes the node’s DSN address in Addr. getMyNetworkAddr (addrtype *Addr) command int isReachable (addrtype *Addr)
Queries the liveliness of a node.
event void reachableResp (int result)
Denotes the result of liveliness query.
command int Used for setting the network ID locally. The setNetworkIDLocally (idtype command is needed in both ”initial” joining into the DSN, and also during a network NetworkID) change. command int setNetworkKeyLocally (keytype NetworkKey)
Used for setting the network key locally. Similarly like above, the command is needed in both ”initial” joining into the DSN, and also during a network change.
command int refreshNetworkKeyGlobally (keytype NetworkKey)
Used for changing the network key within the whole network. The command enables changing of the network key for all currently networked nodes with the single S-Card authentication operation.
event void networkKeySet (keytype NetworkKey)
Denotes a remote network key change.
108
P. Tarvainen et al.
– Banning a node drops the banned node from the network Member Table, broadcasts this and then sends a new network key for every remaining network member explicitly (like above), – Lengths of DSN ID and single network key should be mapped to ones used in ZigBee. Therefore the most straightforward solution is to adopt in DSN exactly same lengths of ID and network key as in ZigBee. Interfaces (i.e. DatagramTransportI, ReliableTransportI, and NetworkMgmtI) of the Communications Module (CommM) include the commands and events needed in implementation of the protocols described in section 4. As an example, Table 1 describes the commands and events of the NetworkMgmtI -interface when using the One Network Key Approach for securing the DSN.
6
Summary and Future Work
Two approaches to guarantee security in Distributed Sensor Networks (DSNs) has been described. In both of the approaches the security of the DSNs is based on utilization of the security card (S-Card) as a network-specific trust centre, bearing responsibility over all the security-related configuration actions. The proposed One Network Key Approach is able to protect the DSNs against the “outsider attacks” while the proposed Identity-based Cryptography Approach is able to protect the DSNs against both “outsider attacks” and “insider attacks”. Next phase of this work is to implement and validate the approaches in practice. Acknowledgments. We like to thank Dr. Jaroslaw Domaszewicz and Mr. Aleksander Pruszkowski from the Warsaw University of Technology for their significant contributions in the POBICOS project [1] where this work originates to. The POBICOS project is co-funded by the European Commission under EU Framework Programme 7 and the consortium partners of the POBICOS project.
References 1. POBICOS: Www pages of pobicos-project, platform for opportunistic behaviour in incompletely specified, heterogeneous object communities (2009), http://www.ict-pobicos.eu/index.htm 2. Baronti, P., Pillai, P., Chook, V.W.C., Chessa, S., Gotta, A., Hu, Y.F.: Wireless sensor networks: A survey on the state of the art and the 802.15.4 and zigbee standards. Computer Communications 30(7), 1655–1695 (2007) 3. Wang, Y., Attebury, G., Ramamurthy, B.: A survey of security issues in wireless sensor networks. IEEE Communications Surveys & Tutorials 8(2), 2–23 (2006) 4. Walters, J.P., Liang, Z., Shi, W., Chaudhary, V.: Wireless sensor network security: A survey. Auerbach Publications, CRC Press (2006) 5. Hu, F., Siddiqui, W., Sankar, K.: Scalable security in wireless sensor and actuator networks (wsans): integration re-keying with routing. Computer Networks 51(1), 285–308 (2007)
Towards a Lightweight Security Solution
109
6. Czarlinska, A., Luh, W., Kundur, D.: Attacks on sensing in hostile wireless sensoractuator environments. In: Proceedings of Global Telecommunications Conference, GLOBECOM 2007, November 26-30, pp. 1001–1005. IEEE, Los Alamitos (2007) 7. Yu, B., Ma, J., Wang, Z., Mao, D., Gao, C.: Key establishment between heterogenous nodes in wireless sensor and actor networks. In: Shen, H.T., Li, J., Li, M., Ni, J., Wang, W. (eds.) APWeb Workshops 2006. LNCS, vol. 3842, pp. 196–205. Springer, Heidelberg (2006) 8. ZigBee: Zigbee specification. Technical Report 053474r17, ZigBee Alliance, Inc. (January 17, 2008) 9. HART: Wireless hart (2009), http://www.hartcomm2.org/index.html 10. IEEE: Std. 802.15.4-2006, part 15.4: Wireless medium access control (mac) and physical layer (phy) specifications for low-rate wireless personal area networks (wpans). Technical report, IEEE Inc. (September 8, 2006) 11. Boyle, D., Newe, T.: Security protocols for use with wireless sensor networks: A survey of security architectures. In: Proceedings of the 3rd International Conference on Wireless and Mobile Communications, ICWMC 2007, Washington, DC, USA, p. 54. IEEE Computer Society, Los Alamitos (2007) 12. Savola, R., Abie, H.: On-line and off-line security measurement framework for mobile ad hoc networks. Accepted to the Journal of Networks Special Issue, 13 (June 2009) 13. Savola, R.: Current and emerging information security challenges for mobile telecommunications. In: Proceedings of the 10th Int. Symposium On Wireless Personal Multimedia Communications (WPMC 2007), December 3-6, pp. 599–603 (2007) 14. Shamir, A.: Identity-based cryptosystems and signature schemes. In: Blakely, G.R., Chaum, D. (eds.) CRYPTO 1984. LNCS, vol. 196, pp. 47–53. Springer, Heidelberg (1985) 15. Nguyen, S.T., Rong, C.: Zigbee security using identity-based cryptography. In: Xiao, B., Yang, L.T., Ma, J., Muller-Schloer, C., Hua, Y. (eds.) ATC 2007. LNCS, vol. 4610, pp. 3–12. Springer, Heidelberg (2007) 16. Cocks, C.: An identity-based encryption scheme based on quadratic residues. In: Honary, B. (ed.) Cryptography and Coding 2001. LNCS, vol. 2260, pp. 360–363. Springer, Heidelberg (2001) 17. Boneh, D., Franklin, M.K.: Identity-based encryption from the weil pairing. In: Kilian, J. (ed.) CRYPTO 2001. LNCS, vol. 2139, pp. 213–229. Springer, Heidelberg (2001)
Cross-Site Management of User Online Attributes Jin Liu Nokia Siemens Networks (Beijing) No. 14, Jiu Xian Qiao Road, Beijing 100016
[email protected] Abstract. People spend time on web 2.0 sites to contribute contents and make connections with each other. On these sites a user wants to selectively reveal parts his attributes to other users, and he also wants to know more attributes of another user. Users’ online attributes are often distributed across multiple sites since most users visit more than one web sites. Currently only the attributes within a specific web site can be queried on that site. This paper proposes a new solution, based on federated identity management, to enable an end user to query the cross-site attributes of another person as easily as possible. Keywords: online attributes; federated identity management; cross-site; web 2.0 mashup.
1 Introduction The prosperity of web 2.0 sites witnesses how people enjoy to contribute contents and make connections with other people. Often the user creates personal account at the sites, either to receive better services or required by the site. At the site associated with each account are the user’s attributes, which are in fact fallen into two categories. One is what the user claims by himself, and the other one is what the site claims about the user. When a user creates account, he is usually asked to input his birthday, interest, occupation and so on. Unless the site verifies these information, the data does not represent anything meaningful to other users. The other kind of user data, claimed by the site, is sometimes called ‘reputation’. Examples of these attributes are visit history, activity statistics, his social connections on the site, evaluation by other user and so on. In this paper, we focus on this category of data and use the generic term ‘online attributes’ to denote what the site claims about the user. The online attributes are important assets for the subject user and the web site. The subject user sees his attributes as accumulated activities rectifiable by a third party (the web site), instead of something from his own mouth. For the sites, especially those social network sites with which user interacts intensively, the users’ attributes are among their core competences. To retain existing subscribers and attract new users, web sites encourage its users to contribute content and connect with other users. For other users of the web site, the public attributes of the subject user have a significant influence on whether the contribution content of the subject user (posts, comments etc.) worth reading. S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 110–119, 2009. © Springer-Verlag Berlin Heidelberg 2009
Cross-Site Management of User Online Attributes
111
Although a single site can provide attribute query service on that site, anything more that is not an easy task for both the site and the users. Since an internet user frequently visits more than one web sites, his attributes are distributed on multiple sites. The problem arises when the query user wants to know cross-site attributes of the subject user. This cross-site attributes management is what this paper seeks to address, by introducing identity management technologies. Federated identity, or the ‘federation’ of identity, describes the technologies which serve to enable the portability of identity information across otherwise autonomous security domains. The ultimate goal of identity federation is to enable users of one domain to securely access data or systems of another domain seamlessly, and without the need for completely redundant user administration. Typical use-cases involve things such as cross-domain, web-based single sign-on, cross-domain user account provisioning and cross-domain entitlement management. It can drastically improve the end-user experience by eliminating the need to redundantly login through crossdomain single sign-on. The rest of this paper is organized as follows. Section 2 describes the problem scenario in detail and the state-of-art solutions. The proposed framework and its implementation considerations are described in Section 3. At last, we conclude in Section 4.
2 The Problem and Current Solutions Figure 1 shows the generic issue when querying the cross-site online attributes.
Web site 1
Query User (A)
Web site 2
Web site 3
Subject User (B)
Fig. 1. Generic issue: user A can get user B’s attributes on web site 1, but not those on web site 2 and 3
Suppose user A is a frequent visitor of web site 1, and user B often visits web site 1, 2 and 3, each of which he has some online attributes associated with and stored at. Now when user A browses web site 1 and finds the identity of user B is interesting, A would like to know more about B. Often web site 1 can provide such services to display B’s public attributes on the site. However, there is no way for A to know B’s
112
J. Liu
public attributes on web site 2 and 3. Given the huge amount of web sites, it is impossible for A to go to web site 2 and 3 to query B’s attributes - A has no knowledge at all of which sites B may visit, not mentioning B may register different user names on web site 2 and 3. A concrete example is for a hotel booking web site. Suppose Alice is going to book a hotel, and she notice there are several comments about the hotel by other users. She is not sure about the trusty of the comments and thus it is hard for her to judge the quality of the hotel. Ideally if Alice found one comment author Bob is a active member of online photo sharing web site, and she sees the travel photos taken by Bob, she will be more convinced that Bob’s comment is a trusty reference for her decision. The state-of-arts solution does not provide a satisfactory mechanism for end users to query the cross-site attributes of another user. The internet search engines are often used by users to try to find information about an identity. The querier user has to open a new browser windows and type the subject user’s name into the search engine’s page, then guess which search results maybe relevant by his own judgment. The inconvenience continues as the querier has to open each result link to see whether that link contains useful information about the subject user. Clearly one disadvantage is that user IDs on different sites cannot easily be linked since different sites have different user IDs for the same end user. Furthermore, not all information will be indexed by search engines and search results are not guaranteed to be valid information sources. Another solution is to resort to a personal homepage or its variations. In case the subject user has his personal page which contains the owner’s identity information on other sites, the querier user can get the subject user’s identities on other service providers and use this information to obtain user attributes. However, such an arrangement requires the querier user to go to each service provider separately to obtain the user attributes, thereby interrupting the browsing experience of the second user. Also it is hard for a personal homepage to authenticate the visitor and selectively display information according to visitor’s identity. OASIS SAML [1] defines AttributeQuery and AttributeStatement to enable a relying party to query the attributes about a subject from an SAML asserting party. While AttributeQuery and AttributeStatement are useful building blocks for communications between back-end servers, it does not consider the end user browsing experiences which is the target issue this paper attempts to address. Online reputation has been thoroughly studied [2-5]. Those works investigate how to build a reputation system to generate/use fair reputation value. The results are complementary to this paper: what this paper tries to solve is not the reputation/attributes value itself, but how to aggregate and propagate the attribute to users on other system in a secure and yet convenient way.
3 Framework for Cross-Site Online Attribute Management The typical function of the Identity Provider (IdP) is to authenticate the end user and provide identity assertions/claims to the relying site, illustrated in Figure 2.
Cross-Site Management of User Online Attributes
113
Fig. 2. Standard working diagram of IdP
To solve the cross-site attribute query issue, we propose to reverse the data flow direction and enhance the IdP to act as the attribute hub for its users. Figure 3 illustrates the proposed cross-site online attributes management, where the attribute owner, the web sites, the IdP and the querier are involved. Identity Provider User attribute data Query results Web Sites
Querier Internet activities
Subject user (attribute owner) Fig. 3. Cross-site online attributes management
The attribute owner, who is an end user and is willing to controllably reveal some of his online attributes, gives permission to the web sites specifying what attributes can be transferred to IdP. He also instructs the IdP directly or indirectly about what attributes can be queried by which querier. The web sites maintain user’s online attributes and transfer the attributes to Identity Provider upon the user’s request. Other operation modes like periodically transfer to
114
J. Liu
IdP or actively push whenever there is an update are also possible so long as the subject user permits. The interface between the web site and IdP can be SAML, Web Service or other protocols. The Identity Provider traditionally provide identity assertion service to other relying parties, but here in our proposal the IdP receives identity and attributes assertion from web sites. The IdP returns the attributes to the querier under the privacy policy of the attribute owner. The querier simply sends the query request to IdP directly or through the web site he is browsing, then the available attributes of the target user is returned. The querier may be authenticated by IdP to determine which attributes can be seen by him. 3.1 Implementation Considerations To build a complete cross-site attributes management solution, provision and query are the main operations needs to be considered. 3.1.1 Provision User Attributes to IdP Attributes of subject users must be provisioned to IdP from the web sites before any cross-site query request can be processed. An implementation example is through OASIS SAML AttributeQuery and AttributeStatement message, as shown in Figure 4.
Fig. 4. IdP obtain user attributes from website
1. 2. 3. 4.
The end user logins to his IdP. The IdP authenticates the user through username/ password or other methods. After a successful authentication, the user’s personal portal web page is returned. Within the page there are links of ‘Get my attributes from the web sites’. The user click the ‘Get my attributes’ link for the sites where he wants the IdP to retrieve his attributes.
Cross-Site Management of User Online Attributes
5.
115
IdP sends SAML AttributeQuery message to web site 1. The message is used to get the requested attributes for this user. To retrieve all attributes available from the website, the IdP sends the AttributeQuery that contains no <saml:Attribute> element. An example SAML AttributeQuery takes the form below (simplified): <saml:Issuer> Identity_Provider_name <samlp:AttributeQuery> <saml:Subject> <saml:NameID> user_name
6.
Web site 1 receives the AttributeQuery and responds by providing an AttributeStatement to the IdP. Optionally the site can authenticate the end user before sending the results. An example SAML AttributeStatement takes the form below (simplified): <saml:Issuer> Web_site_1 <saml:AttributeStatement> <saml:Attribute FriendlyName="nickname"> <saml:AttributeValue> Clever_boy
7. 8.
IdP sends SAML AttributeQuery message to web site 2. Web site 2 receives the AttributeQuery and responds by providing an AttributeStatement to the IdP. The <saml:Issuer> field value is Web_site_2. 9. The IdP stores the attributes from the two web sites as the user’s profile data. 10. A successful status is returned to the end user, containing the attributes IdP just retrieved. 3.1.2 Query User Attributes Figure 5 illustrates the message flow of how an end user queries attributes of another user (subject user) from the web page he is browsing.
116
J. Liu
Browser End user (Querier)
Identity Provider (IdP)
Web site 1: Browse
2 - Recognize the user names in the web page - Place a Query IdP icon beside the user name 3: Web page 4: Click Query IdP 5: Request for id & attributes 6: Based on policy: - authenticate the querier - ensure privacy of the owner 7: Public attributes of the target user
Fig. 5. End user queries attributes of another user
1. 2.
The end user browses the site as usual. While the site generates the page, it recognizes there on the page are usernames that IdP has retrieved attributes previously. The site places a ‘Query IdP’ hyperlink icon beside the username to indicate that more information is available at IdP, as shown in Figure 6.
Alice Bob
Query IdP
Cindy
Query IdP
Fig. 6. “Query IdP” icon besides the user name on a web page
3. 4. 5. 6.
The web page is returned to the browser. The user is interested in one username, and he click on the ‘Query IdP’ hyperlink. A query request containing <subject user, web site> is send to IdP IdP then authenticates the querier and check request match the user’s privacy policy.
Cross-Site Management of User Online Attributes
7.
117
IdP sends the subject user’s attributes to the querier in web page illustrated in Figure 7. The IdP in Figure 5 returns the query results directly to the end user, while an alternative is for the IdP to send the attributes via the web site.
You are querying attributes of Bob@site_1 From site_1 : Member since Jan. 2002 Reputation as seller: From site_2 : Registered on Feb. 2003 Total uploaded photos: 800 Average rank of uploaded photos: 8.5/10 From site_3 : Membership grade: golden Most active topics: computer hardware hardware driver lastest software
Fig. 7. Illustrative web page from IdP to the query user, containing the attributes of Bob on multiple sites
3.1.3 Mashup Extensions The attribute-query scenario can be implemented in the mash-up style [6], where the web site generates the structure of the page (together with the "Query IdP" icons), but the actual content is aggregated only at the client user's end. This means that the client user is presented in their browser with personal information about other users in such a way that the web site does not learn the presented personal information. If we bring this concept further, we actually come up with a solution where a web site is able to provide a highly personalized page and yet it is not able to learn most of the sensitive personal information that constitutes the final content of that page. For the sake of users' privacy, this is a highly preferable situation. This way, for example, the site can provide such a page in which the user is presented with a list of their friends, each of them decorated with a personal photo and with their current context (work/home/vacation for example) and even with their current location – all this is done without the SP learning any of the private information, not even the list of friends of the user or their current context and location. 3.2 Advantages The proposed solution brings benefits to all four parties involved. The attribute owner can reveal his online attributes (e.g. reputation, posts) to other users in a secure way and yet with little effort. No matter how many web sites he has
118
J. Liu
accounts, the user have a single place (the IdP) to finely control which of his online attributes can be seen by whom. The web sites can provide fluent browsing experience for their users because the end user no longer needs to interrupt his session at the web site to query for another user’s information. Moreover, by exposing registered users’ attributes to another person who might be not registered yet, a site can enlarge its reaching scope. The end user as querier can now get the cross-site information about another user very easily when they surfing the internet. The information is available for them no more than a click away from the webpage they are viewing. In addition, the information is from a trusty source – the IdP. The benefits to IdP are two folds comparing to a traditional IdP. IdP now provides valuable service to the end user directly instead of as a backend server. IdP also plays more important role as a hub of users’ online attributes. 3.3 Privacy Considerations In the proposed framework, user’s privacy protection means any attribute data transfer/storage must get consent of the attribute owner. The attribute transfer/storage involves three parts: 1. From owner to web sites. Since this part is the same as current internet world, any existing solutions and concerns also apply here. 2. From Web sites to IdP. The web site should set up trust with the peer party to which it will send the user’s attribute, and enable the attribute owner to select a subset of his attributes to be transferred. 3. From IdP to the querier. The main privacy policy enforcement point is at IdP. The policy specifies which attributes can be revealed to which querier, and the default policy is to return nothing. Before IdP returns any attribute to the querier, IdP should authenticate the querier first unless the owner says his attributes can be seen by anybody. Table 1 shows an example privacy policy of a user which contains 3 rules. Rule 1 is that his basic attributes e.g. online status, hobby can be exposed to his friends group, which contains User_A and User_B. A second rule could be that his recent photo works be revealed to User_C and User_D who are members of an online forum. The last one (default rule) is to return no attribute. Then by means of checking the ID of the querier against the user list define in the rule, the IdP can distinguish which group the querier belongs to and then decides whether return the attributes to the querier. Table 1. Attribute owner privacy policy example Rule 1: friend
User list of the rule User_A, user_B
Attributes returned
2: selected member
User_C, User_D
3: stranger
all others
no attributes
Cross-Site Management of User Online Attributes
119
4 Conclusion User online attributes are important assets for the web sites and end users. Cross-site attribute query is a tough challenge in terms of usability and privacy. In this paper, we propose a model based on augmented Identity Provider. The implementation aspects are carefully considered covering both the user interface to end users and backend operations between the Identity Provider and web sites. In next steps we plan to study alternative approaches for IdP to get user online attributes from web site. We believe the proposed framework is a promising solution to the cross-site online attributes management challenge which is becoming more and more evident.
References 1.
2. 3. 4. 5.
6.
Organization for the Advancement of Structured Information Standards, Security Assertion Markup Language (SAML) v2.0 (2005), http://www.oasis-open.org/specs/#samlv2.0 Jøsang, A., Ismail, R., Boyd, C.: A survey of trust and reputation systems for online service provision. Decision Support Systems 43(2) (2007) Chen, M., Singh, J.P.: Computing and Using Reputations for Internet Ratings. In: Proceedings of the Third ACM Conference on Electronic Commerce (2001) Dellarocas, C.: Immunizing Online Reputation Reporting Systems Against Unfair Ratings and Discriminatory Behavior. In: ACM Conference on Electronic Commerce (2000) Resnick, P., Zeckhauser, R.: Trust Among Strangers in Internet Transactions: Empirical Analysis of eBay’s Reputation System. In: Baye, M.R. (ed.) The Economics of the Internet and E-Commerce. Advances in Applied Microeconomics, vol. 11. Elsevier Science, Amsterdam (2002) Merrill, D.: Mashups: The new breed of web app. (2006), http://www.ibm.com/developerworks/library/x-mashups.html
Analysis and Optimization of Aggregation in a Reconfigurable Optical ADD/DROP Multiplexer J.M. Fourneau1 , N. Izri1 , and D. Verch`ere2 1
PRiSM, Universit´e de Versailles-St-Quentin 45, Av. des Etats-Unis, 78035 Versailles, France 2 Alcatel-Lucent Bell Labs France
Abstract. We analyze the performance and optimize the design of a Reconfigurable Optical ADD and DROP Multiplexer (ROADM in the following). The analysis is based on stochastic bounds, a derivation of an extension of Pollacek formula for Batch queues and numerical analysis of discrete time Markov chains.
1
Introduction
Wavelength Division Multiplexing (WDM in the following) is now established as a successful technique to provide high bandwidth for backbones, metropolitan and local area networks. However, future optical networks will also require that the WDM layer delivers advanced functionalities such as dynamic reconfigurability, automatic wavelength provisioning and a better connectivity between sources and destinations. For this reason Reconfigurable Optical Add and Drop Multiplexer (ROADM) appears as a practical solution for cost-effective advanced optical networks and multiple architectures for ROADM has emerged in the last years (see [14] and [11] and references therein). Briefly a ROADM can be viewed as a device consisting in three parts: a demultiplexer, a multiplexer and optical Wavelength Selective Switch (WSS in the following) sandwiched in between (see Fig. 1). Here we denote as a channel a wavelength on a particular link. The first f channels entering the ROADM are in transit and they do not functionally enter the optical WSS (they may physically enter the switch to be optically regenerated). The remaining channels enter the WSS where they are switched. ADD and DROP links are also connected to the WSS. The design and the optimization of an ROADM based network for a given routing matrix is still an open problem and the literature is quite poor on this topic. Here we consider a possible modification of an existing OADM to increase its capacity and flexibility using a partial O/E/O conversion. The WSS receives L links and we have W wavelength per link. The SDH frames carry packets going to the same destination. In the typical architectures, the frames are associated to an origin and a destination (an OD pair in the following). This assumption is S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 120–131, 2009. c Springer-Verlag Berlin Heidelberg 2009
Analysis and Optimization of Aggregation
121
TRANSIT
OPTICAL SWITCH
DROP
ADD
Fig. 1. Schematic Diagram of a ROADM
modified here due to the aggregation process we now describe. Some channels are in a new mode denoted as aggregation. SDH frames carried on these channels leave the WSS on the DROP link. Then the packets (IP, IP/MPLS, ATM or Ethernet) are extracted of the SDH frame and they are sent into the buffer waiting for a new SDH frame going to their destination. Note that all the packets carried by an aggregation frame enter the buffer again. Similarly all the packets in a drop frame leave the network. The SDH frame which is freed by the packets can receive packets waiting in this buffer. Clearly, we expect to improve the occupancy of the SDH frames when the traffic is not sufficient to fill all the frames for a particular OD pair. As the traffic for that pair is mixed with the traffic with the same destination we expect that the frames are much more efficiently used after the aggregation. However aggregation has a drawback: the time necessary to extract the packets and put them into the electronic buffer waiting for the next frame. We study aggregation channels to improve the CARRIOCAS testbed [1] which is developed by Alcatel and several French organizations. CARRIOCAS is a multi-service network supporting stringent distributed applications in terms of end-to-end delay requirements. Then the optical switches has to be configured in order to optimize between the fulfilling ratio of the OTUs of each wavelength connection and the end-to-end transport delay of each OTU between its OD pair (i.e. the ingress PXC and egress PXC configured for the connection). One must decide for a given matrix of traffic between OD pairs how many channels must be in aggregation mode and which ones are in this mode. Clearly we do not need all the channels to be in aggregation: a packet using a channel in aggregation takes less time to be included into an SDH frame but it takes longer to join its destination. Indeed every aggregation process needs a delay which is larger than the transit time between the demultiplexer and the multiplexer. Thus we have a tradeoff between the occupancy of the SDH frames and the transport time between some OD pairs. We may expect that the first channels in aggregation have a high impact on the occupancy and decrease the delay while the next channels in aggregation slightly change the occupancy and increase the end to end delay. Note also that the end to end delay in an aggregation channel is now random because the packets enter some queues where they wait for an SDH
122
J.M. Fourneau, N. Izri, and D. Verch`ere TRANSIT
OPTICAL SWITCH
DROP
AGGREGATION ADD
Fig. 2. The aggregation channel in a ROADM
frame. This is a clear difference with the usual architecture where after they have been emitted in an SDH frame, packets have a constant transport delay. SDH frames were used until July 2008 on CARRIOCAS pilot networks but the interfaces of the Photonic Cross-Connects (PXC) were upgraded to enable Ethernet packet frames to be directly mapped over Optical Transport Units (OTU). It has the advantages to reduce the wrapping overhead at the aggregation operations. Then it is proposed to change SDH frames into Optical Transport Units (OTU). This change does not modify the assumptions and the dimensioning results of the ROADM model. The rest of the papers is as follows. First in section 2, we give a description of the switch we model and we illustrate the pro and cons of aggregations. Section 3 is devoted to the main results of the analysis. First we prove using the stochastic comparison approach that the population in the buffer is stochastically smaller when we aggregate more channels (even when we add the traffic associated to these channels). This explain that the insertion into an aggregation channel requires less time. Then we establish an analytical formula to obtain the expected number of packets waiting in the buffer. In section 4, we present a numerical analysis of such a buffer and we report some numerical experiments using Xborne [5]. We also present a very simple case study to emphasis some points of the method.
2
A Stochastic Model for a ROADM
Let us now more precisely describe the mechanism and the delays needed for the operations involved in the aggregation process. Remember that we have L fibers and W wavelengths. Therefore the switch must accommodate LW channels and f of them are in transit mode while the remaining are in ADD/DROP mode (let say d or in aggregation mode (say a)). Clearly we have f + a + d = LW .
Analysis and Optimization of Aggregation
123
Let us now consider again the aggregation part depicted in Fig. 2 and let us define the various delays involved. The transit time of a SDH frame on the f transit channels is T1 . When a SDH frame is carried by an aggregation channel, we extract the packets from the frame. We assume that this operation requires T2 unit of times. We assume that the packets have the same size and that the number of packets is a discrete random variable between 0 and K. Once they have been extracted the packets join the ADD buffer in a batch. This buffer also receives packets from the outside. We assume that the arrivals from the outside also follows an i.i.d. batch process. The SDH frames entering the switch are filled with the packets present in the buffer. Filling and sending the frame will also requires a delay equal to T1 . We clearly have a discrete-time systems. T1 and T2 are two constant values depending of the switching system and we assume that T1 < T2 due to some physical constraints (the transit time is faster than the extraction). A packet entering the buffer requires a random response time to leave the buffer and enter a frame. Let T3 be this random variable. We clearly have T3 ≥ 0. In the following we show how to compute T3 , its distribution or its expectation. As the traffic in aggregation is aggregated with the fresh traffic, the packets coming from aggregation channel share the same buffer with the fresh packets coming to the same destination. The buffer is associated with a channel and therefore with a destination and possibly a class of service. Let Bni the batch entering buffer i at time n. The buffer fills a + d frames carried by the same number of channels. Each frame may receive up to K packets. The buffer capacity is B. Let us now compute the end to end delay and show that we can optimize it using aggregation. First note that we do not take into account the propagation delay which is only dependent of the OD pair and which is not impacted by the aggregation process. Assume that a packet enter the network using the ADD link and assume that the channel it used is never aggregated on the path to its destination. Assume that the path length is l. Thus the delay is T3n for entering the network, (l − 1)T1 for the transit and T2 to extract the packets from the DROP link at destination. Now assume that we add a channel in aggregation to help the traffic from that OD pair to enter the network. We prove in the next section that using more frames in aggregation and receiving the traffic in aggregation we still decrease T3 (in the stochastic sense). Therefore E[T3 ] will decrease when we increase a. Now assume that the channel is aggregation will travel along l switches. In some of them it will again be an aggregation channel while in the others it is simply in transit. If we assume that we use the same repartition of the channels for all the switches, we may expect that the expected number of times this channel is in transit is l ∗ f /(f + a) and that the expected number of times it is in aggregation a (T2 +E[T3 ]) is l ∗ a/(f + a). Thus the expected transport time is l f T1 +l a+f . Let us now study the variation of this transport delay with a. As T2 > T1 and T3 ≥ 0 the slope of the function is positive but we must not forget that E[T3 ] decreases when we increase a. Thus the problem is not that simple. . .
124
J.M. Fourneau, N. Izri, and D. Verch`ere
The transport delay is the sum of the insertion time with a), the transit time and the output or extraction time. The end-to-end transit time delay increase with the number of aggregation/de-aggregation operations along the route of the wavelength connection. The insertion time is decreasing with a (the proof is in the next section) and the output time is constant. Thus finding a good number of aggregation channel makes sense. Note that we speak about “good solution” instead of “optimal solution”. Indeed some configurations of aggregation and transit channels may lead to solutions which are not comparable the OD pairs involved in the aggregation are not the same. Thus we prefer to define admissible solutions. A solution is the description of all channels for all switches in the network and the list of channel in aggregation at each switch. Such a solution is feasible if the end to end delay for any OD pair in the traffic matrix is smaller than a threshold. The method we have developed tries to find a feasible solution. Note that the optimization problem us related to the many end to end delay experienced by the traffics in the network. Clearly if we want to optimize the utilization of the SDH frames, it is sufficient to operate all the channels in aggregation mode as a channel in aggregation is much more efficiently used than a transit channel. In the following we prove that E[T3 ] is decreasing with the number of channels operating in aggregation. We model the ADD part of the ROADM as a queue with constant services and a finite or infinite buffer capacity. Of course the buffer is finite but assuming infinite capacity allows to obtain analytical results.
3
Batch Queues with Constant Service Time
Let us first introduce some notation. Let C be the channel capacity and B be the buffer size when it is finite. The arrivals from the aggregation links are in the batch and we also assume that the arrivals of packets from the outside at the ADD link also follows an i.i.d. batch distribution. Clearly we model a discretetime system and we consider a discrete-time Batch/D/C/B queue. We further assume that the services take place before the arrivals. Let Xn be the number of packets in the queue and An be the number of arrivals during slot n. We clearly have: Property 1. The population in the ADD buffer is governed by the following equation when the buffer capacity is infinite: Xn = (Xn−1 − C + An )+ ,
(1)
Xn = (min(B, (Xn−1 + An ) − C)+ ),
(2)
or when B is finite. Furthermore the number of packets in a SDH frame departing the queues is Outn = min(C, Xn + An ). We first prove that when we change one channel from the transit mode to the aggregation mode, the population in the buffer queue is stochastically smaller.
Analysis and Optimization of Aggregation
125
Note that we compare a queue with a servers and a batches of arrivals to a queue with a + 1 servers and a + 1 batches. Thus the property is not that simple. Let us first introduce some definitions and properties about the stochastic comparison of random variables and Markov chains. 3.1
Stochastic Comparison
Let us first define the strong stochastic ordering of random variables (“st-ordering” for short). This ordering is defined by means of the set of increasing functions. We consider here only discrete (finite or infinite) random variables. Indeed, we will consider the set {0, 1, 2, . . . , n} for the state space of a Markov chain or N ∪ ∞ for the comparison of absorption times. We refer to [13] for a far more general introduction to stochastic orders and to [6] for some algorithms. Definition 1. For two random variables X and Y we say that X is smaller than Y in a strong stochastic sense, denoted by X ≤st Y , if E[f (X)] ≤ E[f (Y )], for all increasing real functions f . For discrete random variables, we use the following algebraic equivalent formulation which is far more convenient (see [13] for the equivalence): Definition 2. If X and Y are discrete random variables having respectively p and q as probability distribution vectors, then X is said to be less than Y in the strong stochastic sense, that is X ≤st Y , if pj ≤ qj , for all k. j≥k
j≥k
Let us now illustrate definition 2 by an example: Example 1. Let α = (0.1, 0.3, 0.4, 0.2) and β = (0.1, 0.1, 0.5, 0.3). It follows then that α ≤st β since: ⎡ 0.2 ≤ 0.3 ⎣ 0.2 + 0.4 ≤ 0.3 + 0.5 0.2 + 0.4 + 0.3 ≤ 0.3 + 0.5 + 0.1 For two Markov chains the st-comparison is usually defined as the comparison at each time step: Definition 3. Let {Xk }k≥0 and {Yk }k≥0 be two DTMC on a state space {0, . . . , n}. We say that the chain {Xk } is st-smaller than {Yk }, denoted by {Xk } ≤st {Yk }, if Xk ≤st Yk , for all k ≥ 0. Note that the bounds on a distribution imply bounds on performance measures that are increasing functions of the state indices (see definition 1). We now prove that the queue size and the distribution of the response time T3 is increasing with a using the stochastic comparison. Let us denote X (a) the
126
J.M. Fourneau, N. Izri, and D. Verch`ere
Markov chain which models the buffer size when the queue receives a batches (a) and is associated with a servers. The arrivals are denoted as An and the state (a) of the chain at time n as Xn . Property 2. X (a+1) ≤st X (a) . Proof: by induction on n we prove that Xna+1 ≤ Xna for all n. First by assumption we have: X0a+1 = X0a = 0. The queues are empty at time (a) 0. Consider now the evolution equation for Xn : (a)
+ Xn(a) = (Xn−1 − C + A(a) n ) .
When we change add one channel and its traffic we change C into C + K. Remember that K is the channel capacity. We also have: A(a+1) = Aan + Bna+1 , n where Bna+1 is the batch of customers arriving in the queue on the added channel. By induction we have: (a+1) (a) Xn−1 ≤ Xn−1 . Due to the physical constraints we clearly have: Bna+1 ≤ K. Thus, a+1 Xn−1 − C + Aan ≥ Xn−1 − C − K + A(a) n + Bn . (a)
(a+1)
Then, (Xn−1 − C + An )+ ≥ (Xn−1 − C − K + An + Bna+1 )+ and after (a) (a+1) substitution we conclude the proof: Xn ≥ Xn . The stochastic comparison of the processes implies that all the increasing rewards of their distribution are also ordered. Thus the average queue size and the tail of the population are also comparable. As the service time is constant, this stochastic relation among the buffer (a) size is also true for the response time in the queue. Indeed we have: T3 = XK . Thus T3 is stochastically decreasing when we increase a. (a)
3.2
(a)
(a+1)
(a)
Computing the Expectation of T3
Let us now prove an analytic formula for the expectation of E(Xn ) (PolacekKhinchine type of relation). Property 3. Assume the there exists a stationary regime and let us denote by X the stationary limit of Xn . E[A] is the stationary limit of An . Similarly let us denote by Out the stationary limit of Outn . We have: E[X] =
E[A2 ] − E[Out2 ] 2(C − E[A])
Analysis and Optimization of Aggregation
127
Proof: Remember again Equation 1 and consider the expectation of both sides of the equation: E[X] = E[(X − C + A)+ ]. E[(X + A − C)+ ] = =
∞
i=C (i
∞
i=C
− C)P r(X + A = i)
iP r(X + A = i) − C
= E[X + A] −
C−1 i=0
∞
i=C
P r(X + A = i)
iP r(X + A = i) − C
∞
i=C
P r(X + A = i)
Now due to the linearity of the expectation we have E[X + A] = E[X] + E[A]. ∞ Furthermore, E[Out] = C−1 i=0 iP r(X + A = i) + C i=C P r(X + A = i). Indeed Outn = min(C, Xn + An ). Then clearly E[A] = E[Out] which is a simple consequence of the stationarity assumption. Now using a classical argument [9], square Equation 1: Xn2 = ((Xn−1 + A − C)+ )2 . Again we consider sides of the equation. And note that C−1 the expectation of both ∞ E[Out2 ] = i=0 i2 P r(X + A = i) + C 2 i=C P r(X + A = i). E[X 2 ] = E[((X + A − C)+ )2 ] = =
∞ i=C ∞
(i − C)2 P r(X + A = i) i2 P r(X + A = i) − 2C
i=C
+C 2
∞
iP r(X + A = i)
i=C
P r(X + A = i)
i=C
= E[(X + A)2 ] − +2C
∞
C−1
C−1
i2 P r(X + A = i) − 2CE[X + A]
i=0
P r(X + A = i) + C 2
i=0
∞
P r(X + A = i)
i=C
After substitution we get: E[X 2 ] = E[(X + A)2 ] − 2CE[X + A] + 2CE[Out] − 2C −
C−1
2
i P r(X + A = i) + C
2
∞
i=0
i=C
i=0
i=C
∞
CP r(X + A = i)
i=C
P r(X + A = i)
= E[(X + A)2 ] − 2CE[X + A] + 2CE[Out] C−1 ∞ − i2 P r(X + A = i) − C 2 P r(X + A = i) We substitute E[Out2 ] to get: E[((X + A − C)+ )2 ] = E[(X + A)2 ] − 2CE[X + A] + 2CE[Out] − E[Out2 ]
128
J.M. Fourneau, N. Izri, and D. Verch`ere
Remember that E[Out] = E[A], E[X + A] = E[X] + E[A]. Assuming that X and A are independent we have: E[(X + A)2 ] = E[X 2 ] + E[A2 ] + 2E[A]E[X]. Combining all these relations we finally obtain: 2(C − E[A])E[X] = E[A2 ] − E[Out2 ] This is a Polacek-Khinchine type of relation for the average queue size: E[X] =
E[A2 ] − E[Out2 ] 2(C − E[A])
But E[Out2 ] is unknown. However we can provide some bounds for this moment. Property 4. With the same assumptions, we have: E[X] ≤
E[A2 ] 2(C−E[A]) .
Proof: it is sufficient to remark that E[Out2 ] > 0. It is even possible to obtain a lower bound. Indeed Out is the size of a batch between 0 and K with a known expectation (remember that E[Out] = E[A]). One can find in [10] the distribution which maximizes the second moment (see [3] for an example of application to bound the population of a queue): p(i) = 0, ∀0 < i < K, p(0) = 1 − E[A]/K,
p(K) = E[A]/K.
Property 5. With the same assumptions, we have: E[X] ≥
E[A2 ]−KE[A] 2(C−E[A]) .
Proof: it is sufficient to compute the second moment of the former distribution and the results obtained by Shaked and Shantikumar. These formulas can be used to check that a solution is feasible. Indeed if the upper bound for E[X] is sufficient to prove that the end to end delay is smaller than the threshold, the solution is feasible. But if we need a more precise result we turn to numerical techniques.
4
Markov Chain Model, Numerical Results and Tool
We now assume that the buffer capacity is finite and we analyze the Discrete Time Markov Chain (DTMC) given by Eq. 2 when the batches of arriving customers are i.i.d. Assume that the buffer size is finite and equal to B. We obtain a DTMC with B + 1 states which is very simple to analyze with state of the art numerical algorithms (see for instance Stewart’s book [12]). We use Xborne [5] to build the chain and solve the steady-state distribution. XBorne allows the generation of DTMC, or suitably uniformized CTMC [4] and the computation of stochastic bounds based on lumpability [7] or censoring [8] for the steady state and transient distributions [2]. The bounding algorithms are not really needed if B is smaller than 106 . But the accuracy of the numerical computations must be checked carefully. The user only has to provide the buffer size and the distribution of arrivals. Then the matrix is generated and stored on disk. It is finally solved using classical solvers and the average waiting time is computed and returned to the user. The numerical optimization process proceeds as follows:
Analysis and Optimization of Aggregation
129
1. first make an assignment of the channels to the OD pairs, 2. compute the average end to end delay based on the numerical analysis of the input queue to obtain E[T3 ] plus the number of hops (l) which gives (l − 1)T1 + T2 . 3. check if the solution is feasible (i.e. for each OD pair, E[T 3] + (l − 1)T1 + T2 is smaller than the threshold). 4. If now, find a candidate channel (say O1 D1 ) and try to aggregate several flows using this channel. 5. Compute again the end to end delay for the channel modified in the previous step and check if the solution is feasible 6. Continue until the solution is feasible or it is not possible to find a candidate channel for the aggregation. The candidate channel has a small load and a large number of hops. Due to the load, one can aggregate new traffics and due to the length one can cross several ROADMs where aggregation can be performed. The numerical values for the load and the length are found using the numerical solvers described before. Indeed once aggregated the performance of this channel decreases and we must check that the average delay is still smaller the threshold. Let us illustrate the approach on a toy example. We consider an unidirectional ring topology with 5 nodes denoted as S1 to S5 . We assume that the traffic allows to give only one channel per OD. Thus we have 20 OD pairs. To simplify the first assignment we assume that the same channel is used between Si and Sj in the first part of the ring and between Sj and Si in the second part. Thus we only need 10 channels and the assignment matrix to describe the channels in use is: ⎡ ⎤ . 12 3 4 ⎢1 . 5 6 7 ⎥ ⎢ ⎥ ⎢2 5 . 8 9 ⎥ ⎢ ⎥ ⎣ 3 6 8 . 10 ⎦ 4 7 9 10 . As this step there is no channel in aggregation mode. Let us now suppose that the traffic matrix is (the values are the load of the channel): ⎡ ⎤ . 0.9 0.4 0.2 0.2 ⎢ 0.2 . 0.3 0.5 0.2 ⎥ ⎢ ⎥ ⎢ 0.4 0.5 . 0.4 0.8 ⎥ ⎢ ⎥ ⎣ 0.5 0.2 0.2 . 0.1 ⎦ 0.4 0.1 0.2 0.4 . Assume that due to load in OD pair S1 S2 and S3 S5 , the initial assignment is not a feasible solution. Indeed with this large load the average waiting time to enter the network is very large for these two flows. The initial assignment of load to channels follows in the left part of Table 1. Let us first improve S1 S2 . Channel 7 is used to connect S5 to S2 going through S1 and its load is very small. It is a good candidate to improve the solution. Thus
130
J.M. Fourneau, N. Izri, and D. Verch`ere
Table 1. The load with the first assignment (left), the load after the two aggregations (right)
1 2 3 4 5 6 7 8 9 10
1-2 0.9 0.4 0.2 0.2 0.5 0.2 0.1 0.2 0.2 0.4
2-3 0.2 0.4 0.2 0.2 0.3 0.5 0.2 0.2 0.2 0.4
3-4 0.2 0.4 0.2 0.2 0.5 0.5 0.2 0.4 0.8 0.4
4-5 0.2 0.4 0.5 0.2 0.5 0.2 0.2 0.2 0.8 0.1
5-1 0.2 0.4 0.5 0.4 0.5 0.2 0.1 0.2 0.2 0.4
1 2 3 4 5 6 7 8 9 10
1-2 0.5 0.4 0.2 0.2 0.5 0.2 0.5 0.2 0.2 0.4
2-3 0.2 0.4 0.2 0.2 0.3 0.5 0.2 0.2 0.2 0.4
3-4 0.2 0.4 0.2 0.5 0.5 0.5 0.2 0.4 0.5 0.4
4-5 0.2 0.4 0.5 0.5 0.5 0.2 0.2 0.2 0.5 0.1
5-1 0.2 0.4 0.5 0.4 0.5 0.2 0.1 0.2 0.2 0.4
we make channel 7 an aggregation channel in S1 . Its destination is still S2 . We now have a buffer with 2 output channels (1 and 7) and the load is now 0.5 on both channels. Now we look at origin-destination pair S3 S5 on channel 9. First we look for a candidate for aggregation. Channel 4 is used to connect S1 to S5 by S1 ,S2 , S3 and S4 . Again its load is very low. We operate channel 4 in aggregation at node S3 with destination S5 . Now we have in S3 two channels to send packets to S5 and the average load decrease to 0.5. The load on the various channels for all links are given in Table 2 after these two steps of aggregation. Note that the load of channels is not the same for all links in the case of channel 4 because of the aggregation process. We omit to compute the new end to end delay after these two steps of aggregation and we assume that the solution is now feasible. The loads are reported on the right part of Table 1. Note that the investigation process to find a candidate for aggregation is only a greedy heuristic and we never try to give the best solution. As the networks considered are very small the number of OD pairs is not that large and a branch and bound technique will certainly be possible to find an optimum. However we think it is not really important to find an theoretically optimal solution as most of the information on the arrival processes are quite uncertain. Therefore a feasible solution is sufficient.
5
Conclusions
We have designed a tools to improve the performance of ROADM. This tool is based on numerical analysis of DTMC and some greedy algorithms to select channels to be aggregated. The efficiency of the aggregation is proved by stochastic comparison arguments. This tool will be used to design core networks with aggregation channels. Acknowledgement. this work was partially supported by project System@tic CARRIOCAS and by a grant from ANR SETIN 2006 Checkbound.
Analysis and Optimization of Aggregation
131
References 1. Audouin, O., Cavalli, A., Chiosi, A., Leclerc, O., Mouton, C., Oksman, J., Pasin, M., Rodrigues, D., Thual, L.: Carriocas poject: An experimental high bit rate optical network tailored for computing and data intensive distributed applications. In: Asia-Pacific Optical Communications conference (2007) 2. Busic, A., Fourneau, J.-M.: Bounds for point and steady-state availability: An algorithmic approach based on lumpability and stochastic ordering. In: Bravetti, M., Kloul, L., Zavattaro, G. (eds.) EPEW/WS-EM 2005. LNCS, vol. 3670, pp. 94–108. Springer, Heidelberg (2005) 3. Busic, A., Fourneau, J.-M., Pekergin, N.: Worst case analysis of batch arrivals with the increasing convex ordering. In: Horv´ ath, A., Telek, M. (eds.) EPEW 2006. LNCS, vol. 4054, pp. 196–210. Springer, Heidelberg (2006) 4. Dayar, T., Fourneau, J.-M., Pekergin, N.: Transforming stochastic matrices for stochastic comparison with the st-order. RAIRO Operations Research 37, 85–97 (2003) 5. Fourneau, J.-M., Coz, M.L., Pekergin, N., Quessette, F.: An open tool to compute stochastic bounds on steady-state distributions and rewards. In: 11th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2003), Orlando, FL. IEEE Computer Society, Los Alamitos (2003) 6. Fourneau, J.-M., Pekergin, N.: An algorithmic approach to stochastic bounds. In: Calzarossa, M.C., Tucci, S. (eds.) Performance 2002. LNCS, vol. 2459, pp. 64–88. Springer, Heidelberg (2002) 7. Fourneau, J.-M., Le Coz, M., Quessette, F.: Algorithms for an irreducible and lumpable strong stochastic bound. Linear Algebra and Applications 386, 167–185 (2004) 8. Fourneau, J.-M., Pekergin, N., Youn`es, S.: Censoring Markov chains and stochastic bounds. In: Wolter, K. (ed.) EPEW 2007. LNCS, vol. 4748, pp. 213–227. Springer, Heidelberg (2007) 9. Nelson, R.: Probability, stochastic processes, and queueing theory. Springer, Heidelberg (1995) 10. Shaked, M., Shantikumar, J.G.: Stochastic Orders and their Applications. Academic Press, San Diego (1994) 11. Shankar, R., Florjanczyk, M., Hall, T.J., Vukovic, A., Hua, H.: Multi-degree roadm based on wavelength selective switches: Architectures and scalability. Optics Communications 279(1), 94–100 (2007) 12. Stewart, W.: Introduction to the numerical Solution of Markov Chains. Princeton University Press, New Jersey (1995) 13. Stoyan, D.: Comparaison Methods for Queues and Other Stochastic Models. John Wiley and sons, Berlin (1983) 14. Tang, J., Alan Shore, K.: Wavelength-routing capability of reconfigurable optical add/drop multiplexers in dynamic optical networks. Journal of Lightwave Technology 24(11), 4296–4303 (2006)
Teletraffic Capacity Performance of WDM/DS-OCDMA Passive Optical Network Mohammad Gharaei1, Catherine Lepers2, Olfa Affes3, and Philippe Gallion1 1
Telecom ParisTech, Ecole Nationale Supérieure des Télécommunications, 46 rue Barrault, 75634 Paris, France {mohammad.gharaei,philippe.gallion}@telecom-paristech.fr 2 Telecom & Management SudParis, 9 rue Charles Fourier, 91011 Evry Cedex, France
[email protected] 3 Telecom Lille1, Cité Scientifique - Rue Guglielmo Marconi, BP 20145 - 59653 Villeneuve d'Ascq Cedex, France
[email protected] Abstract. We analyze teletraffic capacity performance of a WDM/DS-OCDMA Passive Optical Network. Hybrid WDM/DS-OCDMA PON is proposed to increase user capacity performance and to be a cost effective method per wavelength of a WDM PON. For each prime code used in this system, an appropriate simultaneous capacity threshold has been evaluated in a manner to ensure good transmission performance (BER~10-9). Since this capacity threshold is inferior to the nominal resource capacity of the considered code, a soft blocking of the system has been determined even if resources in the system are still available. The teletraffic capacity of the WDM/DS-OCDMA system has been then analyzed under the maximum of soft blocking probability constraint for different prime codes. This value is dependent on the probability of channel activity factor. It is demonstrated that using extended quadratic congruent codes (EQC) with better correlation properties, causes the teletraffic capacity to be closer to the nominal teletraffic capacity.
1 Introduction The recent deployments in high broadband telecommunication services and also Internet access with high data rate, build a next step toward a future access network generation. Passive optical network (PON) has been largely accepted as an attractive solution to the last mile bottleneck, providing broadband access networks to end users [1]. PON is a point to multipoint optical network, typically a tree topology, connecting an optical line terminal (OLT) on the service provider side to the multiple optical network units (ONUs) on the subscriber side via a passive star coupler (SC). Time division multiplexing PON (TDM PON) is the commonly accepted PON for network operators, sharing network bandwidth temporally between users. It benefits low installation and maintenance cost although, it does not exploit the huge bandwidth of optical fiber. In order to increase optical bandwidth per user, wavelength division multiplexing PON (WDM PON) was proposed to create point to point links between OLT and users [2]. In WDM PON, each user is assigned a dedicated wavelength S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 132–142, 2009. © Springer-Verlag Berlin Heidelberg 2009
Teletraffic Capacity Performance of WDM/DS-OCDMA Passive Optical Network
133
enjoying a large bandwidth for communication. Protocol transparency and channel independency are the other features of WDM PON. However, the huge bandwidth provided by WDM PON is too expensive per user. Therefore hybrid WDM/TDM PON was proposed to exploit optical bandwidth efficiently with sharing the cost between access network users [3]. Temporal multiplexing per wavelength leads to increase the number of end users in access network. Yet WDM/TDM PON does not support high data rate per user. Also it requires a contention management protocol to avoid the situation in which more users access the channel at the same time. To resolve this contention situation, optical code division multiple access (OCDMA) was proposed [4]-[6]. OCDMA technique permits multiple users access to the transmission media by assigning different optical codewords to different users. It has been demonstrated that using asynchronous CDMA as a quasi contentionless multiple access technique prevents from contention management protocols [7]. In other words, with the contention multiple access protocols there would be no scheduling of transmissions. Different OCDMA techniques have been proposed such as temporal encoding (DSOCDMA) [8], Time/Wavelength encoding (FFH-OCDMA) [9], spectral amplitude coding (SAC-OCDMA) [10] and phase encoding [11]. Let us note that OCDMA is capable to complement WDM by realizing higher spectral efficiency and providing higher user capacity [12]. DS-OCDMA is a convenient technique to be added by WDM which uses temporal encoding. Thus, for a given wavelength in WDM/DSOCDMA PON, each user data bit is encoded with a given sequence of pulses in temporal axes. As a result, different users can simultaneously communicate with each other sharing the same codewords. This approach is then developed for all wavelengths to establish a hybrid WDM/DS-OCDMA network. Quality of service (QoS) in network systems comprises requirements on all aspects of a connection from customer’s point of view. A subset of telephony QoS is grade of service (GoS) requirements, which comprises aspects of a connection relating to network performance and capacity of a network, for example guaranteed maximum blocking probability [13]. Since physical layer of telecommunication networks suffer from different noises and nonlinearities which deteriorate bit error rate (BER) performance, a number of QoS parameters describe the reliability of data transmission, e.g., throughput, and error rate. Generally, multiple access interference (MAI) is considered as the dominant BER performance degradation reason in WDM/OCDMA network [14], [15]. Today, with broadband services demands by huge number of users, determination of system user capacity plays an important role in designing network architecture. Commonly, nominal resource capacity is defined as maximum number of resources in the different systems; these are wavelengths in WDM system and assigned codes in OCDMA system. On the other hand, transmission performance of WDM/DSOCDMA is limited by MAI noise which makes its nominal resource capacity unachievable. Thus we define simultaneous user capacity in OCDMA system as maximum simultaneous active users at acceptable BER performance. Blocking occurs for a new user arrival, when there would be no resource any more for transmission. In WDM/DS-OCDMA PON, there are two types of blocking probability. The first one is hard blocking probability which represents the determined blocking such as reaching to nominal resource capacity in WDM system. The second
134
M. Gharaei et al.
is soft blocking probability which demonstrates the flexible blocking as it is observed in OCDMA systems when calculating simultaneous user capacity. To make a relationship between system capacity and grade-of-service, teletraffic becomes a tool by which investments can be planned [13]. Teletraffic capacity provides information on how a network operator should control data admission to the network so as to ensure a robust QoS. It demonstrates a foreseeable resource capacity with different resource activity rates. Degradation of transmission quality affects the teletraffic capacity of CDMA systems [16]. The teletraffic performance of OCDMA system has been already determined and compared with WDM system by Goldberg et al. [17]. The authors have demonstrated that OCDMA is well suited to applications where conventional hard blocking is undesirable. In this paper, we investigate that the teletraffic capacity performance of OCDMA system is dependent on the code family type. We then analyze teletraffic capacity performance of WDM/DS-OCDMA PON using different prime codes. Initially, we measure the number of simultaneous active users employing different prime codes in DS-OCDMA under system conventional B ER = 10 − 9 . This measurement leads to bring out the notion of simultaneous capacity threshold. Since different prime codes have different correlation properties; they support different simultaneous capacity threshold value. Afterward, these values for different prime codes are used in teletraffic theory to compare the maximum value of teletraffic capacity of WDM/DSOCDMA system. The rest of this paper is organized as follows. In section 2, the simultaneous capacity performance of OCDMA system using different prime codes is defined. In section 3, soft blocking probability of DS-OCDMA is measured for different codes. In section 4, by exploiting the physical limits of OCDMA system in teletraffic capacity, we compare the teletraffic capacity of proposed hybrid WDM/DS-OCDMA PON using different prime codes. Finally we conclude this paper in section 5.
2 System Capacity Performance of WDM/DS-OCDMA PON 2.1 Architecture of WDM/DS-OCDMA PON Fig. 1 shows the proposed network architecture of WDM/DS-OCDMA PON for optical access network where the distance between OLT and ONUs does not exceed 20 km. In this architecture, different optical pulse generators are used at OLT to create short pulses in different wavelength from λ1 to λG . These pulses are forked to N branches in accordance with the code capacity. These pulses are modulated with user data. The modulated data are then coded using temporal OCDMA encoders and subsequently coupled and multiplexed to transport in the access feeder fiber. At the reception part, the filtered wavelengths from λ1 to λG are divided to the same coding branches with the adapted decoder in each branch. Finally, the autocorrelation pulse of the decoder is detected via a receiver at ONUs. Based on OCDMA approach, the encoded data is just detectable by user with the same codeword and it is not decoded by other users.
Teletraffic Capacity Performance of WDM/DS-OCDMA Passive Optical Network
135
OLT Mod
Encoder 1
Decoder 1 λ1
Data
Pulse generator λ1
Mod
Splitter 1:N
.
M
Encoder 2
. . . Data
Mod
.
Data
.
Mod
Mod
Splitter 1:N
Coupler N:1
Feeder fiber
U Encoder 1
E
.
M
.
U
.
RX 2 . . .
Splitter 1:N
Encoder N
Decoder N
RX N ONUs
Decoder 1
RX 1
λG
X
Encoder 2
. . . Data
Mod
Decoder 2
D
Data
Pulse generator λG
RX 1
X
Coupler N:1
Decoder 2
Decoder N
Encoder N
RX 2 . . .
Splitter 1:N
RX N
Data
Fig. 1. Network architecture of hybrid WDM/DS-OCDMA PON
2.2 Simultaneous User Capacity Threshold of WDM/DS-OCDMA PON The WDM/DS-OCDMA nominal resource capacity is determined by simultaneous user capacity threshold for different prime codes. It seems at prior that hybrid WDM/DS-OCDMA PON increase system capacity of WDM PON by G×N. This value is the nominal capacity of the system where G is the number of wavelengths and N is the user multiplexing capacity. However all code resources in each wavelength could not be utilized at the same time due to MAI limitation. Thus the maximum capacity of WDM/DS-OCDMA PON is practically assumed to be G × β where β denotes the simultaneous user capacity threshold of OCDMA system. In this paper, we will focus on impairments resulting from OCDMA system. The performance of OCDMA system is strongly limited by MAI noise. As the number of simultaneously users increases in OCDMA system, BER performance degrades. Since BER performance of OCDMA systems is highly dependent on the code family, we compare the properties of different DS-OCDMA prime codes. First of all, the prime codes are defined as (p,ω,L) where p is the prime number, ω is the weight of the code and L denotes the length of the code sequence. As prime number p is increased in a code type, the weight ω and subsequently nominal user capacity of the code N is respectively enlarged. Prime codes performances are evaluated based on their correlation properties. The autocorrelation A C m and cross correlation CC m , n of prime codes are given by AC m ( s ) =
L −1
∑C i=0
CC m , n ( s ) =
m
{
=ω for s = 0 ( i ) C m (i − s ) ≤ K for 1≤ s ≤ L a
L −1
∑C i=0
m
}
( i ) C n ( i − s ) ≤ K c for 0 ≤ s ≤ L
(1)
136
M. Gharaei et al.
-5
10
BER @10-9 -10
BER
10
-15
10
-20
10
EQC -25
QC
10
PS EP S -30
10
5
10
15 20 Simult aneous active users
25
30
Fig. 2. BER performance versus number of simultaneous active users for different prime code family with p=31
where C m and C n are two different code sequences, and K a and K c are the maximum auto/cross correlation levels. The comparison of correlation properties of prime sequence (PS), extended prime sequence (EPS), quadratic congruent (QC) and extended quadratic congruent (EQC) codes is presented in Table 1. Table 1. Correlation properties of different Prime sequence codes Code family
L 2
ω
N
Ka
Kc 2
PS
p
p
p
p-1
EPS
p(2p-1)
p
p
p-1
1
QC
p2
p
p-1
2
4
EQC
p(2p-1)
p
p-1
1
2
So as to understand the effect of MAI noise in BER performance, we assume N different users for each wavelength in our model. The signal to noise ratio (SNR) is given by the ratio of the autocorrelation peak squared to the variance of the amplitude of the interference. There are (N-1) users’ signals which interfere with desired signal. These interferences are supposed to be uncorrelated and to have an identical variance. Therefore SNR for the code sequence is given by [18]: SNR =
p2 ( N − 1)σ 2
(2)
where σ 2 is the variance of additive noise power. The variance of additive noise power σ is identified as average variance of the cross correlation amplitude [19]. It is calculated over all possible cross correlation 2
Teletraffic Capacity Performance of WDM/DS-OCDMA Passive Optical Network
137
pairs in a given code [19], [20]. Asynchronous OCDMA system has demonstrated a lower bound on the performance [4], [5]. We take into account this assumption for PS and EPS codes. However for QC and EQC codes, we follow the congruent operator to estimate variance of noise power [21]. Fig. 2 shows the number of simultaneous active users in a system using different prime codes which can communicate with a defined BER performance. The degradation of BER performance over increased simultaneous users is observed for all possible OCDMA systems. Different code families take different values since they possess different correlation properties. Hence, we define a threshold value of simultaneous active users at conventional B ER = 10 − 9 for different prime codes. The number of simultaneous active users is fixed at β which represents the simultaneous user capacity threshold. The simultaneous threshold for PS, EPS, QC and EQC code with p=31 are respectively 17, 19, 28 and 30. Let us note that β value is increased as a function of prime number in this system. Beyond this simultaneous threshold β value, the QoS of the transmission will not be guaranteed.
3 Soft Blocking Probability of DS-OCDMA In this section, the soft blocking probability of proposed system is determined when the resource utilization of the system is increased. The state probabilities are directly used to model the performance measurements of the system. Whenever the number of users is greater or equal to the number of channels, the generalized Engset loss model is used [13]. The Engset system is characterized by the following parameters: ρ = offered traffic per user [22], N= number of users, and N% = number of active WDM or/and DS-OCDMA channels. The number of active channels N% is supposed to be a stationary random process over some temporal interval which follows the binomial distribution [23]. Thus the probability distribution in this limited capacity system for a new arrival channel demand concerning about its stochastic random process is given by [13] P ⎡⎣ N% = n ⎤⎦ = E n , N ( ρ ) =
( )ρ N n
n
(1 + ρ ) − N .
(3)
This formula states the probability of a new channel demand rejection corresponding to the case where the system is blocked strictly for a new user channel order. In WDM system model with G users, N% max = G . Thus, neglecting nonlinearities, a WDM system can support a maximum number of users as equal to the number of wavelengths. It has a hard blocking probability since there is not any wavelength available after blocking. However in OCDMA system, N% max ≤ β ≤ N which means that simultaneous user capacity threshold is the maximum value for active OCDMA channel. As soft blocking takes place, there are still at maximum N − β available codes. That’s why it is named as soft blocking probability. Soft blocking probability has its own probability distribution. In order to verify it, binomial distribution has to be defined.
138
M. Gharaei et al.
Binomial P [Yn = m ] =
distribution has probability density function of m (1 − ψ ) n − m . It is a trial which ψ is a "success" outcome and
( )ψ n m
(1 − ψ ) is a "failure" outcome. Subsequently the number of simultaneous active channels M follows the probability distribution which is derived from the total binomial distribution and active channel N% probability distribution. Therefore soft blocking probability is written by:
P [M = m ] =
N
∑ P (Y
n=m
n
= m )E n , N ( ρ ).
(4)
DS-OCDMA system model has a soft blocking probability which means that there would be still accessible channel resource after the simultaneous threshold value but without anymore guarantee. Let us note that ρ parameter is not directly measurable. Just an average transmission rate of a source is observed. So p is defined as the probability of channel activity. The product of (4) results in defining the activity rate of a channel which is represented by: α = p ρ /(1 + (1 − p ) ρ ). The probability of the number of simultaneously active DS-OCDMA channel should be less or equal to the threshold value of β . As a result, the soft blocking probability in proposed architecture is given by Psoft blocking = P [ M > β ] =
∑β ( ) α N
m = +1
N m
m
(1 + α ) − N .
(5)
When WDM/DS-OCDMA channel demands are exceeded simultaneous user capacity threshold (channel demands > β ), new demand is admitted however it does not maintain QoS constraint such as BER performance. To obtain the average number of DS-OCDMA channels supported by the network, the offered traffic should be aggregated. When there is no blocking, the aggregated carried traffic is equal to aggregated offered traffic which is given by: a c = N ( ρ / 1 + ρ ) .
Soft blocking probability
10
10
10
0
10
-5
10
Soft blocking probability
10
-10
-15
10
10
0
-5
-10
-15
PS, channel activity=0.75
PS, channel activity=0.25
EPS, channel activity=0.75
EPS, chan nel activity=0.25
10
-20
QC, channel activity=0.25 EQC, chan nel activity=0.25
10
-20
QC, channel activity=0.75 EQC, channel activity=0.75 EQC, channel activity=1
EQC, chan nel activity=0.5
QC, channel activity=1
QC, channel activity=0.5
EPS, channel activity=1
EPS, chan nel activity=0.5
10
PS, channel activity=1
PS, channel activity=0.5
-25
5
10
15
20
Aggregated offered load (a)
25
30
10
-25
0
5
10
15
20
25
30
Aggregated offered load (b)
Fig. 3. Soft blocking probability versus aggregated offered load for different prime code family with different channel activity ( p ) a) from 0.25 to 0.5 and b) from 0.75 to 1. The horizontal line shows the Max blocking constraint at 10-6.
Teletraffic Capacity Performance of WDM/DS-OCDMA Passive Optical Network
139
Fig. 3 compares the soft blocking probability of DS-OCDMA system exploiting different prime codes versus the aggregated offered load. We have considered 31 users with different channel activity rates. It is confirmed that low channel activity rate p causes low rate of soft blocking probability which makes the network capacity close to nominal capacity. These results demonstrate that EQC code permits to reach higher aggregated offered load with lower soft blocking penalty. Consequently the best aggregated offered load corresponds to the code with better correlation properties.
4 Teletraffic Capacity of WDM/DS-OCDMA PON In this section, we measure teletraffic capacity of WDM/DS-OCDMA PON. The teletraffic capacity with the specification of quantitative measurements for grade of service, forecast the traffic demand and as a result the system teletraffic capacity requirement is one of the objectives of network operators. We define the maximum value of the teletraffic capacity (Max ac ) such that the blocking probability does not exceed the blocking constraint. The blocking constraint is estimated based on system properties such as channel resources and/or commutation loss. To determine the maximum teletraffic capacity in proposed system, we take into account p SMoftaxblocking = 10 − 6 constraint (line inserted in Fig. 3). Fig. 4 shows the teletraffic capacity of WDM/DS-OCDMA architecture using different prime codes. In this model, we consider 31 users per wavelength with totally8 wavelengths. It is demonstrated that using codes with better correlation properties, approaches the teletraffic capacity to the nominal teletraffic capacity corresponding to the nominal resource user capacity. For instance in this model, WDM/EQC-OCDMA PON can stay at maximum nominal teletraffic capacity with a channel activity of 0.55. It is also deducted that teletraffic capacity of WDM system is independent of
250 WDM/DS-OCDMA Nominal Teletraffic Resource Capacity
Teletraffic Capacity
200
150
100 WDM PON WDM/EQC-OCDMA PON WDM/QC-OCDMA PON WDM/EPS-OCDMA PON WDM/PS-OCDMA PON
50
0
0.1
0.2
0.3
0.4 0.5 0.6 Channel activity
0.7
0.8
0.9
Fig. 4. Teletraffic capacity of WDM/DS-OCDMA PON using different prime codes with different probability of channel activity
140
M. Gharaei et al.
-1
10
-2
Max Blocking Constraint
10
-3
10
PS EPS QC EQC
-4
10
-5
10
-6
10
0.1
0.2
0.3
0.4 0.5 0.6 Channel activity
0.7
0.8
0.9
Fig. 5. Shooting points of WDM/DS-OCDMA PON teletraffic capacity using different prime codes versus different probability of channel activity
channel activity p and its hard blocking probability just depends on the number of active channels N. It is depicted as a constant line near zero in Fig. 4. It is also verified from Fig. 4 that in low channel activity rate, the teletraffic capacity is equal to the nominal teletraffic resource capacity. We define the shooting point as the point which the teletraffic capacity initiates to deviate from the nominal teletraffic resource capacity. Fig. 5 demonstrates the shooting points of different prime codes when the Maximum blocking constraints is changed. This figure states how appointing maximum blocking constraint value is important in teletraffic capacity of the network. For example, WDM/EQC-OCDMA PON teletraffic capacity might equal to the nominal resource capacity even with channel activity of 0.8 while we fix a maximum blocking constraint at 10-2. Finally, the results shows that EQC code which have the better correlation properties are more efficient to be used in WDM/DS-OCDMA architecture. Therefore, the DS-OCDMA teletraffic capacity performance using different prime codes leads to define a foreseeable capacity of the system. This foreseeable capacity measurement is primordial for network operators to adapt their statistical channel demands with physical layer network architecture.
5 Conclusions The Capacity of optical access networks plays an important role in designing actual fiber to the home (FTTH) architecture. In this paper, we have proposed a hybrid PON architecture using WDM/DS-OCDMA multiplexing technique. Based on code characteristics, the performance of the simultaneous active users for different prime codes in WDM/DS-OCDMA system is defined and consequently, the simultaneous capacity threshold is estimated for each code. Furthermore, the number of users is extremely increased by bringing into play the higher capacity codes. Then, the soft blocking
Teletraffic Capacity Performance of WDM/DS-OCDMA Passive Optical Network
141
probability of DS-OCDMA is determined with different channel activity versus the aggregated offered load. Finally, teletraffic performance of WDM and DS-OCDMA system with considering the blocking constraint was compared. This comparison leads to justify the increased multiplexing capacity of hybrid WDM/DS-OCDMA system. We have demonstrated that WDM/DS-OCDMA PON with better correlation code properties such as the extended quadratic congruent (EQC) code leads to increase teletraffic capacity. Also by defining the shooting points, network operators can easily set the QoS constraint based on required network capacity in WDM/DSOCDMA PON.
References 1. Effenberger, F., Cleary, D., Haran, O., Kramer, G., Li, R.D., Oron, M., Pfeiffer, T.: An introduction to PON technologies. IEEE Commun. Magazine 45(3), S17–S25 (2007) 2. Park, S.J., Lee, C.H., Jeong, K.T., Park, H.J., Gyun Ahn, J., Song, K.-H.: Fiber-to-thehome services based on wavelength-division-multiplexing passive optical network. J. Lightw. Technol. 22(11), 2582–2592 (2004) 3. Talli, G., Townsend, P.D.: Hybrid DWDM-TDM long-reach PON for next-generation optical access. IEEE Journal of Lightwave Technology 24(7), 2827–2834 (2006) 4. Salehi, J.A.: Code division multiple access techniques in optical fiber networks-Part I: Fundamental principles. IEEE Trans. Commun. 37(8), 824–833 (1989) 5. Salehi, J.A., Brackett, C.A.: Code division multiple access techniques in optical fiber networks-Part II: Systems Performance Analysis. IEEE Trans. Commun. 37(8), 834–842 (1989) 6. Stok, A., Sargent, E.H.: The role of optical CDMA in access networks. IEEE Commun. Magazine 40(9), S83–S87 (2002) 7. Prasad, R.: CDMA for wireless personal communication. Artech House, Norwood (1996) 8. Fsaifes, I., Lepers, C., Lourdiane, M., Gallion, P., Beugin, V., Guignard, P.: Source Coherence Impairments in a Direct Detection DS-OCDMA system. J. Appl. Opt. 46(4), 456–462 (2007) 9. Fathallah, H., Rusch, L.A.: Robust optical FFH-CDMA communications: coding in place of frequency and tempreture controls. J. Lightw. Technol. 17(8), 1284–1293 (1999) 10. Penon, J., El-Sahn, Z.A., Rusch, L.A., LaRochelle, S.: Spectral-Amplitude-Coded OCDMA Optimized for a Realistic FBG Frequency Response. J. Lightw Technol. 25(5), 1256–1263 (2007) 11. Galli, S., Menendez, R., Toliver, P., Banwell, T., Jackel, J., Young, J., Etemad, S.: DWDM-Compatible Spectrally Phase Encoded Optical CDMA. In: IEEE GLOBCOM, vol. 3, pp. 1888–1894 (2004) 12. Sotobayashi, H., Chujo, W., Kitayama, K.: Highly spectral-efficient optical code-division multiplexing transmission system. IEEE J. Sel. Topics. Quantum Electron. 10(2), 250–258 (2004) 13. ITU D Study Group 2. Handbook: Teletraffic Engineering (Januaray 2005), http://www.tele.dtu.dk/teletraffic/handbook/telehook.pdf 14. Shen, S., Weiner, A.M., Sucha, G.D., Stock, M.L.: Bit error rate performance of ultrashortpulse optical CDMA detection under multi-access interferences. Electron. Lett. 36(21), 1795–1797 (2000)
142
M. Gharaei et al.
15. Ramamurthy, B., Datta, D., Feng, H., Heritage, J.P., Mukherjee, B.: Impact of transmission impairments on the teletraffic performance of wavelength-routed optical networks. J. Lightw. Technol. 17(10), 1713–1723 (1999) 16. Evans, J.S., Everitt, D.: On the teletraffic capacity of CDMA cellular networks. IEEE Trans. Veh. Technol. 48(1), 153–165 (1999) 17. Goldberg, S., Prucnal, P.R.: On the teletraffic capacity of optical CDMA. IEEE Trans. Commun. 55(7), 1334–1343 (2007) 18. Yang, G.-C., Kwong, W.C.: Prime codes with applications to CDMA optical and wireless networks. Artech House, Norwood (2002) 19. Maric, S.V., Hahm, M.D., Titlebaum, E.L.: Construction and performance analysis of a new family of optical orthogonal codes for CDMA fiber-optic networks. IEEE Trans. Commun. 43(2/3/4), 485–489 (1995) 20. Prucnal, P.R., Santoro, M.A., Fan, T.R.: Spread spectrum fiber-optic local area network using optical processing. J. Lightw. Technol. 4(5), 547–554 (1986) 21. Pu, T., Li, Y.Q., Yang, S.W.: Research of algebra congruent codes used in twodimensional OCDMA system. J. Lightw. Technol. 2(11), 2557–2564 (2003) 22. Gross, D., Shortle, J.F., Thompson, J.M., Harris, C.M.: Fundamentals of queueing theory. John Wiley & Sons, Inc., Hoboken (2008) 23. Giambene, G.: Queuing theory and telecommunications networks and applications. Springer Science+Business Media, Inc., New York (2005)
Estimation of GoS Parameters in Intelligent Network Irina Buzyukova1, Yulia Gaidamaka2, and Gennady Yanovsky3 1,3
Bonch-Bruevich State University of Telecommunications, Moika emb. 61, 191186 St. Petersburg, Russia
[email protected],
[email protected] 2 Russian Peoples' Friendship University Ordzhonikidze st. 3, 117923 Moscow, Russia
[email protected] Abstract. This paper presents the method of analyzing the post-selection delay in the SS7 channel and in Intelligent Network (IN) nodes which occur in the process of establishing a connection for IN service. This method is based on the application of theory for BCMP networks. The models are derived from the analysis of the call flow and SS7 nodes involved into call processing. As an example of Intelligent Service Freephone from Capability Set 1 is taken and post-selection delay is calculated for it. Keywords: Intelligent Network, BCMP networks, post-selection delay.
1 Introduction When attracting new subscribers and increasing the loyalty of existing customers for telecoms operators providing Intelligent Network (IN) services, it is vitally important to consider the quality and level of service required. Analysis of quality of service (QoS) and grade of service (GoS) indicators is necessary to ensure the fulfillment of quality of service agreements between telecoms operators and their customers, as well as to ensure parameters satisfy international standards. One of the parameters which influences the degree of customer satisfaction is the post-selection delay while using IN services (GoS parameters, recommendations ITU-T E.723, Е.724 [1, 2]). The aim of this article is to outline a method that allows assessment of one of the GoS parameters within an Intelligent Network and, more precisely, delays which occur during the process of establishing a connection. The need for such calculations became evident during the course of research conducted in work [3], in which different configurations of a Russian IN with nodes located across four time zones were analyzed in terms of the required capacity of signaling equipment. Post-selection delay was already investigated in works [4, 5, 6] where generic modeling methodology for the signaling load and the signaling network performance was introduced. The models were obtained by considering the protocol functions of Signaling System No. 7 (SS7) as well as the information flows through these functions. S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 143–153, 2009. © Springer-Verlag Berlin Heidelberg 2009
144
I. Buzyukova, Y. Gaidamaka, and G. Yanovsky
In this paper modeling methodology is based on the theory of exponential BCMP networks. The models are derived from the analysis of SS7 nodes involved into call processing and analysis of the call flow itself. In section 2 the interaction of network nodes when establishing a connection for IN services is examined. Section 3 sets out the mathematical model of the IN fragment; describing the call setup process in the case of a reliable connection. Section 4 contains a mathematical model of the separate nodes in the network. Section 5 outlines the results of mean call setup time post-selection delay calculations for Freephone services on the basis of source data that approximates real-life data.
2 Constructing a Functional Model for the Call Setup Process in IN In the majority of cases the process of establishing a connection for intelligent services involves interaction between the following SS7 nodes: two signaling points (SP), a signaling transfer point (STP), a service switching point (SSP) and a service control point (SCP). To begin with we examine the signaling message exchange process for call servicing using Freephone service from Capability Set 1 as an example. The diagram (fig.1) shows the SS7 messages that influence the overall total post-selection delay in normal conditions within a functioning network. A detailed description of the process and the signaling messages, which are exchanged between IN nodes and signaling points, is given in works [5], [7] and [8]. SPА
SSP
STP
SCP
SPB
IAM TC Begin (IDP)
TC Begin (IDP) TC Continue
TC Continue ACM
(RRB,CIR,FCI, AC,RRB,CON)
(RRB,CIR,FCI, AC,RRB,CON)
IAM ACM ANM
ANM
Fig. 1. The message exchange process when establishing a connection for Freephone service
Estimation of GoS Parameters in Intelligent Network
145
It is worth mentioning that the SCP node shown in the diagram represents a SCP node along with an integrated service data point (SDP). For illustrative purposes the section of the SS7 network, across which message exchange between nodes takes place, has been omitted from this diagram. While conducting our analysis we shall take into consideration the sequence of processing signaling messages with various SS7 subsystems in all of the aforementioned nodes. It should be noted that request processing time in the SDP node database is one of the contributing factors in request processing time in the SCP node and is accounted for by adding a constant.
3 Constructing an Analytical Model of IN Network An open heterogeneous queuing network with several classes of customers (commonly referred to as a BCMP network) can serve as a model for the call setup process for Freephone services [9, 10]. The network is made up of a finite set of service centers (fig.2) which model the reception, processing and transfer of messages to the network in IN nodes, taking into account the queuing delay. Diamonds in the queuing network diagram represent the transfer of messages along a communication path. So as not to overcomplicate the diagram, diamonds have only been added after two service centers. However, it is worth noting that in reality they are present after every service center. A Poisson flow with signaling arrival rate λ FPH enters the network from outside. A customer within the network corresponds to a message entering the IN nodes. Each customer is assigned a class which can be changed when passing from one service center to another. We refer to a customer of class r ∈ R located in service center i ∈ M as (i,r)customer. A multitude of all types of customers serviced within the network can be represented as L = {(i, r ) : i ∈ M , r ∈ R} . Classes of BCMP-customers, which are used when constructing call setup models for FPH services, take the following form: IAM – (1), TC Begin – (2), TC Continue – (3), ACM – (4), ANM – (5). The following numbers will be used to denote nodes within the network: SPА - 1, SSP - 2, STP - 3, SCP - 4, SPB – 5; and the numbers 6-13 will be used for representing message transfer along the signaling data link. Now we will define the types of service centers in the abovementioned queuing network. Let us say that each service center in this queuing network belongs to one of two of the BMCP network node types described below [10]. We will examine an exponential queuing network. A service center of the first type can be described as a single-server queuing system with an infinite buffer and FIFO order. The service time of customers of all classes in node i has exponential distribution with an service rate μ i , i = 1,5 . Service centers of the first type will be used for modeling SP, STP, SSP and SCP nodes. A service center of the second type is made up of a multi-server queuing system with an infinite number of servers. Service centers of the second type will be used for modeling the transfer of messages along the signaling data link. Service centers of the second type correspond to the diamonds illustrated in the queuing network diagram above.
146
I. Buzyukova, Y. Gaidamaka, and G. Yanovsky
Fig. 2. Model of the call setup process in the form of a heterogeneous queuing network with several classes of customer
Overall, the queuing network model is made up of 13 service centers, five service centers of the first type and eight ones of the second type. Mean post-selection delay for Freephone services can be determined by the following way: (1) = TFPH
q
λ FPH
,
(1)
where q - the mean queuing length of the system. Mean queuing length qi in service center i can be calculated using the following formula: qi =
ρi λ i ⋅ Ti (1) = , 1 − ρ i 1 − λ i ⋅ Ti (1)
where ρ i - offered signaling load in service center i, i = 1,13 ,
λi - arrival rate at service center i, i = 1,13 .
(2)
Estimation of GoS Parameters in Intelligent Network
147
Note that λi = λ FPH , i = 6,13 , - this means that for service centers of the second type the given value coincides with the arrival rate λ FPH entering the network. Taking into account the quantity of service centers of the first and second type in the queuing network, the mean queuing length in the system can be calculated in the following way:
q=
5
13
∑ qi , = ∑
λi ⋅ Ti(1)
i =11 − λi
i =1
⋅ Ti(1)
+ 8⋅
λ FPH ⋅ T p(1) 1 − λ FPH ⋅ T p(1)
,
(3)
where Ti(1) - mean service time in type-1 service center i,
T p(1) - mean service time in type-2 service center. (1) Thus, in order to calculate value TFPH it is necessary to determine the mean service time and mean queuing length in service centers of the first and second type.
4 Analysis of BCMP Service Centers We will construct a generic mathematical model of type-1 service center i in the form of a queuing system of the M/M/1 type (fig.3). Service time Ti of customers of class r in the diagram M/M/1 represents the total message processing time in service center i. The ultimate value depends on signaling message processing time at levels 2 and 3 of the MTP subsystem, message processing time with various SS7 subsystems and signaling message transfer time at level 2 of (1) 1 the MTP subsystem ( Tod ) [8]. i
λi
μi
…
Fig. 3. Model of a type-1 service center in the form of M/M/1 queuing system
The mean value Ti(1) for service time in service center formula: (1) Ti(1) = T1(1) i + Tod , i
i
is determined using the
(4)
where T1(1) i - total value of all the time intervals listed above, excepting signaling message transfer time at MTP2 level. Signaling message transfer time Tod i at level 2 of the MTP subsystem of service center i, including transfer queuing delays, can be calculated with the help of a M/G/1 1
od – outgoing delay.
148
I. Buzyukova, Y. Gaidamaka, and G. Yanovsky
B(x)
… F(x) Fig. 4. M/G/1 queuing system with bunker
queuing system with a bunker (fig.4) which has an unlimited supply of customers corresponding to filling signaling units (FSU) [8]. FSU-messages only arrive to the server if, at the point a customer (of any class) has finished being served in the system, there are no further customers of the appropriate message signal unit (MSU). Distribution function B(x) describes the MSU service process and distribution function F(x) describes FSU service. In models with a bunker messages, as examined in the BCMP network model, correspond to MSU; FSU messages consist of a data packet with a fixed length (6 bytes), ensuring synchronization. Processing time in this system T od consists of transfer queuing delays and signaling message transfer time to the communication path. The mean of this value is calculated according to [8] in the following way: 1 ρ* (1) Tod = Tm + (T f + × k1 × Tm ) , 2 1− ρ*
(5)
where ρ * - offered signaling load at the level of the signaling link (SL) of the МТР subsystem,
T
f
- mean transfer time in FSU,
T m - mean transfer time in MSU, k1 - 1st moment of transfer time allocation in MSU. Offered signaling load
a can be determined using this formula: λ ⋅8⋅ L
a=
64000 ⋅ 3600
=
λ⋅L 8000 ⋅ 3600
,
(6)
where λ = λ FPH MSU/hour - arrival rate of MSU entering signaling link, 64000 b/s - data transfer speed, L - mean length of MSU in bytes.
Mean times T m and T
f
for MSU and FSU transfer to the communication channel
can be determined as the ratio: mean length of message to transfer speed. Note that in service centers various classes of messages with different lengths are serviced. Therefore, values of mean MSU transfer time to the communication channel
Estimation of GoS Parameters in Intelligent Network
149
(1) Tm i will differ for each service center i. Consequently, Tod values for each service i center will also differ. Therefore, using formulae (2) and (4)-(6) we can calculate q i values for each type-1 service center. In service center 1 (SPА) customers of classes 1 and 5 are serviced due to the fact SPА sends IAM and ANM messages. Therefore, arrival rate λ1 of the flow entering service center 1 (fig.4) is calculated using formula:
λ1 = λ1,1 + λ1,5 ,
(7)
and the mean queuing length in service center 1 is determined using formula (2), as follows:
q1 =
(λ1,1 + λ1,5 ) ⋅ T1(1)
.
1 − (λ1,1 + λ1,5 ) ⋅ T1(1)
(8)
We calculate values for mean queuing length in the other queuing network service centers in much the same way. For service center 5 (SPB), in which customers of classes 4 and 5 are serviced (ACM and ANM messages):
q5 =
(λ5, 4 + λ5, 5 ) ⋅ T5
(1)
1 − (λ5, 4 + λ5,5 ) ⋅ T5
(1)
.
(9)
For service center 2 (SSP):
q2 =
(λ 2,1 + λ 2, 2 + λ 2, 4 + λ 2,5 ) ⋅ T2
(1)
1 − (λ 2,1 + λ 2, 2 + λ 2, 4 + λ 2, 5 ) ⋅ T2
(1)
.
(10)
For service center 3 (STP):
q3 =
(λ3, 2 + λ3,3 ) ⋅ T3
(1)
1 − (λ3, 2 + λ3, 3 ) ⋅ T3
(1)
.
(11)
For service center 4 (SCP) customer service time T4 (1) in the service center includes request processing time in the database. Value q 4 is calculated using the formula:
λ ⋅T q 4 = 4, 3 4 (1) 1 − λ4,3 ⋅ T4 (1)
.
(12)
Processing time in type-2 BCMP service center T p corresponds to message transfer time along the signaling data link. This depends on the distance between corresponding IN nodes and signal propagation speed in the physical environment.
150
I. Buzyukova, Y. Gaidamaka, and G. Yanovsky
The mean queuing length in type-2 service center q p is calculated using the mean service time in the given service center T p(1) , thus:
qp =
λFPH ⋅ Tp(1) 1 − λFPH ⋅ T p(1)
.
(13)
Therefore, formula (3) for mean post-selection delay for Freephone services takes this form: (1) TFPH =
q1 + q2 + q3 + q4 + q5
λFPH
+ 8⋅
qp
λFPH
.
(14)
5 Case Study During our analysis of post-selection delay for Freephone services we chose data that was as close as possible to real-life data. Thus, volumes of signaling information sent while rendering services were determined on the basis of Freephone service data from one Russian intercity telecoms operator. Mean values for customer service time in nodes was determined based on ITU-T recommendations [11]. During calculation the following source data was used: - arrival rate λ FPH of Freephone service calls - 2500 calls/hour, - access time of SCP node to SDP database - approx. 1ms, - signaling message transfer time T p(1) = 40ms (provided message transfer is via fiber-optic communication lines). Fig. 5 illustrates how post-selection delay for Freephone services depends on the arrival rate of calls for this service, ranging from 250 to 3000 calls/hour. The upper arrival rate value (3000 calls/hour) corresponds to real operational capacity during busy hours. In the histogram (fig. 6) showing call arrival rate for Freephone services (equal to 2500 calls/hour) mean processing time values in each network node are given. These values allow us to estimate the contribution to processing time in the node to gain an overall value for post-selection delay for the considered IN service. Note that the transfer of signaling data between IN nodes can take place not only via protocol SS7 via STP nodes, but also via an IP network with the help of other protocols (e.g. Sigtran). In the case of message transfer via an IP network the corresponding value T p(1) for signaling message transfer time between nodes has a range of 0.5ms to 100ms. The bar charts in fig. 7 illustrate the dependence of mean post-selection delay on message transfer time via an IP network. Calculations are give for λ FPH =2500
Estimation of GoS Parameters in Intelligent Network
151
t, s 3
2,5
2
1,5
1
0,5
0 250
500
750
1000
1250
1500
1750
2000
2250
2500
2750
3000
λ, calls/hour
Fig. 5. Post-selection delay for Freephone services (SS7 network)
t, s 0,6 0,5 0,4 0,3 0,2 0,1 0 SP_A
SSP
STP
SCP
SP_B
8·Tp
node Fig. 6. Mean processing time in IN nodes
calls/hour. Fig.9 also shows a fraction of this transfer time in relation to message processing time in IN nodes. According to call setup procedure, customers pass through nodes SPA и SPB twice, the SSP node four times, and once through the SCP node. This fact was taken into account when creating the graph. Besides due to a change in network data transfer, the time taken for customers to pass through the STP node is not considered here.
152
I. Buzyukova, Y. Gaidamaka, and G. Yanovsky
It is clear from fig. 7 that message transfer time via an IP network directly influ(1) ences the overall post-selection delay and can contribute up to 20% of value TFPH (if
T p(1) =0.1s). In comparison with the result achieved in the case of message exchange (1) . via a SS7 network, it is possible to see a 200-600ms reduction in time T FPH
t, s 2,00
T_FPH
1,50
T_SP_A T_SSP T_SCP
1,00
T_SP_B T_P 0,50
0,00 5,E-03
0,05
0,1
T p, s
Fig. 7. Post-selection delay for Freephone services (IP network)
6 Conclusion This article puts forward a method for calculating post-selection delay for Freephone services based on a mathematical model of an open exponential BCMP network. It also provides the analysis of this parameter based on source data that approximates real-life data. For general service time distribution in BMCP network nodes, estimates for post-selection delay can be gained with the help of simulation. The method developed here can be used for estimating delays that occur during the call setup process, as well as for other intelligent services. In the latter case it is necessary to take into account the specifics of the call setup procedure for each IN service.
Acknowledgements The authors would like to thank Professor K. Samouylov for many important and useful discussions and his suggestions concerning analytical model.
Estimation of GoS Parameters in Intelligent Network
153
References 1. ITU-T Recommendation E.723. Grade-of-Service Parameters for Signaling System No. 7 Networks. ITU, Geneva (1992) 2. ITU-T Recommendation E.724. GOS Parameters and Target GOS Objectives for IN Services. ITU-T (1996) 3. Buzyukova, I.L., Gaidamaka, Y.V.: Signaling Load Investigation in Intelligent Networks with odes located in different time zones. In: Nauchno-tehnicheskie vedomosti SPB STU. №5, pp. 67–74. St-Petersburg (2008) (in Russian) 4. Bafutto, M., Kuhn, P., Willman, G.: Modelling and Performance Analysis for Common Channel Signalling Networks. AEU 47 #5/6 (1993) 5. Willman, G., Kuhn, P.J.: Performance Modelling of Signalling System. IEEE Communications Magazine 7 (1990) 6. Zharkov, M., Chekmareva, E., Samouylov, K.: Methos of Estimating SS №7 Time Responses in Mobile Cellular Communication System. In: Proc. CONTEL 1993, Zagreb (1993) 7. Goldstein, B.S., Ehriel, I.M., Rerle, R.D.: Intelligent Networks. Radio i Svyaz, Moscow (2000) (in Russian) 8. Samouylov, K.E.: Methods for Performance Analysis of SS7 Networks. RUDN, Moscow (2002) (in Russian) 9. Baskett, F., Chandy, K.M., Muntz, R.R., Palacios, F.G.: Open, Closed, and Mixed Networks of Queues with Different Classes of Customers. Journal of the ACM 22(2), 248–260 (1975) 10. Vishnevsky, V.M.: Theoretical Basis for Designing Computer Networks. Technosphera, Moscow (2003) (in Russian) 11. ITU-T: White Book, Recommendation Q.706: Signalling System No.7 – Message Transfer Part Signalling Performance. ITU, Geneva (1993)
Multi-Skill Call Center as a Grading from “Old” Telephony Manfred Schneps-Schneppe1 and Janis Sedols2 2
1 Ventspils University College, Inzenieru st 101 Ventspils Latvia Institute of Mathematics and Computer Science, University of Latvia, Raina bulv 29 Riga Latvia
[email protected] Abstract. We explore parallels between the older telephony switches and the multi-skill call centers. The numerical results have shown that a call center with equally distributed skills is preferable compared to traditional grading-type design. The annex contains a short version of mathematical proof on limited availability schemes design for small call flow intensity λ and for large λ. The proof explores one excellent V. Beneš' paper (from Bell Labs). On its own merit, the annex could initiate new mathematical research in call center area, more by now the powerful software for numerical analysis is available. Main conclusion is the following: numerical analysis of simple multi-skill call centers propose a new principle for call center design, namely: from throughput point of view a multi-skill call center with equally distributed skills is preferable compared to traditional grading-type design. Keywords: Call center, Skill based routing, Teletraffic, Limited availability, Asymptotic expansion for loss probability.
1 Introduction Due to extremely acute research needs on call center performance analysis all around the world there is a reason to talk about the rebirth of teletraffic theory and in this context it is worth to remember Russian queueing theory papers [1], besides them the very pioneering A. Kolmogorov’s paper [2] on Erlang queue problem (1931) and the world-wide known book on queueing theory written by A. Khinchin [3]. Up to 50 years ago one of the authors (the first one) in his young student age had a chance to deliver his results on optimal grading design (in telephony) to Kolmogorov’s applied probability seminar at Moscow State University. The results have surprised Andrey Kolmogorov, namely: in his opinion, in the case of limited availability of switch outlets the optimal scheme should be in form of grading (Fig. 1a). But…that is true for small call flow only. In the case of high call flow values the optimal one is the scheme with equally distributed outlets as in Fig. 1b [4]. These are two limited availability schemes with 4 inlets and 6 outlets. Each inlet can connect to 3 outlets (searching from left to right): a) the grading scheme contains 4 individual and 2 common outlets, b) the scheme with equally distributed outlets: each outlet is available to 2 inlets. (The numerical results of this phenomenon are shown in Fig. 8 and mathematical proof is given in Annex.) S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 154–167, 2009. © Springer-Verlag Berlin Heidelberg 2009
Multi-Skill Call Center as a Grading from “Old” Telephony
a)
155
b)
Fig. 1. Two limited availability schemes.
The aim of the paper is to point out the parallels between the older telephony switches and the multi-skill call centers. Of course, the call center skill based routing is much more complex problem but the discussion seems to be fruitful for call center architects and mangers as well as the teletraffic studies. The paper is organized as follows. Section 2 presents the basic call center mathematical models. Section 3 introduces multi-skill call center issues. Section 4 contains numerical examples for multi-skill call center optimisation. The results have shown that a call center with equally distributed skills is preferable compared to the traditional grading-type design. Section 5 concludes the abstract with discussion on the results for future research. Finally, the annex contains a short version of mathematical proof on limited availability schemes design for small call flow intensity λ and for large λ [5]. The proof explores one excellent V. Beneš' paper [6] from Bell Labs. On its own merit, the annex could initiate new mathematical research in call center area, more so now as powerful software for numerical analysis is available.
2 Call Center Mathematical Models 1. Let‘s notice by n the number of agents (lines or outlets in telephony). The call arrivals are distributed according to Poisson law with intensity λ (i.e. λ calls in a time unit in average) and service time is distributed according to exponential law with parameter μ (i.e. μ calls are served during a time unit in average). Later we will use parameter A = λ / μ as average incoming load. ACD here means automatic call distribution device with some waiting places. From teletraffic theory point of view we have two different systems: 1) loss system and 2) queueing system (Fig 2).
Fig 2. Simplest one-skill call center
156
M. Schneps-Schneppe and J. Sedols
Case 1. When N=0, then we talk about loss system (calls that cannot be handled are given 'equipment busy' tone) and probability of blocking (call loss) is determined by Erlang B formula En (A) = (An/n!)/(1 + A + A/2! +…+An/n!) Case 2. When N is infinite (to simplify the formula), then we talk about queuing system where calls that cannot be handled immediately are queued and probability of waiting is determined by Erlang C formula
An n Pw = n−1 ni ! n −n A A A n + ∑ n! n − A i =0 i! 2. In reality, the one-skill call center is much more complex (Fig 3).
Fig. 3. Full scale one-skill call center
There is a full ‘bucket’ of teletraffic phenomena: 1) waiting calls are impatient and after abandonment could go away (lost calls) or make retrials, 2) the same is true in case of “waiting places of ACD busy”, 3) the served calls also make retrials (return for additional service), etc. A lot of teletraffic results are useful for throughput analysis of this picture [7].
3 Multi-skill Call Center In a typical call center [8], the arriving calls are classified in different types, according to the required technical skill to answer the call, the language, importance of the call, etc. Agents are also classified in skill groups according to the subset of call types they can handle and (perhaps) their efficiency in handling these calls. Calls arrive at random according to some stochastic process. When a call arrives, it may be assigned
Multi-Skill Call Center as a Grading from “Old” Telephony
157
immediately to an agent that can handle it (if there is one available) or it may be put in a queue (usually one queue per call type). When an agent becomes available, the agent may be assigned a call from one of the queues, or may remain idle (e.g., waiting for more important calls). All these assignments are made according to some routing policy that often incorporates priority rules for the calls and agents. Figure 4 illustrates this setting. We assume that there are K call types and I skill groups. In the figure, Si represents skill group i, λk is the mean arrival rate for call type k, and µ k,i is the mean service rate for call type k by an agent of group i. The load of call types k is the total amount of agents’ time required for their service; for example, if all are served by skill group i, their load is λk /µ k,i . Note that the arrival process is usually not a stationary Poisson process and the service times are usually not exponential. Calls waiting in queue may abandon after a random patience time (this is represented by the horizontal arrows in the figure). Those who abandon may call again later, although those retrials are rarely modelled in practice, usually because of lack of sufficient data. Callers who received service may also call again for a number of reasons; these are called returns. In the (degenerate) special case where each agent has a single skill, we have K single queues in parallel. If each agent has all skills, then we have a single skill set and a single queue. The system is obviously easier to analyse in these extreme cases. With all agents having all skills, the system is also more efficient (smaller waiting times, fewer abandonment) if we assume that the service time distribution for a given call type does not depend on the agent’s skill set. However, this assumption turns out to be wrong in practice: agents are usually faster when they handle a smaller set of call types (even if their training gives them more skills). Agents with more skills are also more expensive; their salaries depend on their skill sets. Thus, for large volume of call types, it makes sense to dedicate a number of single-skill agents (specialists) to handle most of the load. A small number of agents with two or more skills can cover the fluctuations in the proportion of calls of each type in the arriving load. The multi-skill call center described above is far too complex to be available for strong mathematical analysis. As initial point for our study we use paper [9] where the authors consider routing and scheduling problems under the assumption that there is no queuing, assuming that queued calls are blocked, or, equivalently, that queued calls abandon right away. We apply our study to simple call center structures made
Fig. 4. A multi-skill call center
158
M. Schneps-Schneppe and J. Sedols
by analogy to real call center with 57 agents (Fig 5) from [10]. This is a grading-type 5-skill call center considered by Ger Koole and his colleagues where skill sets {2}, {3}, {4}, {2,3}, and {3,4} each get 5 agents, the skill groups with skills {1} and {5} each get 7 agents, and skill groups (1,2}, {4,5}, and (1,2,3,4,5} each get 6 agents. Therefore, speaking in telephony terms, this scheme with 5 inlets (5 call flows) has 29 individual outlets, 22 pairs and 6 common outlets. S1=7 S1,2= 6
S2= S3= S4=5
S2,3= 5 S1,…,
S3,4= 5
5
=6
S4,5= 6
S5=7
Fig. 5. Multi-skill call center with 57 agents
4 Numerical Analysis Example 1. Common outlets first? Let’s start with numerical analysis of similar to the multi-skill call center pictured above, but much more simple. Figure 6 shows three rather simple 4-skill schemes. The first two schemes are popular now. They have four individual and two 4-skill agents. The only difference relates to the access of 4-skill agents: on the last step (Fig 6a) or on the first one (Fig. 6b). The certain part of call center administrators offer to deliver callers to the most knowledgeable agent on the first step (as in Fig. 6b). They suppose that such strategy better supports business objectives and earlier recognize caller requirements. Numerical analysis shows (Fig. 7) that the probability of a call loss is always greater for scheme 6b (than scheme 6a). Therefore, from the throughput point of view this recommendation is questionable. Note that numerical analysis is based on Markov process equations (totally 26=64 linear equations).
a)
b)
c)
Fig. 6. Three call center schemes with 4 inlets (equal Poisson call flows), 6 outlets (agents), availability equals to 3
Multi-Skill Call Center as a Grading from “Old” Telephony
159
Fig. 7. Comparison of schemes 6a and 6b: the probability of call loss is always greater for scheme 6b (than scheme 6a).
Example 2. Traditional gradings are recommended for low load only. Our main interest concerns traditional gradings (as Fig. 6a) and schemes with equally distributed skills, namely all 6 agents are two-skill educated (Fig. 6c). The fact is surprising that grading is preferable for low load values only, but with load growing scheme Fig 6c (as shown in Fig. 7) becomes preferable. And these curves cross at the loss probability as low as 0.0025. That is reachable at total load value 0.73 (as shown in Fig. 8), or load per agent equal 0.728/6 = 0.121.
Fig. 8. Comparison of schemes 6a and 6c: grading (a) is preferable for low load values only, but as load grows the scheme (c) becomes preferable.
Example 3. Impact of waiting places. Obviously, the use of waiting places (see ACD in Fig. 2) always reduces call losses. This effect is illustrated in Fig. 9: due to waiting places the loss curves are moved to the right. What is more interesting: due to waiting places the preference of equally distributed agent skills is growing (see Table 1).
160
M. Schneps-Schneppe and J. Sedols Table 1. The crosspoint of loss curves for schemes 6a and 6c Number of waiting places 0 1 2
Total load 0.72855 0.6427 0.616
Probability of call loss at the crosspoint 0.2542 x 10-2 0.9424 x 10-4 0.4091 x 10-5
Fig. 9. Impact of waiting places: due to waiting places the loss curves are moved to right
Example 4. Doubled agents. Let’s double number of agents in each agent group, moving from Figs 6a and 6c to Figs 10a and 10c. Therefore, we have 12 agents totally now (and we need to solve the linear system of 212=4096 equations). What have we seen? The picture in common is not changed (Fig. 11): the scheme with equally distributed skills between agents is practically preferable at any load. The cross-point of loss curves for schemes 10a and 10c practically remains the same, namely: as it shows in Fig 12, loss curves for schemes 10a and 10c cross at loss probability 0.00323 (instead of 0.00254 for schemes 6a and 6c, see Table). It happens at total load 3.65, or load per agent equal 3.65/12 = 0.33.
a)
c)
Fig. 10. Doubled agent groups (comparing to Fig 6a and 6c)
Multi-Skill Call Center as a Grading from “Old” Telephony
161
Fig. 11. Impact of double number of agents in each agent group
Fig. 12. Loss curves for schemes 10a and 10c are crossed at loss probability 0.00323.
Conclusion. Numerical analysis of simple multi-skill call centers propose a new principle for call center design: from throughput point of view a multi-skill call center with equally distributed skills is preferable compared to traditional grading-type design.
5 Future Work The call center industry is large and growing rapidly; there are millions of employees and hundreds of thousands of call centers world wide. The use of call centers by businesses for customer service has grown phenomenally in recent years. By 2003, AT&T estimated the number of call centers in the U.S. at 350,000, employing 6.5 million people. (More generally, we speak of customer contact centers, since telephony is not
162
M. Schneps-Schneppe and J. Sedols
the only communication channel nowadays.) However, it is undesired to hire too many employees because, besides service levels, contact centers also have to meet economical objectives, in particular, minimizing costs due to employee salaries. Minimizing the number of employees is an important subject because labor is expensive. About 80% of operating costs in call centers are due to personnel. Call center operations are also increasing in complexity as features such as skillbased routing, computer telephony integration, and networking multiple sites are incorporated. For an example of complexity of call center dimensioning let’s refer to Mandelbaum’s paper [11] from Technion, Israel. The following is the set of requirements of the CRM of the U.S. NationsBank (Bank of New-York now). How to manage the relationship against three customer groups: RG1: high-value customers, RG2: marginally profitable customers (with potential), RG3: unprofitable customer? How to fulfill eight different criteria (antagonistic in great extent) given in Fig. 13? How to consider multi-skill effects (e.g. several languages)? The problems of such kind are rather far from the traditional teletraffic issues, but it is the reality!
Fig. 13. NationsBank’s CRM Grade of Service criteria
A lot of teletraffic studies are available here, more so now taking into account the powerful software for numerical analysis as well as simulation. Below you will find the annex, containing a short version of the mathematical proof on limited availability scheme design principles for small call flow intensity λ and for large λ. It could initiate a new mathematical research in call center area. This is a challenge for teletraffic researchers.
6 Annex. On Optimality of Limited Availability Switches In a switched-circuit network, the devices known as switches are used to connect the caller to the callee. Each switch has a number of inlets and outlets. Older switches (so called step-by-step exchanges) can only connect some inlets to some outlets. This is known as Limited Availability. Each inlet is available for a group of subscribers,
Multi-Skill Call Center as a Grading from “Old” Telephony
163
forming a call flow. In Limited Availability Switches, the circuits inside the switch are usually arranged into Grading groups. One simple example of 3-step Grading can be seen in Figure 1a having 4 inlets (in another words, 4 incoming call flows) and 6 outlets (lines). Each inlet can connect to 3 outlets (searching from left to right). Grading scheme contains 4 individuals and 2 common outlets. According to Wikipedia [12], gradings are still popular now, and principles of optimal limited availability discussed below are not widely known. As we have shown above by numerical analysis, the classical gradings are preferable for low call flows only. At call loss around 1%, the loss probability curves are cross, and more preferable are the schemes with uniformly distributed outlets. Now we go to the strong mathematical analysis of this phenomenon. We consider rectangular switches with parameters: n – inlets (call flows, subscriber groups), d – availability (number of steps), v – outlets (total number of lines). Therefore, the switch has n · d contacts (points) divided into v groups (outlets). The switch serves n Poisson call flows (each of intensity λ), the holding time is exponentially distributed (with parameter μ=1, for simplicity of formulas). If all d lines available to some call are busy, the call is lost. Therefore, call processing by the switch is described by means of Markov process with state set S of 2v states. Let’s note: ⏐x⏐- the number of busy lines in state x, Ax – the set of states reachable by adding one busy line in state x, Bx – the set of states reachable by releasing one of ⏐x⏐ busy lines in state x, rxy – number of inlets (subscriber groups) whose calls move switch from state x to state y, s(x) – number of inlets having at least one idle line in state x. Obviously,
s( x) =
∑r
xy
y ⊂ Ax
,
R – matrix of elements rxy ,rx – element of matrix R in ⏐x⏐ degree having indexes (0, x). Obviously, rx means the number of paths to move from state 0 to state x only by means of incoming calls We get the state probabilities px as the solution of linear algebraic system
and probability of call loss is equal
In order to study the principles of optimal availability schemes we apply asymptotic expansion of call loss probability in powers of λ (Theorem 1) and in powers of 1/λ (Theorem 2).
164
M. Schneps-Schneppe and J. Sedols
Theorem 1 [6]. Asymptotic expansion of call loss probability at λ → 0. If the sequence {cm (x), m ≥ 0, x ⊂ S} is defined by
and
than the asymptotic expansion of call loss probability in powers of λ is available at point of λ = 0 and the members not equal to 0 are of power not less then λd . According to this theorem for ⏐x⏐> 0 we have expansion
and the call loss probability is equal to
Theorem 2 [5]. Asymptotic expansion of call loss probability at λ→∞. If the sequence {dm (x), m ≥ 0, x ⊂ S} is defined by
and
Multi-Skill Call Center as a Grading from “Old” Telephony
165
than the asymptotic expansion of call loss probability in powers of λ1 is available at point of λ = ∞. According to this theorem we have the expansion
Call loss probability is equal to
and according to Taylor series
π =
D (0) 1 C (0) D ' (0) − C ' (0) D (0) + + o(1 / λ ) C ( 0) λ [C (0)]2
The first four expansion terms are defined by
where li – the number of i-set contact points (For call centers, it means the number of i-agent skills.), sij – the number of inlets for which at least one line is available in state when only lines i and j are idle, ξji – the number of inlets (part of sij) from which calls are coming to line j when only lines i and j are idle. Theorem 3 [5]. On optimality of gradings at λ→0. At λ→0 and given switch parameters (n, d, v) the optimal limited availability scheme should follow the principles:
166
M. Schneps-Schneppe and J. Sedols
1) The contact field (n, d, v) divides (as possible) in contact sets with 1 and n contact points (In case of call center, it means that each agent has 1 or n skills.), and individuals are available earlier than commons, 2) If the above requirement is not fulfilled than contact sets with another contact points are available after individuals and before commons. The proof is based on Theorem 1. Theorem 4 [5]. On optimality of equally distributed contact points at λ→∞. At λ→∞ and given switch parameters (n, d, v) the optimal limited availability scheme should follow the principles: 1) The contact field (n, d, v) divides in contact sets with r or r+1 contact points (In case of call center, it means that each agent has r or r+1 skills.), where r = [nd/v] ([ ] denotes the whole part) , 2) Contact sets with r contact points are available earlier than contact sets with r+1 contact points, 3) Each contact set has equal (as possible) number of common lines (outlets) with each of the other contact sets. The proof is based on Theorem 3. The first two terms in expansion of loss probability in powers of λ1 don’t give any information on the scheme structure. The third term at power λ2 , equal to L/n, gets minimum value when L = v
value. According to definition
∑l i =1
i
v
∑1 / l i =1
i
takes minimum
= nd , where li are integer numbers, therefore
they should be (as possible) equal i.e. equal to [nd/v] or [nd/v] + 1. From that the requirement 1 of Theorem 4 follows. Let’s prove the requirement 3 is necessary for particular case when nd/v is an integer, i.e. equal r. Then
and we need to minimize the first sum. If we denote the number of common inlets for lines i and j by qij then
Multi-Skill Call Center as a Grading from “Old” Telephony
167
This sum takes minimum value when all sij are equal (as possible). It means that each contact set has equal (as possible) number of common lines (outlets) with each of the other contact sets, and the requirement 3 for particular case is proved necessary.
References 1. Schneps-Schneppe, M.A., Gnedenko, B.V., Kharkevich, A.D.: Looking through the exUSSR teletraffic papers. In: 14th International Teletraffic Congress. Proceedings, vol. 1a, pp. 135–143 (1994) 2. Kolmogorov, A.N.: Matem. Sbornik 38(1-2), 101–106 (1931) 3. Khinchin, A.J.: Papers on mathematical queueing theory, Moscow (1963) 4. Schneps-Schneppe, M.A.: New principles of limited availability scheme design, Elektrosviaz 7, 40–46 (1963) 5. Sedol, J., Schneps-Schneppe, M.: Some qualitative study of limited availability schemes. Problemy peredachi informatsii 1(2), 88–94 (1965) 6. Beneš, V.E.: Markov Processes Representing Traffic in Connecting Networks. Bell System Techn. J. 42, 2795–2838 (1963) 7. http://www.math.vu.nl/~koole/research/ 8. L’Ecuyer, P.: Modeling and Optimization Problems in Contact Centers. In: QEST 2006. Third International Conference, pp. 145–156 (2006) 9. Skills-Based Routing and its Operational Complexities. Wharton’s Call Center Forum (2003), http://ie.technion.ac.il/serveng 10. Koole, G., Talim, J., Pot, A.: Routing heuristics for multi-skill call centers. In: Proceedings of the 2003 Winter Simulation Conference, pp. 1813–1816 (2003) 11. Garnett, O., Mandelbaum, A.: An Introduction to Skills-Based Routing and its Operational Complexities. Technion (May 2000), http://fic.wharton.upenn.edu/fic/f0503mandelbaum.pdf 12. http://en.wikipedia.org/wiki/Limited_availability
A Real-Time Algorithm for Skype Traffic Detection and Classification D. Adami, C. Callegari, S. Giordano, M. Pagano, and T. Pepe Dept. of Information Engineering, University of Pisa, Italy {adami,callegari,giordano,pagano,pepe}@netserv.iet.unipi.it
Abstract. In the last years Skype has gained more and more attention from both the users and the scientific community. Namely, the users are interested in its ability to provide a free and reliable way to make phone calls over the Internet, while the scientific community is interested in the reverse-engineering process, because of the proprietary design of the application. In more detail, both Skype protocols and algorithms are unknown and use strong encryption mechanisms, making it very difficult to even reveal Skype presence inside a traffic aggregate. This issue is of primary interest for the scientific community and, above all, of big economical relevance for the operators. In this paper we propose a novel algorithm for detecting Skype traffic, based on both signature-based and statistical approaches. The proposed algorithm is able to reveal in real time the presence of Skype clients in the monitored network, and to distinguish among the several types of Skype “activities”: direct calls, calls with relay node, SkypeOut calls, and file tranfers. To assess the effectiveness of our method we have tested the system over several traffic data sets, collected in different networks. Moreover we have compared the performance offered by our system with those provided by “classical” classification techniques, as well as by the state-of-the-art Skype classifier.
1 Introduction In recent years, the popularity of VoIP-telephony has progressively grown and the majority of network operators has started offering VoIP-based phone services. Skype [1] has rapidly become the most well-known example of this consolidated phenomenon: originally developed by the entrepreneurs who created the pioneering Web applications Kazaa, Skype ended 2008 with 405 million user accounts (more than 42 million of real users [2]), a 47% increase from 2007 [3]. According to TeleGeography Research [2], in 2008 Skype users spent 33 billion minutes talking to people in other countries, representing 8% of all international voice traffic. Moreover, Skype usage hit an all-time peak on March 30, 2009, when more than 17 million users were online at the same time [4]. Unlike other VoIP applications, which generally rely on the client-server architectural model, Skype operates on a P2P overlay network. The communication among Skype users is established according to the traditional end-to-end paradigm except when a relay node is used for symmetric NATs and firewalls traversal. Skype offers S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 168–179, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Real-Time Algorithm for Skype Traffic Detection and Classification
169
several free services: voice and video communication, file transfer, chats, buddy lists, and SkypeIn/SkypeOut to direct call towards the PSTN. Due to its tremendously wide spread and success, Skype has recently attracted the attention of both network operators and the research community. More specifically, network operators require the development of efficient techniques for the identification of the traffic generated by Skype to monitor its impact on network performance, design effective security policies, enforce traffic differentiation strategies, and conveniently charge the use of network services. However, as Skype protocols are proprietary, cryptography, obfuscation, and anti reverse-engineering techniques [5] are extensively adopted and ad-hoc techniques are used to evade NATs and firewalls, the identification and characterization of Skype traffic have been, and still are, hot research topics. At the time of writing, the most complete work concerning Skype traffic classification is [6], where the authors present a methodology working in real-time. In more detail, they propose a framework based on two different and complementary techniques, for revealing Skype traffic from a traffic aggregate. The first one, based on Pearson’s Chi Square test, is used to detect Skype’s fingerprints from the packet framing structure, but is agnostic to VoIP-related traffic characteristics. The second approach, instead, relies on a stochastic characterization of Skype in terms of packet arrival rate and packet length, which are used as features of a decision process based on Naive Bayesian Classifiers. In [7], instead, the authors aim at identifying relayed, rather than direct traffic, and Skype is chosen as a case-study to validate their approach. Other works focus on understanding how Skype protocols work. In [5] the authors provide an overview on Skype design and functionalities, exploring many operational aspects under different network scenarios. Users with public IP addresses as well as users behind a port-restricted NAT or UDP-restricted firewall are taken into account. The work [8] aims at studying Skype operations by means of traffic measurements over a five months period. The authors analyze the user behaviour for relayed, rather than direct, sessions only. Results pertain the population of on-line clients and their usage pattern, the number of super-nodes and bandwidth usage. Finally, the paper [9] investigates Skype signalling mechanisms by means of passive measurements, providing insights into Skype signalling mechanisms and analyzing the cost and complexity of managing Skype P2P overlay network. This paper deals with the identification of Skype traffic and, more specifically, proposes a joint signature-based and statistical approach that outperforms state-of-the-art methodologies. Indeed, as reported in the following sections, our algorithm is able to perform a real-time classification of both signalling and data traffic generated by Skype. A further contribution is related to the detailed description of the signalling traffic, generated during the start-up phase, that has not been reported in any previous work. The paper is organized as follows. Section 2 describes the main characteristics of Skype traffic, focusing on the details that are used to perform the detection. Then Section 3 details the proposed algorithm, while Section 4 presents the experimental results. Finally Section 5 concludes the paper with some final remarks.
170
D. Adami et al.
2 Skype Traffic Before detailing the algorithm implemented to detect Skype traffic, we present a brief analysis of Skype features, focusing on those aspects that are relevant to the detection phase. Despite in this paper, for sake of brevity, we do not provide all the details related to Skype network architecture, it is important to highlight that this knowledge is necessary for a complete understanding of the paper and can be found in [6], where the authors present a deep description of Skype architecture and traffic. In more detail, we assume the reader knows the difference between a Skype Client (SC) and a Skype Supernode (SN), the structure of the Start of Message (SoM), the main characteristics of the voice traffic, and the different types of possible calls (e.g. with or without relay node). 2.1 Skype UDP Ping The Skype UDP Ping is a message exchange, carried out periodically by all the SCs, which consists of two keep-alive messages. These messages are characterized by the function field of the message equal to 0x02. It is worth noticing that some UDP flows only consist of the exchange of such messages. 2.2 Skype UDP Probe The Skype UDP Probe is a message exchange performed when Skype application is launched, to discover the SNs and the network characteristics (e.g., presence of NATs, firewalls, etc.). The Skype UDP Probe consists of 4 UDP messages (Long Skype UDP Probe): two request messages from the SC to the SN (U1 and U3), and two reply messages from the SN to the SC (U2 and U4). Figure 1 depicts the message exchange (sx is the size of the payload of packet x). SC
SN variable size
U1
sU2 = 11
U2
sU3 = sU1 + 5 sU4 = 18, 26, 51, or 53
U3 U4
Fig. 1. Long Skype UDP Probe
After the Skype UDP Probe has been executed, the SC performs the algorithm for the SN selection, which consists of the following steps: if sU4 = 18 bytes then the SC opens a TCP connection (on the same port number used for the Skype UDP Probe) and the contacted node becomes its SN else
A Real-Time Algorithm for Skype Traffic Detection and Classification
171
if sU4 ∈ {26, 51, 53} bytes then the contacted node will not be taken into account any longer # it is likely that these messages can be interpreted as SN “negative” replies end if end if A modified version of this message exchange, only involving U1 and U4 packets (Short Skype UDP Probe), is repeated until a SN is selected Moreover, it is worth noticing that the SC periodically repeats this message exchange (Short Skype UDP Probe), so as to be sure to be always connected to an available SN. 2.3 Skype TCP Handshake Once the SN has been selected, the SC has to establish a TCP connection with the SN, so as to be able to access Skype network. This phase is composed of the following steps: – a TCP connection is opened towards the first node that has positively answered to the Skype UDP Probe (using the same port number used for the reception of the Skype UDP Probe messages) – in case the TCP connection fails, another connection is established with another SN that previously acknowledged an Skype UDP Probe – once the connection has been established, a message exchange, named Skype TCP Handshake, is started It is worth noticing that, even though the whole payload is encrypted, these packets are characterized by the TCP PSH flag set and, some of them, by a fixed size packet payload. Figure 2 displays the message exchange. SC
SN PSH=1, variable size
T1
PSH=1, variable size
T2
PSH=1, variable size
T3
PSH=1, sT 4 = 27 PSH=1, variable size PSH=1, sT 6 = 4
T4 T5 T6
Fig. 2. Skype TCP Handshake
From the figure, we note that six messages (T1, T2, T3, T4, T5, and T6) are exchanged: T1, T2, T3, and T5 are variable size messages, while messages T4 and T6 are 27 and 4 bytes long, respectively. Since this phase is based on the opening of a TCP connection over an arbitrary chosen port, it is obvious that this packet exchange can only be realized if there are no restriction on the usable ports. On the contrary, if the SC resides behind a NAT or a firewall, the connection is established over TCP port 80 (Skype HTTP Handshake) or 443
172
D. Adami et al.
(Skype HTTPS Handshake). This is justified by the fact that the firewalls do not usually restrict the usage of such ports, since they are respectively used by the HTTP and HTTPS protocols. It is worth noticing that Skype uses Transport Layer Security (TLS) protocol over port 443 and a proprietary protocol over port 80. The message exchange over both ports is depicted in figure 3. SC
SN PSH=1, variable size (p. 80) / sR1 = 72 (p. 443) PSH=1, variable size PSH=1, sR3 = 27 (p. 80) / variable size (p. 443) PSH=1, variable size PSH=1, sR5 = 4 (p. 80) / variable size (p. 443)
R1 R2 R3 R4 R5
Fig. 3. Skype HTTP/HTTPS Handshake
Let us analyze the two cases separately: – connection over port number 443 • the payload of the first packet is 72 bytes long and the first 56 bytes have a fixed value (named A1 in the following), corresponding to the Client Hello of the TLS 1.0 protocol • the size of R2 is variable, but the first 79 bytes have a fixed value (named A2 in the following), corresponding to the TLS Server Hello message Moreover, some of the fields of Skype SoM, that should be variable, are instead fixed in these packets (i.e., gmt unix time field (bytes 12-15) and random bytes field (bytes 16-43)) – connection over port number 80 • the payload size of the R3 and R5 messages is 27 bytes and 4 bytes, respectively. 2.4 Skype TCP Authentication Once the connection has been established, the SC needs to authenticate itself to the server. In the TCP authentication phase the SC opens a connection with a Login Server, that is the only centralized server of Skype architecture. Four distinct packets are exchanged during the Skype Login Server Connection phase (L1, L2, L3, and L4), as depicted in figure 4. The main characteristics of these messages are: – L1: the payload size is 5 bytes and it contains a fixed value (named A3 in the following), that is a TLS Server Hello message – L2: the payload size is 5 bytes and it contains a fixed value (named A4 in the following)
A Real-Time Algorithm for Skype Traffic Detection and Classification SC
173
LS PSH=1, sL1 = 5
L1
PSH=1, sL2 = 5
L2
PSH=1, variable size
L3
PSH=1, variable size
L4
Possible further messages
Fig. 4. Skype Login Server Connection
– L3: the first 15 bytes have a fixed value (named A5 in the following), while in a variable position we find A4 – L4: the first 4 bytes have fixed value (named A6 in the following) To be noted that Skype application allows the user to save some authentication credentials. In that case, this message exchange is not realized when the application is launched. 2.5 Skype HTTP Update This message exchange is performed every time Skype is launched and it is aimed at retrieving the application updates. It consists of one unencrypted message, which varies depending on Skype version, but presents either a fixed value (named A7 in the following) in the first 29 bytes (Linux versions) or a fixed value (named A8 in the following) in the bytes 95-124 (Windows versions).
3 Detection Algorithm In this section, we describe the algorithm that allows us to perform a real-time classification of Skype traffic, detailing at first the approach used for UDP traffic and then the one used for TCP traffic. Before discussing the algorithm, we list the parameters that are used in the following: – – – – – – – – –
Di : payload size of packet i F: function field of Skype SoM ti : arrival time of packet i Bi, j : bytes from i to j of the packet payload FLOW: UDP or TCP flow, identified by the “classical” 5-tuple tentative: preliminary decision NOx0d : number of packets with function field equal to 0x0d Bsd : number of bytes sent from the source to the destination Bds : number of bytes sent from the destination to the source
174
D. Adami et al.
3.1 UDP Traffic Given a flow FLOW , the procedure aimed at detecting and classifying the signalling traffic over UDP consists of the following steps: if first packet and F == Ox02 then dim f irst pkt = D1 end if if second packet and dim f irst pkt = 0 then if F = Ox02 and F = Ox07 and D2 ∈ / {11, 18, 26, 51, 53} bytes then dim f irst pkt = 0 FLOW = Short Skype UDP Probe, Long Skype UDP Probe, Skype UDP Ping EXIT end if if F == Ox02 and D2 ∈ {18, 26, 51, 53} bytes and t2 − t1 < 10s then FLOW = Short Skype UDP Probe record the event into the log file “skype events.txt” else if F == Ox02 then FLOW = Skype UDP Ping record the event into the log file “skype events.txt” end if end if end if if third packet and dim f irst pkt = 0 then if F = Ox03 and D3 = dim f irst pkt + 5 bytes then dim f irst pkt = 0 FLOW = Long Skype UDP Probe EXIT if F = Ox02 then if FLOW == Skype UDP Ping then FLOW = Skype UDP Ping record the event into the log file “skype events.txt” else FLOW = Skype UDP Ping end if end if end if end if if fourth packet and dim f irst pkt = 0 then if F = Ox02 then if FLOW == Skype UDP Ping then FLOW = Skype UDP Ping record the event into the log file “skype events.txt” else
A Real-Time Algorithm for Skype Traffic Detection and Classification
175
FLOW = Skype UDP Ping end if end if if F == Ox02 and D2 ∈ {11, 18, 51, 53} bytes then FLOW = Long Skype UDP Probe record the event into the log file “skype events.txt” end if end if As far as the detection and classification of the data traffic are concerned, the system, given a flow, performs the following operation: for each received packet, if the interarrival time with respect to the previous one is less than 3 seconds, computes the quantities NOx0d , Bsd , and Bds . When NOx0d exceeds a given threshold (experimental results have shown that a value of 95 packets allows us to obtain good performance with an acceptable detection time), the system computes ratio = Bds /Bsd . On the basis of the value of ratio, the system distinguishes three events: – direct call if ratio > 0.5 – file transfer if 0 < ratio < 0.5 – call with relay if ratio = 0 Moreover, the system is also able to correctly detect SkypeOut calls, by using a similar procedure (that we do not detail here, for sake of brevity). 3.2 TCP Traffic Given a flow FLOW , the procedure aimed at detecting and classifying the signalling traffic over TCP consists of the following steps: if first packet then if destination port== 443 and D1 > 55 bytes and B1,56 == A1 then tentative = Skype HT T PS Handshake end if if payload == A3 then tentative = Skype Login Server Connection end if if destination port== 80 and D2 ∈ {167, 169, 175} bytes and (B1,29 == A7 or B95,124 == A8 ) then FLOW = Skype HT T P U pdate record the event into the log file “skype events.txt” end if end if if second packet then if D2 > 4 bytes and tentative == Skype Login Server Connection and B1,5 == A4 then tentative = 0 end if
176
D. Adami et al.
if D2 > 78 bytes and tentative == Skype HT T PS Handshake and B1,80 = A2 then FLOW = Skype HT T PS Handshake record the event into the log file “skype events.txt” # the source is a SN end if end if if third packet then if D3 > 4 bytes and tentative == Skype Login Server Connection and B1,15 = A5 then tentative = 0 end if if D3 == 27 then tentative = Skype HT T P Handshake end if end if if fourth packet then if D4 > 3 bytes and tentative == Skype Login Server Connection and B1,4 = A6 then FLOW = Skype Login Server Connection record the event into the log file “skype events.txt” end if if D4 = 27 bytes then tentative = Skype TCP Handshake end if end if if fifth packet then if D4 == 4 bytes and tentative == Skype HT T P Handshake then FLOW = Skype HT T P Handshake record the event into the log file “skype events.txt” # the destination is a SN end if end if if sixth packet then if D == 4 bytes and tentative == Skype TCP Handshake then FLOW = Skype TCP Handshake record the event into the log file “skype events.txt” # the destination is a SN end if end if In the case of TCP traffic, Skype application completely encrypts the packets payload, making it impossible to reveal a call by means of any signature. Our method is thus based on a statistical analysis of the traffic. In more detail, we are able to detect
A Real-Time Algorithm for Skype Traffic Detection and Classification
177
Skype traffic (in this case we cannot distinguish between calls and file tranfers) in two distinct ways: revealing the Skype TCP Ping traffic or the connection tear-down signalling traffic. In the first case, the idea is to reveal Skype calls based on the fact that during a call, Skype application periodically sends some messages with a payload size of 4 bytes and with an inter-time which is multiple of 1 second. Thus, the system counts the number of 4 bytes long received packets, which respect the following conditions: ti − ti−1 ∈ [1, 10] seconds and the difference between the microseconds part of the two timestamps ∈ [−15, 15]. If such number exceeds a given threshold and the mean packet dimension ∈ [10, 600] bytes, the traffic flow is decided to be a Skype flow. Moreover, if the number of packets is greater than 4, the mean packet dimension ∈ [100, 600] bytes, and the total number of bytes belonging to the flow is greater than 40 Kbytes, then the flow is decided to be a call, otherwise it is considered as signalling traffic. The other approach for detecting a Skype call is based on the detection of the traffic generated by the tear-down phase. In this case, the system counts the number of 19 bytes long received packets, for which ti − ti−1 < 3 seconds. If such number exceeds a given threshold and the mean packet dimension ∈ [10, 600] bytes, the traffic flow is decided to be a Skype flow. Moreover, if the number of packets is greater than 6, the mean packet dimension ∈ [100, 600] bytes, and the total number of bytes belonging to the flow is greater than 40 Kbytes, then the flow is decided to be a call, otherwise it is considered as signalling traffic. To be noted, that in this case, the reception of a packet, whose dimension is greater than 150 bytes, resets the counter. This is justified by the fact that we should not observe data packet in the call tear-down phase.
4 Experimental Results To verify the effectiveness of our algorithm, its performance have been compared with those of standard statistical classifiers and the state-of-the-art Skype classifier originally proposed in [6]. To this aim, we have tested the system both off-line and on-line (real time). Regarding the off-line testing, we have collected around 3 GB of traffic data (in libpcap format), partly over a laboratory LAN in our Dept., and partly over a small LAN connected to the Internet through an ADSL link. Moreover, the traffic traces considered in [6] have also been used. In this way, our data are representative of the types of traffic generated in a research lab as well as by home users. These data have been processed by TStat [10] that isolates the different connections and calculates a set of significant features for each connection. Instead, for the on-line testing, we have installed the system over the gateway of our laboratory LAN. As far as Skype traffic is concerned, we have considered calls between two end hosts running SCs (including voice calls, chats and file transfers) as well as calls to traditional PSTN phones (SkypeOut calls). For sake of generality, we used Skype SCs running under Linux (versions 1.4 and 2.0) as well as under Windows (versions 3.2, 3.6, and 4.0). The goal of our algorithm is to correctly identify Skype flows, hence as performance metrics, we have considered the percentage of False Positives (FP) and False Negatives (FN), according to the definitions given in [6]. Since the lengths of the analyzed flows
178
D. Adami et al.
are quite different, these quantities have been estimated on a per-flow and on a per-byte basis, in order to have an insight not only on the number of flows incorrectly classified, but also on the involved amount of traffic. As far as “general purpose” statistical classifiers are concerned, we implemented the following algorithms: – – – –
Naive Bayes Classifier (NB) Linear Discriminant Analysis (LDA) k-Nearest Neighbor (k-NN) Support Vector Machine (SVM)
In order to squeeze the most out of these classifiers, at first we have extracted the best features, according to the Sequential Backward Selection (SBS) algorithm, and then we have normalized them, so as to obtain homogeneous values. The parameters of the classifiers have been tuned during an appropriate training phase. The results are summarized in tables 1 and 2 for UDP and TCP flows respectively. Table 1. UDP flows
FN (Flows) % FN (Bytes) % FP (Flows) % FP (Bytes) %
Our Algorithm Skype Classifier NB 4.8 97.57 11.69 0.06 26.32 95.98 0 2.4 11.83 0 16.85 6.12
LDA 9.44 95.98 13.74 7.27
k-NN 1.9 6.86 10.95 5.33
SVM 2.01 7.88 14.94 17.47
LDA 20.04 14.45 6.95 2.48
k-NN 24.59 27.01 5.08 0.56
SVM 24.44 39.48 3.37 0.11
Table 2. TCP flows
FN (Flows) % FN (Bytes) % FP (Flows) % FP (Bytes) %
Our Algorithm Skype Classifier NB 27.46 84.95 28.41 0.64 56.38 33.07 0.01 9.68 4.49 0.001 94.58 7.27
The previous results highlight that our algorithm outperforms both the general-purpose statistical approaches and Skype classifier [6] since it also employs the knowledge of some Skype “signatures”. The high value of FN at flow level for TCP traffic is due to the fact that up-to-date the features characterizing such exchanges of data between SCs have not been identified yet. Nevertheless it is important to highlight that the unclassified flows correspond to a very low quantity of traffic (bytes), which means that the system is not classifying the signalling traffic, while the calls are correctly classified.
5 Conclusions In this paper we have proposed a real-time algorithm to detect and classify Skype traffic. In more detail, the presented method, by means of both signature based and statitical
A Real-Time Algorithm for Skype Traffic Detection and Classification
179
procedures, is able to correctly reveal and classify the signalling traffic as well as the data traffic (calls and file transfers). The performance analysis has shown that our algorithm achieves very good results over several types of traffic traces, representative of different access networks. Moreover we have shown that our system outperforms the “classical” statistical traffic classifiers as well as state-of-the-art ad-hoc Skype classifier.
Acknowledgment The authors would like to thank Federico Gentile and Fabio Strazzera for their contribution in support of the presented work. This work has been partially sponsored by the European Project FP7-ICT PRISM, contract number 215350.
References 1. Skype web site, http://www.skype.com (accessed on 2009/04/10) 2. Telegeography web site, http://www.telegeography.com/ (accessed on 2009/04/10) 3. Skype by the numbers web site, http://apple20.blogs.fortune.cnn.com/2009/03/31/skype-by-the-numbers/ (accessed on 2009/04/10) 4. Skype users online now web site, http://idisk.mac.com/hhbv-Public/OnlineNow.htm (accessed on 2009/04/10) 5. Baset, S.A., Schulzrinne, H.G.: An analysis of the skype peer-to-peer internet telephony protocol. In: INFOCOM 2006. 25th IEEE International Conference on Computer Communications, pp. 1–11 (2006) 6. Bonfiglio, D., Mellia, M., Meo, M., Rossi, D., Tofanelli, P.: Revealing skype traffic: when randomness plays with you. SIGCOMM Comput. Commun. Rev. 37(4), 37–48 (2007) 7. Suh, K., Figueiredo, D.R., Kurose, J., Towsley, D.: Characterizing and detecting skyperelayed traffic. In: Proceedings of IEEE INFOCOM 2006 (2006) 8. Guha, S., Daswani, N., Jain, R.: An experimental study of the skype peer-to-peer voip system. In: IPTPS 2006: The 5th International Workshop on Peer-to-Peer Systems, Microsoft Research (2006) 9. Rossi, D., Mellia, M., Meo, M.: Understanding skype signaling. Comput. Netw. 53(2), 130–140 (2009) 10. Tstat - tcp statistic and analysis tool web site, http://tstat.tlc.polito.it/index.shtml (accessed on 2009/04/10)
HTTP Traffic Measurements on Access Networks, Analysis of Results and Simulation Vladimir Deart1, Vladimir Mankov1, and Alexander Pilugin2 1
Alcatel-Lucent Training Center, Department of Access Networks, 111024 Aviamotornaya st. 8а, Moscow, Russia
[email protected],
[email protected] 2 Moscow Technical University of Communications and Informatics,
Department of Automatic telecommunication, 111024 Aviamotornaya st. 8а, Moscow, Russia
[email protected] Abstract. Many new factors influence on Internet Traffic nowadays including new features of HTTP/1.1, introduction of new browsers, extremely high penetration of P2P applications and changes in WEB content. A new Toolkit was used by authors to measure Traffic on access network of 1500 users in the center of Moscow. The collected data was handled for 2 levels: TCP connections level and HTTP Message level. Intervals between TCP connections and HTTP Request and Response Length were studied in details. Based on the collected statistics a new HTTP Traffic Model was built by using NS2 simulator and PackMIME package. Keywords: WEB measurements, traffic models, HTTP simulation, NS-2.
1 Introduction During the period of 1996-1999 years a lot of Traffic measurements in Internet were done and as a result some effective models of HTTP Traffic [10-13] were conducted. After 10 years we see the next wave of interest for Traffic measurements in the Net as now it differ greatly from it was those days. Users of modern Multiservice Access Networks share a lot of applications based on different protocol stacks. According to recent measurement results [1, 2] the most popular Internet application is still HTTP generating the major part of traffic in Internet. The leading position of HTTP in current Internet is determined by several reasons. Firstly, WEB browsing is still to be the most popular service on Internet and files transferred by users now become longer. Secondly, there are many new Internet applications based on HTTP, such as Instant Messengers, SW updating programs and P2P applications. One of the most important tasks of Internet Service Provider is to monitor traffic and QoS parameters for their users satisfying. Along with general traffic measurements for HTTP throughput analysis of more detailed statistic data should be obtained. The most detailed statistical data can be collected if all packet headers are S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 180–190, 2009. © Springer-Verlag Berlin Heidelberg 2009
HTTP Traffic Measurements on Access Networks
181
stored in a file for future analysis but this approach can cause problems for storing and handling of a big amount of data. Several SW Tools for monitoring and measurements of HTTP traffic exist currently: monitoring by facilities of SNMP protocol [3, 4], monitoring by NetFlow [5, 6], monitoring based on “black box” principle is like in GigaScope [17] and monitoring by means of TCPump [7]. Monitoring via SNMP protocol and by NetFlow Package, imbedded in routers SW, is good for real time measurements but does not give enough granularity level. While using “black box” monitoring Tool we could not estimate fidelity of measured data and could not change algorithm of analysis that would be necessary for specific investigation tasks. The best decision for obtaining detailed statistical data is to use Tools based on TCPdump utilities. The data analyzed in this paper was collected through online processing on a gateway node of a typical public Moscow Internet Service Provider Network with more than 1500 subscribers. All data from incoming and outgoing flows were collected via WireShark and stored in TCPdump format. The data was collected for three days in February 2008 and finally the dataset of 34 hours was analyzed as a representative one in this paper. New statistical characteristics of HTTP behavior differ from previous results but make it possible to build a new simulation model of HTTP traffic on NS2. Simulation results show a good similarity to measured HTTP characteristics. The rest of this paper is organized as following. Section 2 discussed the method of data collecting and processing. Section 3 describes measurement results and their analysis. Section 4 includes description of the Simulation model and analysis of simulation results. At last the conclusion is given in section 5.
2 Data Collection and Processing The main objective of this paper is analyzing of HTTP characteristics that could be implemented in a simulation model for estimation of throughput and QoS in Access Networks. That is why we use the scheme shown on Fig. 1 for data collecting. All users of this small public Access Network are connected to central Switch (Cisco 4506) via 1G Ethernet links. Outgoing and Incoming Traffic is passing through the Port 0 to the Router (Cisco 7206). Due to the mirroring of this port copies of all sent and received IP-packets are coming through the Span Port 0 to the measurement Server with installed WireShark SW [8]. The collected data is stored in TCPdump files about 100 Mbytes each (1). Next step is handling of TCPdump files with Tshark utility (2). The purpose of this process is to filter fields of packet headers that would be stored in database in clear text format. Depending on a traffic volume and throughput of sever this can be done nearly in a real time. Processed data is injected into MySQL database (3). Modern SQLengines make injection in a real time mode that means the entire process of capture packets on the server network interface, handling TCPdump files with Tshark and injection data in DB can be real time procedure. Following step (4) is destined for the preparation of data for data processing (creating new tables in DB, adding new fields, filtering). Further data processing is done by means of PHP scripts (5). It is especially useful when dynamic SQL-query is needed. For handling of SQL-queries WEB server and PHP interpreter are needed. Further data handling is done by statistical SW Package.
182
V. Deart, V. Mankov, and A. Pilugin
Fig. 1. Data Collection and Processing
In practice a PC platform was used for measurements and data processing with following characteristics: CPU 2400 MHz, RAM 4096 MB, HDD Raid0 1 TB, OS Microsoft Windows Server 2003 R2, Wireshark 1.0.2, MySQL server 5.1, Apache HTTP server 2.2.9, PHP 5.2.6. Experimental results show that throughput of the system depends on RAM size and disc subsystem productivity.
3 Measurement Results In the measured network several services are available for users: WEB surfing, E-mail, VoIP and intranet services including File transfer. As the Provider supplies users with unique IP-addresses they can install their own WEB servers and use P2P applications. Only packets transferred by TCP with source or destination port 80 were selected from data flow for further processing. Table 1. Specification of Analyzed Data Set
Bytes up
Bytes down
2 091 467 796
10 634 203 832
Avg. throughput up, Mbps 4,64
Avg. throughput down, Mbps 23,632
Number of TCP connections 715 427
HTTP Traffic Measurements on Access Networks
183
The data was collected for three days in February 2008. From this period a data set of consequent 34 hours was chosen for busy hour selection. As busy hour was defined between 11:00 and 12:00 all collected data for this period were analyzed. Summary of analyzed data is shown in the Table 1. The session oriented model of HTTP traffic was used for definition the metrics of HTTP behavior in this study. In order to provide essential characteristics for the simulation model we determine following metrics: 1. HTTP message level includes HTTP Request and Response length. 2. TCP connection level includes intervals between TCP connections. Due to the persistent connection of HTTP/1.1 it can be more than one request/response in the connection so we need to measure count of Requests (GET commands) in the connection and interval between them. Summary of statistical results on the defined metrics are shown in Table 2. Most of metrics do not fit any existing probability distribution that is why we split empirical distribution in ranges and try to find a good fitted probability distribution for each range. Table 2. Summary of probability distributions of defined metrics
Metric Interval between TCPconnections
Request (GET) count in TCP connection
Interval between Requests (GET) in TCP connection
HTTP Request length HTTP Response length
Range 0-0,0025 sec. 0,0025-0,5 sec. 1-9
0-0,035 sec. 0,035-0,5 seс. 0,5-1250 seс. 0-1500 bytes 45-10000 bytes 10000100000 bytes
Distribution (parameters) Uniform depended max=0,0025) Exponential (λ=134,13) Exponential (λ=134,13)
(min=0,
Empirical table (1-63,33%; 2-19,18%; 3-8,44%; 4-2,39%; 5-1,82%; 6-1,11%; 7-0,92%; 8-0,54%; 9-0,38%; >91,96%) Uniform (min=0, max=0,035) Exponential (λ=8,67) Exponential (λ=8,67) Log-Log (median = 2,64; shape 1,23; lower threshold = 0,50) Normal (mean=624, stand.deviation 232) Weibull (shape = 0,992; scale 2338,26) Gamma 3-par. (shape = 0,848; scale 0,0000479; lower thr. = 10039,0)
= = = =
184
V. Deart, V. Mankov, and A. Pilugin
3.1 Interval of TCP Connections The calculation of interval of TCP connections was based on measurement of time period between arrivals of the two subsequent SYN-packets. We did not distinguish whether these TCP connections belong to one flow or another because the process of generating of HTTP traffic is very changeable – one user can generate tens of simultaneous TCP connections while other tens of users can generate 1-2 TCP connections. It means that intensity of the data flow on a link will influence the mean value of the interval but we are really interested in the form of this distribution. The distribution of interval of TCP connections is shown on a Fig. 2 and it does not fit any existing probability distribution. More than 50% of all intervals are between 0 and 2.5 microseconds. The mean for measured traffic is 5.47 microseconds. The interval of TCP connections is affected by following factors: WEB browser and user behavior. In our case initial requests were coming from a number of independent users thus constructing a flow with exponentially distributed intervals between TCP connections. Modern browsers after handling first Response start to open parallel TCP connections to make the file transfer from the server faster [14] that means we can see a lot of very short intervals. Taking into account this assumption we split the distribution on Fig. 2 in two distributions: uniform in a timescale of 0 – 2.5 microseconds and exponential in a timescale 0 – 0.05 s (Fig. 3). First (initial) requests are generated from users according to exponential distribution of intervals third request can be generated by browser or by user in a proportion calculated from the interval 02.5 microseconds. For generation of TCP connections we will use an exponential distribution of initial connections while for additional TCP connections generated from browsers a uniform distribution will be used. This scheme is consistent with results for WEB flows described in [9].
Fig. 2. Distribution of intervals of TCP connections
HTTP Traffic Measurements on Access Networks
185
(a)
(b) Fig. 3. Presentation of distribution of intervals between TCP connections as a sum of uniform (a) and exponential distributions (b)
3.2 Request Count in TCP Connection Persistent connection is a default feature of HTTP/1.1 that allows browser to support pipelining – to send more than one request (GET) in TCP connection. The number of Request/Response pairs in TCP connection defines the total amount of the received data. Distribution of requests count in TCP connection is shown on a Fig. 4 and CDF of this distribution – on Fig. 5. More than 63% of TCP connections transport only one HTTP request/response pair. This result is consistent with [9] where a number of such connections were about 83%.
186
V. Deart, V. Mankov, and A. Pilugin
(a)
(b) Fig. 4. Distribution of Requests (GET) in TCP connection: a) PDF, b) CDF
3.3 Length of HTTP Response The length of HTTP response depends on a type of a file transferred. For HTML files length will be less than for image and multimedia files. The measured mean size of response file contains 7.8 KB and median is about 2116 Bytes which is consistent with [10, 11, 12]. The distribution of HTTP Response length is shown on Fig. 5. The mean size of response file is about four times larger than median size which demonstrates the heavy-tailed distribution feature of HTTP response length [10, 11]. In [10] Mah et al. and Smith et al. [11] suggested to use Pareto Distribution for characterizing HTTP response size. In [9] Shuai et al. used Generalized Pareto Distribution which means adding of a scale parameter. Both assumptions were unacceptable
HTTP Traffic Measurements on Access Networks
187
Fig. 5. Distribution of HTTP Response length
(a)
(b) Fig. 6. Presentation of distribution of HTTP Response length splitted in 2 ranges: a) Weibull Distribution for the range 1-10 KB, b) Gamma distribution with 3 parameters for the range 10-100 KB
188
V. Deart, V. Mankov, and A. Pilugin
for measured data to find a good fitted probability distribution we had to split data for two ranges (1-10 KB, 10-100 KB) as it is shown on Fig. 6. A Weibull Distribution is good fitted for the first range and Gamma Distribution with 3 parameters fits for the second range. 3.4 Length of HTTP Request The distribution of HTTP Request size fits with a Normal Distribution with a mean of 624 bytes which consists of result [9].
4 Simulation Further investigation includes building of the simulation model based on the measurement results. An NS-2 simulator [15] with PackMIME Package [16] was used for the model. Two traffic generators was used for producing short (1-10 KB) and long (10-100 KB) HTTP Responses. All other measured distributions were included in the model and PackMIME module was updated to be able to generate distribution of intervals between TCP connections shown on Fig. 3. Results of the simulation made for 3 independent 1 min periods shown in the Table 3. Table 3. Comparison of measured and simulated parameters
Measurement Mean results for 3600 measured Simulation s. value for 60s results (1)
Parameter Packets count in upstream 8 556 290 Bytes count in upstream 2 091 467 796 Packets count in downstream 10 234 226 Bytes count in downstream 10 634 203 832 TCP connection count 715 427 Requests (Get) count 1 291 810
Simulation results (2)
Simulation results (3)
187 604
188 093
142 604
180 496
34 857 796
21 607 387 22 346 238 22 410 250
170 570
160 471
166 842
167 368
177 236 730 182 179 226 189 048 596 190 110 551
11 923
11 442
11 961
11 913
21 530
22 843
23 779
23 760
All simulation results for each 1 min period correspond to measured values that prove a good coherence between simulated and real traffic. The simulation model gives us a possibility to change the parameters of simulated traffic in a wide range and investigate the influence of each parameter on QoS (delays, packet loss...). Most important factors influencing QoS parameters are:
HTTP Traffic Measurements on Access Networks
189
1. Bursts of packets arrivals created by TCP; 2. HTTP Responses having a long-tail distribution in a range of 1-100 KB that causes rare but very long series of packets coming to a buffer in a very short time; 3. Additional TCP connections generated by browsers with very short intervals. In order to show the magnitude of the influence of these factors on QoS parameters we would like to present one simulation example. The simulation was done for the model with a buffer size limited to 100 packets to get the estimation of delays in router’s buffer with real characteristics of incoming HTTP traffic. Results of the simulation are shown on the Fig.7.
Fig. 7. Comparison of the simulated and calculated queue delays
Delays calculated via Little formula for M/M/1/100 model are 5-10 times less then simulation delays. We can mark following reasons for such difference: 1. Incoming traffic consists of initial TCP connections with exponentially distributed intervals and additional TCP connections generated by browsers with very short intervals. 2. HTTP Responses have a long-tail distribution in a range of 1 -100 KB that causes a very long series of packets coming to a buffer in very short time.
5 Summary The new Toolkit for measurement of HTTP traffic has been presented in this paper. Data collection on a typical Moscow Internet Service Provider Network was done for 3 days to get a representative statistical data. Measured results were handled for investigation of HTTP and TCP level metrics. For some metrics we do not find big difference with previously made measurements [9, 10, 11, 12]. The authors have suggested a new model for generation of TCP connections. The generation of HTTP Responses suggested to be done with two generators: first - with Weibull distributed files in the range 1 – 10 Kb, second - with Gamma Distribution of file length in the range 10 -100 Kb.
190
V. Deart, V. Mankov, and A. Pilugin
A new traffic simulation model was build with two types of TCP connections generation and with two HTTP Response generators. Simulation results show that delays in the router’s buffer 5 – 10 times higher than calculated delays. For further investigation a new analytical model of HTTP traffic should be conducted.
References 1. The Academic Research group at Sprint, https://research.sprintlabs.com/index.php 2. CAIDA Internet Data - Realtime Monitors, http://www.caida.org/data/realtime/index.xml 3. Tobi Oetiker’s MRTG - The Multi Router Traffic Grapher, http://oss.oetiker.ch/mrtg/ 4. PRTG Network Monitor, http://www.paessler.com/prtg 5. Cisco IOS NetFlow, http://www.cisco.com/en/US/products/ps6645/ products_ios_protocol_option_home.html 6. JUNOSe J-Flow, http://www.juniper.net/techpubs/software/erx/junose82/ swconfig-ip-services/html/ip-jflow-stats-config2.html 7. TCPdump, http://www.tcpdump.org/tcpdump_man.html 8. Wireshark network protocol analyzer, http://www.wireshark.org/ 9. Shuai, L., Xie, G., Yang, J.: Characterization of HTTP Behavior on Access Networks in Web 2.0. In: Telecommunications, 2008. ICT 2008, pp. 1–6. St. Petersburg (2008) 10. Mah, B.: An Empirical Model of HTTP Network Traffic. In: INFOCOM 1997. Sixteenth Annual Joint Conference of the IEEE Computer and Communications Societies, Driving the Information Revolution, p. 592. IEEE Computer Society, Washington (1997) 11. Smith, D., Hernández Campos, F., Jeffay, F., Ott, D.: What TCP/IP Protocol Headers Can Tell Us About the Web. In: ACM SIGMETRICS Performance Evaluation Review, pp. 245–256. ACM, New York (2001) 12. Crovella, M., Bestavros, A.: Self-similarity in World Wide Web Traffic: Evidence and Possible Causes. IEEE/ACM Transactions on Networking 5(6), 835–846 (1997) 13. Hyoung-Kee, C., Limb, J.: A Behavioral Model of Web Traffic. In: Proceedings of the Seventh Annual International Conference on Network Protocols, p. 327. IEEE Computer Society, Washington (1999) 14. Hopkins, A.: Optimizing Page Load Time, http://www.die.net/musings/page_load_time/ 15. The Network Simulator – NS2, http://www.isi.edu/nsnam/ns/ 16. Cao, J., Cleveland, W., Gao, Y., Jeffay, K., Smith, F., Weigle, M.: Stochastic Models for Generating Synthetic HTTP Source Traffic. In: INFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEE Computer and Communications Societies, Hong Kong, vol. 3, pp. 1546–1547 (2004) 17. Cranor, C., Johnson, T.: Gigascope, Spatscheck O.: How to monitor network traffic 5Gbit/sec at a time. AT&T Labs (2003)
The Video Streaming Monitoring in the Next Generation Networks A. Paramonov1, D. Tarasov2, and A. Koucheryavy2 1
St. Petersburg Science Research Telecommunication Institute (LONIIS), 196128 St. Petersburg, Warshavskay str.11, Russia
[email protected] 2 Central Science Research Telecommunication Institute (ZNIIS), 111141 Moscow, 1 Proezd Perova Polja, Russia
[email protected],
[email protected] Abstract. The Next Generation Network (NGN) Monitoring System (NMS) is considered. One of the most important problem in the NMS creation is the video streaming monitoring. The IPTV traffic monitoring features are researching at the paper. The self – of similar traffic characteristics, Hurst parameter estimation methods, inverse wavelet transform and multifractal wavelet model (MWM), ONOFF model are using in the paper. The key paper results are the IPTV traffic aggregation interval determination and the comparison the service parameter (queue length) for actual (measured), inverse wavelet transform and ONOFF traffic with similar values of Hurst parameter for G/M/1.
1 Introduction The NGN (Next Generation Network) concept is the most important direction for network development today. The NGN concept is determined in ITU-T Recommendation Y.2012 and Y.2021 [1, 2] and gives the possibilities for operators to implement the not limited number of services on the unique base – public packet-switch network. But, the complexity of NGN technical means and the vendors number extension request more and more attention to the tests [3] and monitoring [4] methods for NGN. The NGN monitoring system (NMS) structure are defined in ITU-T Recommendation Q.3902 [4] and shown on the fig.1. The NGN monitoring system includes the following subsystems: − − −
SS7 monitoring subsystem, SIP, H.248 monitoring subsystem.
All of this subsystems are independent from vendors technical means and could be realized on the probes base produced by any license party. − − −
The next subsystems are dependent from the vendor realizations: faults and reconfigurations monitoring subsystem, IP flows monitoring subsystem,
S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 191–205, 2009. © Springer-Verlag Berlin Heidelberg 2009
192
A. Paramonov, D. Tarasov, and A. Koucheryavy
NMS Center
Faults and reconfigurations subsystem
SS7 Subsystem
SIP Subsystem
Vendors Monitoring system
H.248 Subsystem
IP flows subsystem
VOIP subsystem
Video streaming subsystem
SLA/SLS subsystem
Application servers subsystem
Fig. 1. NMS functional architecture
− − −
Video streaming monitoring subsystem, SLA/SLS (Service Level Agreement/Service Level Specification) monitoring subsystem, Application servers monitoring subsystem.
Any more subsystem could be added to NMS in the future, but the fig.1 is shown the general base for NMS. The monitoring parameters set are not fully determined now. But some important positions are clearness today. They are, for example, - losses, delays, variation of delay in accordance with Recommendation ITU-T Y.1540 [5] and Y.1541 [6]. The good experience for monitoring losses and delays was begun during SS7 monitoring system operation [7]. The NGN applications bringing the some new problems for monitoring in accordance with self – similar traffic characteristics for IP, especially for video streaming traffic – IPTV. Some ITU – T recommendations for IPTV traffic monitoring are acceptable today [8, 9], but the IPTV traffic features request more searching. The right Hurst parameter is the most important functional parameter for IPTV traffic presentation. The value of observation time for Hurst parameter measurement are needed for the monitoring result interpretation. Furthermore, the wavelet analyze for self – similar traffic could be used for video streaming monitoring and for Hurst parameter estimation probable. Both of this problem for NMS creation are the key issues of the present paper.
The Video Streaming Monitoring in the Next Generation Networks
193
2 Background and Related Works The IPTV traffic is the self-similar traffic. The self-similar is determined by autocorrelation function (ACF). Let consider the process x= (x1, x2…..xt….), the t=1,2….. Further consider the aggregate process on the block space m:
X (m) = ( X 1
(m)
,X2
(m)
,.. X t
(m)
..)
(1)
Where
Xt
(m)
=
1 ( X tm− m+1 + … + X tm ) . m
If the ACFm = ACF, where m=2,3……, the process x is the self-similar. The Hurst parameter is the functional determined up the self-similar process. This functional could be found from:
⎛ D( X ( m ) ) ⎞ ⎟⎟ = (2 H − 2) ln(m) , ln⎜⎜ ⎝ D( X ) ⎠
(2)
where H - Hurst parameter, D – dispersion for x and x(m) respectively. The dispersion D (x(m)) are determined as:
D( X ( m ) ) = m 2( H −1) D( X )
(3)
We will use the discrete wavelet transform for estimation IPTV traffic further. The discrete wavelet transform are determined as follows:
W ( a, b) = Φk =
1 ∑ X k (Φ k +1 − Φ k ) a k tk
*
X (t ) = ∑ U J 0 , k φ Jo , k (t ) + k
(4)
⎛t −b⎞ ψ ⎜ ⎟dt ∫ a −∞ ⎝ a ⎠
1
∞
∑ ∑W
J =J 0 k
ψ J , k (t )
J ,k
there Uj and Wj are the coefficient for wavelet transform. The are many function types for Ψ(t) performance. We will use the Haar function further as the neighbor function for the packet traffic. The Haar function Ψ(t) is the next:
0 ≤ t < 1/ 2 ⎧1, ⎪ ψ (t ) = ⎨− 1, 1 / 2 ≤ t < 1 ⎪0, t < 0, t ≥ 1 ⎩
(5)
194
A. Paramonov, D. Tarasov, and A. Koucheryavy
Fig. 2. ONOFF method
We will use the ONOFF method too for generation self – similar traffic for comparison Hurst parameter values at the paper [10]. ONOFF method as it shown in the fig.2 unites some arrival process. Usually, the distribution of ON and OF intervals following Pareto low. For example, we use the model with three traffic generators, each of which generated Poisson arrival process. The diagram for one separate generator is shown in the fig.3. The traffic of all generators was aggregated by ONOFF methods and the common arrival process is shown in the fig.4. At the first case H=0.56 and the arrival process could be consider following Poisson law. For the second case H=0.92 and we could be consider this process as the self-similar. There are many papers where IPTV traffic analyzing. The Hurst parameter estimation methods studying in [11]. The Hurst parameters for different cases for movies and Music Videos (Star Wars, Hackers and soon) from [12] considered in [11] for differences estimation methods. It should be noted, that [11] conclusion is the next: estimation methods for the Hurst parameter can give poor estimates, except the wavelet method and the paper proposed method HEAF (Hurst Exponent Autocorrelation Function). The videoconference traffic model considering in [13]. The Discrete Autoregressive model (DAR) using for traffic modeling and comparison between actual and modeling traffic are discussed. The multifractal wavelet model (MWM) for network application traffic was proposed in [14]. The main idea is the using network measurement results for calculation the Uj and Wj coefficients for wavelet transform. The next idea from [14] is the calculation of new coefficient Uj and Wj for traffic model on the base of Ai , k i random number from the interval
− 1 ≤ Ai ,ki ≤ 1 .
The Video Streaming Monitoring in the Next Generation Networks
195
The WJ,k and UJ,k are determined as follows:
WJ ,k = AJ ,k J U J , k −
J 2
j −1
[
U J ,k = 2 U 0,0 ∏ 1 + (−1) ki Ai ,ki i =0
'
]
(6)
The inverse wavelet transform are made on the base of new coefficients (6):
X (t ) = ∑ U J 0,kφ Jo ,k (t ) + k
0
∞
∑ ∑W
J =J 0 k
ψ J ,k (t )
(7)
J ,k
20
10
1
0
0
200
400
0
600
800
i
1000 1000
Fig. 3. Poisson arrival process
50 40
20
0
0
0 0
200
400
600 i
Fig. 4. Self–similar arrival process
800
1000 1000
196
A. Paramonov, D. Tarasov, and A. Koucheryavy
WJ ,k = 2
−
−
J 2
J 2
j −1
[
AJ ,k J U 0,0 ∏ 1 + (−1) ki Ai ,ki j −1
[
i =0
'
U J ,k = 2 U 0, 0 ∏ 1 + (−1) ki Ai ,ki i =0
'
]
]
We will use (7) for comparison modeling and measured traffic further. The software tool for MWM realization is acceptable in [15].
3 The IPTV Traffic Measurements The IPTV traffic measurements for paper cases study was made at ZNIIS model network [16]. The actual traffic of movies named Ohotnik, likely thriller performed on DVD was analyzed for differences times fragments and for differences aggregation period in processing of measuring data. The minimum observation movies fragment is the 60s. The fragment includes 6863 frames, each of frame has the length 1356 bytes. The fig. 5, fig. 6 and fig. 7 shown the measured traffic for 50ms, 100 ms and 1000 ms aggregation intervals respectively. The Hurst parameter estimation and the autocorrelation function are shown on the fig.8 and fig.9. The H=0.48 and the first view for the measured traffic is the next: the traffic flow following Poisson law. The wavelet transform for Poisson flow is shown on the fig.10 and wavelet transform for measured traffic is shown on the fig.11. The analyze on the fig.10 and fig.11 data shows that Hurst parameter is not efficient metric for self – similarity traffic for the small observation interval and, of course, for the small aggregation interval.
Fig. 5. The measured traffic for 50 ms aggregation interval
The Video Streaming Monitoring in the Next Generation Networks
197
The IPTV traffic is the asymptotic self – similar. Let’s consider the Hurst parameter estimation dependence from aggregation interval length. The movies fragment of Ohotnik searching for the 45 minutes too. The fig.12 shows the Hurst parameter dependence from aggregation interval up to 100 000ms. As we see, the reliable estimation could be made after the 50 000 ms aggregation interval.
Fig. 6. The measured traffic for 100 ms aggregation interval
250000
200000
150000
100000
50000
0 1
4
7
10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58
Duration s Fig. 7. The measured traffic for 100 ms aggregation interval
198
A. Paramonov, D. Tarasov, and A. Koucheryavy
Fig. 8. The Hurst line estimation
r (k ) 0.8
0.6
0.4
0.2
0
0
10
20
30 k
Fig. 9. The autocorrelation function
Fig. 10. Wavelet transform for Poisson traffic
40
50
The Video Streaming Monitoring in the Next Generation Networks
199
Fig. 11. Wavelet transform for measured traffic (fragment 60s)
Fig. 12. The Hurst parameter dependence from aggregation interval
4 The Hurst Parameter and Wavelet Transform Using for Monitoring IPTV Traffic The fragment Ohotnik traffic size 260000 packets and length 45 minutes is shown on the fig.13 for measured traffic and the fig.14 for model traffic in according MWM for differences aggregation intervals. The mean time value for interval between packets is t = 0,0109 ms for the first case and t = 0,0112 ms for the second case. The mean value of the packets flows is a a = 91,74 packets per second and a = 89,29 packets per second for actual and model traffic respectively. The fig. 15 show the autocorrelation function (ACF) and fig. 16 the Hurst parameters for both cases. The ACF is determined as follows:
200
A. Paramonov, D. Tarasov, and A. Koucheryavy N −k
r (k ) =
∑ (X i =1
i
− X )( X i + k − X ) ;
(N − k )σ 2
for measured traffic and
r (k ) =
(
1 (k + 1) 2 H − 2 ⋅ k 2 H + (k − 1) 2 H 2
)
for MWM model traffic. The wavelet transforms for measured and model traffic are shown on the fig.17. Furthermore, let’s try to create the model traffic with similar Hurst parameter value by ONOFF method. The fig 18 show the model traffic, which created by ONOFF method, and the wavelet transform for it.
35
200
30 150
25 20
100
15 10
50
5 0
0
0.5
1
1.5
2
0
2.5
0
6
ai=100 ms
0.5
1
1.5
2
2.5 6
ai=1000 ms
x 10
x 10
10000 8000
1000
6000 4000
500
2000 0
0
0.5
1
1.5
2
0
2.5
0
0.5
1
6
ai=10000 ms
x 10
1.5
2
2.5 6
ai=100000 ms
x 10
Fig. 13. The measured actual traffic for differences aggregation intervals (ai – aggregation interval)
The Video Streaming Monitoring in the Next Generation Networks
201
35 200
30 25
150
20 100
15 10
50
5 0
0
0.5
1
1.5
2
0
2.5 6
ai=100 ms 1500
0
0.5
1
1.5
2
2.5 6
ai=1000 ms
x 10
x 10
10000 8000
1000
6000 4000
500
2000 0
0
0.5
1
1.5
ai=10000 ms
2
0
2.5 6
x 10
0
0.5
1
ai=100000 ms
1.5
2
2.5 6
x 10
Fig. 14. The MWM model traffic for differences aggregation intervals (ai – aggregation interval)
Fig. 15. The ACK for measured and model traffic
~ How we can compare three process X(t) – measured traffic, X (t ) -model traffic,
created by MWM and Xˆ (t ) - model traffic, created by ONOFF method with the similar value of Hurst parameter? We proposed to check the service parameters, for example, queue length for all traffic in the G/M/1 system as it shown on the fig.19.
202
A. Paramonov, D. Tarasov, and A. Koucheryavy
Fig. 16. The Hurst parameter for measured and model traffic
Measured
Model
Fig. 17. The wavelet transform for measured and model traffic
Fig. 18. Wavelet transform for ONOFF traffic
The Video Streaming Monitoring in the Next Generation Networks
τ = 10 ms ~ X (t ) X (t ) Xˆ (t )
L(t ) L
Fig. 19. The G/M/1 system
L = 209 Fig. 20. Average length for actual traffic
L = 212 Fig. 21. Average length for MWM model traffic
203
204
A. Paramonov, D. Tarasov, and A. Koucheryavy 500 450 400 350 300 250 200 150 100 50 0
0
200
400
600
800
1000
1200
L = 126 Fig. 22. Average length for ONOFF traffic
~ The results for measured traffic x(t) is shown on the fig.20, for MWM traffic X (t )
–on the fig.21 and for ONOFF traffic Xˆ (t ) – on the fig.22. The average length for ONOFF method less than average length for measured traffic and MWM traffic almost two times. As wee seethe Hurst parameter is not efficient metric for IPTV traffic performance. The wavelet transform is needed for more better IPTV traffic estimation.
5 Conclusions 1. The data aggregation interval for monitoring video streaming should be not less then 5s. 2. The Hurst parameter is not efficient metric for IPTV traffic estimation. The wavelet transform, for example MWM model, could be used for IPTV traffic estimation.
References 1. Recommendation Y.2012. Functional requirements and architecture of the NGN. Geneva, ITU-T (September 2006) 2. Recommendation Y.2021. IMS for Next Generation Networks. Geneva, ITU-T (September 2006) 3. Recommendation Q.3900. Methods of testing and model network architecture for NGN technical means testing as applied to public telecommunication networks. Geneva, ITU-T (September 2006) 4. Recommendation Q.3902. Operational parameters to be monitored when implementing NGN technical means in public telecommunication networks. Geneva, ITU-T (January 2008) 5. Recommendation Y.1540. Internet protocol data communication service – IP packet transfer and availability performance parameters. Geneva, ITU-T (November 2007)
The Video Streaming Monitoring in the Next Generation Networks
205
6. Recommendation Y.1541. Network performance objectives for IP – based services. Geneva, ITU-T (February 2006) 7. Recommendation Q.752. Monitoring and measurements for Signaling System №7 networks. Geneva, ITU-T (June 1997) 8. Recommendation Y.1901. Requirements for the support of IPTV services. Geneva, ITU-T (January 2009) 9. Recommendation Y.1910. IPTV functional architecture, Geneva, ITU-T (August 2008) 10. Willinger, W., Taggu, M., Sherman, R., Wilson, D.: Self similarity through high variability. IEEE/ACM Transactions on Networking 5(1) (1997) 11. Rezaul, K.M., Pakstas, A., Gilchrist, R., Chen, T.M.: HEAF: A Novel Estimator for Long – Range Dependent Self – similar Network Traffic. In: Koucheryavy, Y., Harju, J., Iversen, V.B. (eds.) NEW2AN 2006. LNCS, vol. 4003, pp. 34–45. Springer, Heidelberg (2006) 12. Frank, H.P., Fitrek, H.P.: Video and Audio Measurements for Pre-encoded Content, http://trece.Kom.aau.dk 13. Lazaris, A., Koutsakis, P., Paterakis, M.: On Modelling Video Traffic from Multiplexed MPEG-4 Videoconference Streams. In: Koucheryavy, Y., Harju, J., Iversen, V.B. (eds.) NEW2AN 2006. LNCS, vol. 4003, pp. 46–57. Springer, Heidelberg (2006) 14. Riedi, R.H., Crouse, M.S., Ribero, U.J., Baraniuk, R.G.: A Multifractal Wavelet Model with Application on Network Traffic. IEEE Transaction on Information Theory 45(3) (April 1999) 15. http://www-dsp.rice.edu/software/multifractal-wavelet-model 16. Vasiliev, A., Koucheryavy, A., Lee, K.O.: Methods of Testing the NGN Technical Facilities. In: International Conference on Advanced Communication Technologies (ICACT 2005). Proceedings, Phoemix Park, Korea, February 21-23 (2005)
The Poisson Cluster Process Runs as a Model for the Internet Traffic Jerzy Martyna Institute of Computer Science, Jagiellonian University, ul. Lojasiewicza 6, 30-348 Cracow, Poland
[email protected] Abstract. In this paper, we investigated a Poisson cluster process as runs of packet arrivals on the Internet. This model assumes that the Poisson cluster process is characterized by runs of packets which correspond to defined clusters in the Poisson process. Using the form of the length runs we studied the probability of a general number of cluster runs in the data stream. We illustrated how the obtained results can be used for the analysis of the real-life Internet traffic.
1
Introduction
Mathematical modeling of the network traffic constitutes only an excerpt of computer or telecommunication network analysis. Nevertheless, it is treated as an essential part of communication theory because it decides about the accuracy of the whole model and influences on the solution. A traditional approach to the inter-arrival times of packets with a distribution of packet lengths is based on the queueing theory. Despite them, as has been shown in many papers about the traffic measurements [2], [3], the structure of broadband traffic has some properties which cannot be explained by this theory. It is associated with their two characteristic possessions, namely the heavy tails (see Fig. 1) and a Long Range Dependence (LRD). It is caused by so-called far-reaching random variables which possess a large value of the coefficient of variation, c = σti . It is defined as a quotient of a variation and a mean value of the observation time. Traditional queueing network models do not explain such properties [16]. Also the so-called ON/OFF model [9], [12], which is a Markov Modulated Poisson Process, cannot explain those properties. Recently, in some papers a new traffic model for computer and telecommunication networks has been suggested which can be employed to explain the existence of heavy tails and the LRD in traffic measurements. It allows for modeling of the IP packet traffic on the Internet. It is a model of the Poisson cluster process introduced by Barlett [1], and further studied in the works [4], [18] or branching Poisson process [15], [5]. The Poisson cluster process was applied, among others, for the modeling of broadband traffics in the backbone networks [10], [11], [17], [14], [7]. S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 206–216, 2009. c Springer-Verlag Berlin Heidelberg 2009
The Poisson Cluster Process Runs as a Model
207
0
Log10 (1 − F (x))
P areto(α = 0.8) P areto(α = 1.6)
-2
-4
-6 Exp(λ = 1.2)
0
10
20 30 x Fig. 1. Tail on exponential distribution and “heavy-tail” on so-called Pareto distribution
One of the most interesting methods of network traffic measurements, needed for effective network traffic management, network performance evaluation, detection of anomalous network events, etc., are runs. A flow in the network has two-runs when two consecutive samples belong to the same flow. The issue of the runs theory was undertaken in the work [8]. Runs based traffic estimator for all the flows in backbone networks was presented in the paper [13]. An approach based on sampling two-runs to estimate per flow traffic was given by Fang-Hao et al. [6].This method leads to significantly smaller memory requirement compared to random sampling schemes. Main goal of this paper is to introduce the runs in the Poisson cluster process and obtain the probability function of general cluster runs on real-life Internet. The given example illustrates of our approach. The paper is organized as follows: In section 2 we give some of the basic properties of the Poisson cluster process. Section 3 is the main part of the paper, where the runs of Poisson cluster processes are defined and their properties are given. In section 4 the fit of our approach to the data is examined. We conclude in section 5.
2
The Poisson Cluster Process as a Model of IP Packet Arrivals in the Internet Traffic
The model of IP packet arrivals presented below is fully intuitive. It contains a comprehensible and imaginable description of events concerning the irregular data flow in the data stream of broadband networks. The below results follow from [1], [4], [7]. Assume that the first packet in each cluster arrives at the moment Γ with the arrival rate λ of the Poisson arrival process in R. Each cluster consists of several packets which arrive at the moment Yn = Γj + Sjk . Then for each j we have
208
J. Martyna
Sjk =
k
Xji ,
0 ≤ k ≤ Kj
(1)
i=1
where Xn are the independent identically distributed random variables, kj are the integer independent identically random variables. We assume that Γj , Xji , Kj are mutually independent. For so defined Poisson process we can give, among others, its expectation, variance and covariance values. We first consider the following theorem. Theorem 1. Assuming that EK < ∞ we obtain EN (t) = λ · t(EK + 1),
t≥0
(2)
Proof. Let for each k = 0, 1, . . . the stationary point process N (k) be the k-th point in the cluster. Thus, N (k) possesses points Yjk , j ∈ Z such that Kj ≥ k. We can conclude by properties of the Poisson process that each N (k) is a homo∞ geneous Poisson process on R with the rate λP (K ≥ k). Since N = k=0 N (k) , we can obtain EN (t) =
∞
EN
(k)
(t) =
k=0
∞
λP (K ≥ k)t = λ · t(EK + 1)
(3)
k
This completes the proof. We see that the mean of N (t) does not depend on the distribution of X. To calculate the covariance of the Poisson cluster process we consider the number of packet arrivals in the time interval (a, b] for a < b and N (t) = N (0, t), t > 0. Thus, we can get the following theorem Theorem 2. Let us assume that EK < ∞. For any interval ∞ < a < b < ∞ we have EN 2 (a, b] < ∞ (4) The proof of the theorem is in the paper of [7]. A study of the structure of covariance of a Poisson cluster process allows us to answer how tails of distribution function in a finite time interval. We assume that the Poisson cluster process is a stationary process N (h, h + 1) for ∀h = 0, 1, . . ., where h, h + 1 are the neighbour intervals within the unit length. Let γN (h) = cov(N (0, 1], N (h, h + 1]) be a covariance function for this process. In order to study of the covariance function, we must analyse the convergence of the integral ∞
γN (h)dh
(5)
if and only if EK 2 < ∞
(6)
0
We can get the following theorems: Theorem 3 0
∞
γN (h)dh < ∞
The Poisson Cluster Process Runs as a Model
209
The proof of this theorem can be found in the paper [7]. If EK 2 = ∞ then the covariance function depends on the tail of the distribution function of clusters K and of the interraival times of cluster arrivals. Firstly, we assume that EK 2 < ∞. Thus, the central limit theorem is satisfied. Theorem 4. Let us assume that EK 2 < ∞. Thus N satisfies the central limit theorem N (rt) − λrt(EK + 1) , 0 ≤ t ≤ 1 ⇒ (B(t), 0 ≤ t ≤ 1), if r → ∞ (7) λrE[(K + 1)2 ] for random variables under the distribution with finite dimensions, where B(t), (0 ≤ t ≤ 1) is a standard Brown motion. The above theorem was proved by Daley in the paper [4] for the general Poisson cluster process. On the other hand, if EK 2 = ∞ then the tail of the distribution function of the cluster has a limitation. It is a regular variation of the tail associated with the stable limitation. We can get the following theorems Theorem 5. Assume that P (K > k) is regularly varied according to the index α ∈ (1, 2). Let EX < ∞. Thus, N satisfies the central limit theorem. N (rt) − λrt(EK + 1) , 0 ≤ t ≤ 1 ⇒ (Lα (t), 0 ≤ t ≤ 1), if r → ∞ (8) θ(r) and converges for the distribution with finite dimensions, where θ : (0, ∞) → (0, ∞) be an infinitely divisible function such that lim rP (k > θ(τ )) = 1
(9)
r→∞
and Lα (t), (0 ≤ t ≤ 1) is a spectral positive α-stable L´evy motion with Lα (1) ≈ Sα (σalpha , 1, 0) with σα > 0. The proof of this theorem is outlined in the paper [7].
3
Finding the Probability Function of a General Number of Runs of Poisson Cluster Processes
Let a single cluster consist of several packets which arrive at moments Yjk = Γj + Sjk , where for each j we can get a sum Sjk =
k
Xji ,
0 ≤ k ≤ Kj
(10)
i=1
Let us assume that Xji , j = 1, 2, . . . , n is an independent random variable under the distribution function P (Xji = a1 ) = p,
P (Xji = a2 ) = 1 − p
for i = 1, . . . , k
(11)
210
J. Martyna
We can take samples (X1i , X2i , . . . , Xni ). Each of them consists of a clusters a1 and a2 arriving in a specified succession. We can give the following definition of a cluster runs: Definition 1. Sequence Kj,i , Kj+1,i , . . . , Kj+l,i , j = 1, 2, . . . , n, l = 0, 1, . . . , n − j, i = 1, 2, . . . , k is a cluster runs if Xj−1,i = Xj,i = Xj+1,i = . . . , Xj+l,i = Xj+l+1,i
(12)
Number l + 1 is termed as a length of a cluster runs. To find a probability function of the random variables introduced here, we can apply function G(y1 , y2 ) which is defined as follows: ⎧ ⎨ 0 if | y1 − y2 |> 1 G(y1 , y2 ) = 1 if | y1 − y2 |= 1 (13) ⎩ 2 if y1 = y2 Function G(y1 , y2 ) defines a number in which we can obtain y1 runs of clusters a1 and y2 runs of clusters a2 . Let P (y1j , y2j , m1 , m2 ) be a probability function of a random variable (Y11 , . . . , Y1m1 , Y21 , . . . , Y2m2 , M1 , M2 ). Then, we can obtain the probability P (y1j , y2j , m1 , m2 ) = P (Y11 = y11 , . . . , Y1m1 , Y21 , . . . , Y2m2 , M1 = m1 , M2 = m2 ) = P (Y11 = y11 , . . . , Y1m1 = y1m1 , Y21 = y21 , . . . , Y2m2 , M1 = m1 , M2 = m2 ) ×P (M1 = m1 , M2 = m2 )
(14)
Since the independent random variables Yk have the same (identical) distribution, then for data m1 , m2 each cluster sequence of a1 and of a2 is probably similar. We can see that yi (i = 1, 2) of a cluster runs ai , from which yi1 has length 1, yi2 has length 2, . . ., kmi of a cluster runs has length mi , can be obtained in a number of ways: yi ! yi1 !yi2 ! . . . yimi !
(15)
Taking into considerations function G(y1 , y2 ) we determine a number of ways to obtain cluster runs y1 of a1 of which runs y11 has length 1, . . ., cluster runs y1m1 has length m1 and cluster runs y2 of a2 of which y21 has a length 1, . . . , y2m2 has length m2 . Thus, a number of ways to obtain a given number of cluster runs is given by y1 ! y2 ! · G(y1 , y2 ) (16) y11 !y12 ! . . . y1m1 ! y21 !y22 ! . . . y2m2 ! With the use of the above dependence, we can present Eq. (14) as follows: P (y1j , y2j , m1 , m2 ) =
y1 ! y2 ! · G(y1 , y2 )pm1 (1 − p)m2 y11 !y12 ! . . . y1m1 ! y21 !y22 ! . . . y2m2 ! (17)
The Poisson Cluster Process Runs as a Model
211
Let Yij be a number of clusters in a runs with length j, where i = 1 or 2 and Mi (i = 1, 2) indicates a number of packets in cluster ai . Thus, we have M1 +M2 = m and jYij = Mi , i = 1, 2 (18) j
Let Yi , i = 1, 2 indicate the number of cluster runs with ai , and Y be a general number of cluster runs. Thus, we have Yi = Yij , i = 1, 2 (19) i
Y = Y1 + Y2
(20)
We have two cases here: a) y is an even number. Thus, we have y1 = y2 = 12 y. A number of ways in which we can obtain cluster runs y1 with clusters a1 and cluster runs y2 with clusters a2 is equal to G( 12 y, 12 y) = 2. Then 1 1 m1 − 1 m2 − 1 P (Y = y) = P ( y, y, m1 , m2 ) = 2 1 pm1 (1 − p)m2 (21) 1 2 2 2y − 1 2y − 1 b) y is an odd number. Thus, y1 = 12 (y − 1) and y2 = 12 (y + 1) or y1 = 12 (y + 1) and y2 = 12 (y − 1). Since the number of ways to obtain cluster runs y1 with a1 and cluster runs y2 with a2 is equal to G(y1 , y2 ) = 1. Then 1 1 1 1 P (Y = y) = P (y − 1), (y + 1), m1 , m2 + P (y + 1), (y − 1), m1 , m2 2 2 2 2
m1 − 1 m2 − 1 m −1 m2 − 1 = + 1 1 pm1 (1 − p)m2 (22) 1 1 1 (y − 3) (y − 1) (y − 1) (y − 3) 2 2 2 2 Transforming Eqs. (21) and (22) we can obtain the conditional probabilities of random variables Yi under condition that M1 = m1 and M2 = m2 . m1 − 1 m2 + 1 m P (Y1 = y1 | M1 = m1 , M2 = m2 ) = / (23) y1 − 1 y1 m1 P (Y2 = y2 | M1 = m1 , M2 = m2 ) =
m1 + 1 y2
m2 − 1 y2 − 1
m / m2
(24)
The means of random variables of cluster runs we can compute with the use of Eqs. (22) and (23), namely m1 − 1 m2 + 1 m / y1 − 1 y1 m1 y1 m1 m1 − 1 m2 m = (m2 + 1) / y1 − 1 y1 − 1 m1
E(Y1 | M1 =m1 , M2 = m2 ) =
y1
m1
y1
(25)
212
J. Martyna
Using the Newton formula we can the above get dependencies in a simple form (m2 + 1)(m − 1)! (m2 + 1)m1 m E(Y1 | M1 = m1 , M2 = m2 ) = / = (26) m1 (m1 − 1)!m2 m Analogously, we obtain E(Y2 | M1 = m1 , M2 = m2 ) =
(m1 + 1)m2 m
(27)
To find the conditional variation var(Y1 | M1 = m1 , M2 = m2 ) of cluster runs we can compute E[Y1 (Y1 − 1) | M1 = m1 , M2 = m2 ]. Thus, with the use of the dependencies (23) and (24) we can have E[Y1 (Y1 − 1) | M1 = m1 , M2 = m2 ] m1 m1 − 1 m2 − 1 m = m2 (m2 + 1) / (28) y1 − 1 y1 − 2 m1 y1 =1
Using an expansion of the Newton identity (1 + x)n2 −1 (1 +
1 n1 −1 (1 + x)n1 +n2 −2 ) ≡ x xn1 −1
and comparing the coefficient of x−1 , we obtain m1 m1 − 1 m2 − 1 m1 + m2 − 2 = y1 − 1 y1 − 2 m1 − 2
(29)
y1 =1
From the above equation and Eq. (22) we get var (Y1 | M1 = m1 , M2 = m2 ) =
m1 (m1 − 1)m2 (m2 + 1) (m − 1)m2
(30)
m1 (m1 + 1)m2 (m2 − 1) (m − 1)m2
(31)
and in analogy var (Y2 | M1 = m1 , M2 = m2 ) = 3.1
The Limiting Distribution of Runs of Poisson Clusters
From the Eqs. (22), (23), and (24) we can obtain the conditional probability functions of random variables Yi under condition that M1 = m1 and M2 = m2 . We have m1 − 1 m2 + 1 m P (Y1 = y1 | M1 = m1 , M2 = m2 ) = / (32) y1 − 1 y1 m1 m1 + 1 m2 − 1 m P (Y2 = y2 | M1 = m1 , M2 = m2 ) = / (33) y2 y2 − 1 m2
The Poisson Cluster Process Runs as a Model
For an even and odd number of y, we obtain, respectively m −1 m2 − 1 m P (Y = y | M1 = m1 , M2 = m2 ) = 2 1 1 / 1 m1 2y − 1 2y − 1 P (Y = y | M1 = m1 , M2 = m2 ) =
m1 − 1 m2 − 1 m1 m2 − 1 m + 1 / 1 1 1 m1 2 (y − 3) 2 (y − 1) 2 (y − 1) 2 (y − 3)
213
(34)
(35)
In 1940, Wald and Wolfowitz [19] proved that the conditional distribution 2m1 4αm1 given by Eqs. (34) and (35) is asymptotic normal N ( 1+α ; (1+α)3 ) for m1 = αm2 (α > 0) and m1 → ∞. We can generalize the above given results. Let Rij be a number of runs with the Poisson clusters not less than j (i = 1, 2; j = 1, 2, . . . , mi ). Let Rj = R1j + R2j be a general number of runs with the length not less than j. Thus, we obtain the conditional probability m2 + 1 P (R1j | M1 , M2 = m2 ) = pm1 (1 − p)m2 (36) m1 Analogous, we may get the conditional probability m1 + 1 P (R2j | M1 = m1 , M2 = m2 ) = pm2 (1 − p)m1 m2
(37)
Thus, the mean values of random variables of cluster runs are equal to (j)
E(R1j | M1 = m1 , M2 = m2 ) =
m1 (m2 + 1) , j = 1, 2, . . . , m1 m(j)
E(R2j | M1 = m1 , M2 = m2 ) =
m2 (m1 + 1) , j = 1, 2, . . . , m2 m(j)
(38)
(j)
(39)
For m1 = m2 = 12 m we obtain E(Rij | M1 = M2 = and E(Rj | M1 = M2 =
4
m m ) ≈ j+1 2 2
(40)
m m )≈ j 2 2
(41)
Experimental Results
In this section we examine in more detail the runs in the Poisson cluster process. The data sets, which consist of the times of packet arrivals to a server on the Internet at the University of North Carolina, were obtained on Sunday, April 20, 2003. The data are accessible from the WWW address: http://www-dirt.cs. unc.edu/ts len/2003 Apr 20 Sun 1900.ts len.gz.
214
J. Martyna
Fig. 2. The natural logarithm of the empirical distribution function log(F n ) (the continuous line) and the fitted regression line for the points where t ∈ [0.4×10−3 , 0.7×10−3 ] (the dotted line) [7]
The format of the data set has two columns: the packet arrival time stamp and the packet length. The length of the data set concerns only 245 seconds. Calculation of λ. For the estimation of λ we use a method presented in the paper [7]. To obtain the tail of the distribution function of the interarrival times of process N under the Palm distribution we can use the expression log(F 0 (t)) ∼ −λt,
t→∞
(42)
where F 0 defines the right tail of distribution F0 of the inter-arrival times under the Palm distribution. Fn can be calculated from the sample of the interval times Ti − Ti−1 . This sequence under the Palm distribution consists of the ergodic process and thus supx | Fn (x) − F0 (x) |→ 0 is satisfied. For the large value of t use the linear regression for log(F ) can be used. In Fig. 2 the empirical distribution function Fn (t) is presented which is fitted by the dotted line of regression line. It can be seen that for the interval t ∈ [0.4 × 10−3, 0.7 × 10−3] the right tail of the distribution function is well fitted by a regression line. For the t > 0.7 × 10−3 the so determined regression line is not a good approximation of the empirical distribution function. The negative of the slope of this regression line allows us to estimate λ, which is equal to 1.837 here. Estimation of the mean number of packets in a cluster. To estimate the mean number of packets in a cluster we can use Eq. (2) or Theorem 1.
The Poisson Cluster Process Runs as a Model
215
With the use of dependence λ(EK + 1) ≈ N (t)/T we can compute N (t)/T = 10 × 106/245s ≈ 40816, and further for λ = 1.837 we obtain EK = 22217. Thus, the mean value of packets in a cluster is given by 450. The study of runs. In a sample consisting eight clusters we have the data bytes in succession as follows: 121185, 115293, 117354, 116937, 122053, 117655, 117234, 116843 To be rounded off to 2 thousand data bytes this sample can be given in the form: m = 8, m1 = 6, m2 = 2, a1 , a2 , a1 , a1 , a2 , a1 , a1 , a1 We have here the variables as follows: y11 = 1, y21 = 1, y12 = 1, y13 = 1, y1 = y2 = 2, y = 4. In other words, we have one cluster runs a1 with length 1, two cluster runs a2 with length 1, one cluster runs a1 with length 2 and one cluster runs a1 with length 3. With the use of Eq. (17), we can compute the probability of arrivals of cluster runs a1 and a2 for which the random variables y1 , y2 , m1 , m2 have the values observed in a sample, namely P (y1j , y2j , m1 , m2 ) =
y1 ! y2 ! · · G(y1 y2 )pm1 (1 − p)m2 y11 !y12 ! . . . y1m1 ! y21 !y22 ! . . . y2m2 !
= 3.8 × 10−3 From Borels law of large numbers it appears that in large runs of independent samples of 8 elements we have in approximation 22 samples in a hundred samples which consists of 6 clusters of runs a1 and 2 clusters of runs a2 in accordance with the values of the observed data sets. The null hypothesis H0 testing by use the runs. Let hypothesis H0 denote that the sample is random. The task is to decide whether the sample evidence better supports H0 (decision to ”retain H0 ”) or the alternative hypothesis H1 (decision to ”reject H0 ”). We verify it at the level α = 0.005. From the Eqs. (21) and (22) we obtain P (y = 1) = 0, P (y = 2) = 0, P (y = 3) = 1.8 · 10−9 , P (y = 4) = 4.56 · 10−7 Thus, P (y ≤ 4) = 0.0937. It means that the null hypothesis H0 cannot be rejected.
5
Conclusion
The Poisson cluster process is a new model of data flow in the computer and telecommunication networks. Its aim is to describing both the flow arrival process and the dependencies between several flows. It is fully intuitive, specially in the modeling of computer networks. In the Poisson cluster process the packet arrivals process forms clusters. Moreover, as was shown in the previous sections, these clusters can create a run. With the use of the given dependencies we can catch the succession of particular clusters. Thereby, we can both better understand the data flows and predict their unexpected changes in the communication links.
216
J. Martyna
References [1] Barlett, M.S.: The Spectral Analysis of Point Processes. Journal of Royal Statistical Society, Series B 25, 264–296 (1963) [2] Crovella, M., Bestavros, A.: Self-similarity in World Wide Web Traffic: Evidence and Possible Causes. In: Proc. of the 1996 ACM SIGMETRICS Int. Conf. on Measurement and Modeling of Computer Systems, vol. 24, pp. 160–169 (1996) [3] Crovella, M., Bestavros, A., Taqqu, M.S.: Heavy-Tailed Probability Distributions in the World Wide Web (1996) [4] Daley, D.: Asymptotic Properties of Stationary Point Processes with Generalized Clusters. Zeitschrift f¨ ur Wahrscheinlichkeistheorie und verwandte Gebiete 21, 65–76 (1972) [5] Daley, D., Vere-Jones, D.: An Introduction to the Theory of Point Processes. Springer, Heidelberg (1988) [6] Fang-Hao, K.M., Lekshman, T.W., Mohanty, S.: Fast, Memory Efficient Flow Rate Estimation Using Runs. IEEE/ACM Transactions on Networking 15(6), 1467– 1477 (2007) [7] Fa¨ y, G., Gonzales-Ar´elo, B., Mikosh, T., Samorodinsky, G.: Modeling Teletraffic Arrivals by a Poisson Cluster Process. Queueing Systems 54, 121–140 (2006) [8] Fisz, M.: Probability Theory and Mathematical Statistics. John Wiley & Sons, Chichester (1963) [9] Heath, D., Resnick, S., Samorodnitsky, G.: Heavy Tails and Long Range Dependences in ON/OFF Processes and Associated Fluid Models. Mathematical Operation Research 23, 145–165 (1998) [10] Hohn, N., Veitch, D.: Inverting Sampled Traffic. In: ACM SIGCOMM Internet Measurement Conference, pp. 222–233 (2003) [11] Hohn, N., Veitch, D., Abry, P.: Cluster Processes: A Natural Language for Network Traffic. IEEE Trans. on Signal Processing 51, 2229–2244 (2003) [12] Kulkarni, L.A.: Transient Behaviour of Queueing Systems with Correlated Traffic. Performance Evaluation 27-28, 117–146 (1996) [13] Kodialam, M., Lakshman, T.V., Mohanty, S.: Runs Based Traffic Estimator (RATE): A Simple, Memory Efficient Scheme for Per-Flow Rate Estimation. In: IEEE INFOCOM, vol. 3, pp. 1808–1818 (2004) [14] Latouche, G., Remiche, M.-A.: An MAP-Based Poisson Cluster Model for Web Traffic. Performance Evaluation 49(1), 359–370 (2002) [15] Lewis, P.A.W.: A Branching Poisson Process Model for the Analysis of Computer Failure Patterns. Journal of Royal Statistic Society, Series B 26, 398–456 (1964) [16] Paxson, V., Floyd, S.: Wide-Area Traffic: the Failure of Poisson Modeling. IEEE/ACM Trans. on Networking 3(3), 226–244 (1995) [17] Sohraby, K.: Delay Analysis of a Single Server Queue with Poisson Cluster Arrival Process Arising in ATM Networks. In: IEEE Global Telecommunication Conference, GLOBECOM 1989, vol. 1, pp. 611–616 (1989) [18] Westcott, M.: On Existence and Mixing Results for Cluster Point Processes. Journal of Royal Statistics Society, Series B 33, 290–300 (1971) [19] Wald, A., Wolfowitz, J.: On a Test Whether Two Samples are From the Same Population, vol. 11. AMS (1940)
Proactive Peer-to-Peer Traffic Control When Delivering Large Amounts of Content within a Large-Scale Organization Chih-Chin Liang1, Chia-Hung Wang2, Hsing Luh2, Ping-Yu Hsu3, and Wuyi Yue4 1
Department of Business Administration, National Formosa University, No.64, Wunhua Rd., Huwei Township, Yunlin County 632, Taiwan (R.O.C.)
[email protected] http://researcher.nsc.gov.tw/lgcwow 2 Department of Mathematical Sciences, National Chengchi University, No.64, Sec. 2, Jhihnan Rd., Wen-Shan District, Taipei City, Taiwan 116, R.O.C. {93751502,slu}@nccu.edu.tw 3 Department of Management, National Central University, No. 300, Jung-da Rd., Jung-li City, Taoyuan, Taiwan 320, R.O.C.
[email protected] 4 Department of Intelligence and Informatics, Konan University, Kobe 658-8501, Japan
[email protected] Abstract. The Peer-to-Peer (P2P) approaches utilize the passive means of delivering large amounts of content due to the ease of content delivery and small amount of effort required for file transmission and failure control by the file provider. However, P2P methods incur network congestion due to the large number of uncontrolled connections established by those retrieving content. Additionally, P2P methods potentially stop file providers from exchanging data with other devices because of congested outbound traffic. Therefore, existing P2P approaches are infeasible for large-scale organizations as a congested uncontrolled network is not ideal in a business environment. Related studies demonstrated that the active method for content can minimize outbound congestion for a provider but it needs the fault tolerance. This study presents a novel application called Enterprise Peer-to-Peer (EP2P), which integrates passive and active approaches. Using EP2P, the outbound traffic from a sender on an Intranet can be controlled efficiently, because the transmission between a provider and many receivers can be changed from passive to active on necessary. Therefore, we suggest organizations use EP2P for the delivery of content. Keywords: peer-to-peer, content delivery, enterprise communication.
1 Introduction Although P2P applications, such as BitTorrent (BT), are extensively adopted to transmit content between peers popularly, organizations do not apply P2P applications for content delivery [1], [2]. Notably, P2P applications are unsuitable for organizations because they consume considerable network bandwidth [3]. S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 217–228, 2009. © Springer-Verlag Berlin Heidelberg 2009
218
C.-C. Liang et al.
P2P applications are efficient for distribution of large files as they divide files into chunks or pieces and transmit each piece individually among those retrieving the files. That is, a retriever can obtain parts of a file without waiting for a busy file provider that is delivering a complete file actively at one time. Modern P2P applications transfer each part of a file that can be shared among peers individually through a passive sharing method. Passive methods allow receivers to retrieve a file; however, such pull-based methods occupy sender outbound traffic (In average, one P2P client can occupy over 55% of sender traffic [4]). Thus, a sender is then unable to share content among peers when outbound traffic is congested. Existing P2P methods use the same solution for this problem; that is, to limit the number of available connections of a provider. However, as a complete solution, this approach is inadequate. Available network connections are still easily exhausted as the number of requests by peers attempting to retrieve files is excessive. Therefore, those retrieving files from a provider have difficulty due to congested traffic. In short, a congested network stops nodes from retrieving content [5]–[12]. Therefore, a mechanism that controls network usage is needed. Distribution of large files among staff in a large-scale company using network carefully is important, but difficult to implement due to the complex Intranet environment [12]–[16], that consists of different network segments connected via various bandwidths [16]. Such complex environments can result in delayed transmission of content and even loss during transmission when using inappropriate content delivery approaches. For example, a retriever may have difficulty retrieving content from a sender with a small bandwidth due to excessive outbound traffic from the sender [18]–[24]. Although P2P applications can share sender’s traffic, no companies currently use P2P applications as distribution approaches because of uncontrolled network usage. Once P2P methods exhaust network resources, numerous applications on the same Intranet will not be able to communicate each other. On the contrary of passive method is the active method. Using active method, the packets can be delivered by the sender actively. Once an active method is used to deliver packets within an Intranet network, network congestion caused by passive methods can be eliminated. However, because many Intranet applications must operate without delays, using an active method to deliver packets is also infeasible. To use P2P application within a company, this work presents a novel hybrid application called EP2P that combines both passive and active approaches [25], [26]. A company can share content among peers with the smallest load for the file owner through the passive P2P method. Through EP2P, the passive file delivery process controlled by a retriever can change to active method controlled by the sender on necessary to make outbound connections controllable. Finally, based on performance and reliability results, we suggest organizations consider utilizing EP2P for content delivery. The remainder of this paper is organized as follows. Section 2 investigates work related to content delivery. In Section 3, the EP2P mechanism is described. In Section 4, the proposed application is validated. Conclusions are finally drawn in Section 5, along with recommendations for future research.
Proactive P2P Traffic Control When Delivering Large Amounts of Content
219
2 Literature Review A file is transferred by a traditional content delivery method means a server provides files that can be retrieved from the server by all clients. Basically, all clients (peers) must send a request to the file server to ask retrieving file A through a passive method firstly (The time to get a request from the sender through a passive delivery method is passive denoted as Trequest ). Once the provider is available to send file and gets one request from a client, the client will owns the right to get file A. Therefore, the client can retrieve the file (The time to get file through a passive delivery method is denoted as passive Ttransmission ). However, this traditional content delivery method is inefficient, because too many retrievers are fighting for the right to get their file from the server. Therefore, the time spent retrieving a large file using the traditional approach will be prolonged until the file has been retrieved by all clients. To resolve above problem, the modern P2P application is thereby invented [27]. A file will be retrieved successfully after all parts of the file are collected and merged [27]–[30]. That is, each peer could be the file provider and file retriever depended on the transmission behavior. However, the network traffic is busy because of the passive content-delivering method [27]. However, such a passive content delivery approach (including the modern P2P application) exhaust the outbound network bandwidth of a sender easily when too many retrievers get a large file from the sender concurrently [2], [3], [10]. Additionally, the content delivery within an organization considers not only receiving completed content but also consuming network resources efficiently [13]. There are two solutions about reducing the sender load of traffic: cost-benefit approach and network-control approach [31]–[33]. Because P2P networks are highly dependent on file-owners’ cooperation, the so-called cost-benefit approach is related to the cost of getting privilege to retrieve files from other owners. The network-control solution is used to manage the network usage of a node. That is, all existing solutions are trying to control the usage of provider’s network connections [32], [33]. However, the above solutions are insufficient for the sender’s outbound congestion, because too many requests still cause the outbound congestion of a sender. That is, the above problem is caused by the retrieving (passive) behavior obviously. Therefore, we might use an active content delivery method to transfer items actively. This is the so-called push-based method [2], [3]. Once a file needs to be delivered from the sender to a receiver (peer), the sender must sends a request to the retriever to get the response back that it is available to receive the file from sender (The time to send and to get a request back to a retriever active ). After a receiver sends back the information like “it is available is denoted as Trequest to receive packets,” the sender will deliver the file to the receiver actively (The time active to send file is denoted as Ttransmission ). However, because of the unmanaged delivering packets through active method, we must consider using a fault-tolerant mechanism to manage packets on transmission [14], [23], [24].
3 The EP2P Mechanism This work proposes an enhanced P2P mechanism, named EP2P employing both passive and active methods to deliver large amounts of content within an organization.
220
C.-C. Liang et al.
3.1 Mechanism Design To illustrate the mechanism, we define the exchange information. The exchange information is sent between each other, which is composed of file names, sender names, file status, retrieving parts, completed parts, receiver names, transmission method, file sizes, and transmission status. The “file name” is utilized for identifying the shared content. The “sender name” and the “receiver name” represent the role of peers. The “file status” is used for each peer to understand the owned parts of the shared file from the sender. Because the sharing file using an EP2P mechanism must be divided into chunks for speeding up the file transmission, separate management of each part is needed [11], [25]. The “retrieving part” indicates the parts of the sharing file which are currently being delivering. The “completed parts” show the parts of the shared file are owned by the sender. Because a sender does not always have the completed file, the “file status” indicates the parts owned by the sender. The “transmission method” is utilized for recognizing the content delivery method between a sender and a receiver. The “file size” is used for verifying the accuracy of each received part. The “transmission status” indicates the situation of a currently transferring part. The above information owned by a peer must be shared and update with its neighboring peers. Using an EP2P mechanism, the passive method is the default method to transfer files. 3.2 Passive Approach of EP2P The EP2P mechanism is launched whenever a file needs to be transmitted. For example, Peer A is a sender, peer B, which is one of the neighboring peers of peer A, is a receiver, and File I is the file needs to be transmitted. Peer A shares file status information described in Table 1 to neighboring peers after peer A divides File I into parts. Peer A records itself as “sender name,” the owned parts of File I as “file status,” the name of File I as “file name,” “passive mechanism” as “transmission method,” and “ready for retrieving file” as “transmission status”. Because peer A is the starting node, the owned parts of File I of peer A are completed. After peer B receives the above information from peer A, peer B records itself as “receiver name,” and records a divided part of File I as the “retrieving part,” then passes these modifications back to peer A. A procedure which uses a passive approach is therefore launched. Peer B retrieves “part 1 of File I” from peer A. Whenever “part 1of File I” is transferred successfully by peer B, the “retrieving part” is changed to “part 2 of File I” and the recording of part 1 into “completed parts” by peer B. These modifications are then passed on to neighboring peers including peer A. The retrieving procedure of “part 2 of File I” from peer A is started by peer B after the above modifications are transferred to peer A completely. Once peer A is unavailable to share File 1, peer B will retrieve “part 2 of File I” from other peers own the part. The “transmission status” will change to “successful transmission” whenever File I is transferred completely. 3.3 Active Approach of EP2P Once the network is congested because of the passive delivery, EP2P can change the transmission method to active approach. The procedure of the active approach of
Proactive P2P Traffic Control When Delivering Large Amounts of Content
221
EP2P is described as follows. The transmission method is changed from a passive method to an active one, whenever peer C needs to connect to peer A for retrieving “part 3 of File I,” but there are too many connections established by receivers using the outbound bandwidth of peer A for the available to transfer of “part 3 of File I” through passive method. The “transmission method” is changed to “active method” by peer A. Peer A then sends this modification to peer C. The passive method transmission of the “part 3 of File I” between peer A and peer C stops. Peer C then waits for “part 3 of File I” sent by Peer A. Peer A sends the needed parts to peer C when the network is available to transfer content. Peer A changes the “transmission status” to “ready for sending file” to peer C. Peer C sends “ready for receiving file” to peer A, when peer C is available to receive the content. Peer A then sends “part 3 of File I” to Peer C. Whenever the size of the received part is the same as the “file size,” Peer C sends “successful transmission” to Peer A. In addition, in the EP2P mechanism, if peer A malfunctions, one of the neighboring peers of peer A other than peer C, sends the request part to peer C actively. The push-base method is finished. 3.4 Modeling To understand the performance of this proposed approach, this work models EP2P as follows. The average packet size per file is denoted as δ , and the average transmission rate (delivered packets/unit time) of the sender is denoted as Bs . Notation Bl is the average transmission rate of each link between sender and receivers. The scenario where an active approach is used with fault tolerance mechanisms is illustrated as follows. A receiver can send a message (request) to the sender and ask it to transmit the file again whenever the file size is incorrect. A file has delivered successfully or not is decided by verifying the acknowledgement sent back from the receiver. In case a sender receives no signal from a receiver after sending a file, the sender must resend the file to the receiver. The average packet size per request is denoted as σ . Generally, δ is far larger than σ . Let λ be the average arrival rate of new files. The average time for delivering files via an active mechanism is denoted as T active . The average time T active is the sum of average times for transmitting requests and for sending files. The average time for transmitting requests by an active mechanism can be estimated as follows:
active request
T
⎧ λ Nσ ⎪ B , if 0 < Bl < Bs , ⎪ l ( N , Bl , Bs ) = ⎨ ⎪ λ Nσ , if Bs ≤ Bl ⎪⎩ Bs
and the average transmission time of sending files to all peers is estimated as follows: ⎧ λ Nδ ⎪ B , if 0 < Bl < Bs ⎪ l active ( N , Bl , Bs ) = ⎨ Ttransmission ⎪ λ Nδ , if Bs ≤ Bl ⎪⎩ Bs
.
222
C.-C. Liang et al.
Although using passive methods need small degree of sender effort in dealing with failure control and file delivery, the file transferred through the passive method also needs a mechanism for checking correctness. The correctness of the retrieved content is verified by checking the file size after the process of file transmission is completed. The receiver retrieves the content again whenever the file size is incorrect [23]. This work models the passive transmission method as follows. The average transmission rate (delivered packets/unit time) of receivers is denoted as Br and the total number of receivers (peers) is denoted as N . Notation T passive is the average time for delivering files using a passive mechanism, which is the sum of the average times for transmitting requests and for sending files. The average time for transmitting requests via a passive method can be estimated as follows: ⎧ λσ ⎪B ⎪ l ⎪ λσ ⎪B ⎪⎪ r passive Trequest ( N , Bl , Br ) = ⎨ λσ ⎪B ⎪ r ⎪M ⎪ λσ ⎪ ⎪⎩ Br
where the ceiling function:
⎡N ⎤ ⎢⎢ 2 ⎥⎥
N,
if 0 < Bl < Br
N,
if Br ≤ Bl < 2 Br
, ⎡N ⎤ ⎢⎢ 2 ⎥⎥ , if 2Br ≤ Bl < 3Br M if N ⋅ Br ≤ Bl
,
means the smallest integer is not less than
N 2
. The
average transmission time of receiving all files for all peers is estimated as follows: ⎧ λδ ⎪B ⎪ l ⎪ λδ ⎪B ⎪⎪ r passive Ttransmission ( N , Bl , Br ) = ⎨ λδ ⎪B ⎪ r ⎪M ⎪ λδ ⎪ ⎪⎩ Br
N,
if 0 < Bl < Br
N,
if Br ≤ Bl < 2 Br
⎡N ⎤ ⎢⎢ 2 ⎥⎥ , if 2Br ≤ Bl < 3Br M ,
if N ⋅ Br ≤ Bl
4 Experiment Results To understand the validation and performance, this work presents following the experiment results as follows. 4.1 Validation To understand the performance, this work models the results of transmission time of file delivery through EP2P active and passive approaches [32]. This work illustrates the level of congestion of the network through the relation between Bs , Br and Bl . Bl < min{Bs , Br } shows that the network is heavily congested because the transmission rate of a link is lower than a peer within this proposed environment.
Proactive P2P Traffic Control When Delivering Large Amounts of Content
223
To find the validation, this work assumes the arrival of files follows a Poisson process. In this proposed system, a receiver or a retriever needs to send a message (request) to a file provider in order to make the provider decide to use an active or passive mechanism to deliver content. This work assumes the ability to transmit packets of each peer is the same for all nodes. Additionally, the average transmission rates of senders and receivers/retrievers are the same and equal to Bs . When the network is heavily congested, that is Bl < Bs , the derived average time for delivering files through an active mechanism is equal to that through passive mechanism. From above description, we have Theorem 1.
Theorem 1: If 0 < Bl < Bs , the average time for delivering files through active mechanism is equal to that through passive mechanism. Proof: If 0 < Bl < Bs , then T active active active = Trequest ( N , Bl , Bs ) + Ttransmission ( N , Bl , Bs )
= (σ + δ ) =T
passive request
λN Bl
passive ( N , Bl , Bs ) + Ttransmission ( N , Bl , Bs )
= T passive .
Therefore, when the network is heavily congested, the average time for delivering files through an active mechanism is equal to the average time for delivering files through passive mechanism. □ When the network is not congested, that is Bl ≥ Bs , the derived average time for delivering files through a passive mechanism is less than or equal to that through an active mechanism (Theorem 2). Theorem 2: If Bl ≥ Bs , the average time for delivering files through a passive mechanism is less than or equal to that through an active mechanism.
Proof: If Bl ≥ Bs , then T active active active = Trequest ( N , Bl , Bs ) + Ttransmission ( N , Bl , Bs )
=
λ (σ + δ ) Bs
N
⎧ λ (σ + δ ) N, ⎪ B s ⎪ ⎪ λ (σ + δ ) ⎡ N ⎤ ⎪ ⎢⎢ 2 ⎥⎥ , ≥ ⎨ Bs ⎪M ⎪ ⎪ λ (σ + δ ) , ⎪ B ⎩ s
if Bs ≤ Bl < 2 Bs if 2Bs ≤ Bl < 3Bs M if N ⋅ Bs ≤ Bl
passive passive = Trequest ( N , Bl , Bs ) ( N , Bl , Bs ) + Ttransmission
= T passive
Hence,
T
active
≥T
passive
when Bl ≥ Bs .
224
C.-C. Liang et al.
Therefore, when the network is not heavily congested, the average time for delivering files through a passive mechanism is less than or equal to that through an active mechanism. □ 4.2 Simulation Results The managers who want to use EP2P to deliver content within the enterprise need to know the possible congestion and the load to deliver content for reference. In this work, we use block rate to represent the congestion of the outbound traffic of a sender and the block rate to represent the packets that cannot be obtained by the receiver which is divided by the total packets that is delivered by the sender in a unit of time. Additionally, the increasing sender load to deliver content actively after using EP2P is worthy to discuss, because the load affects the business operation of applications hosted on the sender. To compare the block rate of three methods in a congested network, this investigation uses one provider and one to 30 devices to send (passive method) or receive (active method) packets from the provider through only one outbound connection of the file provider using Promodel™. This work assumes the sender (Bs) and retrievers/receivers (Br) have the same transmission rate to deliver content. The delivered packets of a request for both methods have the same size (1K byte, σ ) and one packet set is delivered at one time. The average delivering packets of a file ( δ ) is set to the unlimited for 1
observing the congestion. The exponential distribution ( E ( λ ) ) is also assumed. The scenario of this simulation for the active method is designed that one provider delivers packets to 1 to 30 devices (N). Each packet is delivered actively to one device at one time (the inter-delivering time follows exponential distribution: E(0.01 minutes)). The simulation for the passive method is designed that there are one to 30 devices retrieve packets from one provider. The schedule of the pull action of one retriever is following exponential distribution ( λ1 is random, ranging from 0.01 minutes to 5 minutes), the mean time between each retrieving action is set as random. That is, the number of collected packets of each device is various. This work simulates each method 20 times. Each actual simulation time is 5 hours. The schedule of retrieving packets of a retriever follows an exponential distribution. Because the EP2P is composed of active and passive approaches, the EP2P method must have the changing policy to decide whether to change the delivery method from passive approach to active approach. Additionally, once a retriever occupied the traffic for longer time than other retrievers, the retriever will collect more packets than others. The congestion is also existed along with the occupation. To release the congestion, this work assumes that the passive mechanism could shift to active method while the largest amount of collected units of a retriever is x times (denote as EP2P(x), x is from two to ten in this simulation) the smallest amount of collected packets of any other retriever, the delivery method will change from passive method to active method. The sender will deliver packets to all receivers except the device which has the largest amount of collected packets. The delivery method will be changed from active method to passive method when the largest amount of collected packets of a device is smaller than x times the smallest amount of collected units of a device.
Proactive P2P Traffic Control When Delivering Large Amounts of Content
225
Fig. 1. Block Rate
The simulation results are as follows (see Figure 1). Using passive/active methods, the block rate is rising along with the increasing number of retrievers/receivers. Additionally, using EP2P, a sender can deliver more packets than using the passive method. That is, using an EP2P is more efficient than a passive P2P approach. The results show that the active method has the better results than other two approaches. However, using the active method merely, the increasing sender load of content delivery will affect others applications hosted on the sender. Therefore, using active method within an enterprise that needs to use computers efficiently is infeasible. The results show that using EP2P approach, the sender load to deliver content actively is decrease with the increasing number of receivers. That is, using EP2P, the more receivers can reduce the more load on content delivery. Additionally, we found that the more receivers the fewer chances to change delivery method from passive approach to active approach. In average, 28.26% packets is using active EP2P method to deliver from one sender to five receivers, and 9.74% packets is using active EP2P method to deliver from one sender to 30 receivers.
5 Conclusions and Remarks The applications hosted on all devices in the enterprise network must be workable, and network resources must be consumed carefully for content delivery within an enterprise. Therefore, selecting a feasible content-delivery approach is important when distributing files within an organization. A passive P2P application can deliver large files efficiently, but consumes vast amounts of network resources. Previous studies demonstrated that adopting an active application can mitigate network congestion because the sender controls connections; however, increasing provider load for
226
C.-C. Liang et al.
content delivery and limits the performance for other back-office applications [1], [2]. Furthermore, an active method needs a particular fault-tolerance mechanism to ensure that files reach their destinations. Thus managing such above methods carefully is important when delivering substantial amounts of content within an enterprise over a complex network. This work presents a novel hybrid P2P approach, the EP2P scheme, which is composed of passive and active methods, for delivering large amounts of content over an Intranet. To evaluate the experimental results, this work models EP2P and simulates its performance. Experimental results show that a sender can deliver a greater amount of packets more efficiently through EP2P to retrievers/receivers than the current P2P method within an Intranet. Additionally, managers need to ensure that Intranet traffic is controllable and existing back-office applications are not affected. By using EP2P, a company can distribute files with minimal sender effort, and network traffic can be controlled. Additionally, the further studies are important that include how to adopt EP2P within an organization and the user satisfaction to EP2P. That is, the steps to apply EP2P to a real company successfully and the feedback from users are important for a manager to verify how to send content within an organization.
Acknowledgement This work was supported in part by GRANT-IN-AID FOR SCIENCE RESEARCH (No. 21500086) and the Hirao Taro Foundation of KUAAR, Japan.
References 1. Agrawal, M., Rao, H.R., Sanders, G.L.: Impact of Mobile Computing Terminals in Police Work. J. Organ. Comput. El Commer. 13, 73–89 (2003) 2. Liang, C.C., Hsu, P.Y., Leu, J.D., Luh, H.: An effective approach for content delivery in an evolving Intranet environment-a case study of the largest telecom company in Taiwan. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 740–749. Springer, Heidelberg (2005) 3. Liang, C.C., Wang, C.H., Luh, H., Hsu, P.Y.: A robust web-based approach for broadcasting downward messages in a large-scale company. In: Aberer, K., Peng, Z., Rundensteiner, E.A., Zhang, Y., Li, X. (eds.) WISE 2006. LNCS, vol. 4255, pp. 222–233. Springer, Heidelberg (2006) 4. Liu, X., Lan, J., Shenoy, P., Ramaritham, K.: Consistency maintenance in dynamic peerto-peer overlay networks. J. Comput. Netw. 50(6) (2006) 5. Azzouna, N.B., Guillemin, F.: Experimental analysis of the impact of peer-to-peer applications on traffic in commercial IP networks. Eur. Trans. Tele. 15, 511–522 (2004) 6. Bharambe, A.R.: Some observations on BitTorrent performance. Perform. Eval. Review 33(1) (2005) 7. Ragab, K., Kaji, N., Horikoshi, Y., Kuriyama, H., Mori, K.: Autonomous decentralized community communication for information dissemination. IEEE Internet Comput. 1089-7801, 29–36 (2004)
Proactive P2P Traffic Control When Delivering Large Amounts of Content
227
8. Turcan, E., Shahmehri, N., Graham, R.L.: Intelligent software delivery using P2P. In: Proc. IEEE Conf. P2P, pp. 1–8 (2002) 9. Lan, J., Liu, X., Shenoy, P., Ramamritham, K.: Consistency maintenance in peer-to-peer sharing network. In: Proc. IEEE Conf. WIAPP, pp. 1–5 (2003) 10. Datta, A., Hauswirth, M., Aberer, K.: “ Beyond “web of trust”: Enabling P2P E-commerce. In: Proc. IEEE Conf. CEC (2003) 11. Liu, X., Lan, J., Shenoy, P., Ramaritham, K.: Consistency maintenance in dynamic peerto-peer overlay network. J. Comput. Netw. 50, 859–876 (2006) 12. Liu, X., Liu, Y., Xiao, L.: Improving query response delivery quality in peer-to-peer systems. IEEE Trans. Para Dist. Syst. 17(11), 1335–1457 (2006) 13. Bar-Noy, A., Freund, A., Naor, J.: On-line load balancing in a hierarchical server topology. SIAM J. Comput. 31(2), 527–549 (2001) 14. Liang, C.C., Wang, C.H., Hsu, P.Y., Luh, H.: Disaster Avoidance Mechanism for ContentDelivering Service. Comput. Oper. Res. 36(1), 27–39 (2008) 15. Agrawal, M., Rao, H.R., Sanders, G.L.: Impact of mobile computing terminals in police work. J. Organ. Comput. Electron. Commer. 13, 73–89 (2003) 16. Wang, W.M., Liang, C.C., Lu, H.Z., Chow, W.S., Chang, K.Y.: Research of testing process the case of TOPS-system delivery process. TL Tech. J. 34(1), 7–34 (2004) 17. Lu, H.Z., Liang, C.C., Chuan, C.C., Wang, W.M.: Discussion of TOPS/order software deployment, news publish and operation mechanism. TL Tech. J. 35(5), 19–733 (2005) 18. Liang, C.C.: Disseminating Content through a Peer-to-Peer Infrastructure in a Disaster Zones. In: Proc. IEEE Conf. ISM 2009, CD (2009) 19. Liang, C.C., Chuan, C.R., Lu, H.Z., Wang, W.M.: A software deploy model on TOPS/order system and its practice. TL Tech. J. 35(5-1), 19–27 (2005) 20. Acampora, A., Krull, M.: A new approach to peer-to-peer wireless LANs based on ultra wide band technology. Wireless Netw. 14, 335–346 (2008) 21. Guo, H., Shen, G., Wang, Z., Li, S.: Optimized streaming media proxy and its applications. J. Netw. Comput. Appli. 30, 265–281 (2007) 22. Cao, J., Feng, X., Lu, J., Chan, H.C.B., Das, S.K.: Reliable message delivery for mobile agents: push or pull? IEEE Trans. Syst., Man, Cyber.n A, Syst., Humans 34(5), 577–587 (2004) 23. Herrería-Alonso, S., Suárez-González, A., Fernández-Veiga, M., Rubio, R.F.R., LópezGarcía, C.: Improving aggregate flow control in differentiated services networks. Comput. Netw. 44(4), 499–512 (2004) 24. Saxena, N., Pinotti, C.M., Das, S.K.: A probabilistic push-pull hybrid scheduling algorithm for asymmetric wireless environment. In: Proc. IEEE Conf. GLOBALCOM, pp. 5–9 (2004) 25. Bhide, M., Deolasee, P., Katkar, A., Panchbudhe, A., Ramamritham, K., Shenoy, P.: Adaptive push-pull: disseminating dynamic web data. IEEE Trans. Comput. 51(6), 652–668 (2002) 26. Guo, L., Chen, S., Xiao, Z., Tan, E., Ding, X., Zhang, X.: A performance study of BitTorrent-like peer-to-peer systems. IEEE J. Select Areas Commun. 25(1), 155–169 (2007) 27. Défago, X., Schiper, A., Urbán, P.: Total order broadcast and multicast algorithms, taxonomy and survey. ACM Comput. Surv. 36, 372–421 (2004) 28. Li, M., Yu, J., Wu, J.: Free-riding on Bit-Torrent peer-to-peer file sharing systems: modeling analysis and improvement. IEEE Trans. Paral. Dist. Syst. 19(7), 954–966 (2008) 29. Hei, X., Liang, C.: A measurement study of a large-scale P2P IPTV system. IEEE Trans. Multi. 9(8), 1672–1687 (2007)
:
228
C.-C. Liang et al.
30. Cho, K., Fukuda, K., Esaki, H., Kato, A.: The impact and implications of the growth in residential user-to-user traffic. In: Proc. IEEE Conf. SIGOMM, pp. 207–208 (2006) 31. Zhou, J., Hall, W., De Roure, D.C., Dialani, V.K.: Supporting ad-hoc resource sharing on the web: A peer-to-peer approach to hypermedia link services. ACM Trans. Internet Tech. 7(2), 1–92 (2007) 32. Zghaibeh, M., Harmantzis, F.C.: A lottery-based pricing scheme for peer-to-peer networks. Telecommun. Syst. 37, 217–230 (2008) 33. Yamada, H., Okamura, A., Nakamichi, K., Sakai, K., Chugo, A.: QoS control by Traffic Engineering in content delivery networks. FUJITSU Sci. Tech. J. 39(2), 244–254 (2003) 34. Weng, C.E., Lain, J.K., Zhang, J.M.: UEEDA: Uniform and energy-efficient deployment algorithm for wireless sensor networks. Inter. J. Commun. Syst. 21, 453–467 (2008)
ISP-Driven Managed P2P Framework for Effective Real-Time IPTV Service Seil Jeon1, Younghan Kim1,*, Jonghwa Yi2, and Shingak Kang2 1
School of Electronic Engineering Soongsil University, Dongjak-gu, Seoul, 156-743, Korea {sijeon,yhkim}@dcn.ssu.ac.kr 2 Electronics and Telecommunications Research Institute, 161 Kajeong-Dong, Yuseong-Gu, Daejeon, Korea {jhyiee,sgkang}@etri.re.kr
Abstract. Over the years, in attempts to overcome client-server based limitations, a variety of Peer-to-Peer (P2P) technologies have been developed. This feature can contribute to robustness and scalability. However, it introduces network-oblivious traffic because of network-unaware peer selection. As P2P users have increased, so does the damage this problem causes. To solve this problem, various P2P model based on collaboration between ISP and P2P users have been recently suggested and then reduced network-oblivious traffic by removing biased peer selection. However, previous schemes have mainly focused on reducing traffic, which does improve network performance. In order to support effective real-time IPTV services, it should be required to not only reduce network-oblivious traffic, but it should also improve user performance such as delay and jitter because IPTV is an application sensitive to delay and jitter. In fact, the reduction of network-oblivious traffic improves user performance but it cannot always guarantee optimum performance experienced by real-time streaming service users. In this paper, we propose an ISP-driven managed P2P framework and peer selection mechanism for real-time IPTV user. For the evaluation, we use a QualNet 4.0 simulator and show proposed mechanism provides the performance effectively. Ultimately, we confirmed that ISPs should reflect the delay information of organized P2P network as well as the link traffic between each peer for effective real-time IPTV services. Keywords: P2P streaming mechanism, Managed P2P, biased peer selection.
1 Introduction To support the efficient and seamless delivery of an ISP-driven real-time IPTV service encompassing hundreds of thousands of channels, it is essential to cooperate with Peer-to-Peer (P2P) technologies. In fact, several P2P streaming mechanisms are already in use to provide web-based live streaming services with file sharing [1][2]. However, peer selection that does not reflect network topology, without considering *
Corresponding Author.
S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 229 – 240, 2009. © Springer-Verlag Berlin Heidelberg 2009
230
S. Jeon et al.
geographical distance among candidate peers, introduces network-oblivious traffic, which occurs severe problem to Internet Service Provider (ISP) and also leads P2P users to provide degraded performance [3]. While many ISPs developed several P2P traffic shaping techniques (e.g., [4][5]) to solve this problem, P2P users have learned new skills to break through against for a long time. Recently, enhanced CPU performance and memory capacity has allowed high-quality streaming services, such as IPTV. Then, ISPs may have a difficulty in providing the IPTV service by limited infra-structure networks. Due to the reason, a large amount of research emphasizing cooperation between ISPs and P2P user has been suggested in order to overcome previous exhaustive methods. R. Bindal et al. [6] proposed a modified mechanism of tracker and client to solve the biased-neighbor peer selection problem introduced by the tit-for-tat mechanism in BitTorrent. V. Aggarwal et al. [7] mentioned that it requires information to select an ISP-friendly target peer among several peer selection mechanisms providing traffic locality. They demonstrate the performance through ISP-aided information. However, D. R. Choffnes et al. [8] assumed that cooperation and collaboration is not likely to occur between ISPs and P2P user. Thus, they suggested peer selection based on content distribution networks (CDNs), which do not re-quire additional infrastructure, network topology information, or cooperation between ISPs and their subscribers. Their experimental results show improved performance in terms of hop counts and bandwidth. However, they do not describe the reason or reasons why ISPs and P2P users cannot cooperate and collaborate nor do they show the performance achieved compared to previous schemes based on oracle information managed by ISPs. ISP-driven IPTV services are expected to maximize revenue through cooperation between ISPs and content providers (CP). Recently, IETF ALTO (Application-Layer Traffic Optimization) WG [9] and DCIA (Distributed Computing Industry Association) P4P WG [10], targeting cooperative technique, are now standardizing for P2P traffic optimization. The P4P (Provider Portal for Applications) suggested that the notion of P4P-distance interface, which means virtual distance reflecting network status between peers. They confirm that the interfaces allow network providers and applications to jointly optimize their respective performances by using primal-dual decomposition. Summarizing the related work, previous research has mainly focused on the reduction of link utilization, traffic volume, and hop counts. In addition, they show reasonable improvement in user’s quality of service that corresponds to the reduction in traffic volume. Actually, however, when we assume a real-time IPTV ser-vice driven by ISP, the new scheme needs to enhance user quality due to delays and jitter experienced from the source node to each peer as well as reducing oblivious traffic. Because we cannot assure that the sum of optimally selected paths in terms of traffic between peers corresponds to reasonable delay and jitter. In order to obtain an optimized path satisfying this requirement, it needs not only reflected-traffic information from the oracle framework between the requesting peer and candidate peer but also network information from the already constructed overlay network. In this paper, we propose an ISP-driven managed P2P framework and peer selection mechanism for effective real-time IPTV service and show enhanced results in terms of decreasing network-oblivious P2P traffic and increasing user performance for IPTV.
ISP-Driven Managed P2P Framework for Effective Real-Time IPTV Service
231
2 Related Work Several P2P overlay constructions based on physical topology information have been studied. Recently, ALTO WG in IETF and P4P WG in DCIA, a targeting cooperative technique, are now standardizing P2P traffic optimization. Thus, many ISPs anticipate and watch the standardizing process. Both protocols suggest several interfaces to feed topology-related information to the oracle server and coordinate and exchange data, and to query the service. More detailed information of ALTO and P4P as follows.
Fig. 1. ALTO framework
Fig. 1 presents ALTO framework that enables other components to provide interactions. ALTO server generally called an “oracle” knows the topology-related information. Trackers owned by a P2P algorithm or contents provider maintain peerorganized overlay networks. However, because it does not have knowledge of the physical network, it is required to query the network about the location of determined candidate peers and the ALTO server provides guidance that corresponds to the query. To support this framework, ALTO servers update the network information such as its routing state periodically. Fig. 2 shows the P4P network architecture that integrates various content providers and ISPs. P4P distinguish trackers into appTracker owned by contents provider and iTracker owned by ISPs. Before determining the target peer, several candidate peers selected by appTracker. After iTracker receives information of candidate peers from appTracker, it calculates weight according to the distance between peers for maximizing the network’s performance. P4P provides three core interfaces: policy interface allowing ISP policy and identifies congestion information based on the network location of users and P4P distance interface querying the virtual distance between two peers and capability distance providing differentiated service, such as use of the on-demand server or cache server.
232
S. Jeon et al.
Fig. 2. P4P framework
Consequently, previously proposed oracle-based mechanisms including ALTO and P4P enabled cooperation with P2P content providers using various P2P protocol. However, they are limited as a P2P solution for web-based file sharing used to minimize network-oblivious P2P traffic in an ISP network. Thus, when we want to provide real-time IPTV streaming service, we need the framework to consider the improvement of user quality as well as a reduction of network-oblivious P2P traffic. In the next section, we will provide the ISP-driven managed P2P framework and peer selection mechanism for real-time IPTV streaming service.
3 Managed P2P IPTV Framework for IPTV Service 3.1 Managed P2P IPTV Framework This section proposes an ISP-driven managed P2P framework to achieve reduction of topology-oblivious P2P traffic and especially, to improve the quality of real-time streaming. ISP-driven IPTV services need to cooperate with other CPs in order to provide various contents. To manage the content owned by CPs, we assume many CPs will use tracker-based solutions, such as BitTorrent [11]. Among selected candidate peers, optimal peer selection is determined by iTracker, which knows network topology information. Fig. 3 shows the procedures of a proposed managed P2P framework for IPTV service.
ISP-Driven Managed P2P Framework for Effective Real-Time IPTV Service
233
Fig. 3. Channel request/response procedures for managed P2P IPTV service
After the set-top box (STB) completes its boot-up procedures, such as the assignment of an IP address, the following steps present the scheme in detail. • Step 1: A STB sends a “Get Program Request” message containing channel information to the EPG (Electronic Program Guide) server, which performs the same function as a torrent server in BitTorrent. The EPG sends a “Get Program Response” message that includes the IP address of the cTracker that manages the requested channels signal delivery to the STB. • Step 2: After receiving the “Get Program Response” message, the STB directly sends a “Get Peer Request” to the cTracker. At this time, we use an OID (Opaque ID) assigned by iTracker through initial boot-up procedure for topology-hiding. A cTracker knows the OID information of each peer and maintains binding information with OID and IP address through “Get Program Request” message but this technique is out-of-scope in this paper. • Step 3: The cTracker receiving the “Get Peer Request” message sends the “Peer Selection Request” containing OID information of candidate peers to the iTracker. The iTracker sends back a “Peer Selection Response” message containing the IP address of the target peer based on inner topology information to the cTracker. • Step 4: The cTracker sends the finally ordered peer list based on receipt message from iTracker. Then, the STB try to connect peering to the peers in received peer list. 3.2 The Structure of iTracker The iTracker decides on a target peer based on the link traffic and link delay for effective peer selection between the requesting peer and candidate peers. The functional block diagram is as shown in Fig. 4.
234
S. Jeon et al.
Fig. 4. Functional block diagram within the iTracker
The iTracker has several interfaces with the STB, cTracker, Route Monitor, and SNMP server. It consists of several functions in which a topology-hiding OID manager creates an OID per the STB and the total distance calculator that calculates the distance value and network DB storing the link traffic and delay. The SNMP periodically obtains network information from SNMP agents that estimate and store the traffic and delay statistics on an input port at every router. To get the delivery network information, we first need the peering information among peers. This is obtained by the cTracker. The link traffic (te) traversing the link is expressed as the transmission rate of frame and link delay (de) meaning the elapsed time of a transmitted frame between two nodes. 3.3 Target Peer Selection Our scheme satisfies the ISP-objective, which requires reduced network-oblivious P2P traffic and the user-objective, which requires improved user-quality. For the ISPobjective, we define pij as weight for optimal route between the requesting peer (i) and the candidate peer (j), and pe as the weight per link e organizing the route. Then, we are able to acquire pij, which is the sum of each pe corresponding to the path organized between i and j, as shown in equation (1). For the calculation, we use an optimized decomposition method, which is also used in P4P [12].
pij = ∑ pe I e (i, j ) . e
(1)
Equation (1) minimizes P2P traffic and then obtains the result according to the mechanism. However, in order to obtain additional improvements in user quality performance, we need information about the currently organized peers from the
ISP-Driven Managed P2P Framework for Effective Real-Time IPTV Service
235
Fig. 5. The information to select target peer in iTracker
streaming server (S). For an easy description of the proposed peer selection mechanism, we assume that there are two branches from S and each peer is connected with one parent and one child as shown in Fig. 5. In this environment, j4a and j4b are closest peers from i node in each branch. The cTracker selects j4a and j4b as the candidate peer and then sends the IP addresses of {j1a, j2a, j3a} and {j1b, j2b, j3b} organized from the S to each j4a and j4b as well as the OID of j4a and j4b. In actuality, each hop count between i and j4a and, i and j4b is 3 and 4. When we assume that the link traffic traversing one hop is the same, the optimal target peer is j4a. However, we cannot assure that the quality of streaming received from j4a is better than that of j4b. This is because we did not consider the delivery network while the streaming data is delivered through peers organized from S. Thus, we calculate the sum of the link delay in the delivery network containing the candidate peers as shown in equation (2), which means that the sum of the link delay between j1m and j2m and, j2m and j3m, and so on, is organized from candidate peer jxm to the S.
d jm , s = x
∑d
e e∈path{ j1m , j2m }
+
∑d
e e∈path{ j2m , j3m }
+"+
∑d
e e∈path{ jxm , S }
.
(2)
236
S. Jeon et al.
We then acquire a link weight (pw) that totally satisfies both the ISP-objective and the user-objective. As shown in (3), we select the minimized pw among the values that are summed up pi, j with link delay multiplied by gamma γ, which is weight factor to decide the reflected degree of organized network delay. It can be used for ISP’s policy.
pw = min{( pi , j a + γ ⋅ d j a , s ), ( pi , jb + γ ⋅ d jb , s ), ..., ( pi , j m + γ ⋅ d j m , s )} .
(3)
4 Simulation and Results In this section, we evaluate the proposed scheme for an IPTV streaming service. For a delivery model, we use the widely deployed breath-first-search (BFS) tree model used in CoolStreaming [13]. We compare three mechanisms that original “BFS” and “P2P-Link-Traffic” only considering the link traffic between the requesting peer and candidate peer and proposed scheme. The BFS model composes the delivery tree by focusing on the width according to the sequential arrival of each node. The BFS model has the feature to reduce the total forwarding latency of data delivered from the source to the leaf node. For the evaluation of three mechanisms, we use a QualNet simulator v4.0 [14]. We deploy 100 nodes and use OSPF routing protocol to retrieve the route between peers in the ISP network. In consideration of the external environment, we use 6 FTP sessions. The amount P2P traffic that occurs is calculated as the number of forwarded datagrams, except for FTP traffic when the FTP transmission rate is 400KB/s, 1MB/s, and 5MB/s, respectively. 4.1 User’s Streaming Quality The change of performance in terms of user quality is indicated by the depth-level of the peer in the delivery network. We use 50 peers among 100 nodes. Each peer participates in the streaming service sequentially. Fig. 6 shows that the delay variation of leaf peer that located in depth-level 4 as received packet sequence from source. The BFS mechanism is not optimized based on network topology; hence, it is influenced more by unnecessary P2P traversing and background FTP traffic. This results in a great deal of delay and more fluctuation than other mechanisms. In Fig. 6 (b), a P2P-Link-Traffic scheme has 22ms while the proposed scheme has a 12ms average. As the peers participate more, we could predict that the delay times will be doubled or will increase even more when we consider the complexity of the delivery network and the increase in routing hop counts. Fig. 7 shows the jitter variation of the peer receiving streaming data located at the depth-level 4. We observe that the BFS has the irregularity of jitter that corresponds to the delay variation shown in Fig. 6. P2P-Link-Traffic and the proposed scheme
ISP-Driven Managed P2P Framework for Effective Real-Time IPTV Service
237
Fig. 6. Comparison of delay variation
show the changes of variation about twice smaller. However, we confirm that the difference of variation is little more than P2P-Link-Traffic. Through this result, we observe that the proposed scheme has better performance in terms of delay and jitter while providing the streaming service.
238
S. Jeon et al.
Fig. 7. Comparison of jitter variation
4.2 Comparison of P2P Traffic and Intensity Fig. 8 (a) expresses the total amounts of P2P traffic and Fig. 8 (b) shows the intensified degree of P2P traffic, which is calculated by standard deviation. Ultimately, Fig. 8 (a) means how efficiently network resources are saved. However, Fig. 8 (b) shows how efficiently loads are distributed. Each peer has 4 child nodes as the out-degree and we regulate 400KB/s as the streaming rate. We observe that the P2P-Link-Traffic scheme and the proposed scheme have reduced the amount and intensity of the P2P traffic. But we observe that P2P traffic in the proposed scheme is slightly more than in the P2PLink-Traffic scheme. However, the proposed scheme provides performance similar to the one of the P2P-Link-Traffic scheme.
ISP-Driven Managed P2P Framework for Effective Real-Time IPTV Service
239
Fig. 8. Total amounts of P2P Traffic and P2P traffic intensity in ISP’s network
5 Conclusion In this paper, we proposed a managed P2P framework and peer selection mechanism for real-time IPTV streaming service. Previous schemes based on topology information have mainly focused on the reduction of network-oblivious P2P traffic. However, in order to provide effective real-time IPTV service, ISPs need to increase efforts to improve user quality as well as suppressing network-oblivious P2P traffic. Recently introduced P4P, which is effective peer selection mechanism to decrease network-oblivious P2P traffic. However, IPTV is delay-sensitive application. It then needs the mechanism that considers service quality in terms of delay and jitter as well as the reduction of unnecessary P2P traffic. Thus, we propose managed P2P streaming mechanism satisfying the requirements and reflecting link delay of organized P2P network. Our simulation shows that the proposed scheme displays better streaming quality and reduction of P2P network-oblivious traffic. Consequently, it confirms that a managed P2P mechanism should consider the delay information for an already organized delivery network.
Acknowledgments This research was partially supported by the MKE (The Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute for Information Technology Advancement) (IITA-2009-(C1090-0902-0036)) and Ubiquitous Computing and Network (UCN) Project, Knowledge and Economy Frontier R&D Program of MKE in Korea as a result of UCN's subproject 09C1-C1-20S.
References 1. CoolStreaming, http://www.coolstreaming.us/hp.php?lang=en 2. TVAnts, http://tvants.en.softonic.com/ 3. Cha, M., Rodriguez, P., Moon, S., Crowcroft, J.: On Next-Generation Telco-Managed P2P TV Architectures. In: Proc. of 2008 IPTPS Conference (2008)
240
S. Jeon et al.
4. Packeteer. Packeteer packetShaper, http://www.packeteer.com/products/packetshaper 5. Sandvine. Intelligent broadband network management, http://www.sandvine.com 6. Bindal, R., Cao, P., Chan, W., Medval, J., Suwala, G., Bates, T., Zhang, A.: Improving Traffic Locality in BitTorrent via Biased Neighbor Selection. In: 26th IEEE International Conference on Distributed Computing Systems, p. 66 (2006) 7. Aggarwal, V., Feldmann, A., Scheideler, c.: Can ISPs and P2P Systems Cooperate for Improved Performance? Proc. of ACM SIGCOMM, Computer Communication Review 37(3) (2007) 8. Choffnes, D.R., Bustamante, F.E.: Taming the Torrent. In: Proc. of ACM SIGCOMM (August 2008) 9. Kiesel, S., Popkin, L., Previdi, S., Woundy, R., Yang, Y.R.: Application-Layer Traffic Optimization (ALTO) Requirements, draft-kiesel-alto-reqs-00.txt (July 4, 2008) 10. Xie, H., Krishnamurthy, A., Silberschatz, A., Yang, Y.: P4P: Explicit Communications for Cooperative Control Between P2P and Network Providers, http://www.dcia.info/documents/P4POverview.pdf 11. BitTorrent, http://www.bittorrent.com 12. Xie, H., Yang, U.R.: P4P – Provider Portal for (P2P) Applications. In: Proc. of SIGCOMM 2008, August 17-22 (2008) 13. Zhang, X., Liu, J., Li, B., Yum, T.-S.: CoolStreaming/DONet: A data-driven Overlay Network for Live Media Streaming. In: IEEE INFOCOM 2005, Miami, FL, USA, March 2005, pp. 2102–2111 (2005) 14. QualNet Network Simulator, http://www.scalable-networks.com
Fault-Tolerant Architecture for Peer to Peer Network Management Systems Maryam Barshan1, Mahmood Fathy1, and Saleh Yousefi2 1
Computer Engineering Faculty, Iran University of Science and Technology(IUST)
[email protected],
[email protected] 2 Computer Engineering Department, Urmia University
[email protected] Abstract. In this paper we propose a 3-tier hierarchical architecture which is based on peer to peer model for network management purpose. The main focus of the proposed architecture is provisioning fault tolerance property which in turn leads to increasing the availability of the Network Management System (NMS). In each tier of the architecture we use redundancy to achieve the aforementioned goal. However we do not use redundant peers thus no peer redundancy is imposed to the system. Instead we use some selected peers in several roles and therefore only add some software redundancy which is easily tolerable by advanced processors of NMS’s peers. Due to the hierarchal structure failure of nodes in each tier may affect NMS's availability differently. Therefore we examined the effect of failure of peers which play different roles in the architecture on the availability of the system by means of extensive simulation study. The results show that the proposed architecture offers higher availability in comparison to previously proposed peer to peer NMS. It also offered lower sensitivity to failure of nodes. Keywords: availability, fault tolerance, network management, hierarchical P2P networks.
1 Introduction Nowadays the interest of almost all companies in taking advantage of new technologies and reaping their benefits leads to increase of network complexity. One of the challenges is offering various services in a high quality. However emergence of new equipment and services and increasing the number of users with diverse service demands make the management of new generation networks difficult. In order to succeed in management of such complex networks a robust, reliable NMS is required. IETF’s SNMP (Simple Network Management Protocol) and ITU-T’s TMN (telecommunication Management Network) have been two main network management technologies in the computer networking industry. Nowadays, due to some new challenges including dynamic topology configuration, interoperability among heterogeneous networks, QoS guaranteed services, etc they show some weakness points in the management of such complex networks [1, 2]. To address these shortcomings two S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 241–252, 2009. © Springer-Verlag Berlin Heidelberg 2009
242
M. Barshan, M. Fathy, and S. Yousefi
alternate technologies are recently introduced in the network management community: web services and peer to peer (P2P) [3]. The well known characteristics of P2P approaches lead to scalability, flexibility, reliability and improving the quality of current network management solutions. Therefore one of the solutions for addressing network management challenges is to use P2P-based NMSs which are based on overlay networks. A management overlay can potentially encompass different administrative domains. Thus it can put together different human administrators and diverse systems and networks in order to accomplish a management task in a cooperative and integrated manner. Basically a network management system will work effectively and efficiently only when it could gain the information it needs from the network and could be able to change the configuration or apply attributes when necessary. Therefore, one of the requirements of a management system to deliver the needed service is to provide reasonable availability. Availability of a system can be investigated from several viewpoints including QOS [4], security [5] and fault tolerance [6]. Fault tolerance is the ability of a system to offer services even in the presence of a fault. It is one of P2P networks requirements in order to avoid data losses and to support proper message transferring. Redundancy is a technique for applying fault tolerance and increasing reliability and availability of P2P systems. It can be used for different resources. In this paper we propose architecture for increasing the availability of the network management system. We intend to achieve such a goal through increasing fault tolerance property of the NMS by applying redundancy. However the architecture we proposed does not impose any peer (hardware) redundancy but software redundancy. This is mainly because we use some peers in several roles and thus add some software redundancy which is easily tolerable by advance processors of NMS’s peers. We conduct extensive simulation study to examine the performance of the proposed architecture in presence of nodes' failure. We also investigate effect of failure in different nodes and sensitiveness of the architecture to those failures. To the best of our knowledge this work is the first work which addresses fault tolerance in peer to peer network management and the proposed architecture is the first one in its type. The remaining of this paper is organized as follows. In Section 2 related works have been reviewed. In Section 3 the proposed architecture is introduced. In section 4 we evaluate performance of the proposed architecture using extensive simulation study and bring results. Finally, section 5 concludes the paper along with some guidelines for future work.
2 Related Work The work related to our study can be categorized in three main topics including P2P-based network management systems, fault tolerance in P2P systems and fault tolerance in hierarchical P2P systems. Reference [7] has carried out one of the first investigations of using P2P paradigm in network management. The authors of [8] designed a P2P overlay to address fault and performance management. Another use of P2P management which is used for Ambient Networks (ANs) has been reported in by [9]. More importantly in a European Celtic Madeira project [10] the goal is to implement P2P-based management
Fault-Tolerant Architecture for Peer to Peer Network Management Systems
243
systems for mesh and ad-hoc networks. The project uses inherent P2P characteristics for automatic and dynamic network management in this kind of networks. Madeira peers are organized in some clusters with selected nodes as cluster heads. The cluster heads are re-clustered in another layer and this trend is continued until one peer remains at topmost level. The aforementioned process leads to a multi-tier hierarchal P2P architecture. The authors of [11] point out that using such architecture and more specifically a 2-tier hierarchical P2P technology is a flexible, scalable and easy to use solution in network management applications. Note that [10] and [11] like all P2P architecture make use of application layer routing which causes improvement of connectivity between management entities. Furthermore the advantage of using grouping in each layer for balancing management tasks is discussed. Deploying redundancy is a technique for increasing reliability and availability of P2P systems. In DHT (Distributed Hash Table) P2P systems, three kinds of redundancy are discussed: replication and erasure coding or combination of the two methods [12, 13]. Due to high churn rate in P2P networks, the authors in [14] proposed a technique in order to ease fault discovery and network recovery aiming at having provisioning dependability and performance. This method is called MSPastry and is a new implementation of the Pastry for real environments which has consistent message routing. By assuming high dynamic rate in structured P2P networks, reference [15] has designed a failure recovery protocol and has evaluated its performance. Moreover the authors in [16] answer two issues: first how data structure can be built for routing in presence of faulty nodes and the second, how a secure message routing can be achieved? In [17] an efficient method for fault tolerance in hierarchical P2P systems is introduced. It proposes a so called multiple publication technique which normal peers connect to one or more SP (Super Peer) in other groups. When a normal peer finds out that its corresponding SP no longer works, it selects one of the other group’s SP as its new SP. The authors in [18] took advantage of BSP for applying fault tolerance in each group, i.e. whenever a SP fails, it is replaced by a BSP. The paper further proposed a scalable algorithm for assigning peers into groups, selecting SPs and maintaining overlay network. In [19] the authors used redundancy but their methodology is different from the one in [18] in that in each group some SPs form a virtual SP. Then node belonging to the virtual SP uses Round Robin (RR) to serve as a SP. [20] reports using of 2 layers for SP fault tolerance: first all peers, are organized in a flat layer and then peers with more resources arrange another overlay for managing other lessresource peers. In case of a SP failure, peers use flat layer instead of second overlay, because they are already organized in that layer and have necessary connections with other peers in that layer. However, delivery is not guaranteed.
3 Proposed Architecture for P2P-Based NMS Proposed network management system is a 3-tier hierarchical architecture. The layers from down to up are as follows: LLM (Low Level Manager), MLM (Mid Level Manager) and TLM (Top Level Manager). Peers of each layer are arranged in some groups. Note that the aforementioned nodes belong to NMS and are called manager nodes hereafter. The manager nodes are used for the sake of management of some other nodes called managed elements (end nodes) which are actually nodes in the network
244
M. Barshan, M. Fathy, and S. Yousefi
under management. LLM groups are in charge of collecting management data from managed elements. As shown in Fig.1, end nodes connect to their SPs through a star topology. Indeed due to the fact that they are managed devices they do not need to connect to each other in the NMS. This is not in contradiction to their ability to communicate with each other in the network under management. Furthermore, each end node has the address of a BSP for the sake of using it if necessary.
Fig. 1. Connectivity of end nodes and corresponding SP
In the proposed NMS architecture, each peer in LLM can be used as a BSP for other peers which act as SPs for end nodes. For example in Fig. 2, A′ is as a BSP for A and A′′ as a BSP for A′ . The reason why SP of each group is used as BSP of others is applying redundancy with minimum overhead. Thus there is no hardware overhead in this case. In each LLM group one SP and one BSP (the most powerful peers) are selected among all peers for making connection to upper tier (i.e. the MLM layer). The process of selecting SP and BSP nodes is out of scope of this paper. In each LLM group peers are connected to the selected SP and BSP through a star topology. It should be stressed that there is a direct link between a SP and BSP in these groups. This link is used to send alive messages in order to get informed of any failure. Fig. 2 gives a representation of an LLM group structure when the redundancy factor is 2.
Fig. 2. LLM layer intra-groups connectivity
MLM layer is formed by grouping SPs and BSPs of the LLM layer groups. The proposed dynamic topology for the MLM group is depicted in Fig. 3. As shown in the figure, SPs are connected through a degree-three Chordal ring and BSPs are connected to their SPs neighbors. It should be noted that in Fig. 3 each BSP is just in role of backup and does not have any responsibility in MLM functionality until its corresponding SP fails. When the SP fails, the BSP connects to MLM through neighbor of the failed SP. Thus during the time in which the SP is active, it is in charge of managing both LLM and
Fault-Tolerant Architecture for Peer to Peer Network Management Systems
245
Fig. 3. MLM layer intra-groups connectivity
MLM layers and during this whole period BSP just is in charge of collecting management data from its assigned end nodes. Fig. 4 shows set of connections between SPs and BSPs of LLM layers in the MLM layer. As shown in the figure, in addition to link between LLM and MLM layers some horizontal links are created among LLM groups (i.e. the groups in the same hierarchal level). These horizontal links shown by dashed lines in the figure, may be taken advantage in order to make possible more cooperation in the proposed NMS. Moreover, in this layer (i.e. MLM) no additional peers are used thus no hardware cost is imposed to the proposed NMS. It is worthy of mention that Management by Delegation (MbD) can be performed in MLM layer.
Fig. 4. Cooperation between LLM layer SPs and BSPs
In the MLM layer we use a set of SP nodes called VSPs (Virtual SP). Using a Round Robin method at any moment of the time (e.g. any failure time) one of the members of VSP set takes the responsibility of SP in each MLM group. The current SP node makes management decisions and links LLM layers to the TLM layer. Based on this organization some SPs take turn serving other peers. Note that as illustrated in Fig. 5 from the TLM point of view this set can be assumed such as one SP. At any moment of time the peer which is in charge of TLM and MLM connectivity is called RSP (Real SP). MLM groups are formed from SPs and BSPs of LLM layer for being in charge of LLM and TLM connectivity. One of the problems of considering static architecture here is increasing load in edge peers (peers between MLM and TLM layers). To alleviate this problem VSPs are used making dynamic connection with TLM. In fact using VSPs leads to distribution of management data transfer load among some more powerful SPs.
246
M. Barshan, M. Fathy, and S. Yousefi
Fig. 5. Method of connecting MLM to TLM
The topology of TLM layer, shown in Fig. 5, due to its importance and communicating with human administrators is considered Full-mesh. The whole TLM layer can be assumed as one virtual peer, as well. The peers of this layer are different from peers of MLM which were formed from SPs and BSPs of LLM groups. In fact, at first all peers are divided into two parts. One part is used to form LLM layer and then MLM from SPs and BSPs of LLM groups. So the other part is used to organize TLM layer. Finally, the schematic of the proposed architecture which is used in simulation is depicted in Fig. 6.
Fig. 6. A sample of hierarchical proposed architecture
In MLM layer of described architecture any SP/BSP failure is periodically being checked. When the BSP/SP realizes that its corresponding SP/BSP is failed, immediately reports it to TLM through a critical message (failure alarm). Then a source routing algorithm is executed in the TLM in order to update (reconfigure) the architecture through replacing the BSP in the architecture instead of the failed SP. In other words, due to absence of the SP, the replaced BSP takes the responsibility of linking LLM
Fault-Tolerant Architecture for Peer to Peer Network Management Systems
247
layer to the TLM layer. Note that this failure will be reported to human administration in TLM layer and repair process is launched afterward. Due to report of failure toward upper levels (i.e., from LLM and MLM toward TLM and in the case of MbD, from LLM toward MLM), there is no need to take k more than 2 (k denotes redundancy factor). In fact, the probability of coincident failure of a SP and BSP can be ignored.
4 Experimental Results Evaluation of the proposed architecture has been performed through the PeerSim simulator [21]. The simulated NMS is composed of 100 management peers (Fig. 6). The number of TLM peers in this scenario is considered 4 out of 100, while the total number of LLM members is considered 96. These 96 peers have been arranged in some 8-member groups. For each of these 12 groups one SP and one BSP have been selected and moved to MLM layer. 8-member groups are formed in this layer again. As a result the number of LLM members is 74 and the number of MLM members is 24. In each MLM group three SPs are selected as VSPs. Then the mentioned topology is applied to the whole structure and a kind of source routing with minimum latency is implemented to route messages from different elements of the architecture. Failure rates are taken different for each peer type. VSPs in MLM layer have lower failure rate in comparison with other nodes in MLM layer. Furthermore, SPs and BSPs have lower failure rate in comparison with LLM peers. In simulation the failure rate of LLM and TLM peers are assumed to be zero ( λ LLM = λTLM = 0, TLM layer because of Full-mesh topology and LLM layer due to the fact they are connected to the end nodes which are not included in the NMS). The times to failure for MLM peers are taken from exponential distribution with the failure rate (i.e. λMLM and λVSP ) given in each figure. It should be noted that our aim in λ MLM is failure rate of all MLM peers except VSPs. In order to evaluate the availability of proposed NMS architecture we measure percentage of available peers. The measured availability is in TLM's viewpoint. In other word if some peers are available but are not reachable by TLM they are considered as unavailable peers. The following figures are depicted from the average of 1000 experiments. In repairable case, after finishing repair time, the SP takes back its responsibility again. In this case until next failure (10 times of failure are assumed for each peer) the BSP has role just in LLM layer and no longer has responsibility in MLM layer. Moreover in this case, times to repair are considered to be fixed and within a specific period of time (500 time slots = 1/40 simulation time). Furthermore, the number of periodic alive checking is 500 and number of message sending for checking available peers is 100 for the whole of simulation time. The non-fault tolerant architecture which is used for comparison has the same assumption as the proposed architecture in terms of number of peers, number of groups and number of peers in each group. However it has the following differences: not using BSPs in LLM layer, not using VSPs in MLM layer and not using RR method for selecting RSP in each MLM group (indeed in this architecture VSP failure means RSP failure).Thus the topology is considered star in LLM layer (just with SPs), degree-three Chordal ring for MLM layer (without BSPs connections) and Full-mesh for TLM layer.
248
M. Barshan, M. Fathy, and S. Yousefi
The following figures are drawn based on repairable assumption to show the availability improvement in proposed architecture in comparison to the non-fault tolerant architecture (figure 7). We also intend to show the effects of different failure rates (figure 8) as well as sensitiveness of the proposed architecture in presence of VSP and MLM failures (figure 9). Note that in each figure the values of failure rates are mentioned. repairable case λ (MLM)=1/20000, λ (VSP)=1/20000 percentage of available peers
101 100 99 98 97 96 95 94 93
proposed architecure
92
non-fault tolerant architecture
91 0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 Trial No.
Fig. 7. The proposed architecture availability versus the non-fault tolerant availability Table 1. Average, Min and Max values of the curves of Fig. 7 Repairable case Proposed arch. λ MLM =1/20000 λ VSP = 1/20000 Non-fault tolerant arch. λ MLM =1/20000 λ VSP = 1/20000
Avg
Max
Min
97.91
99.70
97.46
95.34
99.66
94.56
First we show the effectiveness of the proposed architecture in improvement of NMS's availability in Fig. 7. This figure is depicted assuming the same failure rate for MLM and VSP ( λVSP = λMLM = 1/20000) as well as the probability of no failure in TLM and LLM peers. It follows from the figure that NMS's availability in proposed architecture is noticeably improved (on average 97.91% versus 95.34%. Since 1% of availability is more important 2.57% increase on average) in comparison to the non-fault tolerant one. Next in Fig. 8 we aim at studying the effect of failure rate along with peer types on the NMS's availability. In Fig. 8(a) we assume no failure in other peers and vary failure rate of VSP nodes. As concluded from the figure by increasing failure rate the availability of the NMS deteriorates considerably (on average 98.5%, 97.1%, and 94.2% for different failure rates in the figure. i.e. a total of 2.147% decrease on average). In figure 8(b) the availability of NMS is shown for different failure rates of MLM nodes. As is illustrated in the figure it obeys the same trend of Fig. 8(a) unless it shows that effect of failure in VSP nodes is more considerable (on average 99.4%, 98.78%, and 97.65% for different failure rates in the figure. i.e. a total of 0.872% decrease on average) in comparison to other MLM nodes. This phenomena is also validate in Fig 8(c) in which the same failure rates of MLM and VSP nodes present quite noticeable difference (on average 99.39% versus 98.47%. i.e. 0.92% difference on average) in NMS's availability. In this figure other failure rates except ones mentioned are taken zero.
Fault-Tolerant Architecture for Peer to Peer Network Management Systems
repairable case
repairable case
100
percen tag e o f a va ilable p ee rs
percentage of available peers
101 99 98 97 96 95 94 93 proposed architecture, λ (VSP)=1/20000 proposed architecture, λ (VSP)=2/20000 proposed architecture, λ (VSP)=4/20000
92 91 90 0
249
100.50 100.00 99.50 99.00 98.50 98.00 97.50 97.00 96.50 96.00 95.50
proposed architecture, λ (MLM)=1/20000 proposed architecture, λ (MLM)=2/20000 proposed architecture, λ (MLM)=4/20000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 Trial No.
Trial No.
8(a). Comparing different VSP failure rates
8(b). Comparing different MLM failure rates
Percentage of available peers
Repairable case 100.5 100 99.5 99 98.5 98 proposed architecture , λ (MLM)=1/20000
97.5
proposed architecture , λ (VSP)=1/20000
97 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 Trial No.
8(c). The effects of similar MLM and VSP failure rates Fig. 8. The effect of MLM and VSP failure rates on the NMS's availability Table 2. Average, Min and Max values of the curves of Fig. 8(a) Repairable case Proposed arch. λ VSP =1/20000 Proposed arch. λ VSP =2/20000 Proposed arch. λ VSP =4/20000
Avg
Max
Min
98.50
99.66
97.90
97.11
99.31
96.59
94.21
98.82
93.51
Table 3. Average, Min and Max values of the curves of Fig. 8(b) Repairable case Proposed arch. λ MLM =1/20000 Proposed arch. λ MLM =2/20000 Proposed arch. λ MLM =4/20000
Avg
Max
Min
99.40
99.95
99.27
98.78
99.88
98.59
97.65
99.68
97.16
Table 4. Average, Min and Max values of the curves of Fig. 8(c) Repairable case Proposed arch. λ MLM =1/20000 Proposed arch. λ VSP =1/20000
Avg
Max
Min
99.39
99.89
99.27
98.47
99.80
98.10
250
M. Barshan, M. Fathy, and S. Yousefi
Figure 9 evaluates the sensitivity of the proposed architecture in comparison to non-fault tolerant architecture to peer failure. For this purpose we measure the amount of availability decrease in response of some increase in failure rates. In figure 9(a) we increase the failure rate of VSP nodes from 1/20000 and 2/20000 and measure NMS's availability for our proposed architecture and non-fault tolerant one. Furthermore figure 9(b) shows the same phenomena in the presence of failure in MLM nodes. As it follows from the figures both VSP and MLM failures have less influence on the availability of the proposed architecture than non-fault tolerant architecture. For the failure rate numbers in the figure, the sensitivity of the proposed architecture in comparison to the non-fault tolerant one is 3.38 percent lower (for VSP failure) and 0.31 percent lower (for MLM failure). Repairable case 102
100
Percentage of available peers
Percentage of available peers
Repairable case 102
98 96 94 92 Proposed architecture , λ (VSP)=1/20000 Proposed architecture , λ (VSP)=2/20000 non-fault tolerant architecture , λ (VSP)=1/20000 non-fault tolerant architecture , λ (VSP)=2/20000
90 88 86
100 98 96 94 92 Proposed architecture , λ (MLM)=2/20000 Proposed architecture , λ (MLM)=4/20000 non-fault tolerant architecture , λ (MLM)=2/20000 non-fault tolerant architecture , λ (MLM)=4/20000
90 88
0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 Trial No.
0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 Trial No.
9(b) Sensitivity to MLM failure
9(a) Sensitivity to VSP failure
Fig. 9. Comparing the sensitivity of the proposed architecture and non-fault tolerant architecture to VSP and MLM failures Table 5. Average, Min and Max values of the Table 6. Average, Min and Max values of curves of Fig. 9(a) the curves of Fig. 9(b) Repairable case Proposed arch. λ VSP =1/20000 Proposed arch. λ VSP =2/20000 Proposed arch. λ VSP =4/20000 Proposed arch. λ VSP =8/20000 Non-fault tolerant arch.
λ VSP =1/20000 Non-fault tolerant arch.
λ VSP =2/20000 Non-fault tolerant arch.
λ VSP =4/20000 Non-fault tolerant arch.
λ VSP =8/20000
Repairable case Proposed arch. λ MLM =1/20000 Proposed arch. λ MLM =2/20000 Proposed arch. λ MLM =4/20000 Proposed arch. λ MLM =8/20000
Avg
Max
Min
97.86
99.67
97.36
96.32
99.46
95.64
93.47
98.80
92.67
88.59
97.49
86.94
95.33
99.46
94.51
Non-fault tolerant arch.
91.88
99.38
90.75
Non-fault tolerant arch.
λ MLM =1/20000 λ MLM =2/2000
85.76
98.59
84.35
Non-fault tolerant arch.
75.92
97.56
73.16
Non-fault tolerant arch.
λ MLM =4/20000 λ MLM =8/20000
Avg
Max
Min
97.88
99.61
97.30
97.30
99.54
96.73
96.00
99.39
95.42
93.96
99.03
93 .16
95.33
99.56
94.52
94.61
99.54
93.74
93.01
99.54
92.14
90.49
99.06
89.31
5 Conclusion Due to weakness points of traditional client/server based network management systems, we aimed at designing a high available NMS based on Peer to Peer paradigm. In
Fault-Tolerant Architecture for Peer to Peer Network Management Systems
251
this paper we proposed a self-reconfigurable and self fault-managed architecture by applying fault tolerant property. Our main focus is to offer high availability with minimum overhead contrary to costly redundancy mechanisms. For this purpose we use multi-role peers where each node in addition to its normal role plays the role of back up for other selected nodes. Thus, no peer redundancy is added except of some software redundancy which is easily tolerable by today's advance NMS peers. Experimental results show the effectiveness of the proposed architecture in improvement of NMS's availability in comparison to a non-fault tolerant NMS. It also offers less sensitivity to a node failure. We further examined the effect of failure of nodes in different level of hierarchy in availability of the proposed NMS. For future work we intend to study availability of the proposed fault-tolerant architecture from an analytical point of view.
Acknowledgment This work is supported by the Iran Telecommunication Research Center (ITRC).
References 1. Choi, M.-J., Won-Ki Hong, J.: Towards Management of Next Generation Networks. IEICE Trans. Commun. E90–B(11), 3004–3014 (2007) 2. Li, M., Sandrasegaran, K.: Network Management Challenges for Next Generation Networks. In: Proceedings of the IEEE Conference on Local Computer Networks 30th Anniversary (LCN 2005), Sydney, Australia (2005) 3. Cassales Marquezan, C., Raniery Paula dos Santos, C., Monteiro Salvador, E., Janilce Bosquiroli Almeida, M., Luis Cechin, S., Zambenedetti Granville, L.: Performance Evaluation of Notifications in a Web Services and P2P-Based Network Management Overlay. In: 31st Annual IEEE International Computer Software and Applications Conference (COMPSAC 2007), Beijing, china, vol. 1, pp. 241–250 (2007) 4. Menascé, D.A., Almeida, V.A.F., Dowdy, a.W.: Performance by Design: Computer Capacity Planning by Example. Prentice Hall PTR, Englewood Cliffs (2004) 5. Stallings, W.: Cryptography and Network Security Principles and Practices, 4th edn. Prentice Hall, Englewood Cliffs (2005) 6. Avižienis, A., Laprie, J.-C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing 1(1), 11–33 (2004) 7. State, R., Festor, O.: A Management Platform Over Peer-to-Peer Service Infrastructure. In: Proceedings of 10th International Conference on Telecommunications, ICT 2003, Vancouver, BC, Canada, pp. 124–131 (2003) 8. Binzenhöfer, A., Tutschku, K., Graben, B.a.d., Fiedler, M., Arlos, P.: A P2P-based framework for distributed network management. In: Cesana, M., Fratta, L. (eds.) Euro-NGI 2005. LNCS, vol. 3883, pp. 198–210. Springer, Heidelberg (2006) 9. Brunner, M., Galis, A., Cheng, L., Colás, J.A., Ahlgren, B., Gunnar, A., Abrahamsson, H., Szabó, R., Csaba, S., Nielsen, J., Schuetz, S., Prieto, A.G., Stadler, R., Molnar, G.: Towards ambient networks management. In: Magedanz, T., Karmouch, A., Pierre, S., Venieris, I.S. (eds.) MATA 2005. LNCS, vol. 3744, pp. 215–229. Springer, Heidelberg (2005)
252
M. Barshan, M. Fathy, and S. Yousefi
10. Llopis, P.A., Frints, M., Abad, D.O., Ordás, J.G.: Madeira: A peer-to-peer approach to network management. In: The Wireless World Research Forum (WWRF), Shanghai, China (April 2006) 11. Zambenedetti Granville, L., Moreira da Rosa, D., Panisson, A., na Melchiors, C., Janilce Bosquiroli Almeida, M., Margarida Rockenbach Tarouco, L.: Managing Computer Networks Using Peer-to-Peer Technologies. IEEE Communications Magazine (October 2005) 12. Rodrigues, R., Liskov, B.: High availability in dHTs: Erasure coding vs. Replication. In: Castro, M., van Renesse, R. (eds.) IPTPS 2005. LNCS, vol. 3640, pp. 226–239. Springer, Heidelberg (2005) 13. Chen, G., Qiu, T., Wu, F.: Insight into redundancy schemes in DHTs. The Journal of Supercomputing 43(2), 183–198 (2008) 14. Castro, M., Costa, M., Rowstron, A.: Performance and dependability of structured peer-topeer overlays. In: IEEE, Proc. Dependable Systems and Networks (DSN 2004), June 2004, pp. 9–18 (2004) 15. Lam, S.S., Liu, H.: Failure Recovery for Structured P2P Networks: Protocol Design and Performance Evaluation. In: SIGMETRICS/Performance 2004. ACM, New York (2004) 16. Hildrum, K., Kubiatowicz, J.: Asymptotically Efficient Approaches to Fault-Tolerance in Peer-to-Peer Networks. In: Fich, F.E. (ed.) DISC 2003. LNCS, vol. 2848, pp. 321–336. Springer, Heidelberg (2003) 17. Lin, J.-W., Yang, M.-F., Tsai, J.: Fault Tolerance for Super-Peers of P2P Systems. In: Proceedings of the 13th Pacific Rim International Symposium on Dependable Computing (PRDC), pp. 107–114 (2007) 18. Garcés-Erice, L., Biersack, E.W., Felber, P., Ross, K.W., Urvoy-Keller, G.: Hierarchical peer-to-peer systems. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 1230–1239. Springer, Heidelberg (2003) 19. Yang, B., Garcia-Molina, H.: Designing a Super-Peer Network. In: Proceedings of the 19th International Conference on Data Engineering, Bangalore, India, March 2003, pp. 49–62 (2003) 20. Panisson, A., Moreira da Rosa, D., Melchiors, C., Zambenedetti Granville, L., Janilce Bosquiroli Almeida, M., Margarida Rockenbach Tarouco, L.: Designing the Architecture of P2P-Based Network Management Systems. In: IEEE Symposium on Computers and Communications (ISCC 2006), Pula-Cagliari, Sardinia, Italy (2006). 21. PeerSim: A Peer-to-Peer Simulator, http://peersim.sourceforge.net/
Public Key Signatures and Lightweight Security Solutions in a Wireless Environment Dmitrij Lagutin and Sasu Tarkoma Helsinki Institute for Information Technology HIIT Helsinki University of Technology TKK, Espoo, Finland
[email protected],
[email protected] Abstract. Several security solutions have been proposed for wireless networks, most of these are based on lightweight security measures and contain significant drawbacks, such as transit path dependency and high bandwidth overhead. In this work we compare a Packet Level Authentication (PLA), which is a public key based solution, against hash tree and hash chain based solutions in terms of security and energy overhead in wireless environment. We argue that due to advances in the semiconductor technology, public key based security solutions are viable for wireless networks. In many cases they actually achieve a better energy efficiency due to their lower bandwidth overhead. Keywords: Network security, public key cryptography, hash chains, wireless networks.
1 Introduction Currently the Internet is plagued by various security problems, such as denial-ofservice attacks and unsolicited e-mail. Wireless networks that are becoming increasingly popular further increase this problem, since wireless devices usually have limited battery power and bandwidth, allowing a successful attack to drain the victim from the bandwidth or energy. In order to improve the Internet's security, a strong network layer solution that can detect and stop malicious traffic as soon as possible is preferred. Most of the network layer security proposals are lightweight solutions based on a hash tree or hash chain methods. While such proposals decrease the computational requirements they increase the complexity of the system and include severe limitations. In this paper, we analytically investigate and compare the Packet Level Authentication (PLA), which is based on signing every packet in the network by public key signature technique, against lightweight hash chain and tree based solutions. We compare security properties of different solutions and analyze the overall bandwidth overhead in the wireless environment taking both the computational requirements and bandwidth overhead into account. This paper is structured as follows. Chapter 2 describes various security solutions, and their properties are evaluated in Chapter 3. Chapter 4 contains an analysis of the energy consumption in a wireless environment of various solutions. These results are evaluated in Chapter 5 and Chapter 6 contains conclusions and discusses future work. S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 253–265, 2009. © Springer-Verlag Berlin Heidelberg 2009
254
D. Lagutin and S. Tarkoma
2 Background This section covers existing security solutions that aim to provide integrity protection and data origin authentication on the network layer. 2.1 Packet Level Authentication (PLA) Packet Level Authentication (PLA) [1] is a novel way to provide availability and protect the network infrastructure from several kinds of attacks, like denial-of-service (DoS) attacks, on the network layer. PLA is based on the assumption that per packet public key cryptographic operations are possible at wire speed in high speed networks due to new cryptographic algorithms and advances in semiconductor technology. PLA aims to detect and stop malicious traffic as quickly as possible freeing resources to the benevolent traffic. A good analogy to the principle of PLA is a paper currency. Anyone can independently verify whether the bill is authentic simply by verifying built-in security measures such as a watermark and hologram. There is no need to contact the bank that has issued the bill. Similarly, PLA gives every node a ability to check whether the packet has been modified, duplicated or delayed without a previously established trust relation with the sender of the packet. Such packets can discarded immediately by any node in the network. The PLA relies on IP header extension techniques to add an own header to the packet. The PLA header contains following fields. The trusted third party certificate corroborates the binding between the sender's identity and its public key. It also guarantees that the sender is a valid entity within the network. To reduce computational and bandwidth overhead, PLA utilizes identity-based implicitly-certified keys [2], where the sender's public key is calculated from the TTP certificate. The PLA header also includes timestamp and sequence number fields that make possible to detect delayed and duplicated packets, which can be a sign of a replay attack. Finally, there is a cryptographic signature over the whole packet ignoring some IP header fields like hop limit, since these change during the lifetime of the packet. The signature protects the integrity of packet and together with sender's public key it offers non-reputability; the sender cannot deny sending the packet. A lightweight version of PLA does not contain TTP certificate in every packet to reduce the bandwidth and computational overhead. In this case a TTP certificate is transmitted periodically and routers should cache it to fully authenticate the sender. Cryptographic Solutions and Performance Since PLA includes signatures and public keys in every packet, it is not feasible to use traditional cryptographic solutions like RSA. To reduce the bandwidth overhead, PLA uses elliptic curve cryptography (ECC) [3]. A 163-bit EEC key used with PLA offers the same cryptographic strength as a 1024-bit RSA key [4]. An FPGA-based hardware accelerator has been developed for PLA [5]. According to simulations, an ASIC based on Altera's harcopy [6] technology built on 90 nm manufacturing process would achieve 850,000 signature verifications per second with a power consumption of just 22.4 W.
Public Key Signatures and Lightweight Security Solutions
255
2.2 Hash Trees The hash tree is a tree where leaves are hashes of data blocks, like packets. Nodes higher in a tree are hashes of their respective children. A Merkle tree [7] is a complete binary hash tree. Hash trees can be used to protect the integrity of the network traffic [8], [9]. The idea is to create a Merkle tree which consists of packet's hashes, calculate the root hash of the tree, and sign it with a cryptographic signature. Then packets are sent along with the tree's root signature and a necessary amount of tree hashes to reconstruct the root hash. The verifier of the packet can reconstruct the root hash and verify the root signature. As an advantage, using hash trees reduces the amount of required cryptographic verifications to verify the integrity of the packet. This saves computational resources, since hash calculations are significantly faster to perform than cryptographic verifications. In order to give the ability to independently verify the integrity of a single received packet, additional hashes along with a root signature must be included into every sent packet. A hash tree with width w requires log2(w) additional hashes to be included in sent packets. 2.3 Hash Chain Based Solutions A hash chain is formed by hashing a random seed value s multiple times with an oneway hash function H. The result of the hash operation is used as an input for the next element of hash chain, i.e., h1 = H(s), h2 = H(h1) = H(H(s)), and so on. The final element of the hash chain is the anchor. The hash chain is used in the reverse order to create signatures, if the length of the chain is n, then the anchor, hn, is used first, followed by hn-1 and hn-2. By tying identities to hash chains, hash chains can be used to authenticate nodes and traffic in the network [10]. Security properties of hash chains come from the fact that the hash function H is an one-way function, therefore it is not possible to determine x if only H(x) is know. Therefore, only the owner of the hash chain can know previous values of the chain, while any other party can easily verify that hn = H(hn-1) after the value hn-1 has been disclosed. There exist three different secure ways to distribute values of the hash chain to other parties: one-time signatures, time-based approaches and interaction-based approaches. An example of a time-based hash-chain approach is Timed Efficient Stream LossTolerant Authentication (TESLA) [11], which is a protocol designed to authenticate broadcast and multicast traffic. TESLA assumes that sender and receiver have loosely synchronized clocks. Adaptive and Lightweight Protocol for Hop-by-hop Authentication (ALPHA) [12] is a interaction-based hash chain scheme. ALPHA uses a three-way signature process where both the sender and receiver use two hash chains, it also allows any node on the path to verify and authenticate the protected traffic. First, the sender sends S1 packet containing its hash chain anchor and a message authentication code (MAC) calculated over the payload with a previous hash chain value. The receiver replies with A1 packet indicating that he is willing to receive the packet. Finally, the sender sends the
256
D. Lagutin and S. Tarkoma
actual data together with a previous hash chain value. Intermediate nodes can verify payload's integrity using the MAC from the cached S1 packet. Therefore ALPHA requires two control packets to be transmitted for an each data packet, to reduce such bandwidth overhead there exist two variants of ALPHA. ALPHA-C transmits multiple MACs in a single S1 packet, while ALPHA-M constructs a hash tree over w packets and the whole tree is authenticated with a single S1 packet. 2.4 Energy Requirements of Wireless Transmission Determining the actual energy costs of a wireless communication is not simple, since measuring power consumption accurately is difficult and because the energy consumption depends on several factors like the location, amount of obstacles, used network topology, transmission distance, and weather conditions. According to Zhong's presentation [13], energy consumption of a GSM cell phone is roughly 13.8 μJ/bit for upload and 5.6 μJ/bit for download. For wireless IEEE 802.11 LAN, the energy consumption is 1.1 μJ/bit and 0.75 μJ/bit for upload and download respectively. In a simulated study of IEEE 802.11 power saving mechanisms [14], 50 nodes were placed in a 1000 m x 1000 m area. In a best case, the network was able to deliver 150 kbits of traffic per Joule of energy, with an average path length of 3 hops. Therefore, the energy consumption per hop was about 2.22 μJ/bit. Energy-efficient transmit power control for IEEE 802.11a/h wireless LANs was also studied in [15]. According to simulations, the proposed scheme achieved a transmission power of 14.3 nJ/bit over 10 meter and 111 nJ/bit over 25 meter transmission distance using a star network topology. These results assume that the networking hardware utilizes a special power conservation scheme and do not take into account interference from buildings or weather conditions. Power efficiency of 14.3 nJ/bit is significantly better than any real-life measurements found in literature, thus is not a very realistic figure. Nevertheless, this result can be considered to be the lower bound of wireless LAN's power consumption under the optimal conditions. The trade-off between energy costs of computation and wireless transmission has been studied in [16]. The study argues that extra computing power should be used to decrease bandwidth overhead wherever possible to reduce the overall energy consumption.
3 Comparison of Security Properties between PLA, Hash Tree and Hash Chain Based Approaches In this chapter we evaluate security properties of PLA, the lightweight PLA that does not contain a TTP certificate in each packet, basic hash tree approach described in Section 2.2, ALPHA-C, and ALPHA-M. TESLA was omitted from the comparison since it depends on the synchronized clocks and is a less flexible solution, while the basic ALPHA scheme would produce a too high bandwidth overhead. The results of evaluation based on various criteria are summarized in Table 1. The width of the hash tree and the amount of MACs sent simultaneously in ALPHA-C is denoted by w. By authentication we do not mean simply tying the packet to a specific cryptographic identity, but determining whether the cryptographic identity is trusted one and has
Public Key Signatures and Lightweight Security Solutions
257
permissions to use the network. For measuring the bandwidth overhead we assume that a 163-bit ECC public key and 160-bit hash function are used for an adequate security [4]. With a compression bit, the public key uses 164 bits of the header space. The first criteria (C1) is the independent transmission of packets. This is especially important for a real-time communication, like a video or voice conferencing, where a low latency is preferred. PLA allows an each packet to be sent immediately, while other approaches utilize hash trees or cumulative transmissions of MACs, which require that the sender must cache w packets before sending the first one. The C2 criteria evaluates the support for the fully independent integrity verification in intermediate nodes. PLA and a basic hash tree approach allow every node to independently verify each packet, since all necessary information is already present in every sent packet. However, a basic hash tree approach must cache intermediate hashes and already verified root signatures for an optimal performance. ALPHA-M and -C do not support this criteria and require states for integrity verification. With the C3 criteria we evaluate whether the solution supports fully independent authentication. The results are the same as above, expect that lightweight PLA does not satisfy this criteria since it does not include a TTP certificate in every packet. Table 1. Evaluation of security mechanisms Criteria C1. Independent transmission of packets. C2. Independent integrity verification C3. Independent authentication C4. Transit path independence C5. Bandwidth overhead per data packet
Lightweight PLA
Basic hash trees
ALPHA-M ALPHA-C
Yes
Yes
No
No
No
Yes
Yes
No
No
No
Yes
No
No
Yes
Yes
No
No
Lightweight PLA header (586 bits)
Full PLA header (1022 bits)
Yes, but needs caching for best performance Yes, but needs caching for best performance Yes, at a lower performance Public key (164 bits) + root signature (326 bits) + log2(w) hashes
1 + log2(w)
1 hash (160 bits)
None
None
4/w hashes
C6. Other bandwidth None overhead C7. Per packet Signature computational verification requirements C8. Load on verifying node
PLA
Constant
Key extraction At most 1 + log2(w) and signature hash calculations, 1/w verification signature verifications Constant Variable
hashes
At most 1 + log2(w) hash
(w + 3)/w hashes 1 hash calculation
calculations Constant
Constant
The C4 criteria evaluates whether packets can travel along multiple transit paths between the sender and receiver. PLA allows every node to independently verify each packet, regardless which path the packet takes. A basic hash tree based approach also supports independent verification, but the performance will degrade if packets belonging to the same hash tree will take multiple paths. ALPHA schemes require intermediate nodes to cache S1 packets and therefore all further data packets must travel along the same path in the network.
258
D. Lagutin and S. Tarkoma
The C5 criteria is about bandwidth overhead per data packet. A simple PLA header contains the sender's public key, signature, timestamp, and a sequence number. In addition to above, full PLA header contains trusted third party (TTP) certificate. In order to support independent verifications using a basic hash tree approach, log2(w) hashes must be included in packets in addition to the public key and the root signature. ALPHA-M requires the undisclosed hash chain element to be present in data packets in addition to log2(w) hashes. For ALPHA-C including the undisclosed hash element is enough. In all cases some extra fields are necessary for IP extension headers, these are not included in comparison. The next criteria (C6) contains other bandwidth requirements. Both ALPHA schemes require two control packets (S1 and A1) to be transmitted per w packets. In ALPHA-M w denotes the width of the hash tree while with ALPHA-C it is the amount of MACs transmitted in a single S1 packet. ALPHA-C transmits a MAC for every packet, while ALPHA-M uses a single MAC for the whole tree. The criteria C7 is per packet computational requirements, excluding minor checks like verification of a timestamp or sequence number. Lightweight PLA requires a signature verification to be performed. Full PLA utilizes implicit certificates, where the sender's public key is extracted and the packet's signature is verified in a single operation, which also verifies the TTP certificate. Basic hash tree and ALPHA-M approaches require 1 + log2(w) hash calculations if the received packet is the first one from the tree. Intermediate hashes can be cached and therefore verification of subsequent packets will require less hash calculations. Additionally, basic hash tree approach requires a single signature verification per hash tree. ALPHA-C does not utilize hash trees, therefore only a single hash calculation needs to be performed. The last criteria (C8) evaluates whether the load on verifying nodes is constant or variable. For PLA and ALPHA approaches, the load is constant; intermediate nodes must perform a signature verification or hash calculations per each packet respectively. In the basic hash tree approach the signature must be verified for each packet from a previously unknown hash tree. Therefore verifier's load may differ significantly depending on the order in which packets arrive. This drawback introduces a significant denial of service vulnerability. The attacker can simply flood packets belonging to different hash trees, since the verifying node will not have has packet's hashes in its cache, it must must calculate the whole hash tree and perform the root signature verification for each malicious packet. As a result, the verifying node will be overloaded, since it does not have a capacity to verify the signature of each packet1. 3.1 Summary of Security Properties The main difference between PLA and hash chain based approaches is that PLA supports fully independent verifications of packets by any node in the network: a node which has received the packet can verify its authenticity independently without having any kind of contact with sender of the packet. In addition, PLA does not require nodes to store per-packet or per-sender states for a basic authentication. While ALPHA supports hop-by-hop authentication, it requires that some data is cached by 1
This is the fundamental assumption of hash tree based approaches.
Public Key Signatures and Lightweight Security Solutions
259
intermediate nodes and all packets take the same path in the network. A basic hash tree approach supports independent verifications of packets, but its performance will degrade unless all packets take the same path in the network and verifying nodes cache intermediate hashes. Basically all solutions are trade-offs between usage of bandwidth, computational resources, features and implementation complexity. ALPHA is a less robust solution in a case of route changes or failures. If the route changes after the first S1 packet, rest of the packets from the same hash tree or cumulative transmission cannot be verified unless the S1 packet is retransmitted. Since intermediate route changes are usually not visible to recipient or the sender of the traffic, further data packets in the session will probably be dropped by intermediate nodes located on the new path. Using large values of w with ALPHA or a hash tree approach increases the latency of communication and therefore is not suitable for real-time traffic. For example, voice over IP (VoIP) applications send usually from 34 to 50 packets per second [17]. Therefore, using 16 packets wide hash tree, or sending 16 MACs simultaneously with ALPHA-C, would introduce up to 470 ms of extra latency. In order to support the strong source authentication with ALPHA, used hash chain anchors must be authenticated by some other means, e.g., a public key signatures. ALPHA requires different hash chains to be used for every sender/receiver pair, therefore such authentication can introduce a significant overhead if the receiver is communicating with a large number of senders.
4 Energy Consumption in a Wireless Environment In this chapter we investigate energy requirements of various security solutions for receiving and verifying data wirelessly. In ad-hoc network where nodes would also forward data wirelessly, the energy cost of wireless communication would roughly double. For wireless LAN power consumption, we use the absolute best case values from Section 2.4; 14.3 nJ/bit for 10 meter transmission distance and 111 nJ/bit for 25 meter distance [15]. We also include higher wireless LAN power consumption values which are much more realistic [13], marked as “Real-life wireless LAN” in figures. PLA is assumed to use a cryptographic accelerator based on a 90 nm Hardcopy ASIC technology, which consumes 26 μJ of energy per signature verification [5]. Since we do not have a real ASIC available for cryptographic calculations, following results are calculated analytically. We calculate the bandwidth overhead of various security approaches and compare the energy cost of transmitting such extra traffic versus the cost of performing cryptographic operations. The energy cost of hash calculations was ignored since it is not significant compared to the cost of wireless transmission or signature verifications. The per packet energy advantage of various approaches in comparison to PLA can be summarized using the following formula. Etotal = B*Ewireless + Ecrypto .
(1)
Where B is an average per packet bandwidth advantage in bits against PLA (negative values denote bandwidth disadvantage), Ewireless is the energy cost of wireless reception per bit, and Ecrypto is the energy cost of the signature verification.
260
D. Lagutin and S. Tarkoma
4.1 PLA Versus Hash Tree Approach First we compare PLA with a basic hash tree approach for integrity protection, since both techniques support fully independent packet verification by any node. In this respect, they offer identical security properties. In the hash tree approach a single signature is generated and verified for the whole hash tree while with PLA every packet's signature is verified separately. In order for both schemes to offer the same level of protection and thus to make comparison fair, the PLA header information, like a sequence number, timestamp, and TTP certificate, must be included in every packet in the hash tree approach. In this respect, the bandwidth overhead is the same for both of these two approaches. The only difference is the size of extra hashes that must be included in hash tree approach to make independent verifications of packets possible. The results are shown in Fig. 1, with hash tree width w = 1 hash trees are not used and the signature verification is performed for each packet. Y-axis values denote per packet energy advantage, positive values mean that a hash tree approach has an energy advantage over PLA, while negative values describe the reverse situation.
Fig. 1. Per packet energy advantage of hash trees versus PLA in a wireless LAN
Following observations can be made from Fig. 1, if the cost of wireless transmission is very low then hash trees save energy in some cases. However, as the width of the hash tree grows to thousands of packets, the total energy consumption actually increases in comparison to PLA, since large hash trees have a significant bandwidth disadvantage due to inclusion of extra hashes. As the cost of wireless transmission increases, even using narrow hash trees increases the total energy consumption. If real-life estimates for the power consumption of wireless LAN are used, the total
Public Key Signatures and Lightweight Security Solutions
261
energy consumption becomes completely dominated by cost of wireless communication. As a result, the hash tree approach becomes very energy inefficient due to their large bandwidth overhead. These calculations do not include energy required for performing hash calculations, and they also do not take into account packet fragmentation, which is introduced by the inclusion of extra hashes to the packet. Fragmentation would increase the amount of sent packets, and therefore the amount of signature verifications, since additional packets must be protected by hash tree signatures. Both of these would further increase the energy consumption of the hash tree approach. 4.2 PLA versus ALPHA-C and -M ALPHA variants use additional control packets that have their own headers. For bandwidth overhead calculations, we assume that packets contain standard IPv6 and UDP headers, therefore the length of these control packets, excluding IP header extension fields, is 704 bits. In addition, ALPHA-M includes hashes to data packets. As a result, if width of the hash tree w is very small, the bandwidth overhead per packet becomes large due to additional control packets per each tree. While with big values of w the bandwidth overhead is also large due to extra hashes included to data packets. In a case of ALPHA-C, w denotes the number of MACs sent simultaneously in the S1 packet. Due to IP packet size limitations roughly 70 MACs can be included in a single S1 packet, therefore w is limited to 64 in the figure for ALPHA-C. In order to make an apples-to-apples comparison, we compare the bandwidth overhead of ALPHA-M to the lightweight PLA ignoring PLA's timestamp and sequence number fields. I.e, for PLA we take into account the bandwidth overhead of a public key and signature. This is fair, since ALPHA does not contain these security measures but they can be added to ALPHA at the expense of additional bandwidth overhead. Similarly, ALPHA lacks a TTP certificates and trust mechanism, therefore a comparison to full PLA would not give an accurate picture. Using a lightweight PLA decreases computational requirements of signature verification by about 17% [18] compared to full PLA, where also the validity of TTP certificate is verified. Results are shown in Fig. 2 in the same format as before, positive values denote the situation where ALPHA has an energy advantage over lightweight PLA. ALPHA-C results are marked as a dashed line while solid line denotes ALPHA-M. Situation where w = 1 denotes the case of basic ALPHA where hash trees or cumulative transmission are not utilized at all. It can be seen from the figure that ALPHA-M achieves smallest bandwidth overhead per packet when w = 8. When the width of the hash tree is smaller, the bandwidth overhead is higher due to control packets, while large trees introduces higher overhead because of extra hashes in data packets. If the cost of wireless communication is very low, then ALPHA-M has a small energy advantage when the hash tree is not very wide. When cost of wireless communication is closer to real-life figures, then it once again dominates the overall energy costs and PLA becomes much more energy efficient solution. This results do not take into account energy costs of hash calculation and packet fragmentation, which would slightly increase the energy consumption of ALPHA-M.
262
D. Lagutin and S. Tarkoma
Fig. 2. Energy advantage of ALPHA-C and -M versus lightweight PLA in a wireless LAN
Because the bandwidth overhead of ALPHA-C in data packets is constant, increasing w decreases the overall overhead. It can be seen from the figure that lightweight PLA's bandwidth overhead becomes bigger than ALPHA-C's when w is 8 or higher. ALPHA-C achieves a best energy efficiency when the maximum amount of MACs is included in a single S1 packet. It is important to note disadvantages of ALPHA-C. It is a path dependent solution that do not support independent verifications, and it also requires intermediate nodes to cache one MAC per each data packet. ALPHA-C offers a significant bandwidth and energy advantage only when w is 12 or higher, such large values can cause problems with a latency sensitive communication. For example, to keep the latency of VoIP traffic below 100 ms, not more than four or five packets can be buffered, and in that case ALPHA-C has an energy disadvantage against PLA.
5 Evaluation In the past it has been assumed that public key based security solutions are inherently inefficient in terms of energy consumption and therefore are unsuitable for mobile and wireless devices. Our analysis shows that with an efficient hardware accelerator for cryptographic operations, the cost of wireless transmission will dominate the verification related power consumption. Therefore PLA, which is based on public key signatures, is actually more energy efficient since it uses less bandwidth compared to hash tree or hash-chain based solutions like ALPHA. These results may seem unbelievable but they are valid. In real-life cases wireless reception uses almost 1 μJ/bit of energy, while the cost of performing a signature verification is only 26 μJ. Therefore, even if the PLA can reduce a bandwidth overhead
Public Key Signatures and Lightweight Security Solutions
263
only by 100 bits per packet, it already becomes much more energy efficient solution. While some studies [14] have achieved orders of magnitude better energy efficiency, they have achieved those in a special simulated environment without taking account natural obstacles and building, therefore they can not be considered as a realistic estimate of energy consumption of the wireless communication. It is important to note that we have compared the cost of wireless reception and signature verification. Due to nature of elliptic curve cryptography, signature generation requires significantly less resources than signature generation, while in wireless networks upload energy consumption is usually higher than download consumption. Therefore, for a sending node the energy consumption of cryptographic operations would be even less. The same applies for the wireless ad-hoc network where nodes often forward the traffic and perform two wireless operations, reception and transmission, per a single signature verification. Therefore, this comparison is actually a worst case scenario for a public key based solution. We have not taken into account an idle power consumption of the chip performing signature verifications. If the node receives only a handful of packets per second, then a power consumption per signature verification will be higher. On the other hand, in this case the power consumption would also increase for a wireless reception, since the wireless interface also has a fixed idle power consumption. In any case, the cost of wireless communication would still dominate the overall power consumption in most cases, even if we double or triple the energy consumption of a signature verification. In the future, the power consumption of semiconductors will likely continue to decrease at a rapid pace [19]. The same does not apply to the energy cost of wireless transmission since it also depends on physical properties of the transmission medium that can not be changed. Therefore, the energy consumption of cryptographic computations will likely to decrease faster in the future compared to energy consumption of the wireless communication. This would make PLA even more attractive in the future, since it decreases the bandwidth overhead at the expense of the computing power. As a disadvantage, an efficient PLA implementation requires a dedicated hardware for cryptographic operations increasing deployment costs. If buffering a large amount of packets is feasible then ALPHA-C becomes a very energy efficient solution due to its lower bandwidth overhead. However, PLA is overall much more flexible solution, since it does not depend on the transit path and does not require caching of packets at the sender's end for the optimal performance. Therefore, PLA can also be used with a latency sensitive communication and dynamic wireless ad-hoc networks.
6 Conclusions and Future Work We have compared security features and energy requirements of Packet Level Authentication, which is a based on per packet cryptographic signatures, with hash tree and hash chain approaches that utilize hash calculations for integrity protection. Strong network layer solutions can revolutionize network security, since they allow malicious traffic to be detected and stopped near its origin. Our results show that public key based security solutions with an efficient hardware implementation achieve in many cases a better energy efficiency in wireless environments due to their lower bandwidth overhead. Public key based solutions such
264
D. Lagutin and S. Tarkoma
as PLA are also more flexible since they are not path dependent and do not require caching at the sending end or at intermediate nodes. PLA is also more future proof solution since the power efficiency of semiconductors will likely to improve faster than the power efficiency of wireless communication. As a downside, PLA requires a dedicated hardware for performing cryptographic operations in a efficient way. Our future work will concentrate on researching real-life deployment issues of PLA. We also plan to investigate what kind of trust management solution would be an efficient and secure for wireless networks.
References 1. Lagutin, D.: Redesigning Internet - The packet level authentication architecture. Licentiate’s thesis, Helsinki University of Technology, Finland (2008) 2. Brumley, B., Nyberg, K.: Differential properties of elliptic curves and blind signatures. In: Garay, J.A., Lenstra, A.K., Mambo, M., Peralta, R. (eds.) ISC 2007. LNCS, vol. 4779, pp. 376–389. Springer, Heidelberg (2007) 3. Miller, V.: Use of elliptic curves in cryptography. In: Williams, H.C. (ed.) CRYPTO 1985. LNCS, vol. 218, pp. 417–426. Springer, Heidelberg (1986) 4. Barker, E., et al.: Recommendation for Key Management Part 1: General. National Institute of Standards and Technology, NIST Special Publication 800-57 (2007) 5. Forsten, J., Järvinen, K., Skyttä, J.: Packet Level Authentication: Hardware Subtask Final Report. Technical report (2008), http://www.tcs.hut.fi/Software/PLA/ new/doc/PLA_HW_final_report.pdf (accessed February 4, 2009) 6. Altera: HardCopy Structured ASICs: technology for business (2009), http://www.altera.com/products/devices/hardcopy/hrdindex.html (accessed March 2, 2009) 7. Merkle, R.: Secrecy, authentication, and public key systems. Doctoral dissertation, Department of Electrical Engineering, Stanford University (1979) 8. Wong, C., Lam, S.: Digital signatures for flows and multicasts. In: Proceedings on the 6th International Conference on Network Protocols (ICNP 1998), pp. 198–209 (1998) 9. Lundberg, J.: A Wireless Multicast Delivery Architecture for Mobile Terminals. Doctoral dissertation, Department of Computer Science, Helsinki University of Technology (2006) 10. Hauser, R., et al.: Reducing the Cost of Security in Link State Routing. In: NDSS 1997 (1997) 11. Perrig, A., Canetti, R., Tygar, J.D., Song, D.: The TESLA Broadcast Authentication Protocol. CryptoBytes 5(2), 2–13 (2002) 12. Heer, T., Götz, S., Morchon, O.G., Wehrle, K.: ALPHA: An Adaptive and Lightweight Protocol for Hop-by-hop Authentication. In: Proceedings of ACM CoNEXT 2008 (2008) 13. Zhong, L.: Make “sense” for Computing. Public presentation (2008), http://www.ece.rice.edu/corp/annualmtg/mtgarchive/ 2008archive/lzhong_affil08.pdf (accessed February 5, 2009) 14. Jung, E., Vaidya, N.H.: Improving IEEE 802.11 power saving mechanism. Wireless Networks 14(3), 375–391 (2007) 15. Qiao, D., et al.: Interference analysis and transmit power control in IEEE 802.11a/h wireless LANs. IEEE/ACM Transactions on Networking 15(5) (2007) 16. Barr, K.C., Asanovic, K.: Energy-aware lossless data compression. ACM Transactions on Computer Systems 24(3), 250–291 (2006)
Public Key Signatures and Lightweight Security Solutions
265
17. Cisco: Voice Over IP - Per Call Bandwidth Consumption (2009), http://www.ciscosystems.com/en/US/tech/tk652/tk698/ technologies_tech_note09186a0080094ae2.shtml (accessed February 10, 2009) 18. Järvinen, K., Skyttä, J.: High-Speed Elliptic Curve Cryptography Accelerator for Koblitz Curves. In: Proceedings of the 16th IEEE Symposium on Field-programmable Custom Computing Machines, FCCM 2008, Palo Alto, California, USA (2008) 19. International Technology Roadmap for Semiconductors (ITRS). ITRS 2008 (2008), http://www.itrs.net/ (accessed January 15, 2009)
On the Operational Security Assurance Evaluation of Networked IT Systems Artur Hecker and Michel Riguidel Institut T´el´ecom, T´el´ecom ParisTech, LTCI CNRS {hecker,riguidel}@telecom-paristech.fr
Abstract. In this paper, we introduce and discuss the system security assurance assessment problematic. We first define and position security assurance in the context of modern networked IT systems. We then motivate and discuss its use. Next, we define the problem of the operational security assurance evaluation. We present and compare two orthogonal approaches to such an evaluation: a spec-based approach, which is an extension of the Common Criteria to systems in operation, and a direct approach, which relies on network management. Finally, we show examples and the pros and the cons of both approaches.
1
Introduction
In the recent years, security of networked IT systems became a major concern due to the high number of successful attacks, the ongoing convergence between infrastructures and the elevated vigilance level in the society. Protocol security alone does not suffice anymore. Intrusions happen at the user level, often with explicit user participation. There is no determined point of entrance: data get in over any access method (wired, wireless or virtual) and from any mobile device. Consequently, multi-tier control, application-level gateways, antivirus software, host firewalls and intrusion detection systems (IDS) became indispensable. Application-level security is more diversified; it requires reactive management to fulfill its goals (e.g. recent signature data bases, alarm treatment). The security policy dictates the reactivity of this management, since the latter is in direct relation to the achieved security level. This in turn creates new critical dependencies and translates to vulnerabilities. The impact of security concerns on network management (NM) is dramatic: security management installs an inscrutable maze of interdependent policies spanning over both functional and non-functional parts and requiring both solid transversal experience (from operating systems’ quirks to router configuration idiosyncrasy, from common application threats to specific wireless LAN characteristics) and perfect knowledge of the newest security tools. This management lacks any sustainable systematic and is thus prone to errors. High resilience can only be achieved in the intersection of security and reliability. An attack against a system working at capability limits is easier than an attack against an idle system (e.g. DoS); QoS is difficult to sustain without S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 266–278, 2009. c Springer-Verlag Berlin Heidelberg 2009
On the Operational Security Assurance Evaluation
267
measures against abusive usage, identity usurpation, etc. With currently available tools, NM staff tinkers with configurations at the appliance level to resolve problems that arise at the system level. The results of such interaction with the system are highly doubtful and often lead to problems in the first place. Problems with a newly applied configuration are either recognized a posteriori (e.g. after complaints, a breakdown, a successful attack), or the network resilience is too dependent on the intuition of highly experienced and irreplaceable staff. The latter has problems justifying investments in the non-functional system aspects. At the end of the accounting period at latest, it becomes crucial to present proofs that the recently installed security feature actually prevented something; or, ideally, that it blocked attacks whose cost would have been higher than the cost of the security component per se. Evaluations of security mechanisms are therefore crucial for internal auditing, performance evaluation, optimizations, but, more importantly, for the security policy enforcement: we need to choose security features with asserted capacities (prevention), and we need quantifiable and verifiable results of its actual operation (detection, reaction). Measures are typically only available at the appliance level (logs, etc.). From a system perspective, this is insufficient: the appliances interact to assure the actual function, and we need system-level data to judge the security functions? operations. In summary, we need to provide evidence that the new system is capable of assuring certain security properties (qualitative, system design phase), and evidence whether and to which degree it actually does it (quantitative, operational phase). This is the task of the security assurance assessment. In this paper, we present the operational security assurance evaluation problematic in IT systems. Our intention is to sensitize readers to the importance of such an evaluation and to discuss the pros and cons of it. We first position the system security assurance in respect to other critical aspects such as dependability, security and trust. We discuss the existing security methodology and explain why evaluating security assurance of IT systems is not easy. We share our experiences from research projects dealing with these issues and elaborate on two almost orthogonal assurance evaluation approaches, presenting for both approaches the methodology, the possibilities and the limitations. We conclude with an outlook to future work.
2 2.1
System Security Assurance Assessment Security Assurance Definition
Security assurance (SA) of an entity is the objective confidence that this entity meets its security objectives [1]. The accent here is on the objective confidence, which translates to verifiable, measurable evidence that we assume one can capture. Hence, we need to gather data, interpret it and present proofs that the security objectives are fulfilled. However, security functions are implemented as parts of the entity. This is especially important in the system context: system’s security functions rely
268
A. Hecker and M. Riguidel
themselves on the correct realization of the system per se and on its current state. They also consume system’s resources and contribute to state changes. For this reason, without further assumptions system’s security assurance is not the mere question of compliance (i.e. distance between its as-should and its as-is states). Generally, it is necessary but insufficient to assess compliance: for instance, the SA of a distributed security function depends on the reliability of the used communication facilities. The SA depends on the actually applied security functions and mechanisms (their theoretical strength, effectiveness, capacities, etc.), their management (configuration, activation, updates, processes - not just technical, also the administrative part), their operational availability and thus, in particular, on the reliability of their realization as a system part. Consequently, SA of a system is situated in the intersection of its dependability and security. Security assurance is different from security. This is important: since security is highly subjective and depends on a preliminary risk analysis, in general it is impossible to compare security policies, even for similar realizations. This is one of the reasons why the security metrics research has difficulties producing reasonable results: it is unclear what to measure and where. Operational security assurance also uses metrics but is profoundly different: using the security policy as an existing and indisputable requirement set, it assesses the confidence in the correct operation of the security functions. Security assurance is also different from trust: high security assurance level only implies correctly working security. It does not allow statements on general system operation. Besides, trust is more subtle and global: not only can a principal mistrust a secure system, but the trust influences risk assessment, environment perception, the established protection profile, the security policy, its realization and fault expectations. Conversely, trust into an entity is reflected in the evaluation phase, i.e. in the interpretation of the gathered evidence: a higher trust into an entity will yield more positive conclusions from a given security assurance value. This is comparable to an investigation: a mistrusting detective is less likely to conclude innocence even from a watertight alibi. 2.2
Security Assurance Assessment Problem
For IT products, the security assurance can be evaluated by the means of the well-known Common Criteria (CC) [1]. The CC evaluation is based upon the principle of conformity of the deployed security measures to a previously established protection profile (PP) [1]. This implies that the evaluated asset is a relatively stable, closed region, separated from the surrounding environment. It further presumes that one can define a set of necessary protection measures and thus implies an a priori risk and threat assessment on the asset within the environment. Once the PP is established, one works through it, checking if the respective measure is thoroughly conceived, implemented and tested. In practice, it is difficult to decompose an IT system in a list of discrete independent entities to check. Many entities are virtual and represent an overlap of some real entities.
On the Operational Security Assurance Evaluation
269
The CC have evolved and now support, in a limited manner, the assurance evaluation of e.g. operating systems (CCv3). However, this evolution has been remarkably slow, followed by countless limitations. The main reason for it is in the very concept behind the CC [3]. A major problem arises when evaluating systems with blur boundaries, such as modern software and telecommunication systems, exhibiting high degrees of modularity, openness and pervasiveness (see [2] and [4] for field experience). It is increasingly difficult to tell the asset from the environment: the boundaries are fuzzy and often dynamically changed in operation. Conversely, the knowledge of the security assurance levels of isolated products does not directly translate to the security assurance of a system composed by the latter. This is a problem known as composition. It is addressed in the CCv3, but in a very limited scope [5]. Finally, the CC define a methodology for the evaluation of the product design and development phases, but not for the usage phase. Yet, in a modern distributed system, a part of the complexity is due to the number of different products in different operational states (configuration, failures, load, etc). As for today, there is no known general method to estimate the security assurance of an operational system [4]. Its complexity can be discussed along the following lines: – Openness: modern IT systems are highly open. Especially the telecommunications systems are characterized by their numerous interfaces, allowing various and rich exchanges. – Aggregation: IT systems are complex systems. It is difficult to derive system state from the elementary measurements. Certain decisive system parts are virtual (e.g. soft elements like flows, applications, services) and can be changed in runtime (updates, component reconfiguration), yielding a changed system behavior. – Dynamics: Modern IT systems exhibit high dynamics at the system level. Structure and topology are not preset, but rather established in runtime, based on various optimization criteria or service needs. It is hard to tell in advance, which component or data will be involved in the treatment, or which execution path will be taken. Because of this complexity, until now there is no accepted operational security assurance assessment methodology for networked IT systems. Yet, with the increasing complexity of security realizations, the network management needs new (semi-)automated assessment and management tools. 2.3
The Assessment Process
In any methodology, the security assurance assessment process relies at least on the following three blocks: – Modeling technique: used to produce a model of the targeted system specifying which data, on which level, from which component, and at which
270
A. Hecker and M. Riguidel
moment is to be captured. Models specify the relations between the individual objects (association, aggregation, inheritance) and data interpretation (e.g. normalization, stabilization, evaluation [6]), expressing how an object influences the final SA estimate. In network management (NM) terms, this creates the information model. – Monitoring infrastructure: a system capable of capturing modeled data in a fashion satisfying the semantic constraints (e.g. sampling theorem, freshness, etc). In NM terms, this covers management and communication models (architecture, protocols). It is important to minimize self-interference: such monitoring should be designed in a least intrusive manner, i.e. with the smallest impact on functional and, ideally, non-functional properties of the targeted system. In particular, monitoring should be non-critical: turning it on or off should not impact the targeted system. – Administration console: this additional element should provide possibilities for observing the SA estimate’s evolution in time and to set thresholds and alarms on the obtained values. Besides, it should be capable of controlling the information model, e.g. adding/removing objects, changing data treatment, etc. Moreover, it should be capable of controlling the monitoring infrastructure, e.g. turning it on and off, adding newly deployed monitors, probes, sensors, etc.
3
Possible Approaches to SA Assessment
We present and analyze two different approaches to the operational SA assessment in modern IT systems. The first approach is a spec-based, vertical methodology. The second approach proposes a direct, horizontal system evaluation. 3.1
Spec-Based System SA Evaluation
Description Every spec-based approach requires a preliminary analysis producing the specification. The methodology tests the conformity of a deployed system to a given specification. For an operational IT system, the security specification can be produced by applying state-of-the-art analysis. Based on an a priori system evaluation (risk assessment), an expert establishes a convenient system security policy (SSP). The SSP defines system-level protection profiles (e.g. SLPP [4]), security target (ST) and specifies objects, subjects, and their mutual authorizations. ST describes security architecture specifying security functions, mechanisms and configurations of all entities [1]. Besides, an SSP typically specifies residual risks, i.e. risks against which no countermeasure is installed. For SA evaluation, similar to CC, such a methodology uses assurance level (AL) based verification. Each AL specifies assurance requirements on the PP/STs measures (representing the functional security spec), therefore specifying what and how to evaluate. The SSP of the target system can be checked against the pre-established PP according to AL specifications. This leads to the attribution
On the Operational Security Assurance Evaluation
271
of the highest assurance level whose requirements are fulfilled. The assurance requirements are hierarchical: AL(n-1) requirements are contained in ALn, so it suffices to check if all ALn requirements are fulfilled by the SSP. Just like the CC, this is so far an on-paper evaluation. To evaluate a system in operation, we can now extend this methodology to cater for the inherent system dynamics: we need to find how good the SSP complies to ALn in a given time period. Presuming that all AL(n-1) requirements are fulfilled, we check which characteristic ALn requirements (i.e. ones not present in the ALk, k 0. When a SP receives the forwarded service request message, and if the request message is a duplicate, the SP discards it. Otherwise, the SP needs to decide whether to accept the request according to its current conditions such as workload. The SP may either reject the request (by ignoring it) or accept it by replying to the original SR along with detailed service information including Quality of Service (QoS) information and current load. The SR selects the best preferable service by comparing several replies with details of the service information.
4 Trust Management 4.1 Trust Notation and Operators We denote a trust degree by T(N1→N2) = where t, c ∈ ℜ and 0 ≤ t, c ≤ 1. T consists of two components, trust value (t) and confidence level (c) that N1 (source node) has about N2 (target node). When N1 is the node under consideration, its trust degree with N2 is denoted by T(N2). Trust value indicates how much N1 trusts N2 and confidence level (knowledge level) means how much N1 knows N2, i.e. accuracy of the corresponding trust value t. Confidence level increases as more experience and reputation are accumulated to form the trust value. T0 is an initial trust degree set by the disposition of the source node when it has no advance knowledge for the target node. We define several operators that apply to trust degree T as follows. z
The aggregation operator, ⊕ , combines or aggregates two opinions of the source node (N1) about the target node (N2). This is expressed as, T(N1→N2) = T1(N1→N2) ⊕ T2(N1→N2) = < ( t1
c1 +t2 c 2 ), min(1, c1+c2) > c1 + c2 c1 + c 2
Where, T1(N1→N2) = < t1, c1 > and T2(N1→N2) = < t2, c2 > when c1 + c2 ≠ 0. The recommendation operator, ⊗ , is used to determine the trust degree of a target node (N3) through a recommender (N2)’s opinion, given the belief of the source node (N1) in the recommender. Our definition of operator ⊗ is,
z
T(N1→N3) = T1(N1→ N2) ⊗ T2(N2→N3) = < t1t2, c1c2 > where, T1(N1→N2) = < t1, c1 > and T2(N2→N3) = < t2, c2 >.
284 z
M. Kim, M. Kumar, and S. Jung
The range operator, | |, is used to extract the range of absolute trust value, given the confidence level. When |T| = |< t, c >| = [a ~ b], min|T| = a and max|T| = b. We define operator | | as follows, |T(N1→N2)| = |< t, c >| = [max(0, t - t (1 − c) ) ~ min(1, t + t (1 − c) )]. 2
2
If min|T(N1→N2)| > τ (a given threshold), N1 considers N2 as a trustworthy peer. Binary operators, ⊕ and ⊗ are commutative and associative. Therefore, the two operators can be expanded to apply for multiple trust degrees as follows. n
n n ⊕ Ti(N1→N2) =< t ci , min(1, c ) > where Ti(N1→N2) = < ti, ci >. i =1 ∑i n ∑ k i =1
∑c j =1
n
k =1
j
n
n
i =1
j =1
⊗ Ti(Ni-1→Ni) = < ∏ t i , ∏ c j > where Ti(Ni-1→Ni) = < ti, ci >. i =1
4.2 Trust Management Extension to the Basic NeVo Operations To facilitate trust management using NeVo, all entries in volunteer directories and client directories need an additional field for T. The basic operations of NeVo need to extend for trust management as follows. When a neighbor receives a solicit message from Ni, the neighbor replies with its volunteer directory that includes trust degrees (Trec) of known volunteers. Upon reception of a reply from one of neighbors, Ni computes a new T for each volunteer in the reply. If Ni has a record for the volunteer with Told in its volunteer directory, Ni updates with the new trust degree, Tnew = Told ⊕ ( T0 ⊗ Trec). Otherwise, Ni inserts a new record with Tnew = T0 ⊗ Trec. After updating and inserting known volunteers’ information with new trust degrees, Ni takes T into account to choose its k local volunteers besides hop distances. Similarly, announcement messages include trust degree (Trec) for the source volunteer. When a node Ni receives a non-duplicate announcement message, Ni computes a new trust degree Tnew for the volunteer, in a manner similar to that in the solicit process. After updating or inserting the volunteer’s record with Tnew, Ni may retransmit the updated announcement message containing Tnew to its neighbors. Initially, when a volunteer sends an announcement message, it sets Trec = since it has full confidence and trust in itself. When a volunteer Vi registers Ni, the former initializes T(Ni) as T0. To insert records for Ni’s other local volunteers in the volunteer directory, Vi initializes T for them as T0. If Vi has already records for some of Ni’s other local volunteers, Vi increases T for them by one basic unit defined in the system. That is, T of a volunteer increases as it has more clients. Our definition of increasing T by a basic unit is to increase t and c both by 0.1. Decreasing T by a basic unit is to decrease t by 0.1 and to increase c by 0.1. A client replaces one of its local volunteers when the former deems the latter is not trustworthy any more.
Trust Management Using NeVo in Ubiquitous Computing Environments
285
4.3 Trust Evolution through Service Discovery When a volunteer Vi receives a service request message from a SR (Ni), the former checks T(Ni) in its client directory. Only when Vi deems Ni is trustworthy, Vi accepts the request and processes it. When Vi finds the matched service provided by a trustworthy SP (Nj), Vi forwards the request along with T(Vi → Ni) to Nj. Additionally, Vi reduces DTLr by 1 and if DTLr > 0, forwards the request along with T(Vi → Ni) to trustworthy known volunteers. Upon receiving the forwarded request, a volunteer Vj processes the request only if Vi is trustworthy in its view. In this way, service requests are forwarded to only trustworthy entities. When Nj receives a forwarded service request from Vi, Nj decides whether to accept the request according to its opinion of trustworthiness for Ni (e.g. T(Nj → Ni) = T(Nj → Vi) ⊗ T(Vi → Ni)), in addition to the current conditions such as workload. Before selecting one among multiple SPs, Ni has the option to get T values of the SPs from their local volunteers. Fig. 2 illustrates an example of the evolution process when DTLr = 2 and k = 2. Vi1 and Vi2 are two local volunteers of the SR (Ni) and similarly, Vj1 and Vj2 are two local volunteers of the selected SP (Nj). Service discovery has been processed along the path Ni → Vi1 → Vj1 → Nj → Ni. Once Ni has utilized the service from Nj, all related nodes update their T values for other nodes as the following steps. For this, Ni and Nj exchange their local volunteers’ identities first.
Vi1
p p
o n
Vj1q
Ni
Nj SP
SR Vi2
Service Discovery Path Evaluation Path (Ni evaluates Nj)
Vj2 q Fig. 2. The process to update trust
n Ni sets its satisfaction degree of the interaction with Nj with regard to fulfillment of Nj. o According to the perceived satisfaction degree, i) Ni sets the evaluated trust, and sends it to Vi1; and ii) Ni increases or decreases T(Vi1) and T(Vj1) by a basic unit. If Ni has no entry for Vj1 in its volunteer directory, Ni initializes T(Vj1) before updating. p After receiving Tevl(Ni → Nj), Vi1 computes Tevl(Vi1→ Nj) = T(Vi1 → Ni) ⊗ Tevl(Ni → Nj) and sends it to Vj1 and Vj2. Vi1 also increases or decreases T(Vi1 →Vj1) given Tevl(Vi1→ Nj). q After Vj1 receives Tevl(Vi1→ Nj), it finally updates T(Vj1→ Nj) in the client directory taking into account T(Vj1 → Vi1) and Tevl(Vi1→ Nj). Vj2 also performs in the similar way.
286
M. Kim, M. Kumar, and S. Jung
The above evolution process is described by Ni’s experience. According to Nj’s experience, the corresponding evolution processes is needed, started by Nj. When the selected SP is a volunteer, the evaluation for the SP is propagated through the local volunteers of the SR.
5 Evaluation For comparison with distributed trust management approaches, we develop a trust management protocol applicable to the pull scheme. Since the pull scheme is totally distributed based on broadcast, it is appropriate to develop a distributed trust management protocol for the pull scheme. In this section, first, we present the simulation setup and second, we explain the developed distributed trust management protocol for the pull scheme. Finally, we compare performance and efficiency of the two trust management protocols with the simulation results. 5.1
Simulation Setup
We perform simulation studies using ns2 version 2.27 [12] and choose MANETs for simulation scenarios with a total of 50 nodes in a 700 x 700 m2 network area. We run 100 rounds and, in each round, every node requests a randomly selected type of service. Also, the related trust evolution process in terms of the SR’s experience is performed. As more rounds are processed, trust values become more accurate. We distinguish two types of nodes - good and bad. When a bad node sends recommendation or evaluation with the actual trust degree, Ta = < ta, ca >, it sends the fake trust degree Tf = < 1 - ta, 1 >. In addition, when a bad node provides services, it does not satisfy service requestors. We summarize other main simulation parameters in Table 1. We set k = 2. Table 1. Simulation parameters
Categories Trust management related parameters Service discovery related parameters
Parameters Incentive for direct interactions Incentive for recommendations Initial trust degree (T0) Threshold to be a trustworthy peer (τ) Number of service types DTLr
Values 0.2 0.1 < 0.5, 0 > 0.2 10 2
Incentives in the trust management category operate such that a source node increases or decreases trust value by the defined amount depending on its positive or negative opinion and increases confidence level by the same amount for a target node. Here, since we set T0 = < 0.5, 0 >, the initial minimum absolute trust value is 0.25 (0.5 – 0.5 (1 – 0)/2). As we set 0.2 for threshold of minimum absolute trust value, initially a node trusts strangers for potential interactions with them. In our simulation, every node is a SP that provides a service of a certain type. A service type is assigned to each node in a round robin fashion.
Trust Management Using NeVo in Ubiquitous Computing Environments
287
The main outputs we extract from the simulation studies are:
· · · · ·
tm: minimum absolute trust value of the selected SP (min|T|) nt: total number of messages sent during simulation rf: failure rate of service discovery on total requests during simulation (%) dr: average delay of successful service discovery (seconds) ℮: efficiency value to indicate performance on cost defined as ℮ = tm rs / ur where rs (success rate) = 100 - rf and ur (resource usage) = nt / number of total requests
5.2 Trust Management Protocol for the Pull Scheme The pull scheme operates in a totally distributed manner such that whenever a SR needs a service, it floods a service request message with DTLr and any SP that has the matched service replies to the SR. Note that DTLr in the pull scheme is operated by hop count between nodes. We develop a distributed trust management protocol for the pull scheme as follows. 1 When a SR (Ni) needs a service, it floods a service request message with DTLr (set as 2). 2 Upon receiving the service request, a SP (Nj) replies only when it deems Ni is trustworthy. 3 After collecting all service replies within a given duration (set as 1 second), Ni looks up its cache for trust degrees of all replied SPs. 3.1 If Ni has replies from trustworthy SPs, Ni selects a service of the most trustworthy SP (Nk). 3.2 If Ni does not have a reply from a trustworthy SP but has replies from unknown SPs, Ni floods a recommendation request message for the unknown SPs with DTLr (set as 1). 3.3 Upon receiving the recommendation request message, a recommender (Nr) replies only when it considers that Ni is trustworthy. 3.4 After collecting all recommendation replies within a given duration (set as 1 second), Ni selects a service of the most trustworthy SP (Nk) by evaluating all recommendations. 4 After utilizing the service of Nk, Ni updates T(Nk) according to fulfillment of Nk by the defined incentive (0.2). Similarly, Nk also evaluates T(Ni). 5 If there are recommenders of Nk, Ni evaluates each recommender. Service discovery fails when the SR does not get any reply from SPs that happens when all SPs within the coverage consider the SR is untrustworthy. In the trust management for the pull scheme, every node maintains trust degrees of others in the cache. In the simulation, all nodes assign same amount of storage for the cache. The allocated cache space is from 0% to 100% of the total storage amount required to save trust degrees of all nodes in the network. 5.3 Simulation Results Fig. 3 shows the average tm of all rounds against bad node ratio (refer to the legend in Fig. 4). We refer the trust management utilizing NeVo as T_NeVo and the trust management protocol for the pull scheme as T-Pull hereafter.
M. Kim, M. Kumar, and S. Jung
1.0
2.5
0.8
2.0
delay(second)
0.6 0.4 0.2
T_NeVo T_Pull(Cache:20%)
1.5
T_Pull(Cache:40%)
1.0
T_Pull(Cache:60%)
0.5
T_Pull(Cache:100%)
10 0
60
40
20
90 10 0
80
70
60
50
40
30
20
0
10
80
0.0
0.0
0
Trust Value
288
Bad Node Ratio (%)
Bad Node Ratio (%)
Fig. 3. Average of all rounds’ average tm
Fig. 4. Average delay of successful discoveries
With T_NeVo, each node can usually select more trustworthy SPs with the same bad node ratio. However, with 100% bad node ratio, T_Pull using 20% cache allocation shows the highest trust values of selected SPs. With high bad node ratios, nodes cannot tell trustworthiness of others well due to forged recommendations from bad nodes. Therefore, we assert that the trust values for the selected SPs with 20% cache allocation are not really high since SRs cannot have accurate trust values for SPs. Fig. 4 depicts the average delay of all successful service discoveries. With T_NeVo, the SR does not need to wait for all service replies for further evaluation since the SR can trust and select any SP recommended by the local volunteers. In contrast, using T_Pull, there are two time durations, each lasting one second: i) the wait period to receive all service replies; and ii) the wait period to receive all recommendation replies. Fig. 5 demonstrates service discovery failure rates with varying bad node ratios. 12 10
Failure rate (% )
Failure rate (%)
100 80 60 40 20
Bad Node Ratio (%)
(a) Bad node ratio < 1
90
100
80
70
60
50
40
30
20
10
0
0
T_NeVo
8
T_Pull(Cache:20%)
6
T_Pull(Cache:40%)
4
T_Pull(Cache:60%)
2
T_Pull(Cache:100%)
0 0
10
20
30
40
50
Bad Node Ratio (%)
(b) Bad node ratio < 0.5
Fig. 5. Service discovery failure rates (%)
In Fig. 5 (a), we see the failure rates are much higher with T_NeVo when bad node ratio is higher than 50%. The plots for bad node ratios in the range 0 to 50 are magnified in Fig. 5 (b), showing that with lower bad node ratio, T_NeVo has lower failure rates. In T_Pull, as bad node ratio increases, many SRs and SPs cannot tell trustworthiness of each other without direct interaction since all bad recommenders lie about others. In contrast, with T_NeVo, for successful service discovery, all related nodes including the SR, the SP, and their local volunteers need to trust each other. If majority of nodes are bad, first, SRs have difficulty in finding trustworthy local volunteers.
140000 120000 100000 80000 60000 40000 20000 0
289
Bad Node Ratio (%)
T_NeVo
6
T_Pull(Cache:20%)
4
T_Pull(Cache:40%) T_Pull(Cache:60%)
2
T_Pull(Cache:100%)
0
0 10
80
60
40
20
Efficiency value
8
0
Number of messages
Trust Management Using NeVo in Ubiquitous Computing Environments
Fig. 6. Number of total messages
Bad Node Ratio (%)
Fig. 7. Efficiency values (℮)
Fig. 6 shows the protocol overhead of both schemes in terms of the number of message exchanges during simulation (refer to the legend in Fig. 7). T_NeVo incurs considerably less number of message exchanges despite the messages for building and maintaining NeVo. In T_Pull, the higher cache allocation reduces communication overhead because each node maintains trust degrees for more peers, which results in decrease of recommendation requests. Fig. 7 presents efficiency values to indicate performance of trust management on protocol overhead (cost). As we expected, T_NeVo shows much better efficiency compared to T_Pull.
6 Conclusion and Future Work For secure interaction among participants in open networks, we propose a hierarchical distributed trust management scheme utilizing a flexible architecture called NeVo (Network of Volunteers). In the proposed trust management scheme, trust values of clients are maintained globally and consistently at their local volunteers resulting in the decrease of total overhead compared to the distributed approaches. We choose service discovery to illustrate the trust management protocols. However, the proposed security mechanisms can be applied to any service or application that requires interactions among nodes. For future work, we plan to design authentication protocols to integrate to the proposed trust management scheme. Identities of nodes are likely to be unknown in pervasive environments, which is a rather fundamental obstacle for node authentication [17]. Therefore, we will also consider the issue of identification and privacy.
References 1. Weiser, M.: The Computer for the 21st Century. Scientific American, 94–104 (September 1991); reprinted in IEEE Pervasive Computing, 19–25 (Janurary-March 2002) 2. Kim, M., Kumar, M., Shirazi, B.A.: Service Discovery using Volunteer Nodes for Pervasive Environments. In: Proc. of IEEE International Conference on Pervasive Service 2005 (July 2005) 3. Carbone, M., Nielsen, M., Sassone, V.: A Formal Model for Trust in Dynamic Networks. In: Proc. of the 1st Int. Conference on Software Engineering and Formal Methods, September 2003, pp. 54–63 (2003)
290
M. Kim, M. Kumar, and S. Jung
4. Capra, L.: Engineering Human Trust in Mobile System Collaborations. In: Proc. of ACM Sogsoft 2004, October 31-November 6, pp. 107–116 (2004) 5. Grandison, T., Sloman, M.: Trust Management Tools for Internet Applications. In: Nixon, P., Terzis, S. (eds.) iTrust 2003. LNCS, vol. 2692, pp. 91–107. Springer, Heidelberg (2003) 6. Kinateder, M., Rothermel, K.: Architecture and Algorithms for a Distributed Reputation System. In: Nixon, P., Terzis, S. (eds.) iTrust 2003. LNCS, vol. 2692, pp. 1–16. Springer, Heidelberg (2003) 7. Terzis, S., Wagealla, W., English, C., Nixon, P.: Trust Lifecycle Management in a Global Computing Environment. In: Priami, C., Quaglia, P. (eds.) GC 2004. LNCS, vol. 3267, pp. 291–313. Springer, Heidelberg (2005) 8. Josang, A.: A Logic for Uncertain Probabilities. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 9(3), 279–311 (2001) 9. Shand, B., Dimmock, N., Bacon, J.: Trust for Ubiquitous, Transparent Collaboration. In: Proc. of IEEE PerCom 2003, pp. 153–160 (2003) 10. Blaze, M., Feigenbaum, J., Strauss, M.: Compliance Checking in the PolicyMaker Trust Management System. In: Hirschfeld, R. (ed.) FC 1998. LNCS, vol. 1465, pp. 254–274. Springer, Heidelberg (1998) 11. Blaze, M., Feigenbaum, J., Ioannidis, J., Keromytis, A.D.: RFC2704 - The KeyNote Trust Management System (version 2) (1999) 12. Fall, K., Varadhan, K.: The ns manual (formally ns notes and documentation), http://www.isi.edu/nanam/ns/doc/index.html (cited 13, December 2003) 13. Cahill., V., et al.: Using Trust for Secure Collaboration in Uncertain Environments. IEEE Pervasive Computing Mobile And Ubiquitous Computing 2(3) (2003) 14. Resnick, P., Zeckhauser, R., Friedman, E., Kuwabara, K.: Reputation Systems. Communications of the ACM 43(45) (August 2000) 15. English, C., Wagealla, W., Nixon, P., Terzis, S., Lowe, H., McGettrick, A.: Trusting collaboration in global computing systems. In: Nixon, P., Terzis, S. (eds.) iTrust 2003. LNCS, vol. 2692, pp. 136–149. Springer, Heidelberg (2003) 16. Perich, F., Undercoffer, J., Kagal, L., Joshi, A., Finin, T., Yesha, Y.: In Reputation We Believe: Query Processing in Mobile Ad-Hoc Networks. In: Proc. of IEEE Mobiquitous 2004 (2004) 17. Zakiuddin, I., Creese, S., Roscoe, B., Goldsmith, M.: Authentication in Pervasive Computing: Position Paper, http://www.pampas.eu.org/Position_Papers/QinetiQ.pdf 18. Lewis, N., Foukia, N.: Using Trust for Key Distribution and Route Selection in Wireless Sensor Networks. In: IEEE Globecom 2007, November 26-30 (2007) 19. Pirzada, A.A., McDonald, C.: Trust Establishment In Pure Ad-hoc Networks. Wireless Personal Communications 37, 139–163 (2006)
A Fast and Efficient Handover Authentication Achieving Conditional Privacy in V2I Networks* Jaeduck Choi, Souhwan Jung, Younghan Kim, and Myungsik Yoo School of Electronic Engineering, Soongsil University, 1-1, Sangdo-dong, Dongjak-ku, Seoul 156-743, Korea
[email protected], {souhwanj,younghak,myoo}@ssu.ac.kr
Abstract. This paper proposes a fast and efficient handover authentication with conditional privacy in V2I networks. One of the main challenges for achieving secure V2I communications is to accomplish a fast and efficient handover authentication for seamless IP-based services. Anonymous authentication with authority traceability is another important security issue. The basic idea is that a handover authentication occurs only between a vehicle and a roadside unit to reduce the cost of authentication time and communication overhead. After performing the handover authentication, the roadside unit notifies an AAA server of the authentication result and vehicle’s pseudonym ID, which does not affect the fast handover authentication. The proposed scheme is more efficient than the existing schemes in terms of authentication time and communication overhead. In addition, our work is the first study on conditional privacy preservation during a handover process in V2I networks. Keywords: VANET, V2I, Handover Authentication, Conditional Privacy.
1 Introduction Vehicular Ad-hoc Network (VANET) technologies have got recently considerable attention in the fields of research, standardization, and industry owing to their applications in roadway scenarios. VANETs can be classified into Vehicle-to-Vehicle (V2V) networks to provide driving safety and Vehicle-to-Infrastructure (V2I) networks to support seamless IP-based services during high-speed mobility. V2I applications contain various infotainment services such as VoIP, mobile IPTV, and Internet access. In V2I networks, the seamless mobility to vehicles across heterogeneous access network is essential. A variety of access network technologies such as IEEE 802.11 families including 802.11p Wireless Access in Vehicular Environments (WAVE) [1], IEEE 802.16 Wireless Broadband (WiMAX), and 3GPP *
This research is supported by the Ubiquitous Computing and Network (UCN) Project, Knowledge and Economy Frontier R&D Program of the Ministry of Knowledge Economy (MKE) in Korea as a result of UCN’s subproject 09C1-C1-20S, and partly supported by the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MEST) (No. 2009-0053879).
S. Balandin et al. (Eds.): NEW2AN/ruSMART 2009, LNCS 5764, pp. 291–300, 2009. © Springer-Verlag Berlin Heidelberg 2009
292
J. Choi et al.
UMTS - on the link layer are converging their IP backbone or infrastructure through mobile IP technologies [2-6]. IEEE 802.11 technology offers high bandwidth in the hot spot coverage, but it does not support mobility and roaming functions. IEEE 802.16 is the broadband wireless access technology providing high-speed data rate and mobility. 3GPP provides wide coverage and nearly universal roaming, but it suffers from low data rates. Therefore, the convergence of IEEE 802.11, WiMAX, and 3GPP can provide a convenient network access with high data rates anytime and anywhere in moving vehicles. One of the main challenges for secure V2I communications is to achieve a fast and efficient handover authentication. V2I networks are composed of various wireless networks such as IEEE 802.11, WiMAX, and 3GPP. Furthermore, there are extremely large amounts of network entities and data packets in situations of high traffic density such as traffic jams. For example, drivers and passengers have generally a propensity to use multimedia services such as movie and TV program during traffic jam. According to Dedicated Short Range Communications (DSRC) [7], a vehicle should also send each message with a time interval 100 ~ 300 ms. Therefore, a handover authentication in V2I networks should be fast and efficient in terms of the authentication delay and communication overhead. It should also satisfy the fundamental security requirements and conditional privacy. In other words, the handover authentication should provide the authentication for network access and ensure the data confidentiality and integrity of the value-added services between the vehicle and the Roadside Unit (RSU). Conditional privacy is another important security issue in V2I networks. Drivers do not want their private information such as their name, position, and travel route to be revealed during a handover authentication process. However, when car accidents or certain crimes occur, the identity information of the driver must be revealed by law enforcement to establish liability. In this paper, we propose a fast and efficient handover authentication achieving conditional privacy in V2I communications. To the best of our knowledge, our work is the first study that deals with the conditional privacy issues during a handover process in V2I networks. The basic idea is that a handover authentication occurs only between a vehicle and a RSU to reduce the cost of authentication time and communication overhead. After performing the handover authentication, an RSU notifies an AAA server of the authentication result and vehicle’s pseudonym ID to provide conditional privacy, which does not affect the fast handover authentication. The rest of the paper is as follows. Section 2 reviews the existing handover authentication in wireless networks and describes security requirements for secure V2I communications. Section 3 presents the proposed handover authentication in V2I networks. The security and performance aspects of our scheme are discussed in Session 4. Finally, the concluding remarks are presented in Section 5.
2 Related Works and Security Requirements 2.1 Related Works Many studies have examined the problem of handover authentication for network access authentication on the link layer such as WLAN, WiMAX, and 3GPP. The handover authentication schemes can be classified into two categories: the AAA-based
A Fast and Efficient Handover Authentication Achieving Conditional Privacy
293
scheme and the non-AAA-based scheme. The existing AAA-based schemes, such as the IEEE 802.11i EAP authentication method [8], the proactive key distribution method [9], and the mobility prediction method [10], can be easily applied to V2I networks due to the direction of the roadway route. The movement of vehicles in V2I is relatively predictable since it is restricted to the roads. This has an advantage for handover authentication protocols. The predictability of the movement directions reduces the authentication time for handover since it can distribute authentication keys in advance. In other words, the authentication delay is nearly zero. However, in the case of urban areas, the driving patterns can go in any direction so that they are similar to the movement pattern of mobile nodes in normal wireless networks. Also, they still suffer from Round-Trip Time (RTT) latency and communication overhead between a vehicle and an AAA server. Therefore, these schemes are not best suited for real-time services during the high-speed mobility of vehicles. To cope with this problem, the handover authentication schemes without communicating with the AAA server have recently been proposed [11-12]. These are referred to as non-AAA-based schemes, and they reduce authentication latency and communication cost for handover. Kim et al. proposed an innovative light-weight authentication scheme called Secure Fast Roaming using ID-based Cryptography (SFRIC) [11]. The SFRIC employs ID-based cryptography to simplify the authentication process, which does not require contacting an authentication server or exchanging certificates. In this scheme, the client's identifier such as its MAC address can be used as its public key for verification and encryption. It provides both the mutual authentication and the key agreement. Furthermore, Kim's scheme can complete the handover authentication within the period 20 ms. Zhang et al. proposed a location privacy preserving handover authentication scheme based on an Elliptic Curve Cryptosystem (ECC) blind signature in vehicular networks [12]. Zhang's scheme using a blind signature not only protects the identity of a vehicle, but also reduces the movement tracking probability during the handover of the vehicles through a number of access points. Furthermore, Zhang's scheme reduces the authentication delay since it uses a fast exponentiation computation instead of a timeconsuming pairing computation. None of the existing handover authentication schemes provides conditional privacy. Although Zhang's scheme preserves user privacy, it is impossible for law enforcement to trace user's identity when car accidents or crimes occur. Hence, a fast and efficient handover authentication scheme with the conditional privacy is required. 2.2 Security Requirements for Secure V2I Networks We define security requirements for secure V2I communications as follows. • Requirements for a fast and efficient handover authentication − It should be efficient in terms of the time of cryptography operations used during a handover authentication. − It should reduce the number of authentication messages and authentication procedures. • Requirements for a secure handover authentication − It should provide the mutual authentication for network access between a vehicle and an RSU.
294
J. Choi et al.
− It should generate a dynamic session key for the secure communications between a vehicle and an RSU to provide data confidentiality and integrity. − It should support the conditional privacy, that is, other users and RSUs cannot guess the user's original ID, position, and travel routes. However, law enforcement should be able to reveal the real identity of the user when car accidents and crimes occur.
3 The Proposed Handover Authentication Protocol First, an AAA server of Mobile Service Provider (MSP) generates one RSA key pair (rsaPK -MSP and rsaPK +MSP) and two secret keys θ and ω, where θ is a trapdoor key of RSU and ω preserves user identities. After generating these keys, the AAA server distributes its RSA public key and the secret key θ to all RSUs. In this paper, we assume that APs can securely store all sensitive values. • Initial full authentication phase Figure 1 shows an initial full authentication phase. A vehicle should perform the initial full authentication such as EAP-TLS, EAP-PKMv2, and EAP-AKA with the AAA server through the RSU1. Once a vehicle completes the initial authentication, the AAA server generates the privacy identity of the vehicle and its signature as in (1) and (2). In addition, the AAA server computes the temporary handover authentication key gaimodp for the vehicle as in (3). Then, it sends these values to the vehicle through a secure channel.
PIDvehicle ( i ) = h(ω , IDvehicle (i ) || IDMSP || Texpire )
(1)
SIDvehicle (i ) = rsaSignrsaPK − ( PIDvehicle ( i ) , Texpire )
(2)
g ai mod p (where, ai = θ ⊕ bi , bi = h(θ , PIDvehicle ( i ) || IDMSP || Texpire ))
(3)
MSP
secret key : θ
secret key : θ
+ rsaPK MSP
+ rsaPK MSP
− secret key : θ , ω , rsaPK MSP
PIDvehicle( i ) = h(ω , IDvehicle ( i ) || IDMSP || Texpire ) SIDvehicle( i ) = rsaSignrsaPK − ( PIDvehicle ( i ) , Texpire ) MSP
bi = h(θ , PIDvehicle ( i ) || IDMSP || Texpire ) PIDvehicle ( i ) , SIDvehicle( i ) , g ai mod p, Texpire
ai = θ ⊕ bi , g ai
PIDvehicle ( i ) , SIDvehicle ( i ) , Texpire secret key : g ai mod p
Fig. 1. Initial full authentication phase
• Handover authentication phase When the vehicle moves into the new RSU2 to persist the connectivity for mobility, a fast authentication procedure should be performed as shown in Figure 2.
A Fast and Efficient Handover Authentication Achieving Conditional Privacy
295
Step 1: The vehicle chooses a random number x as its Diffie-Hellman (DH) secret key and computes its DH half-key gxmodp. The vehicle generates the credential μ using the temporary authentication key gaimodp to be authenticated by the RSU2 as in (4). It then sends the message including necessary data to the RSU2.
μ = h( g a mod p, PIDvehicle (1) || IDRSU || Tcurrent || Texpire || g x mod p) 1
2
(4)
Step 2: Upon receiving the message, the RSU2 checks the current time Tcurrent and the validation period Texpire. The RSU2 generates the handover authentication keys a1 and b1 of the vehicle using the trapdoor key θ of the RSU2 as in (5). Then, it computes the credential μ' and compares the received credential μ with μ'. If the validation is successful, the RSU2 verifies SIDvehicle(1) using the RSA public key of the MSP. If the RSU2 fails to verify the signature SIDvehicle(1) of privacy identity, the handover authentication is rejected. Finally, the RSU2 computes the credential ν with its DH half-key gymodp and a session key gxymodp for the mutual authentication and key agreement as in (6). The RSU2 then sends all parameters to the vehicle.
b1 = h(θ , PIDvehicle (1) || IDMSP || Texpire ), a1 = θ ⊕ b1
(5)
ν = h( g a mod p, g xy mod p || g x mod p || g y mod || Tcurrent )
(6)
1
Step 3: After receiving the message, the vehicle computes a session key gyxmodp and authenticates the RSU2 by computing ν'. If successful, the vehicle sends the final message including a hash value of the temporary handover authentication key and session key h(ga1modp, gxymodp) to the RSU2 to confirm the mutual authentication and key agreement. PIDvehicle (1) , SIDvehicle (1) ,
secret key : θ
secret key : θ
Texpire , g a1 mod p
+ rsaPK MSP
+ rsaPK MSP
Vehicle
RSU1
− secret key : θ , ω , rsaPK MSP
RSU2
AAA Server
x
choose x and compute g mod p compute μ
IDRSU (2) , PIDvehicle (1) , SIDvehicle (1) , IDMSP , μ, Tcurrent , Texpire , g x
check Tcurrent and Texpire compute b1 and a1 = θ ⊕ b1 authenticate vehicle by computing μ and verify SIDvehicle (1) choose y and compute g y mod p and g yx mod p compute ν PIDvehicle (1) , IDMSP , IDRSU ( 2) , ν, Tcurrent , g y compute g yx mod p authenticate vehicle by computing ν
IDRSU (2 ) , PIDvehicle (1) , h ( g a1 mod p, g xy mod p )
IDRSU (2) , PIDvehicle (1) , Tcurrent
Fig. 2. Handover authentication phase
296
J. Choi et al.
Step 4: The RSU2 verifies the received message. If it is successful, the RSU2 allows the vehicle to access the wireless networks. Finally, the RSU2 notifies the AAA server of the authentication result with the identities of vehicle and RSU2, which means that the AAA server can provide conditional privacy. Note that the last procedure does not affect the fast handover authentication.
4 Discussion on Security and Performance 4.1 Security Considerations
The proposed scheme satisfies the security requirements for a handover authentication in V2I networks. This section discusses the security aspects of the proposed scheme. • Authentication for network access An RSU authenticates a vehicle by checking the credential μ and SID. An attacker who does not register to an MSP or perform an initial full authentication with an AAA server of the MSP cannot access the wireless networks since the attacker does not know the temporary authentication key gaimodp for handover authentication. Furthermore, the adversary cannot generate the signature SID of privacy identity since he/she does not know the RSA private key of the AAA server. Therefore, the proposed scheme allows only the legitimate vehicle to access the wireless networks. If one RSU is compromised, an attacker can access the Internet using only valid PIDs and SIDs. Note that the attacker cannot access the wireless networks using arbitrary PIDs since he/she cannot make the SID value signed with the private key of AAA server. However, the AAA server can perceive the illegal situation where the PID and SID are being doubly used for network access. Hence, the AAA server can take action against this breach of protocol by limiting usage. • Conditional privacy In our scheme, a vehicle uses a pseudo identity (PID) generated by using the one-way hash function h() with the secret key ω of the AAA server during an initial full authentication phase. Therefore, it is difficult for other users and RSUs to derive the user's real ID from the PID overheard from a public network. For conditional privacy preservation, the RSU forwards the PIDvehicle and the RSU's position to the AAA server after performing the handover authentication. In other words, the AAA server has the ability to trace the position of the PIDvehicle and reveal vehicle's real identity IDvehicle using its secret key ω when a car accident or crime occurs. None of the existing handover authentication schemes considers conditional privacy. • Session key agreement for data confidentiality and integrity The proposed scheme provides the mutual authentication by verifying two credentials μ and ν. After performing the mutual authentication, a vehicle and an RSU generate a session key gxymodp using the Diffie-Hellman algorithm based on the Discrete Logarithm Problem (DLP), which is the problem of exactly computing gxymodp from given g, p, gxmodp, and gymodp. Then, two nodes derive the symmetric keys from the DH session key to ensure the data confidentiality and integrity of the value-added services between the vehicle and the RSU.
A Fast and Efficient Handover Authentication Achieving Conditional Privacy
297
• Perfect Forward/Backward Secrecy (PFS/PBS) The DH key exchange is a cryptographic protocol that provides the PFS/PBS. The PFS and PBS mean that even if a long-term secret key is every compromised at any point in time, it never reveals all the preceding and following session keys. In our scheme, the long-term secret keys are the ephemeral random values x and y of a vehicle and an RSU. This guarantees the freshness of the DH session key if two nodes have chosen their random exponents (xi and yi) properly. • Man-in-the-Middle (MITM) attack Communication between a vehicle and an RSU is secure against the MITM attack. An attacker who does not know a user's temporary authentication key gaimodp cannot make independent connections with the victims and relay messages between them. 4.2 Performance Considerations
The proposed scheme provides a reasonable authentication time for handover authentication. We consider two performance metrics to evaluate the handover authentication schemes: the authentication delay and communication overhead. We obtained the time cost of cryptography operation by the experiment using MIRACL library [13] on a Pentium IV 3GHz as depicted in Table 1. Although there are alreadydeveloped technologies for improving the pairing computation, the computational overhead is still expensive and highly theoretical compared to other operations. Table 1. Execution time of cryptographic operations Security size TE TRV TM
C
F
T=C/F
1,385,265 1024 bit
TP
615,645 1,092,284
0.463 ms 0.206 ms
2,992.56 MHz
0.365 ms
37,790,790
12.628 ms
C: total number of CPU clock cycles, F: frequency of the internal CPU timer TE: modular exponentiation , TM: elliptic curve point multiplication TP: pairing operation, TRV: RSA verification.
Table 2. Time cost of cryptography for handover authentication optimized by pre-computation Kim et al.
Zhang et al.
Proposed Scheme
Tvehicle
1 TP
1 TM
1 TE
TRSU
2 TP
1 TM
2 TE + 1 TRV
Ttotal
37.884 ms
0.73 ms
1.595 ms
Tvehicle: optimized total time for handover authentication at the vehicle TRSU: optimized total time for handover authentication at the RSU Ttotal: total cryptography time for handover authentication (Tvehicle + TRSU).
298
J. Choi et al. Table 3. Comparison on the cost of authentication messages in V2I networks
CRSU_AAA Cvehicle_RSU CRSU_RSU Cauth_msg
IEEE 802.11i Mishra et al. [8] [9] 8 3 14α 2α 8+14α 3+2α
Pack et al. [10] 2 4α 2+4α
Kim et al. [11] 3α 3α
Zhang et al. Proposed [12] scheme 1 5α 3α 2β 5α+2β 1+3α
(b) 0 < α