Communications in Computer and Information Science
154
Salah S. Al-Majeed Chih-Lin Hu Dhinaharan Nagamalai (Eds.)
Advances in Wireless, Mobile Networks and Applications International Conferences WiMoA 2011 and ICCSEA 2011 Dubai, United Arab Emirates, May 25-27, 2011 Proceedings
13
Volume Editors Salah S. Al-Majeed University of Essex Colchester, UK E-mail:
[email protected] Chih-Lin Hu National Central University Jung-li City, Taoyuan, Taiwan E-mail:
[email protected] Dhinaharan Nagamalai Wireilla Net Solutions PTY Ltd Melbourne, Victoria, Australia E-mail:
[email protected] ISSN 1865-0929 e-ISSN 1865-0937 ISBN 978-3-642-21152-2 e-ISBN 978-3-642-21153-9 DOI 10.1007/978-3-642-21153-9 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: Applied for CR Subject Classification (1998): C.2, H.4, D.2, H.3, F.2, K.6.5
© Springer-Verlag Berlin Heidelberg 2011 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
The Third International Conference on Wireless, Mobile Networks and Applications (WiMoA 2011) and the First International Conference on Computer Science, Engineering and Applications (ICCSEA 2011) were held in Dubai, United Arab Emirates, during May 25–26, 2011. They attracted many local and international delegates, presenting a balanced mixture of intellects from all over the world. The goal of this conference series is to bring together researchers and practitioners from academia and industry to focus on understanding wireless, mobile networks and computer science and engineering applications and to establish new collaborations in these areas. Authors are invited to contribute to the conference by submitting articles that illustrate research results, projects, survey work and industrial experiences describing significant advances in the areas of all wireless, mobile networks and computer science and engineering applications. The WiMoA 2011 and ICCSEA 2011 committees rigorously invited submissions for many months from researchers, scientists, engineers, students and practitioners related to the relevant themes and tracks of the conference. This effort guaranteed submissions from an unparalleled number of internationally recognized top-level researchers. All the submissions underwent a strenuous peerreview process which comprised expert reviewers. These reviewers were selected from a talented pool of Technical Program Committee members and external reviewers on the basis of their expertise. The papers were then reviewed based on their contributions, technical content, originality and clarity. The entire process, Program which includes the submission, review and acceptance processes, was done electronically. All these efforts undertaken by the Organizing and Technical Program Committees led to an exciting, rich and a high-quality technical conference program, which featured high-impact presentations for all attendees to enjoy and to expand their expertise in the latest developments in this field. The book is organized as a collection of papers from WiMoA 2011 and ICCSEA 2011. In all, 63 papers were submitted to WiMoA and 110 papers were submitted to ICCSEA. The Program Committee selected 8 papers from the WiMoA 2011 submissions and 20 papers from ICCSEA 2011 for conference publications. Finally, we would like to thank the General and Program Chairs, organization staff, the members of the Technical Program Committees and reviewers for their excellent and tireless work. We also want to thank Springer for the strong support and the authors who contributed to the success of the conference. We also sincerely wish that all attendees benefited scientifically from the conference and wish them every success in their research. Salah S. Al-Majeed Chih-Lin Hu Dhinaharan Nagamalai
Organization
The Third International Conference on Wireless, Mobile Networks and Applications (WiMoA 2011) General Chairs David C. Wyld Michal Wozniak
Southeastern Louisiana University, USA Wroclaw University of Technology, Poland
General Co-chairs Chih-Lin Hu
National Central University, Taiwan
Steering Committee Salah M. Saleh Al-Majeed Dhinaharan Nagamalai Nabendu Chaki Ahmad Saad Al-Mogren Kamel Rouibah Mohamed Hassan Fahim Akhter
University of Essex, UK Wireilla Net Solutions Pty Ltd, Australia University of Calcutta, India King Saud University, Saudi Arabia Kuwait University, Kuwait American University of Sharjah, United Arab Emirates Zayed University, United Arab Emirates
Program Committee Members (Pending) Sajid Hussain Emmanuel Bouix Charalampos Z. Patrikakis Chin-Chih Chang Yung-Fa Huang Abdul Kadir Ozcan Atilla Elci B. Srinivasan Boo-Hyung Lee Chih-Lin Hu Cho Han Jin Cynthia Dhinakaran Dhinaharan Nagamalai
Acadia University, Canada iKlax Media, France National Technical University of Athens, Greece Chung Hua University, Taiwan Chaoyang University of Technology, Taiwan The American University, Cyprus Eastern Mediterranean University, Cyprus Monash University, Australia KongJu National University, South Korea National Central University, Taiwan Far East University, South Korea Hannam University, South Korea Wireilla Net Solutions Pty Ltd, Australia
VIII
Organization
Dimitris Kotzinos Dong Seong Kim Farhat Anwar Firkhan Ali Bin Hamid Ali Ford Lumban Gaol H.V. Ramakrishnan Henrique Joao Lopes Domingos Ho Dac Tu Hoang, Huu Hanh Jacques Demerjian Jae Kwang Lee Jan Zizka Johann Groschdl John Karamitsos Jose Enrique Armendariz-Inigo Jungwook Song K.P. Thooyamani Krzysztof Walkowiak Lu Yan Luis Veiga Marco Roccetti Michal Wozniak Mohsen Sharifi Murugan D. N. Krishnan Nabendu Chaki Natarajan Meghanathan Phan Cong Vinh Rajendra Akerkar
Technical Educational Institution of Serres, Greece Duke University, USA International Islamic University, Malaysia Universiti Tun Hussein Onn Malaysia, Malaysia University of Indonesia Dr. MGR University, India University of Lisbon, Portugal Waseda University, Japan Hue University, Vietnam Communication and Systems, Homeland Security, France Hannam University, South Korea SoNet/DI, FBE, Mendel University in Brno, Czech Republic University of Bristol, UK University of the Aegean, Samos, Greece
Universidad Publica de Navarra, Spain Konkuk University, South Korea Bharath University, India Wroclaw University of Technology, Poland University of Hertfordshire, UK Technical University of Lisbon, Portugal Universty of Bologna, Italy Wroclaw University of Technology, Poland Iran University of Science and Technology, Iran Manonmaniam Sundaranar University, India Manonmaniam Sundaranar University, India University of Calcutta, India Jackson State University, USA London South Bank University, UK Technomathematics Research Foundation, India Rakhesh Singh Kshetrimayum Indian Institute of Technology-Guwahati, India Ramayah Thurasamy Universiti Sains Malaysia, Malaysia Sarmistha Neogy Jadavpur University, India Serguei A. Mokhov Concordia University, Canada SunYoung Han Konkuk University, South Korea Susana Sargento University of Aveiro, Portugal Salah S. Al-Majeed University of Essex, UK Vishal Sharma Metanoia Inc., USA
Organization
Wei Jie Yeong Deok Kim Yuh-Shyan Chen Nicolas Sklavos Shivan Haran Danda B. Rawat Khamish Malhotra Eric Renault Andreas Riener Velmurugan Ayyadurai Syed Rahman Sajid Hussain Michael Peterson Brajesh Kumar Kaushik Yan Luo Yao-Nan Lien Rituparna Chaki Somitra Sanadhya Debasis Giri S. Hariharan Dong Seong Kim
University of Manchester, UK Woosong University, South Korea National Taipei University, Taiwan Technological Educational Institute of Patras, Greece Arizona state University, USA Old Dominion University, USA University of Glamorgan, UK Institut Telecom – Telecom SudParis, France Johannes Kepler University Linz, Austria Center for Communication Systems, UK University of Hawaii-Hilo, USA Fisk University, USA University of Hawaii-Hilo, USA Indian Institute of Technology, India University of Massachusetts Lowell, USA National Chengchi University, Taiwan West Bengal University of Technology, India IIT-Delhi, India Haldia Institute of Technology, India B.S. Abdur Rahman University, India Duke University, USA
First International Conference on Computer Science, Engineering and Applications (ICCSEA 2011) General Chairs Nabendu Chaki Henrique Joao Lopes Domingos
University of Calcutta, India University of Lisbon, Portugal
General Co-chairs Jose Enrique Armendariz-Inigo Universidad Publica de Navarra, Spain Steering Committee Abdul Kadhir Ozcan Dhinaharan Nagamalai John Karamitsos Natarajan Meghanathan
IX
The American University, Cyprus Wireilla Net Solutions Pty Ltd, Australia University of the Aegean, Samos, Greece Jackson State University, USA
X
Organization
Program Committee A. Arokiasamy Abdul Kadir Ozcan Oureddine Boudriga Anh Ngoc Le Andy Seddon Atilla Elci Bong-Han, Kim Boo-Hyung Lee Charalampos Z. Patrikakis Chih-Lin Hu Chin-Chih Chang Cho Han Jin Cynthia Dhinakaran David W Deeds Dimitris Kotzinos Dong Seong Kim Emmanuel Bouix Farhat Anwar Firkhan Ali Bin Hamid Ali Ford Lumban Gaol Girija Chetty H.V. Ramakrishnan Ho Dac Tu Hoang, Huu Hanh Jacques Demerjian Jae Kwang Lee Jan Zizka Jeong-Hyun Park Jivesh Govil Johann Groschdl Johnson Kuruvila Jungwook Song K.P. Thooyamani Kamalrulnizam Abu Bakar Krzysztof Walkowiak Lu Yan
Eastern Mediterranean University, Cyprus The American University, Cyprus University of Carthage, Tunisia Kyungpook National University, South Korea Asia Pacific Institute of Information Technology, Malaysia Eastern Mediterranean University, Cyprus Chongju University, South Korea KongJu National University, South Korea National Technical University of Athens, Greece National Central University, Taiwan Chung Hua University, Taiwan Far East University, South Korea Hannam University, South Korea Shingu College, South Korea Technical Educational Institution of Serres, Greece Duke University, USA iKlax Media, France International Islamic University, Malaysia Universiti Tun Hussein Onn Malaysia, Malaysia University of Indonesia University of Canberra, Australia Dr. MGR University, India Waseda University, Japan Hue University, Vietnam Communication and Systems, Homeland Security, France Hannam University, South Korea SoNet/DI, FBE, Mendel University in Brno, Czech Republic Electronics Telecommunication Research Institute, South Korea Cisco Systems Inc., USA University of Bristol, UK Dalhousie University, Halifax, Canada Konkuk University, South Korea Bharath University, India Universiti Teknologi Malaysia, Malaysia Wroclaw University of Technology, Poland University of Hertfordshire, UK
Organization
Luis Veiga Marco Roccetti Mohsen Sharifi Murugan D. N. Krishnan Nidaa Abdual Muhsin Abbas Paul D. Manuel Phan Cong Vinh Ponpit Wongthongtham Qin Xin Rajendra Akerkar Ramayah Thurasamy Sagarmay Deb Sajid Hussain Sarmistha Neogy Sattar B. Sadkhan Sergio Ilarri Serguei A. Mokhov Seungmin Rho SunYoung Han Susana Sargento Selma Boumerdassi Wichian Sittiprapaporn Yung-Fa Huang
XI
Technical University of Lisbon, Portugal Universty of Bologna, Italy Iran University of Science and Technology, Iran Manonmaniam Sundaranar University, India Manonmaniam Sundaranar University, India University of Babylon, Iraq Kuwait University, Kuwait London South Bank University, UK Curtin University of Technology, Australia Simula Research Laboratory, Norway Technomathematics Research Foundation, India Universiti Sains Malaysia, Malaysia Central Queensland University, Australia Acadia University, Canada Jadavpur University, India University of Babylon, Iraq University of Zaragoza, Spain Concordia University, Canada Carnegie Mellon University, USA Konkuk University, South Korea University of Aveiro, Portugal Conservatoire National des Arts et Metiers (CNAM), France Mahasarakham University, Thailand Chaoyang University of Technology, Taiwan
Organized by
ACADEMY & INDUSTRY RESEARCH COLLABORATION CENTER (AIRCC)
Table of Contents
The Third International Conference on Wireless, Mobile Networks and Applications (WiMoA-2011) Minimum Hop vs. Minimum Edge Based Multicast Routing for Mobile Ad Hoc Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Natarajan Meghanathan
1
Implementing Recovery in Low-Resource Stationary Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bai Li, Lynn Margaret Batten, and Robin Doss
11
Trusted Third Party Synchronized Billing Framework Using Multiple Authorizations by Multiple Owners (MAMO) . . . . . . . . . . . . . . . . . . . . . . . . Supriya Chakraborty and Nabendu Chaki
21
Random Variation in Coverage of Sensor Networks . . . . . . . . . . . . . . . . . . . Thaier Hayajneh and Samer Khasawneh
31
A Study of Applying SIP Mobility in Mobile WiMax Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yen-Wen Chen, Meng-Hsien Lin, and Hong-Jang Gu
42
IEEE 802.15.4/Zigbee Performance Analysis for Real Time Applications Using OMNET++ Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lamia Chaari and Lotfi Kamoun
54
Honeynet Based Botnet Detection Using Command Signatures . . . . . . . . J.S. Bhatia, R.K. Sehgal, and Sanjeev Kumar
69
Bandwidth Probe: An End-to-End Bandwidth Measurement Technique in Wireless Mesh Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ajeet Kumar Singh and Jatindra Kr Deka
79
First International Conference on Computer Science, Engineering and Applications (ICCSEA-2011) An Efficient Mining of Dominant Entity Based Association Rules in Multi-databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V.S. Ananthanarayana
91
Mitigation of Transient Loops in IGP Networks . . . . . . . . . . . . . . . . . . . . . . Mohammed Yousef and David K. Hunter
101
XIV
Table of Contents
An Efficient Algorithm to Create a Loop Free Backup Routing Table . . . Radwan S. Abujassar and Mohammed Ghanbari
116
A Low Cost Moving Object Detection Method Using Boundary Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Soharab Hossain Shaikh and Nabendu Chaki
127
Mitigating Soft Error Risks through Protecting Critical Variables and Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Muhammad Shaikh Sadi, Md. Nazim Uddin, Md. Mizanur Rahman Khan, and Jan J¨ urjens
137
Animations and Simulations: Learning and Practice of Algorithmic . . . . . Brika Ammar
146
Dynamic Task Allocation in Networks-on-Chip . . . . . . . . . . . . . . . . . . . . . . Agarwalla Bindu and Deka Jatindra Kumar
158
Congruence Results of Weak Equivalence for a Graph Rewriting Model of Concurrent Programs with Higher-Order Communication . . . . . . . . . . . Masaki Murakami
171
Performance Study of Fluid Content Distribution Model for Peer-to-Peer Overlay Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Salah Noori Saleh, Maryam Feily, Sureswaran Ramadass, and Ayman Hannan The Hybrid Cubes Encryption Algorithm (HiSea) . . . . . . . . . . . . . . . . . . . . Sapiee Jamel, Mustafa Mat Deris, Iwan Tri Riyadi Yanto, and Tutut Herawan Reducing Handover Latency in Mobile IPv6-Based WLAN by Parallel Signal Execution at Layer 2 and Layer 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Muhammad Arif Amin, Kamalrulnizam Bin Abu Bakar, Abdul Hanan Abdullah, and Rashid Hafeez Khokhar
181
191
201
Minimization of Boolean Functions Which Include Don’t-Care Statements, Using Graph Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . Masoud Nosrati, Ronak Karimi, and Reza Aziztabar
212
Clustering Moving Object Trajectories Using Coresets . . . . . . . . . . . . . . . . Omnia Ossama, Hoda M.O. Mokhtar, and Mohamed E. El-Sharkawi
221
ASIC Architecture to Determine Object Centroids from Gray-Scale Images Using Marching Pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andreas Loos, Marc Reichenbach, and Dietmar Fey
234
Table of Contents
Learning Digital Cashless Applications with the Consolidation of Authenticity, Confidentiality and Integrity Using Sequence Diagrams . . . Ajantha Herath, Suvineetha Herath, Hasan Yousif Kamal, Khalid Ahmed Al-Mutawah, Mohammed A.R. Siddiqui, and Rohitha Goonatilake
XV
250
Encoding in VLSI Interconnects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Brajesh Kumar Kaushik, S.K. Verma, and Balraj Singh
260
Vertical Handover Efficient Transport for Mobile IPTV . . . . . . . . . . . . . . . Salah S. Al-Majeed and Martin Fleury
270
Job Scheduling for Dynamic Data Replication Strategy Based on Federation Data Grid Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Zarina, M. Mat Deris, ANM.M. Rose, and A.M. Isa
283
Matab Implementation and Results of Region Growing Segmentation Using Haralic Texture Features on Mammogram Mass Segmentation . . . Valliappan Raman, Putra Sumari, and Patrick Then
293
Available Bandwidth Estimation with Mobility Management in Ad Hoc Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Redouane Belbachir, Zoulikha Mekkakia Maaza, and Ali Kies
304
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
317
Minimum Hop vs. Minimum Edge Based Multicast Routing for Mobile Ad Hoc Networks Natarajan Meghanathan Jackson State University Jackson, MS 39217, USA
[email protected] Abstract. The high-level contribution of this paper is to establish benchmarks for the minimum hop count per source-receiver path and the minimum number of edges per tree for multicast routing in mobile ad hoc networks (MANETs) and explore the tradeoffs between these two routing strategies with respect to hop count, number of edges and lifetime per multicast tree. Accordingly, we consider two categories of algorithms – Breadth First Search (for minimum hop trees) and minimum Steiner tree heuristic (for minimum edge trees). Extensive simulations of these two algorithms on centralized snapshots of the MANET topology, sampled for every 0.25 seconds, have been conducted for 1000 seconds under two different conditions of network density and three different multicast group sizes. Simulation results illustrate that minimum edge trees have 20-160% larger lifetime than the minimum hop trees. The tradeoff is that the minimum edge trees have 20-100% larger hop count per source-receiver path compared to the minimum hop trees. Similarly, the minimum hop trees have 13-35% more edges than the minimum edge trees. Keywords: Minimum Hop, Minimum Edge, Multicast Routing, Mobile Ad hoc Networks, Simulations, Steiner Trees.
1 Introduction A mobile ad hoc network (MANET) is a dynamic distributed system of wireless nodes that move independent of each other in an autonomous fashion. The network bandwidth is limited and the medium is shared. As a result, transmissions are prone to interference and collisions. The battery power of the nodes is constrained and hence nodes operate with a limited transmission range, often leading to multi-hop routes between any pair of nodes in the network. Multicasting in wireless ad hoc networks has numerous applications in collaborative and distributed computing. In this paper, we adopt the Least Overhead Routing Approach of using a chosen multicast tree as long as it exists and determining a new tree only when the currently used one breaks. Not much work has been done towards the evaluation of MANET multicast routing from a theoretical point of view with respect to metrics such as the hop count per source-receiver path and the number of edges per multicast tree and their impact on the lifetime per multicast tree. These two theoretical metrics significantly contribute and influence the more practically measured performance metrics such as the energy S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 1–10, 2011. © Springer-Verlag Berlin Heidelberg 2011
2
N. Meghanathan
consumption, end-to-end delay per packet, routing overhead and etc. that have been often used to evaluate and compare the different MANET multicast routing protocols in the literature. Hence, we take a different approach in this paper. We study MANET multicast routing using the theoretical algorithms that would yield the benchmarks (i.e., optimum values) for the above two metrics – the Breadth First Search (BFS) algorithm [3] for the minimum hop count per source-receiver path and the minimum Steiner tree heuristic [4] for the minimum number of edges. Our simulation methodology is outlined as follows: Using the mobility profiles of the nodes gathered offline from a discrete-event simulator (ns-2 [6]), we will generate snapshots of the MANET topology, referred to as Static Graphs [7], periodically for every fixed time instant. For simulations with a particular algorithm, if a multicast tree is not known for a particular time instant, we will run the algorithm on the static graph in a centralized fashion and adopt the LORA strategy of using this multicast tree as long as it exists for the subsequent static graphs. If the tree no longer exists after a certain time instant, the multicast algorithm is again run to determine a new tree. This procedure is repeated for the entire simulation time. Depending on the algorithm used, the sequence of multicast trees generated either have the minimum hop count per source-receiver path or the minimum number of edges. Our hypothesis is that the multicast trees, determined to optimize one of the two theoretical metrics, would be sub-optimal with respect to the other metric. Through extensive simulation analysis, we confirm our hypothesis to be true and we explain in detail the performance tradeoffs associated with the two metrics. The rest of the paper is organized as follows: Section 2 discusses the existing related work on MANET multicast routing. Section 3 introduces the notion of a Static Graph and reviews the BFS algorithm for minimum hop path trees and Kou et al.’s heuristic for minimum edge Steiner trees. Section 4 presents the simulation results for the benchmark values of the two theoretical metrics, explores the tradeoffs between these metrics and their impact on the lifetime per multicast tree. Section 5 concludes the paper. We use the terms ‘vertex’ and ‘node’, ‘algorithm’ and ‘heuristic’, ‘destination’ and ‘receiver’ are used interchangeably. They mean the same.
2 Existing Related Work Several MANET multicast routing protocols have been proposed in the literature [1][2]. They are mainly classified as: tree-based and mesh-based protocols. In treebased protocols, only one route exists between a source and a destination and hence these protocols are efficient in terms of the number of link transmissions. The treebased protocols can be further divided into two types: source tree-based and shared tree-based. In source tree-based multicast protocols, the tree is rooted at the source. In shared tree-based multicast protocols, the tree is rooted at a core node and all communication between the multicast source and the receiver nodes is through the core node. Even though shared tree-based multicast protocols are more scalable with respect to the number of sources, these protocols suffer under a single point of failure, the core node. On the other hand, source tree-based protocols are more efficient in terms of traffic distribution. In mesh-based multicast protocols, multiple routes exist between a source and each of the receivers of the multicast group. A receiver node
Minimum Hop vs. Minimum Edge Based Multicast Routing for MANETs
3
receives several copies of the data packets, one copy through each of the multiple paths. Mesh-based protocols provide robustness at the expense of a larger number of link transmissions leading to inefficient bandwidth usage. Considering all the pros and cons of these different classes of multicast routing in MANETs, we feel the source tree-based multicast routing protocols are more efficient in terms of traffic distribution and link usage. Hence, all of our work in this research will be in the category of on-demand source tree-based multicast routing. Some of the recent performance comparison studies on MANET multicast routing protocols reported in the literature are as follows: In [9], the authors compare the performance of the tree-based MAODV and mesh-based ODMRP protocols with respect to the packet delivery ratio and latency. In [10], the authors propose a stability-based multicast mesh protocol and compare its performance with ODMRP. [11], the authors compare a dominating set-induced mesh based multicast routing protocol for efficient flooding and control overhead and compare the protocol’s performance with that of MAODV and ODMRP. In [12], the authors explore the use of genetic algorithms to optimize the performance the performance of tree and mesh based MANET multicast protocols with respect to packet delivery and control overhead. The impact of route selection metrics such as hop count and link lifetime on the performance of on-demand mesh-based multicast ad hoc routing protocols has been examined in [13]. In [14], the author has proposed non-receiver aware and receiver-aware (depending on whether the nodes in the network are aware of the multicast group or not) extensions to the Location Prediction Based Routing (LPBR) protocol to simultaneously minimize the edge count, hop count and number of multicast tree discoveries. An agent-based multicast routing scheme (ABMRS) that uses a set of static and mobile agents for network and multicast initiation and management has been proposed in [15] and compared with MAODV. A zone-based scalable and robust location aware multicast algorithm (SRLAMA) has also been recently proposed for MANETs [16].
3 Review of the Algorithms Used for Multicast Simulations In this section, we describe the two theoretical algorithms (BFS and Minimum Steiner tree heuristic) used in this paper. We implement these theoretical algorithms to simulate multicasting on a sequence of static graphs over the duration of the multicast session. A static graph is a snapshot of the MANET topology at a particular time instant. Using the mobility profiles of the nodes generated offline from ns-2, we will be able to determine the locations of a node at any particular time instant. A static graph G(t) = (V, E) generated for a particular time instant t, comprises of all the nodes in the network as the vertex set V; there exists an edge (u, v) ∈ E, if and only if, the Euclidean distance between the two end vertices u and v ∈ V, is less than or equal to the transmission range of the nodes in the network. All the edges in E are of unit weight. We assume a homogeneous network of nodes and all nodes operate at an identical and fixed transmission range.
4
N. Meghanathan
3.1 Breadth First Search (BFS) Algorithm The BFS algorithm (pseudo code in Figure 1) has been traditionally used to check the connectivity of a network graph. When we start the BFS algorithm on a randomly chosen node, we should be able to visit all the vertices in the graph, if the graph is connected. BFS returns a tree rooted at the chosen start node; when we visit a vertex v for the first time in our BFS algorithm, the vertex u through which we visit v is considered as the predecessor node of v in the tree. Every vertex in the BFS tree, other than the root node, has exactly one predecessor node. When we run BFS on a static graph with unit edge weights, we will be basically obtaining a minimum hop multicast tree such that every node in the graph is connected to the root node (the multicast source node) of the tree on a path with the theoretically minimum hop count. If MG ⊆ V represents the multicast group of receiver nodes and a source node s, we start BFS at s and visit all the vertices in the network graph. Once we obtain a BFS tree rooted at s, we trace back from every receiver d ∈ MG and determine the minimum hop s-d path. The minimum hop multicast tree is an aggregate of all these minimum hop paths connecting the source s to receiver d in the multicast group. Input: Static Graph G = (V, E), source s Auxiliary Variables/Initialization: Nodes-Explored = Φ, FIFO-Queue = Φ, rootnode, ∀ v ∈ V, Predecessor(v) = NULL Begin Algorithm BFS (G, s) root-node = s Nodes-Explored = Nodes-Explored U {root-node} FIFO-Queue = FIFO-Queue U {root-node} while ( |FIFO-Queue| > 0 ) do first-node u = Dequeue(FIFO-Queue) // extract the first node for (every edge (u, v) ∈ E) do // i.e. every neighbor v of node u if (v ∉ Nodes-Explored) then Nodes-Explored = Nodes-Explored U {v} FIFO-Queue = FIFO-Queue U {v} Predecessor (v) = u end if end for end while End Algorithm BFS Fig. 1. Pseudo Code for Breadth First Search (BFS)
3.2 Minimum Edge Multicast Steiner Tree Given a static graph, G = (V, E), where V is the set of vertices, E is the set of edges and a subset of vertices (called the multicast group or Steiner points) MG ⊆ V, the multicast Steiner tree is the tree with the least number of edges required to connect all the vertices in MG. Unfortunately, the problem of determining a minimum edge
Minimum Hop vs. Minimum Edge Based Multicast Routing for MANETs
5
Steiner tree in an undirected graph like that of the static graph is NP-complete. Efficient heuristics (e.g., [4]) have been proposed in the literature to approximate a minimum Steiner tree. In this paper, we use the Kou et al’s [4] well-known O(|V||MG|2) heuristic (|V| is the number of nodes in the network graph and |MG| is the size of the multicast group comprising of the source nodes and the receiver nodes) to approximate the minimum edge Steiner tree in graphs representing snapshots of the network topology. An MG-Steiner-tree is referred to as the minimum edge Steiner tree connecting the set of nodes in the multicast group MG ⊆ V. In unit disk graphs such as the static graphs used in our research and in [5], Step 5 of the heuristic is not needed and the minimal spanning tree TMG obtained at the end of Step 4 could be considered as the minimum edge Steiner tree. Figure 2 outlines the heuristic. Input: A Static Graph G = (V, E) Multicast Group MG ⊆ V Output: A MG-Steiner-tree for the set MG ⊆ V Begin Kou et al Heuristic (G, MG) Step 1: Construct a complete undirected weighted graph GC = (MG, EC) from G and MG where ∀ (vi, vj) ∈ EC, vi and vj are in MG, and the weight of edge (vi, vj) is the length of the shortest path from vi to vj in G. Step 2: Find the minimum weight spanning tree TC in GC (If more than one minimal spanning tree exists, pick an arbitrary one). Step 3: Construct the sub graph GMG of G, by replacing each edge in TC with the corresponding shortest path from G (If there is more than one shortest path between two given vertices, pick an arbitrary one). Step 4: Find the minimal spanning tree TMG in GMG (If more than one minimal spanning tree exists, pick an arbitrary one). Note that each edge in GMG has weight 1. return TMG as the MG-Steiner-tree End Kou et al Heuristic Fig. 2. Kou et al’s Heuristic [4] to find an Approx. Minimum Edge Steiner Tree
4 Simulations The simulations have been conducted in a discrete-event simulator implemented by the author in Java. The two multicast algorithms have been implemented in a centralized fashion. We generate the static graphs by taking snapshots of the network topology, periodically for every 0.25 seconds, and run the two multicast algorithms. The simulation time is 1000 seconds. We consider a square network of dimensions 1000m x 1000m. The transmission range of the nodes is 250m. The network density is varied by performing the simulations with 50 nodes (low density) and 100 nodes (high density). We assume there is only one source for the multicast group and three different values for the number of receivers per multicast group are considered: 3 (small), 10 (moderate) and 18 (large). A multicast group comprises of a source node and a list of receiver nodes, the size of which is mentioned above.
6
N. Meghanathan
The node mobility model used is the Random Waypoint model [8]. Each node starts moving from an arbitrary location (i.e., waypoint) at a speed uniformly distributed in the range [vmin, …, vmax]. Once the destination is reached, the node may stop there for a certain time called the pause time and then continue to move to a new waypoint by choosing a different target location and a different velocity. A mobility trace file generated for a particular vmax value over the duration of the simulation time is the congregate of the location, velocity and time information of all the waypoints for every node in the network. In this paper, we set vmin = 0. The vmax values used are 5 m/s (low mobility), 25 m/s (moderate mobility) and 50 m/s (high mobility). The pause time is 0 seconds. The performance metrics measured are as follows. Each performance metric illustrated in Figures 3 through 6 is measured using 5 different lists of receiver nodes for the same size and the multicast algorithm is run on five different mobility trace files generated for a particular value of vmax. (i) Number of links per tree: This metric refers to the total number of links in the entire multicast tree, time-averaged over the duration of the multicast session. For example, a multicast session uses two trees, one tree with 10 links for 3 seconds and another tree with 15 links for 6 seconds, then the time-averaged value for the number of links per tree for the 9-second duration of the multicast session is (10*3 + 15*6)/(3 + 6) = 13.3 and not 12.5. (ii) Number of hops per receiver: We measure the number of hops in the paths from the source to each receiver of the multicast group and average it for the duration of the multicast session. This metric is also a time-averaged value of the number of hops from a multicast source to a receiver and then averaged over all the receivers of a multicast session. (iii) Lifetime per Multicast Tree: Whenever a link break occurs in a multicast tree, we establish a new multicast tree. The lifetime per multicast tree is the average of the time between successive multicast tree discoveries for a particular routing protocol or algorithm, over the duration of the multicast session. The larger the value of the lifetime per multicast tree, the lower the number of multicast tree transitions or discoveries needed. 4.1 Number of Edges per Tree and Hop Count per Source-Receiver Path As expected, the minimum-edge based Steiner trees incurred the smallest number of edges per multicast trees. On average, the number of edges per minimum hop tree is 13-35% more than those incurred with the minimum edge tree. With an objective to optimize the hop count, minimum hop based multicast trees select edges that could constitute a minimum hop path, but with a higher probability of failure in the immediate future. The physical Euclidean distance between the constituent nodes of an edge on a minimum hop path is close to the transmission range of the nodes at the time of tree formation itself. For a given network density, as we increase the number of receivers per multicast group from 3 to 18, the average number of edges per multicast tree increased by a factor of 3 to 4. For the minimum hop and minimum edge trees, for a given level of node mobility and number of receivers per multicast group, as we increase the network density, the number of edges per multicast tree either remains the same or even slightly decreases.
Minimum Hop vs. Minimum Edge Based Multicast Routing for MANETs
7
As expected, the minimum hop multicast trees incurred the lowest hop count per source-receiver path. The larger hop count per source-receiver path for minimum edge trees could be attributed to a relatively lower number of edges compared to the minimum hop trees. As we connect the source node to the multicast receivers with the lowest possible number of edges, the number of hops between the source node and to each of the receiver nodes increases. This is the tradeoff between the objectives of minimizing the number of edges per multicast tree and the hop count per individual source-receiver paths in the multicast tree.
Fig. 3. # of Edges per Tree and Hop Count per Source-Receiver Path (vmax = 5 m/s)
Fig. 4. # of Edges per Tree and Hop Count per Source-Receiver Path (vmax = 25 m/s)
Fig. 5. # of Edges per Tree and Hop Count per Source-Receiver Path (vmax = 50 m/s)
For both minimum hop and minimum edge multicast trees, for a given network density and number of receivers per multicast group, there is appreciably no impact of the maximum node velocity on the average number of edges per tree as well as the hop count per source-receiver path. For a given level of node mobility (i.e., maximum node velocity) and network density, as we increase the number of receivers per
8
N. Meghanathan
multicast group, the average hop count per source-receiver path for minimum hop trees decreases. On the other hand, the average hop count per source-receiver path for minimum edge trees increases. This could be attributed to the relatively fewer number of edges in the minimum edge trees compared to those incurred by the minimum hop trees. The relatively more edges in minimum hop trees at larger number of receivers per multicast group results in lower hops count per source-receiver path. The average number of edges per minimum hop tree for a network of 50 nodes and 3 receivers per multicast group is about 1 edge more than those incurred by the minimum edge trees; on the other hand, the average number of edges per minimum hop tree for a network of 50 nodes and 18 receivers per multicast group is about 7 edges more than the minimum. Similar observations could be made for network of 100 nodes. When compared to the average hop count per source-receiver path incurred by minimum hop trees, the average hop count per source-receiver path for minimum edge trees is 20% (for smaller number of receivers per multicast group) to 100% (for larger number of receivers per multicast group) more. Note that with increase in the network density and/or the number of receivers per multicast group, the trend of the hop counts per source-receiver path for minimum hop trees is to decrease; whereas, the trend of the hop count per source-receiver path for minimum edge trees is to increase. The hop count per source-receiver path for minimum hop trees decreases by at most 14% and 30% respectively; whereas, the hop count per source-receiver path for minimum edge trees increases by at most 47%. 4.2 Lifetime per Multicast Tree The minimum edge multicast trees had a relatively longer lifetime compared to the minimum hop multicast trees. This could be attributed to (i) the increased number of edges (refer to Section 4.1 for more on this observation) in a minimum hop multicast tree; (ii) the physical Euclidean distance between the constituent nodes of an edge on a minimum hop path is close to the transmission range of the nodes at the time of tree formation itself. Thus, the probability of an edge failure is quite high at the time of formation of the tree; (iii) the edges of a tree are also independent from each other. All these three factors play a significant role in the relatively lower lifetime per minimum hop multicast tree. For both the multicast algorithms, for a fixed network density, as the number of receivers per multicast group is increased, the lifetime per multicast tree decreases moderately at low node mobility and decreases drastically (as large as one-half to one-third of the value at smaller number of receivers per group) at moderate and high node mobility scenarios. This could be attributed to the difficulty in finding a tree that would keep the source node connected to the receivers of the multicast group for a longer time, with increase in node mobility and/or the number of receivers per multicast group. For a given number of receivers per multicast group and node mobility, the lifetime per minimum hop trees and minimum edge trees slightly decreases as we double the network density. The decrease is more predominant for minimum hop trees and this could be attributed to the relatively unstable minimum hop paths in high density networks.
Minimum Hop vs. Minimum Edge Based Multicast Routing for MANETs
Fig. 6.1. vmax = 5 m/s
9
Fig. 6.2. vmax = 25 m/s
Fig. 6.3. vmax = 50 m/s Fig. 6. Lifetime of Minimum Hop Tree vs. Minimum Edge Tree
For a given level of node mobility, the lifetime per minimum edge tree is 23% (low density) to 38% (high density); 61% (low density) to 107% (high density) and 76% (low density) to 160% (high density) larger than the lifetime per minimum hop tree for small, moderate and larger number of receivers per multicast group respectively. For both minimum hop and minimum edge trees, for a given network density and number of receivers per group, as we increase the maximum node velocity to 25 m/s and 50 m/s, the lifetime per tree reduces by 1/3rd to 1/6th of their value at a maximum node velocity of 5 m/s.
5 Conclusions We have described the algorithms that can be used to obtain benchmarks for the minimum hop count per source-receiver path and minimum number of edges per tree for multicast routing in mobile ad hoc networks. Simulations have been conducted to obtain such benchmarks for different conditions of network density, node mobility and number of receivers per multicast group. The minimum hop based multicast trees have a larger number of edges than the theoretical minimum – the minimum hop trees are unstable and their lifetime decreases with increase in the number of edges. This could be attributed to the instantaneous decision taken by the minimum hop path algorithm to select a tree without any consideration for the number of edges and their lifetime. The minimum edge trees have a relatively larger hop count per sourcereceiver path and the hop count per path increases with the number of receivers per multicast group. The relatively fewer edges in the minimum edge tree results in a relatively larger lifetime compared to the minimum hop trees, as each edge in these two trees are independent. The simulation results thus indicate a complex tradeoff between the hop count per source-receiver paths and number of edges per tree vis-àvis their impact on the lifetime per tree for multicast routing.
10
N. Meghanathan
References 1. Siva Ram Murthy, C., Manoj, B.S.: Ad hoc Wireless Networks – Architectures and Protocols. Prentice Hall, USA (2004) 2. Toh, C.K., Guichal, G., Bunchua, S.: ABAM: On-demand Associativity-based Multicast Routing for Ad hoc Mobile Networks. In: The 52nd VTS Fall Vehicular Technology Conference, vol. 3, pp. 987–993. IEEE, Sydney (2000) 3. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press/ McGraw Hill, New York, USA (2001) 4. Kou, L., Markowsky, G., Berman, L.: A Fast Algorithm for Steiner Trees. Acta Informatica 15, 141–145 (1981) 5. Meghanathan, N.: On the Stability of Paths, Steiner Trees and Connected Dominating Sets in Mobile Ad hoc Networks. Ad hoc Networks 6(5), 744–769 (2008) 6. Fall, K., Varadhan, K.: The ns Manual, The VINT Project, A Collaboration between Researchers at UC Berkeley, LBL, USC/ISI and Xerox PARC (2001) 7. Farago, A., Syrotiuk, V.R.: MERIT – A Scalable Approach for Protocol Assessment. Mobile Networks and Applications 8(5), 567–577 (2003) 8. Bettstetter, C., Hartenstein, H., Perez-Costa, X.: Stochastic Properties of the Random-Way Point Mobility Model. Wireless Networks 10(5), 555–567 (2004) 9. Vasiliou, A., Economides, A.A.: Evaluation of Multicast Algorithms in MANETs. In: The 3rd International Conference on Telecommunications and Electronic Commerce, vol. 5, pp. 94–97. WSEAS, Stevens Point (2005) 10. Biradar, R., Manvi, S., Reddy, M.: Mesh Based Multicast Routing in MANETs: Stable Link Based Approach. International Journal of Computer and Electrical Engineering 2(2), 371–380 (2010) 11. Menchaca-Mendez, R., Vaishampayan, R., Garcia-Luna-Aceves, J.J., Obraczka, K.: DPUMA: A Highly Efficient Multicast Routing for Mobile Ad Hoc Networks. In: Syrotiuk, V.R., Chávez, E. (eds.) ADHOC-NOW 2005. LNCS, vol. 3738, pp. 178–191. Springer, Heidelberg (2005) 12. Baburaj, E., Vasudevan, V.: Exploring Optimized Route Selection Strategy in Tree- and Mesh-Based Multicast Routing in MANETs. International Journal of Computer Applications in Technology 35(2-4), 174–182 (2009) 13. Meghanathan, N., Vavilala, S.: Impact of Route Selection Metrics on the Performance of On-Demand Mesh-based Multicast Ad hoc Routing. Computer and Information Science 3(2), 3–18 (2010) 14. Meghanathan, N.: Multicast Extensions to the Location Prediction Based Routing Protocol for Mobile Ad Hoc Networks. In: Liu, B., Bestavros, A., Du, D.-Z., Wang, J. (eds.) WASA 2009. LNCS, vol. 5682, pp. 190–199. Springer, Heidelberg (2009) 15. Manvi, S.S., Kakkasageri, M.S.: Multicast Routing in Mobile Ad Hoc Networks by using a Multiagent System. Information Sciences 178(6), 1611–1628 (2008) 16. Kamboj, P., Sharma, A.K.: Scalable and Robust Location Aware Multicast Algorithm (SRLAMA) for MANET. International Journal of Distributed and Parallel Systems 1(2), 10–24 (2010)
Implementing Recovery in Low-Resource Stationary Wireless Sensor Networks Bai Li, Lynn Margaret Batten, and Robin Doss Deakin University, Melbourne, Australia {baili,lmbatten,rchell}@deakin.edu.au
Abstract. Implementation of certain types of protocols on wireless sensor networks (WSNs) can be difficult due to the architecture of the motes. Identification of these issues and recommendations for workarounds can be very useful for the WSN research community testing hardware. In recent work, the authors developed and implemented clustering, reprogramming and authentication protocols involved in recovering WSNs with low resources. In attempting to integrate these protocols, several issues arose in the implementation phase connected to pre-set configurations of the motes used. In this paper, we describe the issues arising in integrating our protocols using Zigbee with IEEE 802.15.4 and the reprogramming module Deluge, and compare our challenges and solutions with those faced by other researchers in this area. Finally, we provide recommendations for consideration by researchers who may in future consider integrating several protocols on these platforms. Keywords: Wireless sensor network, recovery protocol, implementation.
1 Introduction Wireless sensor networks (WSNs) are deployed in many mission critical applications and for monitoring of critical areas of national defence. With their development, various attacks have appeared usually with the aim of taking over nodes in the network, destroying nodes or disrupting data flow. Since the base station can easily detect if only a few nodes are sending anomalous data, attacks are usually based on compromise of a large part of the network. Detection, response to and recovery from such attacks have become major challenges in protecting sensor networks. Numerous papers have been written on aspects of clustering, reprogramming and authentication (eg. [4], [9]) in WSNs, most of which merely simulate a WSN environment. Our focus in this paper is implementation of recovery in an actual WSN system bringing together an integrated protocol involving reprogramming, re-clustering and authentication. The authors have previous papers in this area ([5], [6], [7]). We assume that the network is low resourced but has the capability to detect an attack on the nodes and determine which nodes have been compromised (with high probability). We then introduce protocols which assist the Base Station (BS) in securely reprogramming compromised nodes and assist the WSN to securely self-organize (cluster and re-cluster). S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 11–20, 2011. © Springer-Verlag Berlin Heidelberg 2011
12
B. Li, L.M. Batten, and R. Doss
In this article, we discuss the problems encountered in implementing node recovery across two TinyOS platforms TelosB and Mica Z using the standards-based protocols ZigBee and IEEE 802.15.4 along with the off-the-shelf software Deluge [13] for reprogramming. In developing TinyOS, the driver set was not designed to be compatible with the IEEE 802.15.4 standard and since Deluge reprogramming was also designed based on this driver set, Deluge is not compatible with IEEE 802.15.4. We also encountered problems with clock skewing and collision resistance. We compare the challenges we faced, and our solutions to them, with those faced by several other authors encountering implementation difficulties. Finally, we offer recommendations for those implementing recovery on low-resource WSN platforms. The rest of the paper is outlined as follows. In Section 2, we discuss related work. In Section 3, we summarize our reprogramming, re-clustering and authentication protocols. Section 4 discusses the implementation issues in integrating these protocols. In Section 5, we compare our implementation issues with those of other researchers, and develop recommendations based on this comparison in Section 6.
2 Background and Related Work With the aim of introducing a hierarchical structure for improving cluster management, clustering of WSNs was developed. Clusters are subsets of nodes in the WSN managed by a member of the subset referred to as an ‘aggregator’. Wu [15] proposed the enhanced multi-point relay (EMPR) algorithm to efficiently partition a WSN with a flat topology into a hierarchical network by identifying a set of multipoint relays that can be shown to be fully connected. This method is amenable to both stationary and mobile WSN environments and does not require that nodes be able to communicate directly with the base station. We adopted EMPR as the basis of our clustering algorithm and adjusted it to permit control of cluster size and choice of aggregators. A detailed description of our re-clustering approach can be found in [5]. In developing our approach to reprogramming for recovery, we considered the best current work. Current methods for over the air reprogramming include XNP from Crossbow [13], Deluge [13], MOAP [12] and incremental programming [1]. Each of these methods is designed for either multi-hop or single-hop reprogramming of all nodes in a deployment. XNP transmits the program code over a single-hop, Deluge, however, makes use of epidemic dissemination over multi-hops to reprogram all nodes in the network and MOAP also supports multi-hop programming. Incremental programming targets components of the code in the node that need to be changed, and then reprograms using only former or updated versions of these components. None of these methods enables the reprogramming of a specific node and hence none is directly suited for node recovery purposes. As the basis for our design of a reprogramming protocol for individual nodes, we chose Deluge based on the fact that it aims to increase the transmission throughput by using optimization techniques such as adjusting the packet transmission rate and spatial multiplexing. We modified the Deluge protocol to selectively reprogram nodes that have been identified as compromised. Detailed descriptions of the redesigned Deluge protocol as evaluated through field experiments using our test bed can be found in [5].
Implementing Recovery in Low-Resource Stationary Wireless Sensor Networks
13
In a recovery operation, it is critical that messages from the base station requesting re-clustering or reprogramming be authenticated as, otherwise, a denial of service attack can be launched. Our focus is on ensuring that authentication is effective and efficient, with the goal of extending the life of the WSN for as long as possible. A number of authentication methods have been proposed in WSNs such as one-way hash chains (OHC) to authenticate broadcast messages. To tolerate packet loss, multilevel OHCs were introduced in which a higher-level OHC is used to bootstrap lowlevel OHCs. These methods require time synchronization which is not practical in a low-resource WSN. In using OHCs and other general cryptographic authentication methods in low resource WSNs, problems unique to unicast messages are encountered, for example, generating and storing keys in a highly resourceconstrained node. The TinySec development of Karlof, Sastri and Wagner in 2004 [2] was developed to tackle these problems by providing access control, message authentication, data confidentiality and avoidance of replay attacks. However, TinySec does not specify any key pre-distribution method but assigns a global key to the system, stored always in the same location. Thus, in a WSN deployment, as an attacker need only target a well-known address or area of memory, we did not employ TinySec. In all schemes using authentication, some kind of key is needed and the question of key storage is critical to maintaining a WSN. In virtually all known schemes for WSNs, keys are stored on nodes and thus require a tamper resistant location or an assumption that an attacker is unable to retrieve the keys. We now turn to three recent papers which have implemented protocols on low resource motes. The paper [8] implements 27 Crossbow Mica2 motes in a coal mine to detect collapsed areas and to maintain system integrity, especially in case motes are moved or destroyed in a collapse. In [14], the authors implement data delivery mechanisms on MicaZ motes to determine performance limitations in obtaining ‘accurate modeling of wireless signal propagation behavior … and erroneous device behavior’. Finally, the paper [3] discusses the use of Zigbee together with IEEE802.15.4 standards, and points out that they do not support time synchronization to support clusters. They propose a work-around for this, and test on MicaZ motes. We give more details of all three papers in Section 4.
3 Summary of Component Protocols In [5], [6] and [7], we presented protocols for reprogramming, re-clustering and authentication as part of the recovery process in a WSN. We assume that the WSN is structured hierarchically with component roles as follows: Member node – senses and transmits data Aggregator node (AG) – controls member nodes and senses, collects, aggregates, analyses and transmits data Base station – controls the system and collects, analyses, transmits and stores data. As explained in detail in the papers mentioned, to optimize the recovery process, we propose to retain connectivity and maximize flexibility in the network by allowing each node to play the role of either a member or aggregator node as appropriate under
14
B. Li, L.M. Batten, and R. Doss
the conditions arising, and at the same time, for the entire network to efficiently reorganize itself in order to remain connected. Authentication is implemented both in re-clustering and reprogramming in order to avoid several typical attacks. Aggregator selection is completed in an energy-efficient manner based on a node metric. For the estimation of the node selection metric, each node v broadcasts a HELLO message at a random time instant within a fixed time interval. Each message carries the ID of the transmitting node v, the IDs of all currently known 1-hop neighbors of the transmitting node, and the metric values M that characterize the capabilities of the node v and its neighbors to act as an AG. The metric M(v) of node v is defined as
M (v ) = a
e (v ) d (v ) , + (1 − a ) e max d max
a ∈ [0,1],
where e is the available battery energy of the node and emax its maximum value, d is the node degree that should not exceed a pre-defined value dmax, and a is a pre-defined weighting parameter. Re-clustering A node v decides to act as a AG if it has never been compromised, and 1. it has a larger metric M(v) than all its 1-hop neighbors and has at least two unconnected neighbors, or if 2. it is in the coverage set formed by its neighbor with the largest metric M. Reprogramming Deluge is a reliable data dissemination protocol for large objects, such as program binaries. Together with a boot-loader, Deluge provides a way to reprogram sensor motes in a network. Since Deluge only supports network-wide reprogramming, we modified the dissemination engine of the protocol to individually address sensor nodes. This was done by replacing the AM_BROADCAST_ADDR parameter in the engine with the node-id of the node to be recovered. For reprogramming, we assume that each node in the WSN is within range of the BS, while the converse is not necessarily the case. This is a much stronger assumption than that needed for reclustering. Authentication In each case, a node with IDn contains secret Sn, known only to itself and the BS. R represents a random value (but is in fact the local time obtained from the LocalTime.get() command in TinyOS), M represents a message to reprogram or recluster or a request that another node be reprogrammed. H represents a hash function and we assume that n, M and R are the appropriate size for input to H. All message requests to reprogram or to re-cluster between nodes and nodes, or between the BS and nodes, include the node ID of both sender and receiver and a hash of data including the secrets known only to the sender and BS and of the current value for R. This ensures the authenticity of the content and its sender and prevents resending of legitimate messages at a later time by an attacker. For comparison purposes, we chose two hash functions in implementing the authentication protocol. SHA-1 was chosen as it is popular and widely used in many
Implementing Recovery in Low-Resource Stationary Wireless Sensor Networks
15
government standards. Our second choice was the Rabin encryption system [10] adapted for use as a hash function as proposed by Shamir in [11]. While Shamir suggests an adaptation of Rabin’s scheme to what he calls SQUASH, our implementation is constrained by TinyOS requirements and so we use smaller values than those proposed to ensure the security of SQUASH. Such smaller values do not warrant use of the improved SQUASH computations, and so we use Rabin’s scheme as proposed in [10]. The details can be found in [7]. We make the assumption that keys are stored in a tamper resistant location and that an attacker is unable to retrieve them.
4 Implementation Issues with the Integrated Recovery Protocol MPRH (Multi-Point Relaying with Hash) In this section, we describe implementation issues with our integrated recovery protocol and the technical issues involved with the integration. This discussion should assist other implementers of WSNs in some of their decisions. We chose to use MicaZ and TelosB sensor nodes as both are supported by open source TinyOS [13] and IEEE 802.15.4 compliant, a standard in implementing security protocols on hardware motes, while the former is also ZigBee compliant. 802.15.4 defines the physical and MAC layers, whereas ZigBee defines the network and application layers. Long battery life, low cost, small footprint and meshnetworking in supporting communication between large numbers of devices have driven the specifications for 802.15.4 and ZigBee. TelosB has an 8 MHz TI MSP430 microcontroller processor with 16 KB of instruction memory. The CPU consumes 1.8 mA (at 3 volts) when active, and 5.1uA power when sleeping. The radio is a 2400 MHz to 2483 MHz globally compatible ISM band, delivering up to 250 Kbps of bandwidth on a single shared channel and with a range of up to a few dozen meters. The RFM radio consumes 4.8 mA (at 3 volts) in receive mode, up to 12 mA in transmit mode, and 5μA in sleep mode. The MicaZ processor has a 4 MHz 8‐bit Atmel ATMEGA103 CPU which consumes 5.5 mA (at 3 volts) when active, and about 0.06mA when sleeping. The radio of 916 MHz is considered to be a low‐power radio from RFM, delivering up to 40 Kbps bandwidth on a single shared channel with a range of up to a few dozen meters. The RFM radio consumes 4.8 mA (at 3 volts) in receive mode, up to 12 mA in transmit mode, and 5μA in sleep mode. The main parameters of both motes are given in Table 1. MicaZ only supports 32 bit computation while the 16 bit processor of TelosB supports 64 bit computation. Table 1. Hardware performance comparison of TelosB and MicaZ motes
Platforms
Processor bits
TelosB MicaZ
16 bits 8 bits
Integer Computation Bit Support 64 bits 32 bits
RAM
ROM
10KB 4KB
48KB 128KB
16
B. Li, L.M. Batten, and R. Doss
Re-clustering We found that the hardware of our two platforms had an impact on program size after compilation. For the MicaZ platform, the re-clustering algorithm requires about 126KB of RAM making it impossible to run due to the 4KB RAM limitation. However, TelosB consumes only 9.87KB after successful compilation of re-clustering and so can comfortably accommodate the re-clustering algorithm within the 10KB of RAM available. Reprogramming As mentioned in Section II, we needed to modify Deluge to reprogram individual sensors. This was done easily by replacing the AM_BROADCAST_ADDR parameter in the engine with the node-id of the node to be recovered. This allowed us to implement the reprogramming protocol effectively. Authentication In order to transmit an authentication message using the IEEE 802.15.4 MAC protocol, 64 bit data is required to be fragmented on the sender side (the base station) and re-assembled on the receiver side (at the node) because a payload unit is only 8 bits. Use of either 64-bit SHA-1 code or 64-bit Rabin code consistently resulted in error responses. We have discussed this problem with developers from the TinyOS community and believe that it could be happening because of a bug in the GCC compiler or could be due to some unidentified hardware constraints. The problem disappeared on both platforms when we reduced our transmitted data size to 32 bit Rabin authentication. Integration We encountered several problems during the integration phase Firstly, Deluge reprogramming methods use the original CC2420 driver from TinyOS Official while our re-clustering algorithm was designed based on a MAC protocol which uses a modified CC2420 chip driver in order to incorporate it into the IEEE 802.15.4 standard. In developing TinyOS, the driver set was not designed to be compatible with the IEEE 802.15.4 standard and since Deluge reprogramming was also designed based on this driver set, Deluge is not compatible with IEEE 802.15.4. In addition, Deluge requires large volumes of RAM and ROM on both our platforms: at least 1.13KB RAM and 31.95KB of ROM on MicaZ; 1.31KB of RAM and 37.40KB of ROM on TelosB. There are several possible solutions to these problems, including development of a new CC2420 chip driver set based on the IEEE 802.15.4 protocol in the first instance, and revising Deluge to significantly reduce the memory requirements in the second instance. Both of these solutions are outside the scope of this paper and will form the basis of our future work in this area. We also mention here two issues that arise from the hardware that had an impact on our protocols. Clock skewing In implementing the re-clustering algorithm, we encountered time latency due to the drift in the internal mote clocks because of limited processing capacity on both
Implementing Recovery in Low-Resource Stationary Wireless Sensor Networks
17
platforms, and also due to network traffic collisions. We were unable to directly deal with time latencies caused by the processing capacity, but covered this problem by allowing larger time intervals than might otherwise be used when dealing with traffic collisions. This process is described under the heading 'collision resistance' below. Collision resistance Traffic collisions occur when a mote attempts to receive more than one message simultaneously. The Medium Access protocol (MAC) supported by the underlying TinyOS operating on both platforms includes the standard CSMA/CA (Carrier Sense Multiple Access/Collision Avoidance) to manage the network traffic in an optimal way and so reduce the packet loss rate as much as possible. CSMA/CA is the channel access mechanism used by most wireless LANs; it specifies how the node uses the medium: when to listen, when to transmit etc. In addition to employing the MAC with CMSA/CA, we added a method similar to node beaconing used in [3]. To improve collision avoidance and allow for clock skew caused by the hardware, we set a small random variation for the time interval to prevent multiple nodes from broadcasting messages simultaneously. Table 2 summarizes the capacity of the two platforms to deal with the protocol components. Table 2. Capabilities of the two platforms (“Yes” means the algorithm is fully supported by a platform; “no” means the algorithm is not supported by the platform) Algorithms Deluge Reprogramming Authentication protocols Standalone 8bit SHA-1 Standalone 8bit Rabin Standalone 16bit SHA-1 Standalone 16bit Rabin Standalone 32bit SHA-1 Standalone 32bit Rabin Standalone 64bit SHA-1
TelosB Yes Yes Yes Yes Yes Yes Yes Yes Yes
MicaZ Yes Yes Yes Yes Yes Yes Yes Yes No
Standalone 64bit Rabin
Yes
No
Re-clustering Deluge Reprogramming + SHA-1 authentication Deluge Reprogramming + Rabin authentication Re-clustering + Rabin authentication Deluge Reprogramming + re-clustering
Yes Yes Yes Yes No
No Yes Yes No No
Deluge Reprogramming + authentication + reclustering
No
No
Reasons for “No”
Limited computing capacity Limited computing capacity Limited RAM size
Limited RAM size Different sets of CC2420 chip driver Limited RAM size and different sets of CC2420 chip driver
5 Comparison with Implementation Issues of Other Researchers The paper [8] implements 27 Crossbow Mica2 motes in a coal mine to detect collapsed areas and to maintain system integrity, especially in case motes are moved
18
B. Li, L.M. Batten, and R. Doss
or destroyed in a collapse. Thus they need to allow for frequent re-verification of neighbor nodes. As the neighbor table method we use is extremely expensive in such a situation, the authors replace this by a beacon method whereby nodes regularly report their existence to the network. In addition, since a collapse initiates a flurry of correspondence within the network, the authors must deal with a serious collision message problem; in fact flooding through the network can result. In order to reduce collisions, they employ two methodologies: randomized forward latency is used to hold back messages from the less important areas of the WSN after a collapse; data aggregation is the second method applied – motes at the edge of a collapse collect messages from the area and aggregate the data in them into a single report message to the base station. Data aggregation is a major component of our own methodology as the AGs of clusters perform this regularly. In [14], the authors implement data delivery mechanisms on MicaZ motes to determine performance limitations in obtaining ‘accurate modeling of wireless signal propagation behavior … and erroneous device behavior’. The authors point out that changes in the environment can affect WSN performance and so focus on faulttolerant routing. They use acknowledgment packets when finding traffic routes, despite the extra load on the network, arguing that the trade-off in reliability is worth it. However, their protocols limit acknowledgments to motes close to the destination of the transmitted packet. They also implement wait-time periods during which a mote listens for a response, but after which responses are ignored. Thus, the number of acknowledgements is kept relatively small. In evaluating their protocols, the authors measure the time required for a packet to traverse a WSN. However, this measurement is affected by clock skew – drift in the internal mote clocks. They compensate for this with linear regression methods. In our work, acknowledgment of packets for the purposes of authentication is necessary. Thus, where authentication is critical, we need all acknowledgments. We dealt with clock skew in our protocols by using time lags in order to allow for receipt of packets. Finally, the paper [3] discusses the use of Zigbee together with IEEE802.15.4 standards, and points out that they do not support time synchronization for cluster topologies. They propose a work-around for this, based on time divisioning beaconing, and test it on MicaZ motes supported by TinyOS. The protocol allocates a packet sending timing schedule to each AG and its cluster, in order to avoid collisions between packets sent within the cluster. A ‘super-frame’ schedule is also allocated across the WSN. Their protocol aims to align itself with the IEEE 802.14.5 protocol specification which, they point out, does not support a beacon-only approach. Thus, their work heads towards filling a gap in that standard. The authors of [3] identify clock skew as an issue as did the authors of [14]. The time division beacon frame scheduling mechanism in a Zigbee cluster-tree network suggested in [3] is used for a fixed cluster topology and is not compatible with our need to re-cluster a network periodically since the scheduling of traffic relies on a fixed set of parameters in a duty cycle. In order to use this method, the duty cycle parameters would have to be changed to match the new network structure at each reclustering. This is infeasible in our low-resourced setting. In summary, the main issues raised in the above papers are clock skewing and collision resistance. In addition, in our work, we cite the lack of support in the motes
Implementing Recovery in Low-Resource Stationary Wireless Sensor Networks
19
themselves for proper integration with standards defined by MACs, Zigbee and IEEE 802.15.4, thus making integration across some platforms difficult, if not impossible.
6 Conclusions and Recommendations In this paper, we presented an integrated recovery technique which was implemented on both TelosB and MicaZ motes in a low-resource WSN. We described the issues arising in integrating several separate protocols – re-clustering, reprogramming and authentication – using Zigbee with IEEE 802.15.4 and the reprogramming module Deluge, and compared our challenges and solutions with those faced by other researchers in this area. Based on this analysis and comparison, we now provide recommendations for consideration by researchers who may in future consider integrating several protocols on these platforms. • • • •
The use of data aggregation in order to reduce collisions. A reduction in the number of acknowledgment packets where possible. Integration of the time division beaconing methods of [3] with the IEEE specifications for routing. Recognition of the difficulties in integrating Zigbee and IEEE 802.15.4 across some platforms.
In particular, relevant to the last point, the TinyOS driver set needs to be redesigned so as to incorporate the IEEE 802.15.4 standard.
References [1] Jeong, J., Culler, D.: Incremental network programming for wireless sensors. In: Proceedings of the First IEEE Communications Society Conference on Sensor and Ad Hoc Communications and Networks IEEE SECON, pp. 25–33 (2004) [2] Karlof, C., Sastry, N., Wagner, D.: TinySec: A link layer security architecture for wireless sensor networks. In: SenSys 2004, pp. 162–175. ACM, New York (2004) [3] Koubaa, A., Cunha, A., Alves, M.: A time division beacon scheduling mechanism for IEEE 802.15.4/Zigbee cluster-tree wireless sensor networks. In: ECRTS 2007 (2007) [4] Krishnan, R., Starobinski, D.: Efficient Clustering Algorithms for Self-Organizing Wireless Sensor Networks. Journal of AD-Hoc Networks, 36–59 (2006) [5] Li, B., Doss, R., Batten, L., Schott, W.: Fast Recovery from Node Compromise in Wireless Sensor Networks. In: NTMS 2009, pp. 186–191 (2009) [6] Li, B., Batten, L., Doss, R.: Lightweight Authentication for Recovery in Wireless Sensor Networks. In: MSN 2009, pp. 465–471 (2009) [7] Li, B., Batten, L., Doss, R.: Network resilience in low-resource mobile WSNs. In: Proceedings of MobiSec 2010, paper 9047. ICST online publishing (2010) [8] Li, M., Liu, Y.: Underground structure monitoring with wireless sensor networks. In: IPSN 2007, Cambridge, Mass, pp. 69–78 (2007) [9] Maia, G., Guidoni, D.L., Aquino, A.L.L., Loureiro, A.A.F.: Improving an Over-the-Air Programming Protocol for Wireless Sensor Networks Based on Small World Concepts. In: Proceedings of the 12th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, pp. 261–267. ACM Press, New York (2009)
20
B. Li, L.M. Batten, and R. Doss
[10] Rabin, M.: Digitalized Signatures and Public-Key Functions as Intractable as Factorization. MIT Laboratory for Computer Science, 16 pages (January 1979) [11] Shamir, A.: SQUASH – A new MAC with provable security properties for highly constrained devices such as RFID tags. In: Nyberg, K. (ed.) FSE 2008. LNCS, vol. 5086, pp. 144–157. Springer, Heidelberg (2008) [12] Stathopoulos, T., Heidemann, J., Estrin, D.: A remote code update mechanism for wireless sensor networks, Technical report, UCLA (2003) [13] Wang, Q., Zhu, Y., Cheng, L.: Reprogramming wireless sensor networks: Challenges and approaches. IEEE Network Magazine 20(3), 48–55 (2006) [14] Wasilewski, K., Branch, J., Lisee, M., Szymanski, B.: Self-healing routing: A study in efficiency and resiliency of data delivery in wireless sensor networks. In: Proceedings of Conference on unattended Ground, Sea, and Air Sensor Technologies and Applications, Symposium on Defence & Security, Orlando Florida (2007) [15] Wu, J.: An enhanced approach to determine a small forward node set based on multipoint relay. In: Proc. IEEE Semi-Annual Vehicular Technology Conference, pp. 2774–2777 (2003)
Trusted Third Party Synchronized Billing Framework Using Multiple Authorizations by Multiple Owners (MAMO) Supriya Chakraborty1 and Nabendu Chaki2 1
JIS College of Engineering, Kalyani, W.B., India
[email protected] 2 Department of Computer Science & Engineering, University of Calcutta, W.B., India
[email protected] Abstract. Numerous applications are envisioned with the proliferation of mobile usage in this decade. Third party billing system for subscription is one of them. Third party billing system needs acquisition of data from both users’ handset and base satiation of the service providers. Trustworthiness and Synchronization are the characteristics that need to be ensured of the third party billing system. A scheme, Multiple Authorizations by Multiple Owners (MAMO) is proposed in this work to address trustworthiness of the third party billing system. Along with a billing framework is designed to overcome synchronization problem during acquisition of data from multiple sources. MAMO is illustrated for trustworthiness of the third party billing system in this proposed work. But it could be practiced in many applications e.g. collaborative working environment of document editing for protecting alternation of data without permission of author by non-authors within a file. Although the billing framework is illustrated for synchronization of data during acquisition of data from two sources, it could be extended to synchronization among multiple sources too. Keywords: Authorization, Ownership, collaborative, Read only, Add Beginning, Add End.
1 Introduction
T
hird party billing is increasingly being popular across the globe in many services including mobile billing system. Information like call duration as recorded in the base station is of course the primary quotient to compute the bill. Another aspect of billing is the availability of the network especially for post-paid connection. Monthly rental needs to be charged based on the availability of the network. Thus if network gets unavailable; the rental needs to be adjusted. In third party billing, such information is sent as text data to the third party billing section by the base station of the service provider. However, factors like the signal strength as perceived in the mobile devices could be significant especially when many calls were made during short period of time to a S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 21–30, 2011. © Springer-Verlag Berlin Heidelberg 2011
22
S. Chakraborty and N. Chaki
particular phone number. Such data is to be collected from the log of the mobile device. The final billing is done after reconciliation of the data from the subscriber’s hand set and that from the base station of the service provider. However, there is always a chance to distort the primary information intentionally or unintentionally. This may affect the consistency and trustworthiness of the business process. In order to maintain the credibility of the third party billing system, one need to ensure that whatever information is sent by the base station or the subscriber is not altered. However, new information needs to be added for bill generation, payment and other housekeeping jobs. This work is proposed to addresses that allows multiple authorizations for different sections of a text document using a mechanism namely MAMO. A section which is authored by an author only has an access right for alternation. This is to ensure that the integrity of the data is maintained as set by owners in different source points of the mobile network and sent to third party billing section for reconciliation before billing or rebate. MAMO could also be envisioned in collaborative text editing environment. This paper commence with the brief summary of the work, following the introduction section. Then related works are summarized just immediate below this paragraph. In the next section our proposed work MAMO is presented with different type of authorizations and then followed by the formulation of the solution. In the next section implementation of MAMO is done. Implementation comprises of software requirements, and diagrams. Finally conclusion is drawn of this work following the future scope of work. This paper ends with the reference section. A secure, private communication channel is always in demand to share data with others. A large majority of the applications use techniques like cryptography or, watermarking, steganography have been deployed for secret writing or covered writing respectively for different applications. The performance of loose and high coupling was evaluated to build hybrid network for 3G by [2]. Interestingly low response time, jitter and end to end delay was experienced with the loose coupling that definitely effects billing. The video call billing was proposed using an embedded watermarking technique by [3]. Each video sequence is transmitted by the embedded watermark to trace the network noise which is evaluated at the recipient end. Learning of new services in 3G environments has been proposed to be quickening by the usage of AJAX in [4]. A third party billing system has been proposed by [5]. Context has been enveloped to compute the real time billing. [6] proposes a content based billing was proposed by the least common multiple capacity algorithm in 3G environment. The performance result outperforms its contemporary. A different aspect of multi grade services was analyzed in IP multimedia subsystem to provide flexible charging scheme with QoS provisioning in [7]. Watermarking, a means of hiding copyright data within images, are becoming necessary components of commercial multimedia applications that are subject to illegal use [9]. Internet is overwhelmed by the digital assets like image, audio etc. In [9], a fragile watermarking scheme has been proposed that detects and locates even minor tampering applied to the image with full recovery of original work. Steganography [10] covers data within audio or video file. A radically different approach, cantered on natively representing text in fully-fledged databases, and incorporating all necessary collaboration support is presented in [1]. The concept and architecture of a pervasive document editing including management
Trusted Third Party Synchronized Billing Framework Using MAMO
23
system has been discussed in [11]. Database techniques and real-time updating are used to do collaborative work on multiple devices. COSARR combines a shared word-processor, chat-boxes and private access to internal and external information resources to facilitate collaborative distance writing [12]. However, multiple authorizations in synchronous writing involving multiple users have not been considered in either of [1, 11, or 12]. Other related important works include classification of text blocks of bibliographic items which compose understanding thesaurus [13]. This method includes incompletely recognized text distantly and utilizes it for reference. It improves the efficiency of digital library. Internet based collaborative writing tool with comparison results are shown in [14]. The Xaxis proposes a framework for web-based collaborative application in an Air Force dealing with dynamic mission re-planning [15]. A developing platform is proposed for collaborative e-commerce on which awareness is the key for users to learn all sorts of information about people and system in this environment [16]. It presents the analysis of traits of awareness information in collaborative e-commerce environment. An analysis method based on rules for software trustworthiness is proposed in [17]. It mainly focused on the trustworthiness of software component and lifecycle process. The trustworthy resource extraction rules, analyzing rules and synthesis rules was discussed. Our investigation reveals that there is no proposal in the literature till date to impose multiple authorizations on different segments by multiple owners within a document.
2 MAMO - Multiple Authorizations by Multiple Owners A trusted third party billing framework has been proposed based on securing the key parameters for billing using a novel, manifold authorization technique called Multiple Authorizations by Multiple Owners (MAMO). Each of the multiple messages is authorized by the author of the message. Data would be acquired from the base station as well as mobile handset by the third party billing section. Appropriate authorization of the data will be set priory by both the service provider (base station) and handset. Such authorization will remain enforced until and unless the author (owner) itself changes it. Message from the base station is considered as primary source for billing. Authorized data would reach via a message from the base station and handset too. Associated massages will be reconciled at the third party billing section. Different types of authorizations of MAMO are defined as below: • Read Only - The segment cannot be modified by any users. • Add Beginning – Text may be added only at the begging of the segment but nowhere else by any users. Existing text of the segment can not be modified. • Add End – Text may be added only at the end of the segment but nowhere else by any users. Again existing text of the segment can not be modified. • Add without Altered – Text may be added anywhere of the segment but existing text cannot be modified by any users. • Add with Altered – Existing text may be modified as well as new text could be added by the users.
24
S. Chakraborty and N. Chaki
Now the challenge is how to implement these authorizations and along with its ownership? The following segment is elaborately discussed with these issues. The proposed Multiple Authorizations by Multiple Owners (MAMO) methodology is described with a formal representation using the help of theory of automata. Consider the grammar G = (Vn, ∑, P, S) where Vn is a finite, nonempty set whose elements are called variables, ∑ is the set of alphabet | set of punctuation symbol | set of integers | φ, (where φ implies Null), P is a production rule that impose the authorization by the owner of the segment, and S is called the start symbol of Vn. In this context, the start symbol is implied as section with any granularities (a word, or line or paragraph, or entire document) on which authorization mode is embodied by owner. The authorization mode is checked before accessing the section. If imposed authorization mode is satisfied then user can only access the document. Access privileges are according to the authorization mode for example if owner specifies Add Beginning on the fragment, user can only add text before the fragment but nowhere else. The authorization mode is formulated by a set of production rules set by the owner. Now rules for different authorization mode are specified below: Read only mode is formulated by the following rules: S→S
…i
Add Beginning mode is formulated by the following rules: S → S0S
…ii
So → wSo | φ
…iii
where w is any string that in ∑*and φ implies Null. Add End mode is formulated by the following rules: S → SS0
…iv
So → wS0 | φ
…v
Add without alter mode is formulated by the following rules: S → S1S2S3……..Sn
…vi
S → S0S | SS0| S1S0S2….Sn| S1S2S0S3….Sn| ….|S1S2S3…S0Sn,
…vii
So → wSo | Sow| φ
…viii
Trusted Third Party Synchronized Billing Framework Using MAMO
25
Add with alter mode is formulated by the following rules: S→ S1S2S3……..Sn
…ix
Si → φ
…x
where Si denotes any of S1, S2, … , Sn S → S0S | SS0| S1S0S2….Sn| S1S2S0S3….Sn| ….|S1S2S3…S0Sn,
…xi
So → wS0 | S0w| φ
…xii
3 Billing Frameworks The proposed trusted third party billing mechanism is illustrated with the help of a block diagram in figure 1. Data regarding a call of base-station and subscribers’ hand set need to be acquisitioned by the third party billing section through messages. Before sending predefined authorization is imposed on concerned data within a message by the base station as well as subscribers’ hand-set. Messages are acquired and reconciled into one message by the third party billing site. First the message is checked to determine from which source it has come. After determining the type of the received message housekeeping information is added conforming to predefined authorization as per MAMO. Then associated message of the same call is searched in the following step. If the associated message is found both messages are reconciled into one. Otherwise, received message is put into the log for further processing. When associated message is arrived late compared to its counterpart into the billing site an interruption is occurred to get the already received message from the concerned log. Then do the same as mentioned in the above paragraph. If the associated message is not received within subscription period a policy decision need to be specified that would be transparent to all the parties. Human intervention is minimized in this framework. Addition and alternation within received message and reconciled message are allowed only as per MAMO. Reconciled message is archived for billing.
4 Implementation The following characteristics are envisioned on implementation of MAMO to build it professionally acceptable. • Imperceptible- MAMO (rules) is invisible to users and owners. Thus rules need to be mapped by invisible characters that are not shown in the editor. • Compatibility –Compatibility of authorization modes are shown in table 1. • Robustness – Rules should not be distorted by text processing functionalities. If any rule is distorted by text processing functionalities, then editor is unable to open the segment and consequently a semantic message is thrown. • Protection – Extraneous data are embedded with rules and then encryption is performed. The encrypted rules are inserted into the document.
26
S. Chakraborty and N. Chaki
Start Start Build text data with call records and set Add Beginning authorization at base station. Base Station Send authorized data through message to billing section.
Build text data with call parameters and set Read Only and Add End authorization Sent authorized data through message to billing section Subscriber Hand-set Acquiring authorized data from both ends in billing section
Yes
No
Message from base station?
Add housekeeping information at the begging of the context
Add housekeeping information at the end of the context
Third Party Billing Site
Is data from hand set available for corresponding No call?
Put into LOG for further verification
Yes
Is base station record found?
Yes
Reconcile call details Archive reconciled call records for billing
No Put into LOG for further verification
End
Fig. 1. Block diagram of the billing mechanism
Trusted Third Party Synchronized Billing Framework Using MAMO
27
Table 1. Compatibility of Authorization Modes AUTHORIZATION MODES
READ ONLY ADD BEGINNING ADD END ADD WITHOUT ALTER ADD WITH ALTER
READ ONLY
Yes Yes No No
ADD BEGINNING
Yes Yes Yes Yes
ADD END
Yes Yes Yes Yes
ADD
ADD
WITHOUT
WITH
ALTER
ALTER
No Yes Yes No
No No No No -
The Application has been simulated in Java environment. Rich text format is taken as message format and netBeans 6.5 is used as an IDE on Microsoft platform. Different authorizations on multiple messages in a single file are shown in figure 2. Authorization is imposed either on a line or a paragraph, or a set of paragraphs. A message contains different type of information in different parts, referred as sections in the rest of this document. Different access rights may be imposed on different sections by multiple sources. The source may be log information of the handset or calls details of the base station. As for example, call duration, source address, destination address, and network availability are acquired from the base station are all read only in nature. However, data could be added before all this information maintaining the read only authorization. Data on signal strength, signal to noise ratio etc are acquired from the handset also read only type located in another section of the message/file. But, data may be added at the end of this information. Little information needs to be added for housekeeping purposes by the third party and will be located in a third section of the message (see figure 2). All the fields here are assumed for the sake of explaining the concept. Based on the applications and the agreement of services the exact content of the messages would be determined. Exception Handling: A few typical questions on failure scenarios may occur. A couple of these may be considered. What will happen if handset message reach before the base station message? Or, what happen if handset message is lost? A solution to the first problem could be that handset message will be put into the log book for further verification. In the second scenario, a fresh request for the data from handset may be initiated depending upon the terms and conditions for connection. However, as the primary billing parameters are already available; bill may still be generated without reconciliation with handset message. 4.1 Result The experiment is simulated in Java environment. The numbers of mobile calls within a range of time are generated in random order. The range of time is fixed to ten minutes.
28
S. Chakraborty and N. Chaki
Information added at the begging Add beginning Read Only Information added at the end of the segment Add without alter
Add end Data acquired from the mobile handset are added to the message sent by the base station.
Fig. 2. Instance of a sample message for reconciliation
A random number is assigned at the beginning of every message. The random number is same in both base station message and handset message for every unique call. That random number is matched in both messages for determining whether they belong to the same phone call. Reconciliation is done on matched messages. The reconciled message is kept in a file for further usage. The average file size and memory usage of the reconciled message is shown in table 2. Table 2. Average Memory Usage and Average Message Size
No. of Messages in unit time
Average File Size in KB
Average Memory Usage in KB
1000 5000 10000 15000 20000
8.56 51.5 105 168 232
289.85 283.75 279.6 272.55 215.95
The file size is varied because it is assumed that different hardware and software configurations of the handset may keep different and variable length data. The reconciliation time is jotted down in figure 3.
Trusted Third Party Synchronized Billing Framework Using MAMO
29
Fig. 3. Y-axis implies average reconciliation time in nanosecond. X-axis implies number of messages in thousand.
The numbers of messages are generated in random order within ten minutes of time duration. Number of messages is implied by the number of phone calls made during that period. Here number of messages implies either the number of messages in the base station or subscribers’ handsets for simplicity. Total number of messages is basically double the number of calls held during the stipulated time. The performance of the billing framework grows linearly by the increasing number of phone calls. Total number of phone calls at a time within a cell of the mobile network is constant. The performance of the simulated billing framework is satisfactory for processing that constant number of phone calls at a time. Although in the real life situation input/output cost, traversal cost through network are much higher than our simulation work.
5 Conclusions The trustworthiness of the billing system is ensured by MAMO for which reputation of the business process would be improved. MAMO is also identified as a feature within collaborative working environment.
References [1] Hodel-Widmer, T.B., Dittrich, K.R.: Concept and Prototype of a Collaborative Business Process Environment for Document Processing. J. Data & Knowledge EngineeringSpecial Issue: Collaborative Business Process 52(1), 61–120 (2005) [2] Chowdhury, S.A., Gregory, M.: Performance Evaluation of Heterogeneous Network for Next Generation Mobile. In: Proc. of 12th International Conference on Computer and Information Technology, pp. 100–104. IEEE press, Dhaka (2009)
30
S. Chakraborty and N. Chaki
[3] Benedetto, F., Giunta, G., Neri, A.: A Bayesian Business Model for Video-Call Billing for End-to-End QoS Provision. IEEE Transactions on Vehicular Technology 58(2), 836–842 (2009) [4] Hu, J., Gong, C.: The 3G Mobile Learning Platform’s Building Based on AJAX Technology. In: 3rd International Conference on Advanced Computer Theory and Engineering, Chengdu, vol. 2, pp. 257–260 (2010) [5] Bormann, C.F., Flake, S., Tacken, J., Zoth, C.: Third-party-initiated Context-aware Realtime Charging and Billing on an Open SOA Platform. In: 22nd International Conference on Advanced Information Networking and Applications – Workshops, Okinawa, pp. 1375–1380 (2008) [6] Huang, H., Liao, J.-x., Zhu, X.-m., Zhang, L.-j., Qi, Q.: A Least Common Multiple Capacity Load-Balancing Algorithm for Content-Based Online Charging System in 3G Networks. In: 3rd International Conference on Communications and Networking, Hangzhou, pp. 548–552 (2008) [7] Barachi, M.E., Glitho, R., Dssouli, R.: Charging for Multi-Grade Services in the IP Multimedia Subsystem. In: The Second International Conference on Next Generation Mobile Applications, Services, and Technologies, Cardiff, pp. 10–17 (2008) [8] Hodel, T.B., Businger, D., Dittrich, K.R.: Supporting Collaborative Layouting in Word Processing. In: IEEE International Conference on Cooperative Information Systems, Larnaca, Cyprus. IEEE, Los Alamitos (2004) [9] Hassan, M.H., Gilani, S.A.M.: A fragile watermarking scheme for color image authentication. Proceedings of World Academy of Science Engineering and Technology 13 (May 2006) [10] Cox, I., Miller, M., Bloom, J., Fridrich, J., Kalker, T.: Digital Watermarking and Steganography. Morgan Kaughmann, USA (2008) [11] Leone, S., Hodel, T.B., Gall, H.: Concept and Architecture of an Pervasive Document Editing and Managing system. In: Proceedings of the 23rd Annual International Conference on Design of Communication: Documenting & Designing for Pervasive Information, Coventry, UK, pp. 41–47 (2005) [12] Jaspers, J.G.M., Erkens, G., Kanselaar, G.: Cosarr: Collaborative Writing of Argumentative Texts. In: Proceedings IEEE International Conference on Advanced Learning Technologies, Madison, WI, pp. 269–272 (2001) [13] Satoh, S., Atsuhiro, T., Eishi, K.: A Collaborative Supporting Method between Document Processing and Hypertext Construction. In: Proceedings of the Second Int’l Conference on Document Analysis and Recognition, pp. 535–536. IEEE publisher, Tsukuba Science City (1993) [14] Lowary, P.B., Numamaker Jr., J.F.: Using Internet-Based, Distributed Collaborative Writing Tools to Improve Coordination and Group Awareness in Writing Teams. IEEE Transactions on Professional Communication 46(4), 277–297 (2003) [15] Galime, M.P., Milligan, J.R.: Xaxis.: A Framework for Web-based Collaborative Applications. In: International Symposium on Collaborative Technologies and Systems, CTS 2007, pp. 389–395. IEEE press, Orlando (2007) [16] Qingzhang, C., Yan, J., Peng, W.: A framework of awareness for Collaborative eCommerce. In: 14th International Conference on Computer Supported Collaborative Work in Design, pp. 33–36 (May 2010) [17] Bao, T., Liu, S., Han, L.: Research on an Analysis Method for Software Trustworthiness Based on Rules. In: Proceedings of 14th International Conference on Computer Supported Cooperative Work in Design, pp. 43–47 (May 2010).
Random Variation in Coverage of Sensor Networks Thaier Hayajneh and Samer Khasawneh Department of Computer Engineering, Hashemite University, Zarqa, Jordan {Thaier,samerkh}@hu.edu.jo
Abstract. Wireless Sensor networks have become one of the most widely-used forms of ad-hoc networks which have countless number of applications in the modern life. Due to the unreliable communication medium and the failureprone sensors, coverage is an important functional property of a sensor network. Coverage is represented as a function of sensor density, which in turn depends on sensing range and the deployment function. The sensing range and the deployment function have random nature that can greatly affect sensor coverage. In this study, we subjected a densely deployed sensor network to stochastic variations in sensing range and deployment. Also, we captured deployment variations in space dimension by a random deployment pattern and a random noise. In addition, we considered time dependent random variation in noise and environmental factors. More specifically, we studied the effect of randomness resulting in a certain percentage of uncovered area in the network. Keywords: Wireless Network, Coverage, Connectivity, Deployment, Node Density.
1 Introduction The recent advances in CMOS technologies and wireless communications have enabled the development of tiny, battery-operated multifunctional sensor nodes [1]. These tiny devices are characterized by their low cost, short radio range, limited supply energy and inadequate processing capabilities. Providing unlimited number of applications, wireless sensor networks (WSNs) are composed of a large set of sensors deployed randomly or uniformly in a field, where air is the communication medium between them. Typically, sensor nodes have the ability of monitoring a wide variety of ambient conditions. Upon detecting an event, sensors cooperate together to deliver the sensed data to the base station (sink), thus, each sensor acts as data source and data router for other sensors data. The applications in which sensor networks are valuable include surveillance, agricultural, medical, army, fire fighting and many more. Sensor networks are subject to node failure due to their limited power supply of the sensor nodes. Thereby, coverage is by far the most important property of sensor networks that takes into consideration sensing ability of individual sensors in addition to the deployment strategy. Sensors coverage is the problem of node deployment for the purpose of sensing [2]. In order to explain the importance of the coverage problem, consider a sensor network deployed for the purpose of intrusion detection. This sensor network must report every intrusion detected immediately in order to take S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 31–41, 2011. © Springer-Verlag Berlin Heidelberg 2011
32
T. Hayajneh and S. Khasawneh
the suitable action. At any given time, if there is any part of the network that is not covered by sensor nodes, undesirable consequences may happen. Therefore, we must guarantee the continuous coverage for the whole field. Depending on the constraint of a given application, the coverage problem can be studied as worst-case coverage [3], deterministic coverage [3], [4] or stochastic coverage [3], [4], [5], [6], [7]. Moreover, apart from sensing function, a sensor network also needs to attain the required connectivity in order to relay the data to the appropriate nodes. Therefore, coverage and connectivity form the two basic density control properties of sensor network. Furthermore, a region is fully covered if there is at least one sensor to cover each point in that region. Extending this definition, a region is K-covered, if there are at least K sensors that cover every point in that region. Coverage can be defined in terms of percentage of the covered area since some applications may not require 100% coverage. In order to satisfy the coverage requirements aforementioned, the density of the deployed sensors along with the deployment strategy must be carefully examined. Apart from determining the sensor deployment density, there are other questions of vital importance. Of utmost vitality, is the energy efficiency, and connectivity insurance which can be attained through several protocols and algorithms at the MAC, topology control and routing layers [13][14][18]. While connectivity involves the sensors’ transmission range, coverage involves the sensing range. However, there is a relationship between the two properties as shown in the following equation [5]. Rsensing ≥ 2 Rtransmission
(1)
Thus, in order to maintain the network connectivity, each sensor must have a transmission radius that is double the sensing radius as shown in Fig. 1.
Rt
Rs
Fig. 1. The relation between the transmission range and the sensing range
Coverage and sensor density also has significant implications upon the design of MAC and routing protocols. Typically, in a dense system with stringent K-coverage requirement, high sensor density may lead to scalability and coordination problems. The remainder of this paper is organized as follows: the next section summarizes the related work. In Section 2, we introduce the concepts of deployment density, sensing radius, sensor footprint and network lifetime. Section 3 & 4 present the system model and the experimental results that support our model. Finally in Section 5, the conclusion and directions for future work are discussed.
Random Variation in Coverage of Sensor Networks
33
1.1 Previous Work Coverage problem represents one of the fundamental problems in sensor networks; thereby a large body of research has been conducted in this context. The sensor networks are energy constrained, hence, a small duty cycle in their sensing and transmission functions is preferable. The majority of the routing protocols focus on maintaining coverage (and thereby the connectivity) by conserving nodes energy during routing and also during sensing activities. For instance, PEAS protocol has proposed energy conservation mechanism for sensing functions via sleep cycles [14]. The protocol maintains working sensors while turns off unnecessary ones. This assumes a dense network in which turning off some sensors will not affect the network performance. On the other hand, OGDC protocol geometrically optimizes coverage via minimum overlapping [16]. In [8] the sensor network is partitioned into n non-intersecting cover set. Depending on the deployment, each cover set contains random number of sensors. Any sensor can perform the sensing task within the cover set, so it is possible to deactivate some sensors to save their energy if other sensors can perform the monitoring task. The monitoring task is rotated between the sensors so that their energy is consumed fairly. A distributed coverage algorithm based on a node scheduling scheme is proposed in [9]. The algorithm uses directional antenna to gather geographical information regarding the sensor nodes (in addition to arrival of angle information). This information is used to determine which sensors should be activated (working set sensors). GPS-based sensor networks are characterized by high cost and high energy consumption. High energy consumption may worsen the coverage problem in sensor networks. The work presented in [10] proved that the algorithm that is based on node scheduling has a low efficiency. K. Kar, and S. Banerjee studied the node placement strategy in order to attain connectivity in sensor networks [11]. The study involved sensing the field of interest with a minimum number of sensor nodes but guaranteeing the connectivity between these nodes. In order to achieve the connectivity, the authors used deterministic node placement, bearing the assumption of disk model for modeling the sensing area. [12] demonstrates the possibility of improving the sensor network coverage by using multi-hop routing features. The paper also studied the effect of path length on the network performance and thereby network coverage. It optimizes the coverage constraint to the limited path length. On the other hand, our study of coverage mainly focuses on random variation such as the uniform and the Poisson random node deployment on the percentage of the uncovered area. Besides extending network lifetime via energy control, some researches include several other constraints in order to determine sensor density, for instance, Shakkottai et. al [17] have assumed a binomial probability of death of a sensor in an unreliable environment. They have analytically derived a critical value of sensor density in a unit-sensing disc that will ensure coverage. To the best of our knowledge, this work is the closest to ours. We, however, considered random variations in the sensor network to impact the sensor density. The work proposed in [19] defines coverage as the ability to find a path through the sensor network nodes. The work addresses two coverage problems namely: the
34
T. Hayajneh and S. Khasawneh
best coverage problem and the worst coverage problem. The best coverage problem employs a localized algorithm that tries to find a specific path that will minimize the maximal distance between all nodes and the closest sensor. On the other hand, worst coverage problem employ localized algorithm that tries to find a path that maximizes the minimal distance between all nodes and the closest sensor. In our work, we consider the effect of white noise and cumulative effect of other environmental factors, on the sensing range. To the best of our knowledge, none of the previous works have considered the effect of noise and environmental factors on sensor density. The majority of the previous works focus on studying the influence of node deployment on the coverage and connectivity. More specifically, we study the way in which sensor density would respond to stochastic factor in space and time. We also relate the notion of network lifetime to coverage and senor density.
2 Sensor Coverage In our study of sensor network coverage, we include the concepts of deployment density, sensing radius, sensor footprint and network lifetime. In this section we briefly describe each of these concepts. 2.1 Deployment Density In our analysis, we assume the deployment density to be the number of sensing elements per unit length of the sensing distance. We quantify our deployment density by parameter A, which is the distance between any two adjacent sensors along the sensing line. The higher the deployment density, the lower will be the value of A and vice versa. In fact, the value of A depends on deployment density via an appropriate random density function. The density function of deployment can be either deterministic or random in nature. In our study, we consider random placement of sensors along the sensing line. We perform our study with deployment density chosen to be Uniform and Poisson distribution. The deployment density will be represented as uniform U(0,2A) and Poisson P(A) , using the parameter A. 2.2 Sensing Radius Sensing radius corresponds to the distance dependent capacity of a sensor to perform its sensing function. For our study, we assume sensing radius of the sensor to consist of a deterministic constant and a stochastic variable. We further assume that the deterministic constant to be equal to the sensing range specified by the sensor manufacturers. We denote the manufacturer specified sensing radius as S. The sensing radius also have a variable component in their sensing range which may depend at any instant on several external factors such as noise and environmental effects. External noise on a sensor is a collection of several factors that result in an independent and identically distributed effect on sensing radius. Such an external white noise is assumed to be Gaussian with zero mean and unit variance. The impact of such noise will either increase or decrease the sensing radius. This impact will be independently and individually for each senor in a random point of time. Thus, the
Random Variation in Coverage of Sensor Networks
35
effect of noise on sensor i is independent of the effect on sensor j. Likewise, the effect of noise on a sensor at time t1 is independent of its effect at time t2. Thus, the effect of noise will change the radius S, with a variance of S+1 or S-1 along the distribution N(S, 1). 2.3 Sensor Footprint Footprint of a sensor is the relative area covered by a single sensor with respect to the total area covered by a network of sensors. Footprint of a sensor decides the strength of an individual sensor within a network of sensors. Sensor networks with a smaller footprint constitute a densely deployed system. In our study, we assume a small footprint of sensors, this will help to highlight the impact of noise on the coverage. 2.4 Network Lifetime Lifetime of a network is closely related to the functions that the network performs. Typically, the main functions that a sensor network performs are coverage and connectivity. While coverage takes care of sensing function, connectivity relates to several other network related factors such as routing, energy conservation, topology control, medium access etc. In our case, we relate network lifetime to coverage and we assume a strong relation between coverage and connectivity. Thus a network is alive and functioning well, if it attains the required coverage that the application demand. With such a perspective on network lifetime, we quantify the coverage dependent network lifetime in our case, by the percentage of uncovered area along the one-dimensional sensing line. K-coverage is another level of complexity that we impose on uncovered area and network lifetime. In the case of a K-coverage situation, a certain percentage of the network area needs to attain K-coverage. K-coverage may be a redundancy requirement built into the application.
3 Experimental Design 3.1 Average Percentage Uncovered Since, the percentage of uncovered area is related to various parameters, we need to develop appropriate metrics and assumptions. We assume a small footprint system with a fixed sensing capacity. The sensing capacity depends on the internal noise limited sensing radius as developed by the sensor manufacturers. In our experiment, we assume a sensing radius of 20 unit and the sensing line to be 1000 units long. As we described earlier, the sensor deployment density is closely related to the sensing radius. For a 20 units sensing radius, 1000 units sensing length, and a deterministic placement, we need a sensor spacing of at least 40 units for full coverage. However, due to stochastic placement, we may not be able to keep the spacing to be exactly 40 units between every pair of sensors. For instance, in case of uniform deployment U (0, 80) the spacing can vary from zero to 80. In order to compensate a spacing of 80 units, and assuming a constant sensing range of 40 units, there is a need to deploy more sensors to cover the entire area.
36
T. Hayajneh and S. Khasawneh
The number of sensors that needs to be deployed is further aggravated if we consider the effects of environment and noise in addition to random spacing. The instantaneous sensing radius depends on time and space. The time varying environmental effects and noise variation, will cause the uncovered length between sensors to change with time and space. Thus, at any instance in time, we have a certain value of uncovered length between any two adjacent sensors. We are interested in the expected value of the percentage-uncovered area, considering these three effects. 3.2 System Modelling We will use the distance between any two adjacent sensors to model the uncovered area as a function of random deployment (Uniform and Poisson), sensing radius, noise and environmental effects. We relate the spacing distance A to the sensing range, which is related to a fixed sensing radius S. If the network is composed of N sensors, then for any sensor with fixed sensing radius S, we have a stochastic sensing range S i (t ) (i = 1,2,3….N). The variation in sensing range is due to the superposition of normally distributed noise and environmental effects on to the value of S. Thus at any point of time the instantaneous sensing range of the i th sensor is given by (2)
S i (t ) = S + N (0,1) + N (0,V ) Let
(2)
X i be the spacing between sensors i-1 and i. Since we assume
Uniform/Poisson distribution for sensor deployment with parameter A, and we have X i = U(0, 2A) or X i = Exponential(0,A). In order to cover this sensor spacing completely by the two adjacent sensors, For 1 sensor coverage:
X i = ( S i − 1)(t ) + S i (t )
(3)
For 2 sensors coverage:
2 * X i = ( S i − 1)(t ) + S i (t )
(4)
For K sensors coverage :
2 k −1 * X i = ( S i − 1)(t ) + S i (t )
(5)
Thus the uncovered length between sensors i and i-1 is given by:
U i (t ) = 2 k −1 * X i − ( S i − 1)(t ) + S i (t )
(6)
The spacing between any two sensors depends on A, and S i (t ) depends on S for fixed value of V (environmental effects constant), hence A and S are dependent and the system can be fully characterized by A. Moreover, we control environmental and noise effects on our system by controlling our deployment parameter A. In our study,
Random Variation in Coverage of Sensor Networks
37
we quantify the network lifetime by the mean value of U i (t ) which relates to the system variable A. The percentage of uncovered length = Expected [ U i (t ) ] / Total length.
4 Results In this section we present the experimental results in order to validate the proposed system model in section 3.2. The simulations analyze the impact of several factors on the uncovered area. Energy efficiency, network throughput, and fault tolerance are generally considered to be the main metrics of sensor networks performance. Having dense deployment (i.e. low deployment factor and high K values) is expected to improve these metrics. 4.1 Studying the Effect of Random Deployment and White Noise Figure 2 shows the percentage of uncovered area (UA) for various values of deployment factor A using uniform node distribution. The figure shows that the uncovered length increases as the average deployment factor increases. More specifically, as the deployment factor increases, the distance between the adjacent sensor increases and thereby the percentage of uncovered area will also increase. For smaller values of deployment factor, a random reduction in white noise compensates the variations in deployment. The reason of that, is because the noise is represented as N(0,1), where there is a chance for the noise to take negative value. Hence, UA drops rapidly (almost linearly) for low values of A. For higher values of A, a reduction of UA requires a larger decrease in A. For higher values of A, compensating the effect of noise is less powerful compared to the case with lower values of A. The same result is shown in Figure 3 but for Poisson distribution. However, the results show that the percentage of UA increases faster with the increase of the average deployment factor in the case of Poisson distribution.
Fig. 2. Percentage of uncovered length for uniform deployment (K=1, N(0,1))
38
T. Hayajneh and S. Khasawneh
Fig. 3. Percentage of uncovered length for poisson deployment (K=1, N(0,1))
4.2 Studying the Effect of K Coverage Figure 4 shows the percentage of uncovered area for different values of K. It is clearly shown that for K=2 and K=4 the percentage of UA is much larger than the case with K=1 coverage. The reason is that it is more likely to violate the K coverage criteria for higher values of K. Thus, in order to provide K-coverage redundancy for some applications, we may need much larger deployment rate of the sensors in the network.
Fig. 4. Percentage of uncovered length for different values of K
4.3 Studying the Environmental Effects Compared to the impact of noise, the environmental effects will affect all sensors equally. The environmental effect is given by equation (7). Figure 5 shows the
Random Variation in Coverage of Sensor Networks
39
relationship between the percentage of UA with respect to the environmental effects. The percentage of UA increases as the environmental effects increase. As discussed in the previous section, noise may have negative excursion from its mean value. However, similar effect is unlikely to exist for environmental effect.
/
(7)
Fig. 5. Environment effect for different values of UA using uniform distribution (A = 40 and K=1)
Generally, environmental effects have more significant effect on the stochastic variability compare to the pure noise. This is clearly shown from the high values of V for the environmental effects. Figure 6 shows the effect of V on the percentage of uncovered area with the deployment factor. In this case (with environmental effects), the percentage of UA increases faster with the increase of the deployment factor compared with the previous cases. In Figure 6 we have considered random uniform deployment, white noise, and environmental effects.
Fig. 6. Percentage of uncovered area for different deployment factor using random distribution (V=20, K=1)
40
T. Hayajneh and S. Khasawneh
5 Conclusion and Future Work The results driven from the analytical and experimental analysis on random variations in sensor network coverage, recommends that high degree of over engineering might be required for a system to attain homogenous coverage. The degree of over engineering is particularly crucial for small footprint networks, where the number of sensors per unit length is very high. Large number of sensors is needed to achieve sensor network functionality such as coverage. This may also have significant implications on the design of MAC and routing protocols. Our future work will be to extend the analytical analysis to cover two dimensional area. We also plan to extend our analysis to study non-stationary, time dependent variation on the values of sensing coverage. Moreover, we will consider other deployment distributions such as the Gaussian distribution. Furthermore, we plan to study the impact of coverage on other performance metrics such as the energy efficiency and the packet delivery ratio.
References [1] Akyildiz, I.F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: Wireless Sensor Networks: A Survey. Computer Networks Journal 38(4), 393–442 (2002) [2] Lazos, L., Poovendran, R.: Coverage in Heterogeneous Sensor Networks. In: 4th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks, 2006, April 03-06, pp. 1–10 (2006) [3] Koushanfar, F., Meguerdichian, S., Potkonjak, M., Srivastava, M.: Coverage Problems in Wireless Ad-Hoc Sensor Networks. In: Proceedings of the IEEE INFOCOM 2001, pp. 1380–1387 (March 2001) [4] Liu, B., Towsley, D.: A Study of the Coverage of Large-scale Sensor Networks. In: Proceedings of MASS 2004 (2004) [5] Huang, C., Tseng, Y.: The Coverage Problem in a Wireless Sensor Network. In: Proceedings of WSNA 2003, pp.115–121 (2003) [6] Miorandi, D., Altman, E.: Coverage and Connectivity of Ad Hoc Networks in Presence of Channel Randomness. In: Proceedings of the IEEE INFOCOM 2005, pp. 491–502 (March 2005) [7] Xing, G., Wang, X., Zhang, Y., Lu, C., Pless, R., Gill, C.: Integrated Coverage and Connectivity Configuration for Energy Conservation in Sensor Networks. ACM Transactions on Sensor Networks 1(1), 36–72 (2005) [8] Cardei, M., MarCallum, D., Cheng, X., Min, M., Jia, X., Li, D., Du, D.: Wireless Sensor Networks with Energy Efficient Organization. Journal of Interconnection Networks 3(3-4), 213–229 (2002) [9] Tilak, S., Abu-Ghazaleh, N.B., Heinzelman, W.: Infrastructure tradeoffs for sensor networks. In: Proceedings of First International Workshop on Wireless Sensor Networks and Applications (WSNA 2002), Atlanta, USA, pp. 49–57 (September 2002) [10] Younis, O., Fahmy, S.: Distributed Clustering in Ad-hoc Sensor Networks: A Hybrid, Energy-Efficient Approach. In: Proceedings of IEEE INFOCOM (March 2004) [11] Kar, K., Banerjee, S.: Node Placement for Connected Coverage in Sensor Networks. In: Proceedings of WiOpt 2003 (March 2003) [12] Haas, Z.J.: On The Relaying Capability Of The Reconfigurable Wireless Networks. In: IEEE 47th Vehicular Technology Conference, vol. 2, pp. 1148–1152 (May 1997) [13] Estrin, D., Govindan, R., Heidemann, J., Kumar, S.: Next Century Challenges: Scalable Coordination in Sensor Networks. In: Proc. of ACM MobiCom 1999, Washington (August 1999)
Random Variation in Coverage of Sensor Networks
41
[14] Tian, D., Georganas, N.D.: A coverage-preserving node scheduling scheme for large wireless sensor networks. In: Proc. of the 1st ACM International Workshop on Wireless Sensor Networks and applications (2002) [15] Ye, F., et al.: PEAS: A robust energy conserving protocol for long-lived sensor networks. In: ICNP (2002) [16] Zhang, H., Hou, J.: Maintaining Sensing Coverage and Connectivity in Large Sensor Networks. In: Proceedings of Mobicom (2003) [17] Shakkottai, S., Srikant, R., Shroff, N.B.: Unreliable Sensor Grids: Coverage, Connectivity and Diameter. In: Proceedings of IEEE Infocom (2003) [18] Ye, W., Heidemann, J., Estrin, D.: An energy-efficient MAC protocol for wireless sensor networks. In: Proceedings of IEEE Infocom, New York, NY (June 2002)
Authors Thaier Hayajneh Thaier Hayajneh is Head and Assistant Professor of the Computer Engineering Department at the Hashemite University. At HU Dr. Hayajneh regularly teaches courses on network security, computer networks, network programming, artificial intelligence and C++ programming. Dr. Hayajneh received his Ph. D. and MS degrees in Computer and Network Security from the University of Pittsburgh, PA, USA in 2009 and 2005, respectively. He also received his MS and BS in Electrical and Computer Engineering from Jordan University of Science and Technology, Irbid, Jordan, in 1999 and 1997, respectively. He also received the Information Assurance certification in 2007 from the National Security Agency, USA. His current research Interests include: computer networking, including mobile wireless ad hoc and sensor networks, information assurance and security, network security and privacy, wireless security, and system modeling and simulation. Samer Khasawneh He received his BSc and MSc in Computer Engineering from Jordan University of Science and Technology (JUST), Irbid, Jordan in 2006 and 2009 respectively. Currently he is working as a Full-Time Lecturer, Department of Computer Engineering, Hashemite University, Zarka, Jordan. His research interests focus on Computer Networks and Wireless Communications.
A Study of Applying SIP Mobility in Mobile WiMax Network Architecture Yen-Wen Chen, Meng-Hsien Lin, and Hong-Jang Gu Department of Communication Engineering National Central University 320 No. 300, Jhong-Da Road, Jhong Li City, Taiwan ROC
[email protected] Abstract. This paper studies cooperation of the ASN anchored mobility and SIP terminal mobility to improve the handoff performance in WiMax network. In addition to integrating the WiMax mobility and SIP mobility procedure by using the B2BUA concept, the proposed scheme investigates the traffic load so that the handoff dropping rate can be reduced. As the B2BUA acts as a proxy to handle the link layer and application layer handoffs, the handoff decision and policy can be performed through heuristic manner. Experimental simulations were performed and the results show that the proposed scheme can effectively utilize the traffic links to achieve the above objective. Keywords: WiMax, Session Initiation Protocol, Mobile Network, Anchor Mobility.
1 Introduction In recent years, new applications, such as Voice over IP (VoIP), Video on Demand (VoD), on line game, and Social Network Services (SNS), are continually developed to meet emerging service demands. These applications, when operate in mobile environment, need seamless handoff to achieve high quality service in practical wireless networks. The Worldwide Interoperability for Microwave Access (WiMax) [1, 2] has been proposed to support broadband services in mobile environment. In order to provide complete end to end service scenario, the network is divided into access service network (ASN) and connectivity service network (CSN) with several control modules inside them to deal with the required signaling functions. The end to end architecture provides the ASN anchored mobility and the CSN anchored mobility for efficient handoff. The ASN anchor mobility keeps the path, which is prior to the anchor point, unchanged and, therefore, has smooth transmission performance and lower handover latency. However, the path between mobile node and CN may not be the optimal arrangement and may waste network resources. On the other hand, CSN anchor mobility is more efficient from the resource utilization point of view, however, the end to end path needs to be re-routed [3,4] and introduce longer handoff latency. S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 42–53, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Study of Applying SIP Mobility in Mobile WiMax Network Architecture
43
The WiMax Forum Network Working Group proposed the client based and proxy based Mobile IPv4 (MIPv4) and Mobile IPv6 (MIPv6) to cooperate with existing link layer mobility. However, the mobile node shall be equipped with mobile IP capability to retain the connectivity at network layer. Basically, it is not practical to assume that each mobile device could support mobile IP capability in real environment. Because the network layer function always tightly coupled with the software kernel in most mobile devices and it is not easy to be designed by most people. Therefore, providing handoff capability at application layer may be an alternative way to achieve it without changing network protocol. The Session Initiation Protocol (SIP) [5] is basically a user based mobility approach. The user identification, or alias, is the key to maintain the service connectivity and the IP mobility is not essential. By registering the current location (e.g. IP address) of the user identification at the SIP server, other user can easily query user location at the SIP server and communicate with user in a peer to peer manner. Furthermore, the re-invite procedure provides the user to inform user peer of location change during movement. However, as SIP is the application layer protocol, the SIP based mobility cannot improve the performance of latency and packet loss suffered from connection handoff [6, 7]. If using the soft handoff approach, which establishes the new connection before breaks the old connection, it is helpful to improve the performance of latency. In this paper, we adopt the hybrid SIP and anchor based scheme as the basic concept for soft handoff and propose the back-to-back user agent (B2BUA) at ASN node, named as B2BUA-ASN node, as a proxy role to send invite and the re-invite message to CN to achieve handoff efficiently. A B2BUA is a logical network entity in SIP environment. It can receive the UAC message and use the Call Control Logic to transmit the other UA. The B2BUA can initiate the new call and modify the SIP message. Thus, the mobile node can be mobile IP unaware in the B2BUA-ASN node. In addition, as the ASN node also plays the role of anchor point for connection handover [8, 9, 10], the B2BUA-ASN node can consider the traffic condition to decide whether the handover connection is ASN anchored or not in a heuristic manner so that the network resource can be more effective utilized. This paper studies the operational scenario of B2BUA-ASN based on the existing WiMax handover procedure.
2 Overview of Mobility Management and Related Works The mobility management is one of the important and critical issues in the wireless network environment. Because it covers several protocol layers and network interfaces. And the handoff performance is measured by several criteria. Generally, the handoff latency and packet loss during handoff are the performance indexes that are frequently analyzed. It is noted that the network load, including the network node and the link bandwidth, shall be considered in handoff. The WiMax forum defines a system wide architecture with several reference interfaces. And the layer 2, link layer, mobility has several scenarios and may occur at each reference interface as shown in Figure 1.
44
Y.-W. Chen, M.-H. Lin, and H.-J. Gu
Fig. 1. WiMax Mobility Scenarios
In addition to the link layer handoff, the WiMAX end to end architecture documents [10, 11] also define the network layer mobility. z Link Layer Mobility (CSN/ASN anchor) In the WiMAX network architecture, it proposes two scopes of mobility. One is ASN anchored mobility and the other is CSN anchored mobility. ASN anchored mobility refers to a set of procedures associate with the movement between two BSs. The target BS can be in the same ASN or in a different one. The anchor ASN remains identical during ASN anchored mobility. ASN anchored mobility takes place when the MS moves to a neighboring BS that connects to a different ASN GW, which may be within the serving ASN or not. If the target BS belongs to a different ASN, the target ASN-GW will establish a R4 data path to the anchor ASN. Then the MS can avoid data loss and the QoS could be guaranteed as well. During the ASN anchored handoff procedure, the MS does not change its CoA. For ASN anchored mobility, traffic to the MS comes from the anchor ASN. CSN anchored mobility involves in anchor ASN relocation, which means anchor ASN is changed after CSN anchored mobility. After CSN anchored mobility execution, MS will update its CoA to the target ASN. The CoA renewal process uses Client MIPv4 (CMIPv4) or Client MIPv6 (CMIPv6) or Proxy MIPv4 (PMIPv4) protocol. And the network will trigger the data path function relocate so that the CSN will directly transmit the data to the target ASN. The Target ASN-GW is now become the anchor ASN-GW and serving ASNGW to the MS. The procedure is known as R3 re-anchoring, because the traffic to MS comes from the new anchor point, which is a new R3 link. When CSN anchored mobility is executed, it is involved in CoA update. ASN Anchored Mobility prior to the CSN anchored mobility minimizes the handoff delay and packet loss.
A Study of Applying SIP Mobility in Mobile WiMax Network Architecture
45
z Network layer (CMIP, PMIP) CMIP is used in the standard (RFC-3344) of the Mobile IP technology. The MN must support Mobile IP functionality. When MS discovered that they have moved, they will use the Mobile IP technology to update the network status. Proxy Mobile IP (PMIP) provides mobility support without change these devices network layer protocol. It is used in WiMAX, 3GPP and 3GPP2 architectures. PMIP network is the Network infrastructure help MN achieve the purpose of MIP. The MN doesn’t need to have the MIP ability, simply follow the original usage of the network (such as DHCP agreement), PMIP include a MAG (Mobility Access Gateway) into the Mobile IP architecture which interacts with the HA and the LMA (Local Mobility Anchor) is on behalf of the Mobile Node. MAG is an element on an access router that manages the mobility for a MS. LMA is the home agent for the MS in PMIP. z Application Layer (SIP) The SIP is an application-layer protocol, which is used in establishing and tearing down voice or video calls. SIP has high flexibility and high extension advantages. SIP supporting VoIP mobility is more important increasingly and supports four mobile Mechanisms: 1. Terminal mobility: Allow terminal device can move different domain or subnet. To ensure the packet can still deliver correctly, or the device can maintain the current participants of the session. 2. Session mobility: Allow the user switch to different terminal device during the SIP session, but the user is still able to maintain call quality keep the connection. 3. Personal mobility: Allow different terminal devices can use the same SIP address or a terminal device can use multiple SIP addresses at the same time. For Example: a SIP user can use one SIP URI to register the PC and mobile phone, users can only remember the URI. 4. Service mobility: Allows the user switches in different network, different network devices, different Internet service is still able to retain the original services. The paper presented SIP mobility is Terminal Mobility. SIP (Terminal) Mobility Allow terminal device can move different domain or subnet. So the user can use the re-invite method to ensure the packet can still deliver correctly, or the device can maintain the current participants of the session. Although, PMIP is the network-based approach to help MS achieve the purpose of IP mobility and the MS doesn’t need to have the MIP ability. But not all network infrastructures will support PMIP. It is depend on operator. When using the VoIP application, we must use the SIP to initialize communication and we find the SIP mobility is an application layer protocol. The advantage of SIP mobility is the network protocol of MS and network do not require any modification and it can be deployed on the network fast and easily. SIP register approach is similar to MIP approach. But in SIP register approach, the MS does not need fix HoA. Consequently, we propose a combing the ASN, CSN Anchor and SIP Mobility and also builds the B2BUA component into the existing WiMAX architecture to overcome existing drawbacks.
46
Y.-W. Chen, M.-H. Lin, and H.-J. Gu
3 The Proposed Hybrid SIP/Anchor Mobility Management Schemes This paper proposes a hybrid architecture, which dynamically adopts SIP mobility through B2BUA and anchor mobility at the ASN component, to improve the handoff performance. When the MS powers on, it access WiMAX network and complete the ND&S (Network Discovery and Selection). Then the MS performs the Station Basic Capability (SBC) negotiation and authentication and notify the core network that MS support capability of SIP Mobility. We add two bits of the optional filed on extended capability [9] to specify whether the MS supports the SIP mobility and network assistant SIP mobility or not. After the completing the access authentication, the MS take the connection setup process. When the MS gets the IP address, then MS register to the SIP server. The MS can individually perform network layer mobility if it is equipped with CMIPv4 capability after the SIP connection is established as shown in Figure 2.
Fig. 2. Connection setup – SIP mobility (CMIP4)
In WiMAX Stage 2/3 definition, the core network can also support proxy mobile IP (PMIP) to assist the MS that has no mobile IP (MIP) capability. And both MSs of a SIP connection need MIP functions within the mobile environment if they want to be handoff at network layer. Alternatively, the SIP mobility provides the application layer mobility when the MS has no MIP capability. We propose to support the B2BUA function at the ASN component as shown in Figure 3. The B2BUA acts as a UA point of the correspondent node as well as a UA end point of the MS.
A Study of Applying SIP Mobility in Mobile WiMax Network Architecture
47
Fig. 3. Network assistance SIP mobility in WiMAX network with B2BUA
Without network layer capability, the MS will initiate SIP re-invite procedure when it moves to the other routing domain and receives a new IP address through the DHCP server. The MS is only notified of the new IP address, however, the network layer will not report the CoA to its correspondent node due to lack of MIP capability. Instead, the MS will send the re-invite message to its peer, i.e. the B2BUA, for the new IP address. As the layer 2 handoff shall occur when the routing domain changes. Therefore, the cross layer handoff decision will be considered at the ASN GW. Upon the receipt of the re-invite message, the ASN GW can either decide ASN anchor mobility without resend the re-invite message to the correspondent SIP client or resend the re-invite message to the correspondent SIP client for SIP mobility. If the ASN anchor mobility is taken, there will be no change at the correspondent SIP client, only the layer 2 handoff will occur between the moving MS and the ASN GW. In this case, the path between the correspondent SIP client and the previous ASN GW keeps the same and a tunneling data path will be established at R4 interface (i.e. between the previous ASN GW and the serving ASN GW). On the contrary, if SIP mobility is taken, the B2BUA at the serving ASN GW will send the re-invite message to the correspondent SIP client to report the new IP address. In this case, R3 interface mobility occurs and the tunneling data path is not necessary. Thus, the B2BUA has the flexibility to trade off the routing path during handoff. The handover decision proposed in WiMax Stage 2/3 documents [11, 12] is to make the ASN Anchored Mobility first, and then the CSN Anchored Mobility. The reason is ASN Anchored Mobility doesn’t change IP, so it can complete the handover fast and reduce the handover latency. However, the ASN Anchored Mobility must establish the tunnel between previous (old) ASN-GW and the serving (new) ASNGW. It increases the ASN-GW loading and the packet of tunnel overhead. If the MS moves far away the Anchor ASN-GW, it will cause a longer tunneling path and increase the transmission overhead. Regarding CSN Anchored Mobility (i.e. without tunneling path), its obvious advantage is to reduce the network resource as well as the
48
Y.-W. Chen, M.-H. Lin, and H.-J. Gu
load of ASN-GW. But the disadvantage is that the MS needs longer time to complete handover. The SIP Mobility is regarded as a kind of CSN anchor architecture, which is also named as R3 Re-anchor architecture. In addition to the proposed B2BUA architecture, a load balance based handoff decision scheme is proposed in this paper. Generally, consider the downstream traffic, the arrival traffic of ASN GW may be either from R3 interface or the tunneled R4 interface. And as mentioned above, the load of R4 interface increases if the ASN anchor mobility is applied. Therefore, the decision of ASN anchor handoff or SIP mobility shall take the R3 and R4 load into consideration to achieve balance link load. Let α and β be the thresholds of R3 interface and R4 interface loads, and x and y be the current loads measured by the serving ASN GW in R3 and R4, respectively. The link capacities of R3 and R4 are assumed as C R 3 and C R 4 . Then the proposed heuristic decision approach is as follows:
iii.
( x ≥ α ) ∩ ( y < β ) Æ ASN anchor mobility if ( x < α ) ∩ ( y ≥ β ) Æ SIP mobility if ( x < α ) ∩ ( y < β ) Æ ASN anchor mobility (if ( x / α ) > ( y / β )) ; SIP
iv.
mobility (otherwise) if ( x ≥ α ) ∩ ( y ≥ β ) Æ ASN anchor mobility (if
i. ii.
if
( x / C R 3 ) > ( y / CR 4 ) ;
SIP mobility (otherwise) It is noted that the handoff connection will be dropped if there is no available bandwidth in R3 and R4 interfaces.
4
Experiments and Simulation Results
In order to evaluate the performance of the proposed scheme, exhaustive simulations were carried out. Our simulation topology is shown in Figure 4. The environment includes seven ASNs and every ASN has one ASN-GW and one BS. All ASN will connect to the Internet through the egress router. The ASN profile B, which means the interfaces between ASN-GW and BS are a black box, was adopted in our simulations. Because we discuss the core network loading balance problem, we did not consider the R6 interface between the ASN-GW and BS in our simulations. We assume that each ASN-GW and ASN GW pair has separated R4 bandwidth. And the bandwidth of each R3 interface from each ASNGW to the egress router is independent to each other. Figure 5 illustrates the arrangement of the above seven ASN-GW from top view. Every BS may adopt one of the four modulations, which are 64QAM, 16QAM, QPSK and BPSK, according to channel condition between MS and BS. In the neighborhood BS, every modulation has some overlaps. The MS may initiate handoff when it travels in this overlap area. The channel condition is assumed to be proportional to the distance between MS and BS and the modulation and coding scheme is dynamically adapted according to the channel condition as shown in Table 1. The MS is assumed to move according to the random way point mobility model in our simulations. The frame duration is 5ms.And
A Study of Applying SIP Mobility in Mobile WiMax Network Architecture
49
Fig. 4. The simulation architecture
the mobility related parameters are listed in Table 2. The simulation results of the proposed hybrid ASN+SIP scheme is compared with the drastic algorithm, which ignores the link loads and adopts either the ASN anchored Mobility (ASN only) or SIP Mobility (SIP only), to illustrate the performance of our approach.
Fig. 5. Top view of the simulation environment
50
Y.-W. Chen, M.-H. Lin, and H.-J. Gu Table 1. Modulation and BS coverage radius Coverage Radius
Inner circle
Second circle
Third circle
Forth circle
Modulation/Coding
64QAM
16QAM
QPSK
BPSK
Table 2. Parameters of mobility model Position Speed Direction Halt time
Max Retry Retry Interval Model Threshold
Uniform Distribution 0 ~ 120Km/Hr Uniform Distribution 0 ~ 360 degree Uniform Distribution Exponential Distribution Mean value : 15 second Upper Bound : 30 second Lower Bound : 3 second 3 200ms Random Waypoint Mobility 80%
As mentioned in previous section, a handoff connection will be dropped if there is no available R3 or R4 bandwidth. However, the handoff connection will also be dropped if the bandwidth of R1 air interface is saturated. Therefore, we performed two experiments to verify the performance of the proposed scheme. In experiment 1, the bandwidth of R1 interface is assumed to be unlimited. This arrangement is to prevent the limited R1 capacity from affecting the dropping rate introduced by the insufficient R3 and R4 links. In experiment 2, the bandwidth of R1 interface is constrained, which is more comparative to the practical environment. The number of mobile stations is 100 in both experiments. z Experiment1: The requested bandwidth of each MS is Gaussian distributed with various mean loads from 1 to 13Mbps, where the standard deviation is one. The bandwidth capacities of R3 and R4 are 200 Mbps and 15 Mbps, respectively. The values of α and β are set to be 80% of the link capacity. As one ASN-GW connects to six ASN-GWs, the R4 of ASN-GW total bandwidth of ASN 3 is 90 Mbps. The simulation is illustrated in the Figure 6. Figure 6 indicates that we the dropping rate of the ASN anchored mobility only scheme is the highest. The main reason is due to the limited bandwidth capacity in R4 interface. The SIP mobility only and the proposed ASN+SIP scheme have the similar dropping rate. Because the capacity of R3 interface is much larger than the R4 interface and the SIP mobility scheme always choose SIP mobility and, therefore, the dropping rate decreases. And the proposed scheme can dynamically choose R3 or R4 interface, its dropping rate is also lower accordingly.
A Study of Applying SIP Mobility in Mobile WiMax Network Architecture
51
Fig. 6. Experiment 1 simulation results (unlimited R1 bandwidth)
z Experiment 2: In this experiment, we constrain the R1 interface capacity to be 100Mbps and the mean bandwidth of each MS requested varies from 1 to 6 Mbps. The main reason of reduce the mean load of each MS is due to the limited bandwidth in air interface. The other simulation parameters are the same as in experiment 1. The purpose of this experiment is to investigate whether the limitation on the R1 capacity affects the performance for those schemes or not. The simulation results of the three schemes are illustrated in the Figure 7.
Fig. 7. Experiment 2 simulation results (R1 bandwidth=100 Mbps)
Figure 7 shows that the dropping rate of the ASN only scheme still much higher than the SIP only and the proposed ASN+SIP schemes. Compared with Figure 6 for the mean load of 6 Mbps, the dropping rates of the SIP only and ASN+SIP schemes approach 0 in experiment 1 and it is about 3% in experiment 2. However, the dropping rate of ASN only scheme increases from 10% to 15%. It indicates that although the R1 capacity affects the dropping rate, however, the dropping rate will increase dramatically if the selection of R3 and R4 interfaces for the handoff connection is not properly considered.
52
Y.-W. Chen, M.-H. Lin, and H.-J. Gu
It shall be indicated that although the SIP only and the proposed ASN+SIP schemes have the similar dropping rates for both experiments, it is clear that the handoff delay and packet loss rate of the proposed scheme shall be lower than the SIP only scheme. Because the ASN always needs to initiate re-invite procedure during handoff.
5
Conclusions
The paper studies the mobility scenarios in WiMax networks and proposes a feasible architecture to improve the mobility performance. There are two main contributions of this paper: (1) Propose a B2BUA component in the WiMax architecture to achieve application layer mobility so that MS needs not to be equipped with mobile IP capability; (2) The proposed hybrid scheme provides the flexibility for network resource arrangement so that the network resource can be evenly allocated. The B2BUA in the WiMax network provides a proxy function to send re-invite message to the correspondent node. This approach reduces the transmission of packets in the air interface, therefore, the handoff latency and the loss of control message can be minimized. The simulation results also reflect the objectives of the proposed scheme. The parameters of α and β adopts in this paper only consider the bandwidths of link capacity for the decision of handoff types. These two parameters and the interface selection (the decision handoff types) can be designed more sophisticated. Thus, the B2BUA may switch a connection with ASN anchor mobility to SIP mobility when the link utilization is too high even the layer 2 handoff does not occur. Owing to the B2BUA, more heuristic handoff consideration will be feasible and this is our on-going works. Acknowledgments. This research work was supported in part by the grants from the Ministry of Education and National Science Council (NSC) (grant numbers: NSC 972221-E-008-033, NSC 98-2221-E-008-063, and NSC 99-2221-E-008-005).
References 1. IEEE Standard for Local and metropolitan area networks Part 16: Air Interface for Fixed and Mobile Broadband Wireless Access Systems Amendment 2: Physical and Medium Access Control Layers for Combined Fixed and Mobile Operation in Licensed Bands and Corrigendum 1, IEEE Std 802.16e-2005 and IEEE Std 802.16-2004 / Cor 1-2005 (2006) 2. Draft Standard for Local and metropolitan area networks Part 16: Air Interface for Broadband Wireless Access Systems, P802.16Rev2/D2 (2007) 3. Ray, S.K., Pawlikowski, K., Sirisena, H.: Handover in Mobile WiMAX Networks: The State of Art and Research Issues. IEEE Communications Surveys & Tutorials 4. Seyyedoshohadaei, S.M., Khatun, S., Ali, B.M., Othman, M., Anwar, F.: An integrated scheme to improve performance of fast mobile IPv6 handover in IEEE 802.16e network. In: IEEE 9th Malaysia International Conference on Communications (MICC) (2009)
A Study of Applying SIP Mobility in Mobile WiMax Network Architecture
53
5. Handley, M., Schulzrinne, H., Schooler, E., Rosenberg, J.: SIP: session initiation protocol. 6. 7. 8. 9. 10. 11. 12.
Request for Comments (Proposed Standard) 2543, Internet Engineering Task Force (March 1999) Wedlund, E., Schulzrinne, H.: Mobility Support Using SIP. In: Proc. 2nd ACM/IEEE Int’l. Wksp. Wireless and Mobile Multimedia (August 1999) Nakajima, N., Dutta, A., Das, S., Schulzrinne, H.: Handoff delay analysis and measurement for SIP based mobility in IPv6. In: ICC 2003, Communications (May 2003) Chen, Y.-W., Huang, M.-J.: Study of Heuristic MAP Selection and Abstraction Schemes with Load Balance in HMIPv6. Wireless Personal Communications, on-line version (2010), doi: 10.1007/s11277-009-9854-5 Hsieh, F.-Y., Chen, Y.-W., Wu, P.-W.: Cross Layer Design of Handoffs in IEEE 802.16e Network. In: 2006 International Computer Symposium Conference, January 29 (2007) Ke, S.-C., Chen, Y.-W., Hsu, S.-Y.: A Performance Comparison of Inter-ASN Handover Management Schemes over Mobile WiMAX Network. International Journal of Wireless Communications and Networking (IJWCN), 75–85 (June 2009) WiMAX Network Architecture –Stage 2 – Release 1.1.0, WiMAX Forum (July 2007) WiMAX Network Architecture –Stage 3 –Release 1.1.0, WiMAX Forum (July 2007)
IEEE 802.15.4/Zigbee Performance Analysis for Real Time Applications Using OMNET++ Simulator Lamia Chaari and Lotfi Kamoun SFAX University, National Engineering School (ENIS), Tunisia
[email protected],
[email protected] Abstract. Wireless sensor networks (WSN) were been explored in various scenarios and several protocols have been developed. With the standardization of the IEEE 802.15.4 protocol, sensor networks became interesting for application in industrial automation. The IEEE 802.15.4 specifies physical and media access control layers that could be optimized to ensure low-power consumption. The focus in this paper is on real-time capabilities and reliability. We analyzed and compared the performance of the IEEE 802.15.4 standard using OMNET++ Simulator. Among the objectives of our study is to outline which degree the standard accomplish real time requirements. Different application scenarios were been evaluated. Performance parameters such as data delivery delay and goodput are the main factors, which been considered in our study. We have focused on single sink scenarios and we have analyzed some network performance according to nodes number. Our simulation results can be explored for planning and deploying IEEE 802.15.4 based sensor networks with specific performance demands. Besides, specific protocol limitations in real time environment can be identified and solutions can be suggested. Keywords: IEEE 802.15.4; Zigbee; OMNET++; Real time.
1 Introduction Recent development of communication technology has facilitated the expansion of the wireless sensors networks (WSN) [1][2][3]. The applicable area of WSN includes military sensing, data broadcasting [4], environmental monitoring [6], Intelligent Vehicular Systems [7], multimedia [8], patient monitoring [9], agriculture [10] [11], industrial automation [12] [13] [14] and audio [15] etc. WSN networks have not yet achieved widespread deployments, although they have been proven capable to meet the requirements of many applications categories. WSN has some limitations as lower computing power, smaller storage devices, narrower network bandwidth and very lower battery power. Real-time applications of a WSN [5] require the sensor network paradigm to provide mechanisms that guarantee a certain level of quality of service (QoS). Whereas the main objective in sensor network research is minimizing energy consumption, mechanisms that deliver application-level QoS efficiently and map these requirements to network-layer metrics, such as jitter and latency, have not get S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 54–68, 2011. © Springer-Verlag Berlin Heidelberg 2011
IEEE 802.15.4/Zigbee Performance Analysis for Real Time Applications
55
major concerns. Data delivery delay in WSN presents specific system design challenges, which is the object of this article. The remainder of this article is organized as follows. Section 2 gives related works of real-time application using WSN. In section 3, we present a brief overview of the IEEE 802.15.4 protocol specifications. In section 4, we consider scenarios implemented over OMNET++ to discuss performance trends and trade-offs for real time application. In section 5, we offer some concluding remarks.
2 Related works In [16], authors have studied the applicability of IEEE 802.15.4 based solutions in industrial automation focusing on its real-time capabilities. Their objective is to verify whether the protocol meets all the demands in industrial automation fields. Besides, they have evaluated the protocol with analytical methods focusing on its capabilities for real-time operation. In [17], authors have elaborated an architecture that uses the wireless protocol 802.15.4/ZigBee in a Home Automation and where it is necessary to transmit traffic flows of time-critical control data between sensors and actuators and automation network. In [18], authors demonstrate ZigBee’s performance in several practical applications. For this purpose, authors have explained an experimental testbed setup and execution. The testbed is capable of measuring the minimum, maximum, and average received signal strength indicator (RSSI), packet loss rate (PLR), packet error rate (PER), bit error rate (BER), and the bit error locations. In [19], the authors have presented a set of simulation experiments results to better understanding the protocol behaviour. Their results outline the capabilities and limitations of this protocol in the selected scenario. They considered the dependency of the protocol to different traffic loads and to on protocol-inherent parameters such as superframe order and the beacon order. In this paper, we evaluate the IEEE 802.15.4 performance with a special focus on industrial sensor network applications.
3 Overview of the IEEE 802.15.4 Protocol Specifications 3.1 Description The 802.15.4[20] is a part of the IEEE standards family for physical and link-layers for wireless personal area networks (WPANs). The WPAN working group focuses on short-range wireless links. The main IEEE 802.15.4 focus is to provide low data rate WPANs (0.01 - 250 kbit/s) with simple or no quality of service (QoS) and low complexity and rigorous power consumption requirements. The standard differentiates between full function device (FFD), and reduced function device (RFD), intended for use in the simplest of devices. The protocol stack upper layers are defined by the ZigBee Alliance [21] [22]. These layers correspond to the Application Layer (APL) and the Network Layer (NWL), as illustrated in Fig.1.
56
L. Chaari and L. Kamoun
The IEEE 802.15.4 supports two physical options. The 868/915 MHz PHY known as low-band uses binary phase shift keying (BPSK) modulation whereas the 2.4 GHz PHY (high-band) uses Offset Quadrature Phase Shift Keying (OQPSK) modulation. The MAC layer provides sleep mode feature based on superframes bounded by the beacons. This feature is available only in the synchronized network. The ZigBee network (NWK) layer offers services for devices to join and leave a network, to discover and maintain routes between devices for unicast, multicast or broadcast packets and to guarantee security to data frames at network layer. The main functionalities for devices to maintain bindings, which are device groupings based upon application communication needs are provided at the ZigBee application services (APS) layer. Finally, the ZigBee application framework (AF) layer identifies a device’s potential services as dictated by a given AF profile. Each ZigBee Alliance approved profile describes message formats and the network configuration parameters necessary for devices of similar interest to communicate successfully. Application
Applications Profils Appli support
Zigbee
Network
Zigbee Network
SSCS Link MAC
IEEE 802.15.4 Physical
868/915
2450
Fig. 1. IEEE 802.15.4/Zigbee protocol stack
3.2 MAC Access Mechanisms & Frame Structure The MAC protocol in IEEE 802.15.4 can operate on both beacon enabled and nonbeacon modes. In the non-beacon mode, a protocol is a simple Carrier Sense Multiple Access with Collision Avoidance (CSMA-CA). This necessitates a constant reception of possible incoming data. In our simulation works, we have considered this mode. In the beacon-enabled mode, all communications are executed in a super-frame structure. The standard allows the optional use of a super-frame structure which is sent by the coordinator bounded by network beacons. It is divided into sixteen equally sized slots. A device can transmit at any time during the slot, but must finish before the next superframe beacon. The channel access during the time slots is contention based. For applications requiring specific data bandwidth and low latency, the
IEEE 802.15.4/Zigbee Performance Analysis for Real Time Applications
57
network coordinator may dedicate portions of the active super-frame to that application. These portions are named guaranteed time slots (GTS). The GTS that form the contention free period (CFP), always appear at the end of the contention access period (CAP). All contention-based transactions should be achieved before the CFP begins. Three additional headers are added to the outgoing data frames in the ZigBee implementations in order to perform the following services. In entire, ZigBee headers occupy 15 octets of overhead for each data frame. The complete IEEE 802.15.4 frame structure is represented in Fig.2.
MAC Header (MHR)
Preamble
MAC Footer (MFR)
MAC Service Data Unit (MSDU)
Start of Packet Delimiter 6 Bytes
PHY Header
PHY Service Data Unit (PSDU) 0-127 Bytes
Fig.2. IEEE 802.15.4 Frame structure
The physical packet includes many fields that correspond to the preamble (32 bits) for synchronization, the start of packet delimiter (8 bits) which shall be equal as “11100101”, the physical header (8 bits) and the data field (PSDU: Payload Segment Data Unit) which has a variable length (0 to 1016 bits). The MAC frame structure is designed to keep the complexity at a minimum while ensuring they are sufficiently robust to be transmitted on a noisy channel. A MAC frame is composed of three fields including a MAC header (MHR), a MAC service data unit (MSDU), and a MAC footer (MFR). The frame control field (2 bytes) which is the first field of the MHR It indicates the type of the MAC frame (acknowledgement frame, data frame, beacon frame and MAC command frames), specifies format of the address field, and controls acknowledgement. The second subfield is the sequence number is used to identify successful transmission of a frame. The address field length is variable of 0 to 20 bytes. Based on the frame type, the address field may include source and destination addresses, no address or destination address. The payload (MSDU) length is variable with a restriction of 127 bytes for the complete MAC frame. The MFR corresponds to the frame check sequence (FCS) is a 16 bits used to ensure data integrity.
4 IEEE 802.15.4 Performance Analysis for Real Time Application In this section, we are interested in wireless sensors networks performances analysis for real times applications. The performance evaluation is performed using a simulation model of IEEE 802.15.4 under OMNeT++ [23][24]. In this section, we introduced this model and we described the simulation settings.
58
L. Chaari and L. Kamoun
4.1 Simulations Modules The principles modules constituting the IEEE 802.15.4 network (shown in Fig.3) are the Sim module, the host module and the Nic802154_TI_CC2420 module.
Fig. 3. IEEE802.15.4 network.ned module
- Host Module is the node that sends and receives the messages. - BlackBoard is an entity that allows the communication inters layer/process. It builds the total knowledge of the hosts or relevant information for more than one layer accessible for modules derived from BlackBoard Acces. Each module derived from BlackBoardAcces is able to publish information in BlackBoard. - SimTracer allows to record vectors in files .vec (result of simulation). - ChannelControl is a framework module that allows controlling connections. It coordinates connections between all the nodes. The Host module it contains four simple modules and a compound module. Those modules are SensorApplaye, WiseRoute, UbiquitousArp and Nic802154_TI_CC2420. Fig.4 illustrates the host module. - SensorApplayer it implements the higher layers - WiseRouteis the basic class for the network layer; it is a generic class for all modules of network layer. - ConstSpeedMobility it controls all the nodes movements. UbiquitousArpis a module that implement address resolution protocol. - Nic802154_TI_CC2420 module is a compound module and the structure of this module is illustrated below by Fig.5. - Nic802154_TI_CC2420 module contains 4 simple modules: - MAC module is based on CSMA/CA algorithm. - SnrEval can be used in order to consider some information for the received messages since the module.
IEEE 802.15.4/Zigbee Performance Analysis for Real Time Applications
59
Sensor Appla WiseRout e Nic80215 4_TI_CC24 20
Fig. 4. IEEE802.15.4 Host.ned module
- SnrDecider processes the received data to verify the message integrity then to decide if the message can be dropped. In order to take decision the Snr decider takes the snrList created by snrEval and translates SNR values to the bit state (with or without error). - SingleChannelRadioAccNoise3 contains the Nic802154_TI_CC2420 parameters.
CSMA/
SingleChannelRadio snrDecid
Fig. 5. Nic802154_TI_CC2420.ned
60
L. Chaari and L. Kamoun
4.2 Simulation Parameters We have evaluated the 802.15.4 protocol performances in star topology, where nodes transmit data directly to the coordinator. We considered that the wireless channel is without interferences, so this means that each node is able to hear the transmissions of all the other nodes. The CSMA/CA mechanism uses imply that the messages can not be delivered because of collisions or impossibility of reaching an occupied channel. In this scenario, every 30 ms, the sensors generate messages of 108 bytes. These messages are composed of 75 bytes of data, 2 bytes of control added by the request, the ZigBee heading is equal to 20 bytes, and 9 bytes corresponds to the heading included by the 802.15.4 protocol. The delivery acknowledgement (ACK) sent by the coordinator is composed of 5 bytes. The Sensors start to generate the data at random moment between the beginning of simulation and 30 ms. Each sensor sends 1000 messages to the coordinator. In the table one and table two, we have regrouped respectively the CSMA/CA and the Nic802154_TI_CC2420 parameters used during simulation. Table 1. CSMA/CA Parameter
Parameter
Value
minBE maxBE MaxCSMABackoffs MaxFrameRetries AckWaitDuration SIFS AUnitBackoffPeriod CCADetectionTime
3 5 4 3 864 μs 192 μs 320 μs 128 μs
Table 2. Nic802154_TI_CC2420 Parameter
Parameter DelaySetupRx DelaySetupTx delayRxTx / delayTxRx
Value 1.792 ms 1.792 ms 192 μs
Several parameters can be defined to evaluate the wireless network performances. These measurements were carefully selected to give an idea of the ZigBee networks behaviour and reliability. The considered parameters are as follows: - Throughputs is the data quantity transmitted correctly starting from the source to the destination within a time U (sec). The node rate is measured by counting the total number of data packets received successfully on the node, which leads to the calculation of the received bits number which is divided by the total time of simulation execution. The network rate is defined as the average of all nodes rates implied in the data transmission.
IEEE 802.15.4/Zigbee Performance Analysis for Real Time Applications
61
- End-to-End-delay is equal to the time taken by a data packet to reach the destination node. Simulations are carried out with a star network topology, and the routing mechanism was desactivated. Consequently, the term End-to-End Delay can be exchanged with delay. The packet delay is the time put by the packet to reach the destination. - Goodput is calculated as the rate between packets received correctly and the total transmitted packets. The lost packets number does not take into account the retransmissions. So the Goodput is evaluated by the following relation: - Latency time is equal to the arrived message time minus the message creation time. - Service time is the message life time from its creation by the transmitter until the positive ACK reception. 4.1 Scenarios 4.1.1 Scenario1 In this scenario (star topology one coordinator and n nodes), in first step, we have fixed the sensors number, then for different rates we have computed the goodput and the latency. In the second step, we change the number of sensors and we repeat the same process. Fig.6 illustrates simulations results corresponding to this scenario.
Fig. 6. Goodput according to the rate for various numbers of sensors
For the curve (nbc=2) we notice that the goodput reached a threshold. The rate increases doesn’t have any advantages. For the curves (nbc=5 and nbc=10) we perceive that the goodput converge to the ideal value when the rate increases. For the curves (nbc=20, nbc=30 and nbc=40), the goodput has a low value even though the rate increases. According to the network size, If the number of sensor is high, it is necessary to have a high rate to ensure a correct network operation.
62
L. Chaari and L. Kamoun
For the curves illustrated by Fig.7 which represent the goodput according to the simulation time we have fixed the rate equal to 250 Kbps.
Fig. 7. Goodput according to the simulation time
The latency time variation according to the rate for various numbers of sensors is illustrated by Fig.8.
Fig. 8. Latency according to the rate for various numbers of sensors
IEEE 802.15.4/Zigbee Performance Analysis for Real Time Applications
63
For the curve (nbc=2, rate 100 Kbps) the latency time is limited by 15ms. For the curves (nbc=5 and nbc=10) the latency time varies according to the rate. For the curves (nbc=20, nbc=30 and nbc=40) the rate does not have a great effect on the latency time. When the number of sensor is increasing the rate doesn’t guaranteed the latency time decreases because of the collisions and the messages losses. In Fig.9 we illustrates the latency time variation according to simulation time for various rate values (sensors number= 10, packet number = 100).
Rate= 10 Rate= 100 Rate= 1000
Rate= 50 Rate= 250 Rate= 500
Fig. 9. Latency time according to the simulation time
In order to evaluate the service time variation according to different rate as shown in Fig.10, we have considered that every one second the sensors generate messages; each sensor sends 100 messages to the coordinator. We have fixed the number of sensor equal to five.
64
L. Chaari and L. Kamoun
BitRate= 10
BitRate= 20
BitRate= 75
Fig. 10. Service time according to the simulation time
4.3.2 Scenario 2: Payload Size Analysis In this scenario, we have analyzed the IEEE 802.15.4/ZigBee performances using a star topology for different payload sizes and ten sensors. We seek to determine the
IEEE 802.15.4/Zigbee Performance Analysis for Real Time Applications
65
performances according to the rate in order to estimate the suitable payload size for a given rate. We have used the network parameters of the preceding scenario. Simulations results, illustrated respectively by figure 11, 12 and 13. According to the obtained results, there isn’t correlation between the latency and the useful information size in the real times applications case (when the information size doesn’t exceed 100 bytes).
Information size (Bytes) Rate 10Kbps
Rate 50Kbps
Rate 100Kbps
Rate 500Kbps
Rate 1000Kbps
Fig. 11. Goodput variation according to payload information size
Information size (Bytes) Rate 10Kbps
Rate 50Kbps
Rate 100Kbps
Rate 500Kbps
Fig. 12. Latency variation according to payload size
Rate 1000Kbps
66
L. Chaari and L. Kamoun
Payload size= 5 Bytes
Payload size= 20 Bytes
Payload size= 75 Bytes
Fig. 13. Service time variation according to payload size
IEEE 802.15.4/Zigbee Performance Analysis for Real Time Applications
67
In order to evaluate the service time variation according to different rate as shown in Fig.13, we have considered that every one second the sensors generate messages with 20Kbps. We have fixed the sensor number at 5.
5 Conclusion In this paper, we have presented and discussed a number of performance measures of the IEEE 802.15.4 protocol. In essence, we obtained these results in a simulative performance study using IEEE 802.15.4 simulation model, which is implemented under OMNeT++. We have analyzed two different scenarios for the CSMA/CA operation mode using a star topology. In the first scenario, we treated the scalability problem by evaluating the metric performance (goodput, latency) for different network sizes (different sensors number). In addition, in the second scenario, we analyzed payload sizes effect on the IEEE 802.15.4/ZigBee performances. In future work, we will continue to study the applicability of IEEE 802.15.4 in low-latency and energy-aware applications especially in real network and for healthcare use.
References [1] Akyildiz, I.F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: Wireless sensor networks: a survey. Computer Networks 38, 393–422 (2002) [2] Baronti, P., Pillai, P., Chook, V.W., Chessa, S., Gotta, A., Hu, Y.F.: Wireless Sensor Networks: a Survey on the State of the Art and the 802.15.4 and ZigBee Standards. Computer Communications 30(7), 1655–1695 (2007) [3] Sohraby, K., Minoli, D., Znati, T.: Wireless Sensor Networks Technology, Protocols, and Applications. John Wiley & Sons, Inc., Chichester (2007) ISBN 978-0-471-74300-2 [4] Sung, T.-W., Wu, T.-T., Yang, C.-S., Huang, Y.-M.: Reliable Data Broadcast For Zigbee Wireless Sensor Networks. International Journal on Smart Sensing and Intelligent Systems 3(3) (September 2010) [5] Pagano, P., Chitnis, M., Lipari, G., Nastasi, C., Liang, Y.: Simulating Real-Time Aspects of Wireless Sensor Networks. EURASIP Journal on Wireless Communications and Networking 2010, article ID 107946, 19 pages (2010), Research article doi:10.1155/2010/107946 [6] Jang, W.-S., Healy, W.M.: Assessment of Performance Metrics for Use of WSNs in Buildings. In: 2009 International Symposium on Automation And Robotic in Construction (ISARC 2009), pp. 570–575 (2009) [7] Mouftah, H.T., Khanafer, M., Guennoun, M.: Wireless Sensor Network Architecturesfor Intelligent Vehicular Systems. In: Symposium International for Telecommunication Techniques (2010) [8] Suh, C., Mir, Z.H., Ko, Y.-B.: Design and implementation of enhanced IEEE 802.15.4 for supporting multimedia service in Wireless Sensor Networks. Computer Networks 52, 2568–2581 (2008) [9] Golmie, N., Cypher, D., Rebala, O.: Performance analysis of low rate wireless technologies for medical applications. Computer Communications 28(10), 1266–1275 (2005) ISSN:0140-3664
68
L. Chaari and L. Kamoun
[10] Zhoul, H., Chen, X., Liu, X., Yang, J.: Applications of Zigbee Wireless Technology Tomeasurement System in Grain Storage. In: Computer and Computing Technologies in Agriculture II. IFIP International Federation for Information Processing, vol. 3, pp. 2021–2029 (2009), doi:10.1007/978-1-4419-0213-9_52 [11] Mestre, P., Serôdio, C., Morais, R., Azevedo, J., Melo-Pinto, P.: Vegetation Growth Detection Using Wireless Sensor Networks. In: Proceedings of the World Congress on Engineering, WCE 2010, London, U.K, June 30-July 2, vol. I (2010) [12] Willig, A.: Recent and emerging topics in wireless industrial communication. IEEE Transactions on Industrial Informatics 4(2), 102–124 (2008) [13] Park, P.: Protocol Design for Control Applications using Wireless Sensor Networks, Thesis, Automatic Control Lab School of Electrical Engineering, KTH (Royal Institute of Technology), Stockholm, Sweden (2009) ISSN 1653-5146, ISBN 978-91-7415-441-5 [14] Chen, F., Talanis, T., German, R., Dressler, F.: Realtime Enabled IEEE 802.15.4 Sensor Networks in Industrial Automation. In: IEEE Symposium on Industrial Embedded Systems (SIES 2009), pp. 136–139. IEEE, Lausanne (July 2009) [15] Meli, M., Sommerhalder, M.G.M.: Using IEEE 802.15.4 / ZigBee in audio applications. Embedded World, Nuremberg (February 2006) [16] Chen, F., Talanis, T., German, R., Dressler, F.: Real-time enabled IEEE 802.15.4 sensor networks in industrial automation. In: IEEE International Symposium on Industrial Embedded Systems, SIES 2009 (2009) ISBN: 978-1-42444109-9, INSPEC Accession Number: 1081428 [17] Collotta, M., Salerno, V.M.: A real-time network based on IEEE 802.15.4 / ZigBee to control home automation environment. In: International forum “Modern Information Society Formation – Problems, Perspectives, Innovation Approaches”, St.-Petersburg, Russia, June 6-11 (2010) [18] Casey, P.R., Tepe, K.E., Kar, N.: Design and Implementation of a Testbed for IEEE 802.15.4 (Zigbee) Performance Measurements, Research Article. EURASIP Journal on Wireless Communications and Networking 2010, article ID 10340611, 11 pages, doi:10.1155/2010/103406 [19] Chen, F., Wang, N., German, R., Dressler, F.: Simulation study of IEEE 802.15.4 LRWPAN for industrial application. Wirel. Commun. Mob. Comput. 10, 609–621 (2010), published online in Wiley InterScience http://www.interscience.wiley.com, doi: 10.1002/wcm.736 [20] IEEE 802.15.4-2006 Standard for Information technology- Telecommunications and information exchange between systems- Local and metropolitan area networks- Specific requirements Part 15.4: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications for Low-Rate Wireless Personal Area Networks (WPANs). IEEE Computer Society. Revision of IEEE Std 802.15.4-2003) (September 8, 2006) ISBN 0-7381-4996-9 [21] ZigBee Alliance (2005), ZigBee specification (2004), http://www.zigbee.org.Zigbee [22] Van Nieuwenhuyse, A., Alves, M., Koubâa, A.: On the use of the ZigBee protocol for Wireless Sensor Networks, HURRAY-TR-060603 (June 26, 2006) [23] http://www.omnetpp.org/ [24] Chen, F.: OMNeT++ IEEE 802.15.4 Model for OMNeT++/INET Framework, http://www7.informatik.unierlangen.de/~fengchen/omnet/802154/
Honeynet Based Botnet Detection Using Command Signatures J.S. Bhatia, R.K. Sehgal, and Sanjeev Kumar Cyber Security Technology Division, CDAC Mohali, INDIA 160071 {jsb,rks}@cdacmohali.in,
[email protected] Abstract. Global Internet threats are undergoing a profound transformation from attacks designed solely to disable infrastructure to those that also target people and organizations. This alarming new class of attacks directly impacts the day to day lives of millions of people and endangers businesses and governments around the world. At the centre of many of these attacks is a large pool of compromised computers located in homes, schools, businesses, and governments around the world. Attackers use these zombies as anonymous proxies to hide their real identities and amplify their attacks. Bot software enables an operator to remotely control each system and group them together to form what is commonly referred to as a zombie army or botnet. A botnet is a network of compromised machines that can be remotely controlled by an attacker. In this we propose an approach using honeynet data collection mechanisms to detect IRC and HTTP based botnet. We have evaluated our approach using real world network traces. Keywords: We would like to encourage you to list your keywords in this section.
1 Introduction 'Bot' is a shortened derivative of ‘robot’, a program that operates as an agent that enables a user or another program to simulate a human activity. It is possible for an attacker to control a lot of bots over botnet using one command. Botnets(or, networks of zombies) are recognised as one of the most serious security threats today. Fig 1 depicts the communication flow in a botnet. Our research makes several contributions. First, we propose behaviour based approach to identify both IRC and HTTP C & C in port independent manner by extracting commands sequences from network traffic. Second we develop a system, which is based upon our behaviour based algorithm. The rest of the paper is organised as follows. In section 2, we provide a background on botnet C & C and motivation of our botnet detection approach. In section 3, we describe the usefulness of honeypots in our detection approach. In section 4, we present the architecture of our system and describe in detail its detection algorithm. In section 5, we present experiments and results and conclude the results in section 6. S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 69–78, 2011. © Springer-Verlag Berlin Heidelberg 2011
70
J.S. Bhatia, R.K. Sehgal, and S. Kumar
1.1 Background and Motivation Bot A bot is a malware which installs itself on a weakly protected computer by exploiting the vulnerabilities available in the machine. By converting the victim to a zombie computer, a bot adds the machine to a network of zombies called botnet which is remotely controlled by a set of master named as botnet controller Botnet A botnet is a network of compromised computers maintained and controlled by a set of bot masters. These masters utilize bots to increase and control the number of the zombies in the network. Bot masters control the botnet through a command-andcontrol (C&C) mechanism. Therefore, they are often called as C&C servers. C&C servers often liaise with other C&C servers to achieve greater redundancy [Bot]. In the last few years, the botnet phenomenon got the general attention of the security research community. One of the first systematic studies was published in March 2005 by the Honeynet Project that studied about 100 botnets during a period of four months [3]. A more methodical approach was introduced by Freiling et al., who used the same amount of botnet data for their study [4]. Cooke et al. outlined the origins and structure of botnets and present some results based on a distributed network sensor system and honeypots [5]. They do not give detailed results that characterize the extent of the botnet problem. Compared to all these studies, our solution is automated analysis of binary samples in honeynet environment. We are proposing complete automate prototype to detect C & C server in IRC, HTTP botnet. We can observe trends and long-term effects of the botnet phenomenon like the average lifetime of a botnet not possible with previous studies. A transport layerbased botnet detection approach was introduced by Karasaridis et al. [7]. They use passive analysis based on flow data to characterize botnets and were able to detect several hundred controllers over a period of seven months. However, such a flowbased approach cannot provide insight into the botnet and the control structure itself. In our study, we can also observe the commands issued by botherders, the malware binary executables used, and similar inside effects of a botnet. Canavan [8] and Barford and Yegneswaran [9] presented an alternative perspective on IRC-based botnets based on in-depth analysis of bot source code. We also analyzed the source code of several bot families such as SdBot and Agobot, which can be freely downloaded from the Internet, to get a better understanding of some of the effects we monitored during our observations. In our study, we focus only C & C server and command exchanged between bot and botnet.
2 Our Proposed Approach We have developed a algorithm which is based on bots run-time network behaviour and corresponding command sequence used in bot and C & C server conversation. We propose an approach that uses network-based anomaly detection to identify C & C command sequences. Fig 1 shows our malware collection prototype. The goal of malware collection is to collect as many binaries as possible. However, developing a scalable and robust infrastructure to achieve this goal is a challenging problem in its
Honeynet Based Botnet Detection Using Command Signatures
71
own right, and has been the subject of numerous research initiatives (e.g., [10, 11]). In particular, any malware collection infrastructure must support a wide array of data collection endpoints and should be highly scalable. The goal of our collection prototype is to collect as many binaries as possible by changing the services and configurations of the honeypots. We have established the Distributed Honeynet Prototype using three different internet service providers. Before we can discover what the risks are in the network, we need to discover how attack code reacts with the system. To realise this goal, a collection system is proposed which collects malwares to be dynamically analysed. Also, this system provides protection against significant involvement in attacks after the bot has been run on the system. It uses firewall and intrusion prevention techniques, such as limiting or dropping packets leaving the protected network. Our proposed architecture systematically collects the malwares over internet.
Fig. 1. Malware Collection Framework
2.1 Architecture and Algorithm In this section we discuss the components of the algorithm without any specific tools in mind. Any tool that can perform the tasks described here can be used as a part of the solution. Fig 2 illustrates the logical structure of the proposed solution. The input to our system is bot binaries which are collected via honeynet system, a malware collection platform. There are three main components: Honeynet Based Execution, Payload Parser and Correlation System Component. We have collected the malware by distributed deployment of malware collection framework, which are later passed to Symantec Anti-Virus engine to classify them as bot and non-bot samples. Then these bot binaries are automatically passed to honeynet based open analysis environment.
72
J.S. Bhatia, R.K. Sehgal, and S. Kumar
2.2 Honeynet Based Execution Environment We have developed a honeynet based open analysis execution environment in which bot binaries are executed for 30 minutes times using different timestamps. We set up a Vmware environment on a server with Intel processor running a full patched instance of Window XP, assigned a static, public IP address and infected with one bot for a period of 30 minutes Basically our code reads the bot binaries from a file one by one and executes it on open analysis environment and results generated are sent to the central server as a file including complete payload.
Fig. 2. System Architecture
Our open analysis system provides connection to Internet. The Honeynet based execution environment allows us to inject a malicious bots sample into a system and connect back to its original destination. This enable us to isolate the bot from the network and monitor its traffic in a more controlled way instead of waiting to be infected and then monitoring the traffic passively. Then its traffic segregated based on application content for the observations of network behaviour in terms of source IP address, destination IP address, source ports, destination port, and its command sequences for each flow. Network Monitoring In this section, we discuss how we record the bot network traffic that our system requires for analyse it for the presence of response activity. Since there are no other applications that run and generate traffic, the bot accounts for all network traffic under its host VM's IP address. Once the response activity is located, we can extract a snippet from the network traffic that precedes the start of the response and thus, likely
Honeynet Based Botnet Detection Using Command Signatures
73
contains the corresponding command. Moreover, we can collect behaviour profiles, which describe the properties of the bot response behaviour. Note that we have made the deliberate decision to observe the behaviour of the bots when they are connected to the actual botnet. This allows us to detect command and control traffic without any prior knowledge of the protocol and commands that are used between the bot and botmaster. However, at the same time we do not wish the bots that we are analysing to engage in serious and destructive malicious activity such as denial of service attacks. Thus, we have firewall that rate limit all outbound network traffic. After every 30 minutes data capturing period, Virtual machine is recreated in a clean state, before the next sample of bot is executed. Payload Extractor After the execution of the bot binary, the complete payload has been extracted and sent to the central server so that we can parse it with payload parser to extract the commands token signatures with respect to IRC, HTTP botnet. To capture all network traffic generated by the virtual environment, we use a Honeywall. The Honeywall is able to capture all network packets that are sent and received by the image. These packets are merged into PCAP file and send to central server. Packet Filter We identify that there is a need of stepwise reduction of the data set to the meaningful subset of flows. The selection of the cut-off for the quick filtering for data reduction requires both quantitative statistical information and human judgement. The first filter is to select TCP-based packets only. The second filter is to remove the packet containing SYN and RST flags Flows containing only TCP packets with SYN and RST flags indicate that communication was never established,, and so provide no information about botnet C & C flows . No application-level data was transferred by these flows. Unfortunately for today's Internet, probes of the system vulnerabilities are commonplace. While SYN-RST exchanges indicate suspicious activity that may be worth investigations, they do not assist with characterising botnet C & C flows. Command Token Based Payload Parser We feed the composite payload corresponding to a bot binary to our Payload parser which extracts the activity response (e.g., scanning, spamming, binary update) and message response (e.g., IRC PRIVMSG) commands sequence from payload. Payload parser detects port-independent protocol matcher to find suspicious IRC and HTTP traffic. This port Independent property is important because many botnet C& C may not use the regular ports. IRC and HTTP connections are relatively simple to recognise. For example, an IRC session begins with connection registration (defined in RFC1459) that usually has three messages, i.e., PASS, NICK, and USER. We can easily recognise an IRC connection using light-weight payload inspection, e.g., only inspecting the first few bytes of the payload at the beginning of a connection. HTTP protocol is even easier to recognise because the first few bytes of a HTTP request have to be “GET”, “POST”, or “HEAD”.
74
J.S. Bhatia, R.K. Sehgal, and S. Kumar
Correlation System Component Our next stage, correlation, looks for relationships between two or more bots binaries that suggest that they are part of same botnet. The question about whether one bot is correlated to another only makes sense if the two are connected to same C & C server. There are several temporal correlation algorithm for this purpose but all are equally computational expensive. However we have decided to apply our algorithm that we were designing that described the flow into same cluster. We use payload commands signatures and network fingerprint of bot binaries flows. If they are connected to the same C & C server and getting the same type of commands sequence, then it is clustered into one group. 2.3 Experiment and Analysis Results In our experiment set-up, VMware workstation is used to create the default inspection of Window XP. Capturing and analysis of the network traffic and system traces is observed in live execution environment. To capture all the data generated by the virtual environment, we use Honeywall . The Honeywall is able to capture all network packets that are sent and received by the image. These packers are merged into PCAP file and sent to central server after every 1 hour. Currently we have found it useful to separate the PCAP files into 1 hour segment. By segmenting the file, it allows us to allocate the suspicious data more easily. We feed the PCAP data to our payload parser that filter the unused data and extract the command sequences in bot and C & C conversation. Our algorithm is able to detect IRC and HTTP based C & C server. With the help of our malware collection frame work we have collected 650 unique malware samples during the period of Nov, 2009 to July, 2010. Form them 59 malware samples is classified as bot by AV engine. Most of the bots that we have actively examined use some type of systematic scan, presumable for propagation. Most of these ICMP ping scans were used. Approximately 37.5% of samples were doing ICMP ping scans on different subnet. Fig. 3 is the snapshots captured using wireshark[17]. Applying to IRC: Most of the IRC communication is with specific IP addresses, one of the samples is downloading abc.exe from 58.240.104.57 using port number 5751. Fig. 4 is the snapshots showing the TCP follow stream which includes the USER, NICK, MODE, JOIN, USERHOST. It is also observed with the help of sebek traces, most of the samples has run Cmd.exe, ping.exe, svchost.exe, HelpSvc.exe, explorer.exe,cndrive32.exe,msvmiode.exe. Most of the propagation scan activity performed by asc command. The command for the propagation scan used is .asc <port#><delay><switches>. For example .asc exp_all 25 5 0 -b -r -e which corresponds to a randomised (-r switch), Class B (-b switch) subnet scan using 25 threads with a 5 seconds delay for an infinite amount of time. Rerely does a piece of malware designate a time for the scan to finish so the 0 is used to express an infinite amount of time.
Honeynet Based Botnet Detection Using Command Signatures
75
Fig. 3. ICMP Ping scans
Applying to HTTP: In case of HTTP botnet we have observed, the communication of bots with web based C & C server by identifying the GET, HEAD, POST parameters. In most of our results shows the HTTP based C & C communication and download some executable. Following snapshots shows the HTTP communications.
Fig. 4. IRC Bot Communication
76
J.S. Bhatia, R.K. Sehgal, and S. Kumar
Below are the snapshot captured using wireshark tool . Below is the result of one of sample showing that it is downloading executable from 208.053.183.222 : 203.129.220.214.01044-208.053.183.222.00080: GET /0calc.exe HTTP/1.1 203.129.220.214.01043-208.053.183.124.00080: GET /mjsn.exe HTTP/1.1 208.053.183.222.00080-203.129.220.214.01044: HTTP/1.1 200 OK Figure 5 shows the snapshot of secondary infection using HTTP communication. As per as our Experimental Results and Analysis, we are concluded that most of the C&C SERVER of Type IRC are using ICMP SCAN, IRC TOKENS found in payload are PING,PONG,JOIN,USER,MODE,PRIVMSG,NICK. The attack specific commands found in payload are DDOS, VSCAN. And most of the C&C SERVER of Type HTTP is using ICMP SCAN, HTTP TOKENS found in payload are HTTP, GET, POST, Downloading some exe files like /rbf.exe, /0calc.exe and sending spam mails.
Fig. 5. HTTP snapshot
3 Conclusion Botnets have become the most serious threats to the Internet security. Many cybercrimes are botnet related. Botnet detecion is a relatively new and a very challenging research area. In this paper, Our results shows that the botnet problem is of global
Honeynet Based Botnet Detection Using Command Signatures
77
scale. We presented a system and architecture, a network based botnet detection. We are moving towards the P2P botnet. Acknowledgments. We would like to thank Malware collection team of Cyber Security Technology Division at CDAC, Mohali to provide the useful help in collecting the malwares to make them available for further analysis. We also very thankful to Executive Director of CDAC, Mohali to provide us full support.
References 1. Stankovic, S., Simic, D.: Defense Strategies against Modern Botnets. ArXiv e-prints (June 2009), http://en.wikipedia.org/wiki/Botnet
2. The Honeynet Project. Know Your Enemy: Tracking Botnets, Internet (March 2005) 3. http://www.honeynet.org/papers/bots/ 4. Cooke, E., Jahanian, F., McPherson, D.: The zombie roundup: Understanding, detecting, and disrupting botnets. In: Proceedings of SRUTI 2005, pp. 39–44 (2005)
5. Freiling, F., Holz, T., Wicherski, G.: Botnet Tracking: Exploring a Root-Cause
6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
Methodology to Prevent Distributed Denial-of-Service Attacks. In: di Vimercati, S.d.C., Syverson, P.F., Gollmann, D. (eds.) ESORICS 2005. LNCS, vol. 3679, pp. 319–335. Springer, Heidelberg (2005) Karasaridis, A., Rexroad, B., Hoeflin, D.: Wide-scale botnet detection and characterization. In: Proceedings of the Workshop on Hot Topics in Understanding Botnets (April 2007) Canavan, J.: The evolution of malicious IRC bots. In: Proceedings of the Virus Bulletin Conference (2005) Barford, P., Yegneswaran, V.: An Inside Look at Botnets. Advances in Information Security, vol. 27, pp. 171–191. Springer, US (2007) Provos, N.: A virtual honeypot framework. In: Proceedings of the USENIX Security in Special Workshop on Malware Detection, Advances in Symposium, pp. 1–14 (August 2004) Rajab, M.A., Zarfoss, J., Monrose, F., Terzis, A.: A multifaceted approach to understanding the botnet phenomenon. In: Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement, pp. 41–52. ACM Press, New York (2006) Vrable, M., Ma, J., Chen, J., Moore, D., Vandekieft, E., Snoeren, A.C., Voelker, G.M., Savage, S.: Scalability, Fidelity, and Containment in the Potemkin Virtual Honeyfarm Baecher, P., Koetter, M., Holz, T., Dornseif, M., Freiling, F.: The Nepenthes Platform: An Efficient Approach to Collect Malware. In: Zamboni, D., Krügel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 165–184. Springer, Heidelberg (2006) Honeyd Virtual Honeypot Framework, http://www.honeyd.org/ Honeynet Project and Research Alliance. Know your enemy: Tracking Botnets (March 2005), http://www.honeynet.org/papers/bots/ Provos, N.: A virtual honeypot framework. In: Proceedings of the USENIX Security in Special Workshop on Malware Detection, Advances in Symposium, pp. 1–14 (August 2004) Wireshark, http://www.wireshark.org Gu, G., Zhang, J., Lee, W.: BotSniffer: Detecting botnet command and control channels in network traffic. In: Proceedings of the 15th Annual Network and Distributed System Sec
78
J.S. Bhatia, R.K. Sehgal, and S. Kumar
18. Binkley, J.R., Singh, S.: An algorithm for anomaly-based botnet detection. In: Proceedings 19. 20. 21. 22. 23. 24.
of the 2nd Conference on Steps to Reducing Unwanted Traffic on the Internet, Berkeley, CA, USA. USENIX Association (2006) Goebel, J., Holz, T.: Rishi: Identify bot contaminated hosts by irc nickname evaluation. In: HotBots 2007: Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets, Berkeley, CA, USA. USENIX Association (2007) Livadas, C., Walsh, R., Lapsley, D., Strayer, W.T.: Using machine learning technliques to identify botnet traffic. In: Proceedings of the 2nd IEEE LCN Workshop (November 2006) Strayer, W.T., Walsh, R., Livadas, C., Lapsley, D.: Detecting botnets with tight command and control. In: Proceedings of the 31st IEEE LCN (November 2006) Kirda, E., Kruegel, C., Banks, G., Vigna, G., Kemmerer, R.: Behavior-based spyware detection. In: Proceedings of the 15th USENIX Security Symposium (2006) Gu, G., Perdisci, R., Zhang, J., Lee, W.: BotMiner: Clustering analysis of network traffic for protocol- and structure independent botnet detection. In: Proceedings of the 17th USENIX Security Symposium (2008) Honeynet Project and Research Alliance. Know your enemy: Tracking Botnets (March 2005), http://www.honeynet.org/papers/bots/
Bandwidth Probe: An End-to-End Bandwidth Measurement Technique in Wireless Mesh Networks Ajeet Kumar Singh and Jatindra Kr Deka Department of Computer Science and Engineering Indian Institute of Technology Guwahati, Guwahati - 781039, India {s.ajeet,jatin}@iitg.ernet.in
Abstract. End-to-end Bandwidth Estimation is an important metric for network management and monitoring. It can also improve the effectiveness of congestion control mechanism, audio/video stream adoration and dynamic overlay. In recent years, many techniques have been developed for bandwidth estimation in the wired as well as the last-hop wireless networks, but they under-perform in Wireless Mesh Networks (WMNs). This is because of the lack of understandability of the wireless multi-hop environment. In this paper, we present an active bandwidth measurement technique called Bandwidth Probe based on the packet dispersion principle. It measures the steady state bandwidth of the system while considering the effects of the FIFO cross and CSMA/CA-based contending traffic. We also show how to achieve the stationary state behavior of the system to limit the number of probe packets. On simulation, Bandwidth Probe gives a accurate estimation of the available bandwidth using average convergence time and lower intrusiveness. Keywords: Bandwidth Probe, WMN, bandwidth estimation, packet train.
1
Introduction
Wireless Mesh Networks (WMNs)[1] is based on the IEEE 802.11s WLAN standard [3][21]. Being a different type of the architecture, WMNs can decrease the operational and infrastructure cost of traditional wireless network by being built all around the wireless and its self organizing nature. Also, it can resolve the problems of ad-hoc[24][22] network like loose connectivity and limited coverage area by keeping some node stationary which provides wireless backbone for service to the clients. But in actual, there is lack of end-to-end tools to estimate resources like path capacity and available bandwidth which is essential for the congestion avoidance[2], video/audio stream adoration and dynamic overlay[10]. Knowledge of these resources can improve the performance of the WMNs. WMNs have the following properties as opposed to wired networks as well as last-hop wireless due to which errors may occur in the bandwidth estimation. S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 79–90, 2011. © Springer-Verlag Berlin Heidelberg 2011
80
A.K. Singh and D.K. Jatindra
1 Fading and Interference. Wireless channels’ properties are highly variable due to fading and interference. Other potentially hidden stations implemented on same or different radio standards using the same frequency band create interference on the wireless medium which can often affect WMNs due to its multiple radio configuration. Its effects can cause high change of the signal-to-noise ratio leading to high bit error rates. Even stations having different coding schemes combined with rate adaptation may be used for compensation and its available bandwidth can change dramatically. 2 Contention. Wireless nodes share the same medium and contend for access to the channel. To avoid collisions, stations listen to the channel to detect nearby transmissions. It is controlled by the MAC protocol and bandwidth estimation is done on assumption that node gets the channel access in FIFO order[4]. This assumption may fail in case of hidden stations, so there is a need of additional mechanisms such as CSMA/CA[23] which handle the contention in a fully distributed manner and follow the random channel allocation to the node. 3 Frequent packet loss. Wireless system manages the packet delivery by stop-and-wait ARQ technique. Retransmission can consume channel capacity and lead to varying one-way delays which can affect the bandwidth estimation. Two methods are available for the bandwidth estimation– Direct method and iterative method[5]. Spruce[6], WBest[7] and IGI[8] use direct probing with the assumption that link effective capacity (CEf f ective ) is already known and then using the rate response model [14], it calculates the available bandwidth (A). WBest is a two-step algorithm; in the first step its evaluate the link CEf f ective and then it sends the packet train to evaluate the A. Spruce assumes that the CEf f ective is already known and directly applies the rate response model to calculate the A. IGI uses probing trains with increasing gaps to evaluate A. TOPP[9], DietTopp[12] and Pathchirp[13] use the iterative method which do not require the previous knowledge of link CEf f ective . They use multiple probing rate, aiming to check the behavior of the rate response model and its turning point. TOPP uses trains of packet pairs with increasing rates and apply the rate response model for A estimation. DietTopp uses multiple node case with varying proving rates. Pathchirp increases the probing rates of chirp within a single probe. These bandwidth estimation tools yield highly unreliable values because of their assumption to develop the bandwidth measurement model by considering only the effect of cross traffic. As discussed, they are based on the rate response model. It assumes the single bit-carrier multiplexing of several users in the FIFO order which is not applicable to WMNs. The contention among users is supported Fig. 1. Wired-Wireless Testbed Setup by CSMA/CA which often does not
Bandwidth Probe: An End-to-End Bandwidth Measurement Technique
81
follow the FIFO assumption and nodes get the channel access in distributed manner. Fig.1 shows the traffic in WMNs – cross and contending traffic. Motivated by the challenges in the existing model and properties of the WMNs, we have proposed a new model in lines with the rate response curve discussed in the next section. Based on the proposed model, we have developed a new algorithm "Bandwidth Probe", which is specially tailored towards WMNs for the A estimation. It is a single stage algorithm which sends the packet trains with certain spacing between the packets. The spacing between the packets in the train depend on the input rate of the packet trains with the assumption that the input rate is always greater than the CEf f ective of the network. Our main objective is to consider the effect of both cross and CSMA/CA based contending traffic in the steady state system and to reduce the random wireless error during the bandwidth calculation. The rest of the paper proceeds as follows: In the second section we give the background work and proposed model. In the third section we describe the developed bandwidth estimation algorithm, Bandwidth Probe and its dispersion model that shows its actual behavior. In the last section we describe the analysis, experimental simulation and comparison with the existing tools and methods.
2
Background and Proposed Model
The rate response curve[14][15] is one of the fundamental model for bandwidth estimation. Such a model places the fluid assumption for the cross traffic that it traverses the FIFO queue where the probing flows. Under the assumption, CEf f ective of the network is already known and then A of the network is given as A = CEf f ective (1 − μ)
(1)
where μ is part of the capacity utilized by the cross traffic. If the input rate and output rate of probe flow are ri and ro respectively, the rate response curve behavior of the networks in the presence of cross traffic can be represented as ri if ri ≤ A ro = (2) ri CEf f ective ri +CEf f ective −A if ri >A We can also estimate the available bandwidth in a direct way if ri = CEf f ective . In such a case the A will be A = CEf f ective 2 − CEf frective (3) o The probe packets rate can be presented in term of the input gap (Δi ), output gap(Δo ) between the packet pair and packet size(L), ri = L /Δi and ro = L /Δo . As opposed to the above mentioned rate response curve which only considers the effect of cross Fig. 2. Model of CSMA/CA based system
82
A.K. Singh and D.K. Jatindra
traffic, an assumption is taken that all the nodes which get the channel access under FIFO mechanism cannot hold any longer in the CSMA/CA based MAC environment. This is because it often handles the contention in a fully distributed manner and the nodes get the access of the channel in a random distributed manner. So A cannot be accurately derived from (2) in WMNs. Figure. 2 shows the typical model of CSMA/CA based wireless system and its traffic behaviors. To deal with the effect of CSMA/CA we consider an extra parameter – achievable throughput (TAchievable ). The TAchievable is the average packet dispersion rate at the receiver side. It measures the bandwidth along the direction of probe traffic and later infers the accurate value of A. So with the help of this new parameter we proposed a new model for rate response curve (2) which will be suitable for the WMNs. If input rate of probe traffic ri is described by the following equation: ro = min(ri , TAchievable ), Dispersion(ΔDis ) measurement of the receiver side is ΔDis =max(ΔSender, ΔReceiver ). The above parameters gives the clarity about the value of TAchievable ≤ ri , which will always be valid. So, proposed model for the WMNs is as follows ri if ri ≤ A TAchievable = (4) ri CEf f ective ri +CEf f ective −A if ri >A
3
Bandwidth Estimation Algorithm
Bandwidth Probe depends on the proposed model mentioned in eq (4) and uses a similar mechanism where the rate of the packet train can be converted into certain spacing of the train’s packets. That shows a direct relation to the gap model of the packet pair dispersion[16][20] at receiver side. It uses the independent packet pairs in the packet trains to calculate CEf f ective and packet trains for TAchievable . To reduce the effect of the interference and uncertain nature of the wireless environments, it uses the mean value of CEf f ective and TAchievable to calculate A. In a single probe, it sends N number of packet trains with each packet train having n packets to capture the steady state behavior of the system and detect packet queuing behavior and its transmissions[15] at the wireless node and access point. It sends the packet train with the assumption ri ≥ CEf f ective that produces the smallest gap between the packets to get the narrow link capacity estimate[17]. 3.1
Bandwidth Probe
Bandwidth Probe is a single stage algorithm, having three parts to calculate the CEf f ective , TAchievable and A. Calculation of CEf f ective and TAchievable will be done parallelly. The sender sends the packet train with ΔSender time gap between each packet pairs and the receiver receives them with ΔReceiver time gaps, if TSendi and TSendi+1 are the sending times of the ith and i + 1th packet respectively and TReceivei and TReceivei+1 are the receiving times of the ith and i + 1th packet respectively.
Bandwidth Probe: An End-to-End Bandwidth Measurement Technique
83
Table 1. Bandwidth Probe Algorithm
1 Measuring Eective Capacity
Ce and Ci : Intermediate parameters Set Ce = 0 for i ← 0 to N − 1 do Set Ci = 0 for j ← 0 to (n − 2)/2 do if ΔSenderj ≥ ΔReceiverj
then
ΔDisj =ΔSenderj
else end
ΔDisj = ΔReceiverj
Ci +=
end
Ce +=
end
CEective =
L ΔDisj
Ci n/2 Ce N
2 Measuring Achievable Throughput ΔDis and TA : Intermediate parameters Set TA = 0 for i ← 0 to N − 1 do Set ΔDis =0 for j ← 0 to n − 2 do if ΔSenderj ≥ ΔReceiverj then ΔDisj =ΔSenderj
else
ΔDisj = ΔReceiverj
end
ΔDis += ΔDisj
end
Dis ΔTotal = Δ n−1 if ΔSender ≥ ΔT otal then TA +=
max(ΔSender
3 Actual Available bandwidth else if CEective == ri then L TA + = ΔTotal if TAchievable ≥ CEective /2 and end CEective = 0 then CEective end A = CEective [ 2- TAchievable ] TAchievable = TNA else A=0 end else ri A = ri +CEective [1− TAchievable ] end
L
+ k(n), ΔT otal )
ΔSenderi = TSendi+1 - TSendi ; ΔReceiveri = TReceivei+1 - TReceivei So ΔDisi = max(ΔSenderi , ΔReceiveri ) where ΔDisi is the dispersion measurement of the packet pair in the packet train at the receiver side. Assume k(n) is the average service delay of the hop-workload by cross traffic. The complete Bandwidth Probe algorithm is described in Table 1. 3.2
Bandwidth Estimation during Probe Packet Loss
If the probe packets are lost, Bandwidth Probe runs only on the packet pairs that are received successfully in the packet train. It passes A to the low pass filter. Here, Aold and Acurrent are the old and current available bandwidths respectively. A = 0.93 ∗ Acurrent + 0.07 ∗ Aold (5) For setting the constant value in (5) in packet loss situation we perform an experiment wherein we calculate the throughput of the performed application.
84
3.3
A.K. Singh and D.K. Jatindra
Synchronization and Clock Skew Issues
For successful bandwidth estimation, System needs to rely on a perfectly synchronized clock [25]. Suppose δ is the time offset between the sender and the receiver ΔSender = TSendi+1 - TSendi ; ΔReceiver = TReceivei+1 -TReceivei+1 -2δ ΔDisi = max(ΔSenderi , ΔReceiveri ) (6) Eq. (6) shows the dispersion of the packet pair. Since it includes the time offset between the sender and receiver, the calculated A will not be affected by synchronization issues and the measured samples of dispersion will produce a good estimation. And clock drift can be avoided by considering the mean value of dispersion issues. 3.4
Length of Packet Train and Input Gap between Packets
The length of packet train is very important in the sense of accuracy, convergence and intrusiveness. We perform an experiment with the same scenario as in section 5-A to discuss the service delay and to retrieve the length of the packet which will produce the transient state. Service delay is the packet wait at the head of the transmission queue until it gains access to Fig. 3. Behavior of the system – Serthe channel and is completely transmitvice Delay (sec) vs Number of Packets ted. Transient state is the state in which the system is neither empty nor in backlog when the probing packet is transmitted. The transitory is maximum when the probe traffic and cross traffic send their fair share. To provide synchronization, we use syn-server which connects both sender and receiver. From fig. 3, we infer that the Service delay of the packets is initially low but gradually its distribution changes as more and more packets started reaching the queue link. In order to achieve the practical train length, we repeat the experiment more than 50 times. After sending 140 packets, we get the steady state of the system in every trial. The gap between the packets depends on ri . So ΔSender = L / ri . The probing sequence depends on the Poisson distribution to assure proper interaction with the system and considering no context switch within a packet train accessing. 3.5
What the Algorithm Really Does
In WMNs, dispersion due to both contending and cross traffic are at wireless nodes and access point. So traffic would not be FIFO manner. This causes random delay between two successive packets. Hence, to trace the behavior of WMNs, Bandwidth Probe measures two variables, CEf f ective and TAchievable . CEf f ective indicates the maximum capability of the wireless networks delivered to the network layer traffic. Wireless network adopts the dynamic rate to send
Bandwidth Probe: An End-to-End Bandwidth Measurement Technique
85
the traffic, so the CEf f ective is defined as the continuous function of packet size L and time t. t2 L dt CEffective = t1t1Δ(t) (7) −t2 where Δ(t) is the packet pair dispersion at time t. We can also model this equation in discrete manner. n−1
CEffective =
i=0
L Δ(i)
n
(8)
where Δ(i) is the Dispersion of ith packet pair. The second parameter TAchievable measures the dispersion of the packets due to the contention between the probe and contending traffic. L TAchievable= n−1 (9) 1 n
Δ(i)
i=0
TAchievable is also known as the average packet dispersion rate i.e. the average time used to forward one single packet. It uses only the effect of contending traffic along the direction of the probing flow. As discussed before, the relation can be established among the measured parameters by Bandwidth Probe and A as A ≤ TAchievable ≤ CEf f ective . With the assumption of Bandwidth Probe ri CEffective , proposed model (4) can be described if ri = CEffective then C2
Effective TAchievable = 2CEffective −A CEffective A = CEffective 2 − TAchievable
(10) (11)
A can be derived from (10) but if ri > CEffective then A can be derived by the following equation ri A = ri + CEffective (1 − TAchievable ) To derive the bound of A, if CEffective =0 and TAchievable CEffective /2 then A will be 0 otherwise it can be derived from the above equations. If A is equal to CEf f ective , this implies that the network is idle (no cross traffic). Otherwise, the leftover portion of CEf f ective by the cross and contending traffic is A. 3.6
Dispersion Model of Bandwidth Probe Algorithm
For considering the dispersion model assume that the probing sequence enters the transmission queue at instance {TSendi , i=0,1,..,n- Fig. 4. Interaction among TSend , TReceive and Cross Traffic 1}, their departure related Process (x) instance (i.e. instance at which they completely leave the transmission queue) is defined by {TReceiveri , i=0,1,..,n-1}.
86
A.K. Singh and D.K. Jatindra
Bandwidth Probe assumes negligible transmission time as compared to the dispersion and service delay. If cross traffic related process forms the sequence {xi ,i=0,1, .,n-1}, fig. 4 shows the inter departure time at the output path. The output gap between the packet is defined as ΔReceiver=
TReceiven−1 −TReceive0 n−1
(12)
This can be expressed as the cross traffic related process form of sequence as follows −xn−1 +x0 ΔReceiver= (n−1)ΔSender (13) n−1 The calculated value of ΔReceiver and ΔSender can establish the relation between dispersion and ri and ro of rate response curve described in (2). L L (14) ΔSender ≈ ri ΔReceiver ≈ ro From the above discussion, we infer that rate of packet trains can be used for the interaction with traversing system. Considering the cross traffic as the offered load, the dispersion equations can be modeled as dispersion perspective where E[.] is the limiting average of a sample of a path process. n−1 1 If ΔSender ≤ n−1 E [νi ], the output dispersion is the function of the average i=1
utilization of the cross traffic in FIFO order (μfi fo ) and service delay ( νi ) of ith packets when they are contending for the medium then n−1 1 E[ ΔReceiver] = n−1 E νi +μfi fo ΔSender (15) If ΔSender ≥
1 n−1
n−1
i=1
E [νi ], and k(n) is the average service delay of the hop-
i=1
workload by cross traffic. The following equation can describe the aggregate ⎧
bound n−1 ⎪ 1 ⎪ E νi +μfi fo ΔSender ⎨E [ΔReceiver] ≥ max ΔSender + k (n) , n−1 i=1
(16) n−1 ⎪ 1 ⎪ ⎩E [ΔReceiver] ≤ min ΔSender + n−1 E [νi ] + k (n) , (μfi fo + 1) ΔSender i=1
The above equations shows the behavior of the rate response curve in the steady state. The output dispersion at the receiver’s side denoted by equations (15) and (16) is used to calculate the TAchievable given in (9).
4
Analysis and Experimental Results
In this section we present the simulation results and analysis of the proposed model. We have used 802.11b/g standard for simulation and the nodes are configured to CSMA/CA; so simulated data packets are preceded by an RTS/CTS exchange [23]. The header size is as per the standard- RTS has 20 bytes, CTS has 14 bytes, ACK has 14 bytes and MAC has 34 bytes. In each of the subsections, we have run the simulation 50 times and the given result is the mean of all the estimated results.
Bandwidth Probe: An End-to-End Bandwidth Measurement Technique
87
4.1 Measurement of Available Bandwidth by Different Packet Sizes In this simulation, we have created a topology with wired and wireless nodes and access point. The wired link capacity is 30 Mbps and the wireless channel is using the CMU wireless extension[18]. The wireless channel is tuned on the IEEE 802.11b based lucent way eLAN card at 5.5 Mbps with no mobility, with the effective transmission range as 250 meters and interference range as 550 meters. The origin of the Probe packet is the wired node and destination is the wireless node. Ad hoc On-Demand Distance Vector (AODV) [18] is as the route agent. Each wireless node is configured in the multi-hop scenario. The results estimated by Bandwidth Probe are as expected. Figure.5 shows that if the size of the probe packet is large then the estimated capacity is high and if its size is small then the estimated capacity is less. This is because, if the packet size is small, more number of ACKs contend for the medium with other packets at the link layer.
Fig. 5. Result of available bandwidth es- Fig. 6. Bandwidth estimation along a chain timation (without interference) of nodes with different packet lengths
4.2
Measurement of Available Bandwidth on Chain Topology
With the same scenario as in subsection A, all the wireless nodes are placed in a row. The result has come out favourably as expected[17] and the effective end-to-end capacity decreases as the length of chain grows. Bandwidth Probe is able to achieve the end-to-end capacity estimation that closely matches the analytical prediction (of the single hop capacity). Table 2. Estimated Available Bandwidth by Different Techniques (in Mbps) Tuned Capacity / Cross Traffic 54 / 2 Mbps 48 / 2 Mbps 36 / 2 Mbps 24 / 2 Mbps IGI/PTR 3.726 3.312 2.484 1.656 Pathload 3.672 3.26 2.448 1.632 Pathchirp 10.612 9.544 7.408 5.272 WBest 6.696 5.952 4.464 2.976 Bandwidth Probe 7.452 6.624 4.496 3.312 Ground Truth 8.64 7.68 5.76 3.84
88
4.3
A.K. Singh and D.K. Jatindra
Comparison with the Existing Bandwidth Estimation Techniques
In this subsection, we have created a test bed having two wired nodes with a capacity of 100 Mbps, one access point and four wireless nodes. Each wireless node is placed equi-distant from each other and from access point. Each wireless node is using 802.11g standard. Table 2 shows the different link rate of the wireless nodes and rate of cross traffic in the different cases. The effective transmission and interference range is 250m and 550m respectively. We have set one of the wired nodes as the source and a wireless node as the destination. Cross traffic is created by CBR UDP with packet size as 1000 bytes same as probe packet size. The value ground truth of A is given by analytical method. Table 2 clearly shows that the Bandwidth Probe estimates A more accurately than the rest of the measurement techniques. IGI/PTR and Pathload always underestimates A while Pathchirp overestimates it. WBest measures a good approximation of A but it is not considering the steady state behavior of the system and hence the estimated A can vary over time and produce inaccurate results. Figure 7 shows the mean relative error in all the four cases mentioned in Table 2. IGI/PTR and Pathload gives high relative error in the estimation. Pathchirp and WBest show better accuracy and lower relative error. But, Bandwidth Probe is having the best accuracy and least relative error. It is also evident that Bandwidth Probe is having larger variability than the other estimation techniques. Intrusiveness is the probe byte sent by the estimation tools during measurement.
Fig. 7. Mean error while esti- Fig. 8. Intrusiveness in Fig. 9. Convergence of different techniques different techniques mating available bandwidth
Figure 8 shows that Bandwidth Probe has much lower intrusiveness as compared to the other techniques with values as low as 130 Kbytes. IGI/PTR and Patchirp have an intrusiveness of 570 Kbytes and 450 Kbyte respectively. Pathload has the largest intrusiveness around 1600 Kbytes. WBest is comparatively having better intrusiveness of 170 Kbytes. Convergence is the time spent by the estimation techniques during measurement. Figure. 9 shows that Bandwidth Probe seems to have much less convergence time compared to the others with 0.48 seconds. IGI/PTR uses convergence time 1.5 seconds. Pathchirp has the longest convergence time of 18 seconds and Pathload having around 15 sec . WBest is also comparatively better in the sense of convergence having convergence time 1.2 seconds.
Bandwidth Probe: An End-to-End Bandwidth Measurement Technique
5
89
Conclusion
In this paper, we have proposed a new bandwidth estimation technique for the WMNs and multi-hop wireless network. To avoid estimation delay and the effect of random errors in the wireless channel, we use a statistical measurement technique in an iterative way. It is an estimation method which depends on the dispersion principle that uses probe packet trains for the measurement. Bandwidth Probe inserts certain spacing between the packet trains showing direct relation to the gap of packet pairs. We have also given the experimental details and comparison with the existing techniques. The results clearly shows that Bandwidth Probe is able to deal with the contending traffic, interference and mobility more efficiently.
References 1. Akyildiz, I.F., Wang, X.: A Survey on Wireless Mesh Networks. IEEE Radio Communication, 523–530 (2005) 2. Jacobson, V.: Congestion Avoidance and Control. Computer Communication Review, 314–329 (August 1988) 3. Parag, S.M.: Cross-layer bandwidth management and optimization in TDMA based wireless mesh networks using network coding. SIGMM Records (September 2010) 4. Prasad, R.S., Murray, M., et al.: Bandwidth Estimation: Metrics, Measurement Techniques and Tools. Proc. of IEEE Network 17, 27–35 (2003) 5. Bredel, M., Fidler, M.: A Measurement Study of Bandwidth Estimation in IEEE 802.11g Wireless LANs Using the DCF. In: Das, A., Pung, H.K., Lee, F.B.S., Wong, L.W.C. (eds.) NETWORKING 2008. LNCS, vol. 4982, pp. 314–325. Springer, Heidelberg (2008) 6. Strauss, J., et al.: A measurement study of available bandwidth estimation tools. In: IMC 2003: Proc. of the 3rd ACM SIGCOMM conf. on Internet measurement (October 2003) 7. Li, M., Claypool, M., Kinicki, R.: WBest: A Bandwidth Estimation tool for multimedia streaming application over IEEE 802.11 wireless networks. Technical Report WPI-CS-TR-06-14, Worcester Polytechnic Institute (March 2006) 8. Hu, N., Steenkiste, P.: Evaluation and characterization of available bandwidth probing techniques. IEEE JSAC 21(6), 879–894 (2003) 9. Melander, B., Bjorkman, M., Gunningberg, P.: Regression-based available bandwidth measurements. In: Proc. of the 2002 International Symposium on Performance Evaluation of Computer and Telecommunications Systems, San Diego (July 2002) 10. Zhu, Y., Dovrolis, C., Ammar, M.: Dynamic overlay routing based on available bandwidth estimation: A simulation study. Elsevier Computer Networks (2006) 11. Konda, V., Kaur, J.: RAPID: Shrinking the Congestion Control Timescale. In: Proc. of IEEE INFOCOM (2009) 12. Johnsson, A., Melander, B., et al.: DietTopp: A first implementation and evaluation of a simplified bandwidth measurement method. In: Proc. of SNCNW (November 2004) 13. Ribeiro, V.J., Riedi, R.H., Baraniuk, R.G., Navratil, J., Cottrell, L.: Pathchirp: Efficient available bandwidth estimation for network paths. In: Proc. of PAM (April 2003)
90
A.K. Singh and D.K. Jatindra
14. Melander, B., Bjrkman, M., Gunningberg, P.: Regression based available bandwidth measurements. In: Proc. of SPECTS (2002) 15. Liu, X., Ravindran, K., Liu, B., Loguinov, D.: A Queuing-Theoretic Foundation of Available Bandwidth Estimation: Single-Hop Analysis. In: ACM/SIG IMC (2005) 16. Li, M., et al.: Packet Dispersion in IEEE 802.11 Wireless Networks. In: Proc. of IEEE Local Computer Networks (2006) 17. Sun, T., Yang, G., Chen, L., Sanadidi, M.Y., Gerla, M.: A measurement study of path capacity in 802.11b based wireless networks. In: WiTMeMo 2005, Seattle, USA (2005) 18. Network simulator (ns2), http://www.mash.cs.berkeley.edu/ns/ 19. Perkins, C.E., Royer, E.M., Das, S.: Ad Hoc On Demand Distance Vector (AODV) Routing, IETF Internet draft, draft-ietf-manet-aodv-08.txt (March 2001) 20. Dovrolis, C., Ramanathan, P., Moore, D.: Packet dispersion techniques and a capacity estimation methodology. IEEE/ACM Transaction on Networking (2004) 21. IEEE P802.11 Task Group S - Status of project IEEE 802.11s - Mesh Networking, http://www.grouper.ieee.org/groups/802/11/Reports/tgs_update.htm 22. Liu, X., Ravindarn, K., Loguinov, D.: Multi-Hop Probing Asymptotics in Available Bandwidth Estimation: Stochastic Analysis. In: Proc. of ACM/SIG IMC (2005) 23. Xu, K., Gerla, M., Bae, S.: How Effective is the IEEE 802.11 RTS/CTS Handshake in Ad Hoc Networks? In: IEEE GLOBECOM (2002) 24. Xu, K., Gerla, M., et al.: An ad hoc network with mobile backbones. In: IEEE ICC (2002) 25. Zhang, L., Liu, Z., Xia, C.H.: Clock Synchronization Algorithm for Network Measurements. In: IEEE Infocom (2002)
An Efficient Mining of Dominant Entity Based Association Rules in Multi-databases V.S. Ananthanarayana Department of Information Technology National Institute of Technology Karnataka, Surathkal - 575 025, India
[email protected] Abstract. Today, we have a large collection of data that is organized in the form of a set of relations which is partitioned into several databases. There could be implicit associations among various parts of this data. In this paper, we give a scheme for retrieving these associations using the notion of dominant entity. We propose a scheme for mining for dominant entity based association rules (DEBARs) which is not constrained to look for co-occurrence of values in tuples. We show the importance of such a mining activity by taking a practical example called personalized mining. We introduce a novel structure called multi-database domain link network (MDLN) which can be used to generate DEBARs between the values of attributes belonging to different databases. We show that MDLN structure is compact and this property of MDLN structure permit it to be used for mining vary large size databases. Experimental results reveal the efficiency of the proposed scheme. Keywords: Multi-databases, Valueset, Personalized mining, Dominant entity, Association rules.
1
Introduction
In real-life situations, associations among data items are hidden and mining for these associations is further complicated because of possible distribution of the data across several databases [1] [2]. For example, consider person based information system; wherein information about the person is available as customer in SUPERMARKET database, as employee in COMPANY database, as patient in the MEDICAL CENTRE database, and as passenger in TRAVEL database. There could be an implicit association among various parts of this data. For example, there may be an association between salary, region of living, mode of traveling and disease; like people who have salary in the range of US$ 1000 US$ 2,000, eat frequently at CENTRAL CALCUTTA hotels and travel by air in executive class have cardiovascular diseases, where CENTRAL CALCUTTA is marked with a high pollution rating. Such examples provide the associations of a person in different contexts. We call such a mining activity person-based associations mining (or personalized mining) and the system under which the mining activity performed is called person warehouse. The primary property of such a warehouse is the possibility to generate a global schema linked by S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 91–100, 2011. c Springer-Verlag Berlin Heidelberg 2011
92
V.S. Ananthanarayana
one entity which we call, dominant entity. The associations between the values of attributes of interest using the dominant entity are called dominant entity based associations. In the above example, the dominant entity is person which is characterized by attributes name and address. The set of attributes which characterizes dominant entity is called dominant entity attributes (DEA). The sets of attributes between which we want to find the associations are called characteristic attributes (CA). In person based information system example quoted above, {salary, region-of-living}, {mode-of-travel}, {diseases} are characteristic attributes. In this paper, we are addressing the problem of ‘dominant entity based association mining activity in multiple databases’. For illustration purposes we use ‘person warehouse’ through out the paper. However, the notion can suit any domain where we have a dominant entity and the data warehouse. 1.1
Problem Definition
The conventional association rules are mainly based on togetherness or cooccurrence of values in a tuple [3] [6]. However, in the dominant entity based mining like personalized mining, we want to mine the rules of the following type: income > “Range X” ∧ age < “Range Y” =⇒ purchase = “Costly goods”. Note that in order to generate such rules from multiple relations/databases, support for individual values are important and associations are built using the dominant entity. Such rules are called dominant entity based association rules, (DEBAR). Further, in the above rule, all “Costly goods” need not be purchased together. (The goods purchased is said to be “costly” if the cost is more than “Z Dollars”). We originate a scheme for mining for dominant entity based association rules which is not constrained to look for co-occurrence of values in tuples. It can look at attributes in one more databases which may be located at many places. We discuss this scheme in section 2. In order to generate DEBARs, there is a need to link the values of attributes of interest to the values of dominant entity attributes. We propose and develop a structure called “multi-database domain link network (MDLN)” for the mining activity that involves several databases. This structure provides link between the values of characteristic attributes and the values of dominant entity attributes in an efficient manner in terms of - space to hold the structure; and access time to generate DEBARs out of it. The rest of this paper is organized as follows. We discuss MDLN structure in section 3. We show how meaningful associations can be mined using MDLN in section 4. Experimental results are described in section 5. We conclude our study in section 6.
2
Dominant Entity Based Association Rules (DEBAR)
y x Let vaij and vbkl be the j th and lth values of characteristic attributes, Axai and y Abk which belong to relations Rax and Rby of databases Dx and Dy respectively. Let d1 , d2 , . . . dN be the N distinct values pertaining to DEA.
An Efficient Mining of Dominant Entity Based Association Rules
93
x x Let C(vaij ) be the count of vaij with respect to each dz (1 ≤ z ≤ N ) to which x x it maps. This gives the support of vaij . If the support of vaij is greater than or x equal to user defined minimum support (σ), then we call vaij - a frequent value. x x depends be P (vaij ). Let the set of values dz (1 ≤ z ≤ N ) on which count of vaij x This gives the set of elements which supports vaij . y y x Similar to that of vaij , let C(vbkl ) be the count of vbkl with respect to each of the dw (1 ≤ w ≤ N ) to which it maps. Let the distinct dw (1 ≤ w ≤ N ) y y on which count of vbkl depends be a set, P (vbkl ). The dominant entity based y x association rule DEBAR, vaij =⇒ vbkl holds if
1. 2. 3. 4.
x vaij is a frequent value, y is a frequent value, vbkl y x Number of elements in P (vaij ) ∩ P (vbkl ) ≥ σ and y x x |(P (vaij ) ∩ P (vbkl ))|/|P (vaij )| ≥ c, user defined minimum confidence. The L.H.S of the inequality gives the strength of the rule.
For illustrative example, please refer [5]. We present an algorithm to generate DEBARs in section 4. The DEBARs are different from conventional association rules (AR) [3] in the following aspects : 1. In AR mining, the frequency (largeness) of an itemset is based on the number of times it appears in the transaction database. It would account for every transaction made ignoring the fact that the same transaction is made by the same customer many number of times. However, in DEBAR, we are interested in mapping of valuesets on distinct DEA values. For example, in the person based information system, if many people who have two wheelers are suffering from headache, then an interesting DEBAR could be “twowheeler user =⇒ has-headaches”. Here, we do not care if the same person has many two wheelers. Because in our frame work, the frequency is based on how many DEA values are involved in the mapping process. 2. In AR mining, even if items ia and ib occur many number of times, and do not occur together in any transaction, then (ia , ib ) is not a large itemset because of the togetherness property. Note that people may not purchase several costly goods together. Therefore, costly goods may not occur together in the same transaction of a transaction database. So, the rule like “{Airconditioner, Microwave, Refrigerator, TV} =⇒ Four wheeler” is not possible, because the possibility of number of transactions which have all the four items in a transaction is less likely. In the case of DEBAR, we do not insist on together occurrence of the values/items; rather, how many DEA values are involved in the association process is of importance. So, the above rule is possible, if we find many persons with those goods bought, perhaps, even at different times.
3
Multi-database Domain Link Network (MDLN)
As discussed in section 2, we want to find the associations between the values of characteristic attributes which may belong to several databases. These values
94
V.S. Ananthanarayana
are mapped to a range of values of DEA. So, there is a need to link the values of characteristic attributes using the DEA values, with the help of which we can find the required associations. We propose a structure called multi-database domain link network (MDLN) to generate such links between the values of characteristic attributes and values of DEA across the databases. Let C1 , C2 , . . . Cm be m characteristic attributes pertaining to any relations which may belong to different databases. Let the domain of DEA be {d1 , d2 , . . . dN }. In person warehouse, the values of DEA, d1 , d2 , . . . dN map to each of groups pertaining to N persons. [d1 , d2 , . . . , dN could be social security numbers (SSN), if they are available]. The linking mechanism among values of m characteristic attributes (CAs) through d1 , d2 , . . . , dN is shown in Figure 1. In the figure, B1 , B2 , . . . BN are structures to hold the associations. C1
...
... B1
V 11
V 21
. . . V
...
.. .
d1
. . .
V 1m
. . .
. . .
B2
. . .
d2
. . .
g1
Cm
V 2m
V hm
BN
Set of all values corresponding
. . .
dN
. . .
to the attribute C 1
Set of all values corresponding to the attribute
Cm
Fig. 1. Linking mechanism
There are four types of relationships, one to many, one to one, many to one and many to many - that are possible between the values of characteristic attributes and DEA. We discuss the handling of each of these relationships in the following four sub-sections. In order to realize a compact structure, we treat the relationships one to many (one to one) and many to many (many to one) between the characteristic attributes and dominant entity attributes differently. 3.1
One to Many (1:N) Relationship
The relationship between the characteristic attribute, Ci and the dominant entity attribute, DEA is 1 : N , if each value of Ci maps to one or more values of the DEA and not vice-versa.
An Efficient Mining of Dominant Entity Based Association Rules
95
For example, consider the relation, EMP PERMANENT INFORMATION shown in Figure 2A. Assume that the characteristic attribute is Age and dom-
EMP_PERMANENT_INFORMATION relation Name ...
Location ...
Age Value Count
Sex ...
Age 30
Pid 1
...
...
...
40
2
...
...
...
30
3
...
...
...
40
4
...
...
...
...
...
...
30 ...
6 ...
Link pointer
Pid (as dominant entity) dnode ...
30
3
1
40
2
2
...
vnode 1n
3
...
4
...
6
...
Next pointer
FIGURE A
Next pointer
m Link Pointers for m characteristic attributes FIGURE B
Fig. 2. Handling of 1:N relationship for m associations
inant entity attribute is P id. There is 1:N relationship between Age and P id. Figure 2B shows the part of MDLN structure which handles the 1:N relationship. The figure shows two types of nodes - vnode1n (value node) and dnode (DEA value node). A vnode1n holds a value of the characteristic attribute, and the number of times the value is mapped to distinct dominant entity values. A dnode holds a value of the DEA. Both vnode1n and dnode also have two pointers, the Link pointer and the Next pointer. 1. Link pointer : This is a pointer which helps in navigation through the actual associated values of the DEA corresponding to a value of the CA in the form of a circular list. In Figure 2B, from the DEA node (called dnode) having value 1, the link pointer points to the dnode having value 3 whose link pointer points to the dnode having the value 6. Note that these three distinct P id values are associated with a value 30 of the characteristic attribute Age. So, a circular list is constructed involving the vnode1n whose value = 30 and all dnodes which are having the associated values corresponding to Age = 30 using the link pointers. Since there are three P id values associated with Age = 30, Count field of corresponding vnode1n is set to 3. Link pointer in the vnode1n is a single pointer. However, dnode contains m link pointers since m characteristic attributes are involved in the relationship with the DEA. 2. Next pointer : This is used to generate the list of nodes pertaining to the values of a characteristic attribute or DEA. 3.2
One to One (1:1) Relationship
This is a trivial version of 1:N relationship. So, it can be handled as a specialization of the 1:N relationship. Here, there is no linked list, but only a node to node linking.
96
V.S. Ananthanarayana
3.3
Many to Many (M:N) Relationship
The relationship between the characteristic attribute, Ci and the dominant entity attribute, DEA is M : N , if values of both Ci and DEA have more than one mapping from each other. For example, consider the relation, CUSTOMER PURCHASE shown in Figure 3A. Assume that the characteristic attribute is Goods-purchased and domi-
CUSTOMER_PURCHASE relation Pid
Goods-purchased
4
bread
4
milk
2
coffee
2
butter
4
coffee
4
butter
4
coffee
...
Pid 4
...
2
...
Next pointer
dnode
... FIGURE A
vpointer
dpointer
cnode m Link pointers for m characteristic attributes
Link pointer Goods-purchased
bread Value
1
coffee
2
Count milk
1
butter
2
Next pointer
vnodemn FIGURE B
Fig. 3. Handling of M:N relationship for m associations
nant entity attribute is P id. There is M:N relationship between Goods-purchased and P id. Figure 3B shows the part of MDLN structure which handles the M:N relationship. The figure shows three types of nodes, dnode (DEA value node), vnodemn (value node) and cnode (connector node). Similar to vnode1n , a vnodemn holds a value of the characteristic attribute, and the number of times the value is mapped to the distinct DEA values. A cnode is a dummy node which is used to visualize a M:N relationship as two 1:N relationships. There are two types of pointers in dnode and vnodemn : they are Link pointer and Next pointer. Link pointer and Next pointer in dnode are similar to that of respectively Link pointer and Next pointer in dnode of 1:N relationship. But Link pointer of vnodemn points to a cnode and Next pointer of vnodemn points to a vnodemn. A cnode has two pointers. One is dpointer, pointing to a cnode or a dnode. This pointer generates a circular list involving dnodes and cnodes. Other is a vpointer, pointing to a cnode or a vnodemn. This pointer generates a circular list involving vnodemn s and cnodes. The Count field of vnodemn has a similar functionality to that of Count field of vnode1n .
An Efficient Mining of Dominant Entity Based Association Rules
3.4
97
Many to One (M:1) Relationship
It can be handled as an M :N relationship which is discussed in the previous section. Refer [5] for the detailed algorithm on the construction of the MDLN structure and its space and time complexity analysis.
4
DEBAR Generation Using MDLN Structure
From the definition of DEBARs, it is clear that we are interested in the associations between the valuesets which are mapped to a range of values of DEA such that the cardinality of the range is greater than or equal to the user defined threshold value. Note that repeated mappings between the value of a characteristic attribute and the value of DEA are ignored. In MDLN structure, we are not storing any information necessary to get the togetherness /co-occurrences of values in a tuple. And also the Count field of the node pertaining to a value of the CA is updated only if it maps to different DEA values. For example, in Figure 3B even though the relation has three instances of cof f ee, the Count value of the node is set to 2 indicating that two distinct P ids are having association with cof f ee. We give a general outline of the DEBAR generating algorithm using the MDLN structure below. Algorithm INPUT : 1. Two sets of characteristic attributes A (= {A1 , A2 , . . . An }) and B (= {B1 , B2 , . . . Bm }), where A, B ⊂ C. 2. MDLN structure. 3. User defined minimum support, σ and confidence, c. OUTPUT : All possible DEBARs between valuesets of A and B. STEPS : 1. For each valueset, VA (= vA1 , vA2 , . . . vAn ) - an n-valueset of A, find the frequency as follows : (a) Find the frequency of vAx , i.e., Count(vAx ) (1 ≤ x ≤ n) using MDLN structure. (b) ∀vAx , if Count(vAx ) ≥ σ, then find P VA = P vA1 ∩ P vA2 ∩ · · · P vAn . If cardinality of P VA ≥ σ, then VA is frequent (large) valueset. 2. For each valueset, VB (= vB1 , vB2 , . . . vBm ) - an m-valueset of B, find the frequency as follows : (a) Find the frequency of vBx , i.e., Count(vBix ) (1 ≤ x ≤ m) using MDLN structure. (b) ∀vBx , if Count(vBx ) ≥ σ, then find P VB = P vB1 ∩ P vB2 ∩ · · · P vBm . If cardinality of P VB ≥ σ, then VB is frequent (large) valueset. 3. For any two frequent valuesets, VA and VB If (P VA ∩ P VB ) ≥ σ, then {VA , VB } is a large valueset. 4. For any frequent valueset, {VA , VB } VA VB – if P P ∩P ≥ c, then output the DEBAR, VA =⇒ VB . VA – if
P VA ∩P VB P VB
≥ c, then output the DEBAR, VB =⇒ VA .
98
V.S. Ananthanarayana
5
Experimental Results
We conducted experiments to study the storage requirement patterns of the MDLN structure. We compare the size requirements of MDLN with PC-tree structure [4] for multiple databases. “PC-tree is a data structure which is used to store all the patterns occurring in the tuples of a database, where a count field is associated with each value in every pattern.” [4]. PC-tree is chosen for this comparison because, navigation across a collection of PC-trees (each corresponds to a database) can be done using the DEA values as in the MDLN structure. It is already shown in [4] that, the size of PC-tree structure is less than that of the database. Here, we examine whether the MDLN structure is more compact than the PC-tree or not. For this, we assumed the disk block size = 2K Bytes and used three parameters : number of tuples (projected w.r.t. CA), domain size of DEA and number of characteristic attributes (linkages). In order to make the PC-tree attribute-order independent, we constructed PC-trees each with values of DEA and values of characteristic attribute. So, constitute a pattern based on which PC-tree is constructed. In order to handle multiple databases, there is a need to link across the PC-trees through the DEA values. So, the parameter number of attributes (linkages) typically indicates number of PC-trees constructed and in the case of MDLN structure, it represents number of linkages. We have two cases. Case 1 : One to Many relationship between Characteristic and Dominant entity attributes : Figure 4 shows the size of the MDLN structure with the corresponding PCtree size. Here, the number of tuples (projected w.r.t. CA) is varied while size of the domain of DEA is fixed at 999 and number of characteristic attributes (linkages) is changed from 5 to 10. It is clear from both the graphs that there is no further increase in the sizes of the structures with increase in the number of tuples. Further, it is also clear from Figure 4 that, since MDLN structure depends a) on the size of the domain of the attributes and not on the number of tuples, and b) for each value of an attribute only one node is created, the x 10
4
5
1.8 STORAGE SPACE IN BYTES
STORAGE SPACE IN BYTES
9 8 7
No. of Attributes = 5
6 PC−TREE SIZE MDLN SIZE
5 4 3
0
1 2 NO. OF TUPLES
3 x 10
5
x 10
1.6 No. of Attributes = 10
1.4 1.2
PC−TREE SIZE MDLN SIZE
1 0.8 0.6
0
1 2 NO. OF TUPLES
Fig. 4. size of domain of DEA = 999
3 x 10
5
An Efficient Mining of Dominant Entity Based Association Rules
99
size does not increase as that of PC-tree and the storage requirement is about 50% of that of the PC-tree. Case 2: Many to Many relationship between Characteristic and Dominant entity attributes : Figure 5 shows the sizes of MDLN structure and the corresponding PC-tree where the number of tuples (projected w.r.t. CA) is varied while the size of the domain of DEA is fixed at 99 and number of characteristic attributes (linkages) is changed from 5 to 10. Here, we chose a smaller domain to show that as in 1:N relationships, there is no increase in sizes of the structures as the number of tuples increases. In this case, storage requirement for MDLN structure is about 75% of that of the PC-tree. 5
6
x 10
2 STORAGE SPACE IN BYTES
STORAGE SPACE IN BYTES
8
No. of Attributes = 5 6
4 MDLN SIZE PC−TREE SIZE 2
0
0
1 2 NO. OF TUPLES
3 5
x 10
x 10
No. of Attributes = 10 1.5
1 MDLN SIZE PC−TREE SIZE 0.5
0
0
1 2 NO. OF TUPLES
3 x 10
5
Fig. 5. size of domain of DEA = 99
From the experimental results it is clear that MDLN structure handles 1:N relationship between CA and DEA in a more compact manner than M:N relationship between CA and DEA.
6
Conclusions
We introduced the notion of mining for Dominant Entity Based Association Rules (DEBARs). These rules - are dominant entity attributes oriented, do not depend on co-occurrences of values in tuples, capture n-ary relationships and can be generated from values of characteristic attributes across several relations/databases. We showed the importance of such a mining activity by taking a practical example like personalized mining. We proposed a novel structure called Multi-database Domain Link Network (MDLN) which can be used to explore associations between values of various attributes which belong to same or different relations which in turn belong to the same or different databases. We gave a concrete framework for mining for DEBARs in single and multiple relations belonging to one or more databases using the MDLN structure. Our experiments reveal that the storage requirement for MDLN structure varies from 50% to 75% of that of the PC-tree based representation.
100
V.S. Ananthanarayana
References 1. Shang, S.-j., Dong, X.-j., Li, J., Zhao, Y.-y.: Mining Positive and Negative Association Rules in Multidatabase Based on Minimum Interestingness. In: International Conference on Intelligent Computation Technology and Automation, pp. 791–794 (2008) 2. Li, H., Hu, X., Zhang, Y.: An Improved Database Classification Algorithm for Multidatabase Mining. In: Deng, X., Hopcroft, J.E., Xue, J. (eds.) FAW 2009. LNCS, vol. 5598, pp. 346–357. Springer, Heidelberg (2009) 3. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: Proc. 20th Int’l Conf. on VLDB, pp. 487–499 (1994) 4. Ananthanarayana, V.S., Subramanian, D.K., Narasimha Murty, M.: Scalable, distributed and dynamic mining of association rules. In: Prasanna, V.K., Vajapeyam, S., Valero, M. (eds.) HiPC 2000. LNCS, vol. 1970, pp. 559–566. Springer, Heidelberg (2000) 5. Ananthanarayana, V.S.: An Efficient Mining of Dominant Entity Based Association Rules in Multi-databases Using MDLN Structure, Technical Report, NITK (2011) 6. Zaki, M.J., Parthasarathy, S., Ogihara, M.: New algorithms for fast discovery of association rules. In: Int’l Conference on Knowledge Discovery and Data Mining, pp. 283–286 (1997)
Mitigation of Transient Loops in IGP Networks Mohammed Yousef and David K. Hunter School of Computer Science and Electronic Engineering, University of Essex, Wivenhoe Park, Colchester, Essex CO4 3SQ, UK {mayous,dkhunter}@essex.ac.uk
Abstract. Routing loops have recently re-emerged as an important issue in new carrier class Ethernet technologies such as IEEE 802.1aq. While they can waste resources through full link utilization, routing loops were originally mitigated in IP networks by TTL expiration, resulting in wasted resources before packets were dropped after a few seconds. In this paper a new mitigation approach based upon Early Packet Dropping (EPD) is developed, in which a router can drop looping packets earlier than expected if full link utilization will occur before TTL expiration. While the EPD algorithm deals only with loops between two nodes, a proposed solution known as TTL Relaxation is also combined with this loop mitigation mechanism in order to minimize the use of EPD and minimize the performance degradation introduced by loops between more than two nodes. Keywords: routing loops, network performance.
1 Introduction Routing loops are a potential issue in data networks which employ dynamic routing protocols. Recently, this has re-emerged as an important consideration in new carrier Ethernet technologies such as IEEE 802.1aq where a link-state routing protocol determines the shortest path between Ethernet bridges. Routing loops which arise because of unsynchronized routing databases in different nodes are known as transient loops, because they exist for only a specific period before the network converges again. Such routing loops have always existed within Interior Gateway Protocol (IGP) networks, while loops occurring specifically between two nodes comprised 90% of the total in one study [1]. The Time-to-Live (TTL) field within an IP packet implements the basic loop mitigation process. It ensures that packets won’t loop forever, as the TTL will eventually expire, causing the packet to be dropped. Packets may still circulate for a long time before expiring, during which the link is fully utilized and router resources are wasted, delaying other traffic passing through the affected links. This paper introduces a new loop mitigation mechanism based upon Early Packet Dropping (EPD) where packets are dropped only when a loop traversing two nodes is expected to utilize the link fully before the network converges and before the circulating packets get dropped as a result of TTL expiration. The overall proposed solution is divided into two parts. The first part of the mitigation mechanism is known S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 101–115, 2011. © Springer-Verlag Berlin Heidelberg 2011
102
M. Yousef and D.K. Hunter
as TTL Relaxation which is a slight modification to the “exact hop count” approach proposed in [2]. The basic idea is that once an ingress edge router receives an IP packet from the clients (i.e. from the Ethernet port), the TTL of the packet is relaxed to a new TTL which is two hops greater than the diameter of the network (the longest shortest path in terms of hop counts) from the perspective of the ingress edge router. Two more hops are added so that traffic can tolerate rerouting. In the event of a loop, packets with the new TTL will be dropped sooner, relieving the problem and reducing bandwidth consumption. Conversely, when an egress edge router forwards an IP packet to a client, it sets the TTL to a reasonable value (64 in this paper). The remainder of this paper is organized as follows. Section 2 discuses the related studies. Section 3 introduces the methodology while section 4 provides a full discussion of the EPD algorithm. Section 5 and section 6 provide the results and the conclusions respectively.
2 Related Studies While minimizing the convergence time by altering the default routing protocol timers has been proposed to minimize the duration of these routing loops [3], [4], other approaches aim to avoid such loops completely [5], [6], [7] by developing routing algorithms to ensure loop-free convergence. Another loop prevention approach for IGP networks performed Forwarding Information Base (FIB) updates in a specific order so that a node does not update its FIB until all the neighbours that use this node to reach their destinations through the failed path have updated their FIBs [8], [9], [10]. IP fast reroute (IPFRR) solutions which include the installation of backup routes considered loop-free convergence [11]. Another technique was proposed where loops are mitigated, but not avoided, by maintaining an interfacespecific forwarding table [12], [13]; packets are forwarded according to both their destination and the incoming interface, so that looped packets can be identified and dropped at the FIB because they arrive from an unexpected interface. Other studies aimed to avoid routing loops only in cases where network changes are planned [14], [15]. A more general overview of the routing loop problem and existing techniques to mitigate or avoid such loops is discussed in [16].
3 Methodology Part of the GEANT2 network (Fig. 1) is simulated with the OPNET simulator as a single-area Open Shortest Path First (OSPF [17]) network. In Fig. 1, the numbers on the links represent the routing metrics. Full segment File Transfer Protocol (FTP) traffic is sent from the servers to the clients (150 flows), using Transmission Control Protocol (TCP). The servers and clients are connected to the backbone through access nodes. Traffic from the servers follows the shortest path through nodes TR, RO, HU, SK, CZ, DE, DK, SE, and FI. Backbone optical links are configured as OC1 (50 Mbps) with utilization maintained at below 30%. The transmission buffer capacity (in packets) in all routers is equal to Backbone Link Capacity × TCP Round Trip Time (RTT) with an estimated RTT of 250 ms, hence router buffers have a capacity of 1020 packets. IP packets of 1533 bytes are assumed throughout the paper (including headers for IP, TCP, Ethernet and PPP).
Mitigation of Transient Loops in IGP Networks
103
Fig. 1. Simulated network – part of GEANT2
The looped arrows in Fig. 1 identify the six links containing loops. Such loops involve TCP traffic and arise because of link failure followed by delayed IGP convergence. Every simulation run involves only one link failure, creating of one of the loops shown in Fig 1. The shaded nodes delay shortest path calculation after the corresponding link failure, thus creating the loop. Every case was simulated with loop durations varying from 200 ms to 1 second in steps of 200 ms. Voice over IP (VoIP) traffic passed though each loop; this was carried by User Datagram Protocol (UDP) and consumed 10% of the link capacity. However the UDP traffic was not caught in the loop because it was not going through the failed link originally – only TCP traffic is caught in the loop. Since in every case the loop takes place at a different “hop count distance” from the source, TCP traffic enters it with a different Relaxed TTL ( as shown in Table 1. This is because node TR is rewriting the TTL of the source of 11 which gets subtracted by one every time it passes a backbone traffic with node. During the loop, the utilization of the corresponding link and the performance of the VoIP traffic passing through it was monitored and analyzed with and without the EPD mechanism. Table 1.
of the traffic entering the loop for each case in Fig. 1
Case entering the loop
1
2
3
4
5
6
10
9
8
7
6
5
104
M. Yousef and D.K. Hunter
4 Early Packet Dropping Loop Mitigation The EPD algorithm has been developed to mitigate loops arising between two nodes due to link failure, node failure and increase in link metric. However due to lack of space, this paper introduces the part of the algorithm which mitigates loops that arise between the node that detects the link failure and performs the mitigation process (Master), and the node to which the traffic was rerouted (Neighbour). As the node detecting the failure is the first to converge, it’s most likely that a routing loop will form with the Neighbour node which includes the failed link in its Shortest Path Tree and to which traffic is rerouted, because the Neighbour node takes longer to converge. Before describing the details of the algorithm, the following timers and parameters are considered: : The time between detecting the link failure • Link Failure Detection Delay at hardware level (known in advance) and notifying the routing protocol stack (configurable timer). : The default delay between receiving the Link-State • SPF Delay Advertisement (LSA) and starting the Shortest Path First (SPF) calculation (configurable timer). • Neighbour Dummy SPT Calculation Delay : Delay due to Shortest Path Tree (SPT) calculation and routing table update performed at the Neighbour node, assuming the link to the Master node fails. : Delay due to SPT calculation • Master Dummy SPT Calculation Delay and routing table update performed at the Master node assuming the link to the Neighbour node fails. : The time required for any node to upload the original routing • FIB Delay table to the FIB. : . • Neighbour Total Convergence Delay • : Average value of the TTL of packets transmitted by each interface on the Master node. : The throughput in packets per second (pps) of every interface on the Master • node. All delays are measured in seconds, while the dummy SPT calculations and routing updates do not take part in forwarding decisions – their only purpose in the model is to measure the time consumed. As shown in the algorithm in Fig 2, in the negotiation phase each Neighbour node to a corresponding Master node; at this time, each node will take on sends the role of both Master and Neighbour at the same time. This will provide the Master node with the approximate maximum time required by any of its Neighbours to converge when each receives an update (LSA) from the Master node informing it of link failure. During the monitoring phase, the Master node registers the Link Capacity in pps (C), while monitoring S, and Link Utilization ( ) for every interface (Neighbour).
Mitigation of Transient Loops in IGP Networks
105
Fig. 2. The Early Packet Dropping Loop Mitigation Algorithm. Int denotes Interface
If a failure takes place forming a loop that traps TCP traffic, the TCP source sends packets up to its Congestion Window (CWND) and waits until the Retransmission Timeout (RTO) has expired because no Acknowledgments (ACKs) are received from the destination. The utilization of the link increases while these packets bounce between both nodes, until either the routing protocol converges or the TTL expires. will be equal The sum of the CWND of all TCP sources measured in packets ( to the capacity of the bottleneck buffer in the access node, which is 1020 packets. If such traffic bounces over an OC1 link more than 3 times, the link will be fully utilized. If the Master node knows in advance, it can estimate the amount of link capacity consumed by the rerouted and looped traffic. As the nodes cannot know the exact in advance, no more than S packets per second (throughput) will enter the loop if the interface on which S was registered has failed and S packets per second were rerouted. This is why S needs to be registered for every interface. We and as and S respectively of the will denote Master node’s interface at which the link was failed. Once the Master node detects a and and calculates Master Total link failure, it registers Convergence Delay = + + (where belongs to the failed interface). In the calculation phase, the Master node performs a set of calculations (equations 1-8) for every active interface (Neighbour). All other parameters except and belong to the interface on which the calculation is performed. The difference in the convergence time between the Master and the Neighbour indicates the maximum loop duration in : seconds
106
M. Yousef and D.K. Hunter
(1) Assuming that all the packets will be rerouted (alternative path exists) to a single interface, the maximum Queuing Delay in seconds ( ) at that interface is equal to: (2) As justified later, the first second of the loop is being considered, which is why is effectively in units of packets, not packets per second. We define LTT as the transmission delay and as the propagation delay as the Loop Trip Time, (all in seconds). The Master node can estimate the duration of a loop trip with any of its Neighbours (MasterÆ NeighbourÆMaster) by calculating: 2
(3)
In the backbone network where is large and the transmission speed is high, and can be neglected so that . LT is defined as the Loop Traversals which indicates the maximum total number of times the packet travels round the loop before being dropped. (4)
2
is the number of times packets travel round the loop while considering the loop duration that the Master calculated . (5) LT' is the number of Loop Traversals during the first second of the loop. The rationale for considering the first second is discussed later. 1
(6)
Hence, the Master node is able to calculate the utilization that will be introduced with any Neighbour and the total utilization during the by a loop : loop min
,
,
/
(7) (8)
Any interface with 1 will be marked as a Loop Mitigation Interface of any LMI interface is invoked, and packets (LMI). A timer equal to the largest received by and destined to the same LMI interface are dropped (mitigation phase). In equation (7), min , , is considered for the following reason. If LT is the minimum value then all the packets will be dropped during the loop (TTL expiration). is the minimum then the loop will end before the TTL expires. If is the If
Mitigation of Transient Loops in IGP Networks
107
minimum then will reach 100% (assuming that is large) except is known in advance and is normally very small, the when is large. While propagation delay should be large enough to make the packets take longer to traverse between the nodes, hence maximizing and minimizing . While between any two nodes is not expected to be large in current optical backbone networks, traffic bouncing inside the loop can easily utilize the link fully in the onesecond period. 4.1 The Algorithm’s Complexity and Applicability Although in the EPD algorithm, every node runs the SPT algorithm once for every link connected to it, introducing a complexity of where N is the average node degree, such computational complexity only arises after the network is fully converged and hence does not affect the convergence time, and the nodes can perform the SPT calculations while traffic is being forwarded normally. The complexity of the calculations performed between a link failure or an LSA being received, and convergence taking place, is only . Because these calculations do not involve any complex computational operations like calculating a tree or building a complex data structure, we expect that they will consume negligible processing time. Measuring through simulation the time taken for these calculations on a router with 10 interfaces revealed that it takes less than 1 millisecond. The requirements for implementing the EPD algorithm in today’s routers are divided into two categories, namely protocol modification and router modification. The protocol modifications do not require changing the way shortest paths are calculated but it requires the link-state protocol to register and keep track of different timers and be able to send these timers to the router’s neighbours as an LSA/LSP. at every interface, and Router modification must permit monitoring of performance of the calculations presented earlier. Once the router marks an interface as an LMI, there should be a way for the router to determine whether the packets have the same incoming and outgoing interfaces (packets arrive from a next hop node) before dropping them. This requires the router to maintain interface-specific forwarding tables instead of interface-independent forwarding tables (the same FIB for all interfaces). These modifications were proposed in [12]. The results from [12] showed no extra convergence delay against the convergence delay of a traditional link-state protocol. Table 2 shows the interfaceindependent forwarding table while Table 3 shows the interface-specific forwarding table for node RO in the network of Fig 3. These forwarding tables are produced after failure of link RO↔HU. Table 3 is produced after node RO carried all the calculations in equations 1 to 8, in order to determine whether the loop utilization ( ) on any interface is equal to or greater than 1 (100% utilization). The entries in Table 2 are all marked with the next hop while the entries in Table 3 are marked with the next hop, ‘−‘, or next hop referenced with X. Entries with ‘−‘ will never be used, for example, node RO will never receive traffic from node BG destined to node BG itself. Entries marked with the next hop and referenced with X indicate a possibility of a loop, for example, if node RO received traffic from node BG with destination HU and sent it back again to node BG, a loop will be triggered as node BG is not converged yet and still believes that node RO is the next hop to destination HU.
108
M. Yousef and D.K. Hunter
Fig. 3. Example network illustrating the interface-specific forwarding table to be implemented with the EPD mechanism Table 2. Interface-independent forwarding table at node RO. Int denotes Interface.
Int
HU
Destination BG TR
BG → RO
BG
BG
TR
TR → RO
BG
BG
TR
Table 3. Interface-specific forwarding table at node RO when implementing the EPD algorithm. Int denotes Interface.
Int
Destination HU
BG
TR
BG → RO
BG (X)
−
TR
TR → RO
BG
BG
−
If any of the interfaces that are referenced with an ‘X’ for any of the destinations in equal to or greater than 1, that the interface-specific forwarding table had interface will be referenced with an LMI (instead of ‘X’) for those destinations. Packets that are now received by such an interface, and destined to the same interface, will be dropped. The difference between our approach and the approach introduced in [12] is that the forwarding table may contain loops but these loops are not harmful. Once the loop is predicted as being harmful, the interfaces which were referenced as X will be referenced as LMI. The establishment of this interface-specific FIB will only require the original SPT which was computed with Dijkastra’s algorithm.
5 Implementations and Results 5.1 TTL Relaxation Fig 4 shows the utilization of the link containing the loop for all six cases, for varying loop durations, and without consideration of EPD. In the first four cases (1-4),
Mitigation of Transient Loops in IGP Networks
109
utilization increases with the duration of the loop in a linear fashion until it reaches 100% when the loop duration is one second. In cases 1-4, the packets forming the total CWND of all TCP sources ( ) traverse the link four or more times, resulting in full link utilization. The benefit of TTL Relaxation is shown clearly in Case5 and Case6 where almost all packets were dropped (TTL expired) before full link utilization took place, because of the low of TCP packets in the loop. In both cases (Case 5 and Case 6) all the packets which are bottlenecked by the access router’s buffer don’t travel the link more than 3 times hence, full utilization does not occur.
Fig. 4. Utilization of the link containing the loop for all cases with different loop durations and different when entering the loop in each case
Fig. 5. Total TTL expirations during different loop durations
110
M. Yousef and D.K. Hunter
Fig 5 shows the total number of dropped packets (TTL expiration) in all six cases. was small, more packets were dropped. Because the The graph shows that when R in Case 1 was the largest, almost half the packets were not dropped, which made R the utilization reach 100%. We expect that when the link is fully utilized during a loop, any traffic passing through it will be delayed as the node’s buffer becomes congested. To prove this point, the VoIP traffic which passes through the loop (but is not looping) was analyzed during the loop in order to see whether such real time traffic is affected and delayed by the loop. The Mean Opinion Score (MOS) which determines the quality of the call at the destination and which is presented by a numeric value ranging from 1 to 5 was analysed for the VoIP traffic. Fig 6 shows the average MOS value (as generated by the simulator) of the VoIP traffic as received by the VoIP receivers in all six cases during the loop. The graph shows that for Case5 and Case6 the MOS value stayed almost constant during the loop at approximately 3.7 (high quality) as the link was not fully utilized. The MOS value for the other four cases decreased as the duration of the loop increased until it reached about 3.1 during which the quality of the VoIP call degrades. Although in this case the VoIP traffic was not very greatly affected by the ), there are cases during which the quality of such traffic loop (because of the low may become greatly compromised by the loop. In order to show a worst case was set to 64 scenario, the same simulation was repeated for Case1 except that and the loop duration was set to 1, 3 and 5 seconds respectively. Fig 7 shows the average MOS value of the VoIP traffic for this scenario. While the VoIP quality is still acceptable during the first second of the loop, it gets as low as 2 when the loop continues to exist. Such a low MOS value indicates that almost all the call receivers will be dissatisfied with the call quality, which degrades noticeably during the loop.
Fig. 6. The MOS value for the VoIP traffic that passes through the loop in all six cases for different loop durations
Mitigation of Transient Loops in IGP Networks
Fig. 7. VoIP MOS for Case1. 210 seconds.
111
is 64. The loop took place when the simulation time was
5.2 TTL Relaxation with Big Loops As the lifetime of the packet depends upon the number of nodes that it travels through, increasing the number of nodes participating in the loop should cause packets to be dropped sooner and hence reduce bandwidth consumption during the loop. Through changing the link metrics, a three-node loop was created between nodes TR, RO and BG and nodes HR, AT and SI with no traffic passing through the 11, the utilization loop. In the first case where packets entered the loop with of links TR→RO and RO→BG was almost 100% while it decreased on link BG→TR to be 75%. This is because both nodes TR and RO sent the traffic four times while node BG sent the traffic only three times. In the second case where traffic enters the 8, the utilization of links HR→AT and AT→SI was almost 65% loop with while it decreased on link SI→HR to 45%. The results show that TTL Relaxation reduces the bandwidth consumed in larger loops while limiting the number of links that are fully utilized by them. 5.3 TTL Relaxation and EPD After implementing the algorithm in OPNET according to the analysis and equations provided earlier, the algorithm performed as expected and mitigated the loop in Cases 1, 2, 3 and 4 but not Cases 5 and 6 where the loop wasn’t expected to fully utilize the link according to the calculations. Fig 8 shows the utilization in the link containing the loop for Case1 when the loop duration was 1 second. The utilization was expected to be 100% during the loop (Fig 4) but the graph shows that the packets that bounced back from node BG were dropped by the Master node RO which marked the interface connected to node BG as an LMI interface. When the loop took place, node RO sent the traffic only once and this is why the utilization at that time was approximately 35%; and is 10% was VoIP traffic and 25% was TCP traffic which is determined by limited by the access router buffer. The algorithm was extended further to drop a certain number of packets during the loop (instead of dropping all the packets) in order to minimize the packet loss, although this is not discussed here due to space limitations.
112
M. Yousef and D.K. Hunter
Fig. 8. Link utilization in Case1 when the loop duration is 1 second with EPD applied. The loop took place when the simulation time was 210 seconds.
5.4 Efficient Early Packet Dropping While the EPD algorithm mitigates the harmful loops by dropping all looped packets, it might be more efficient in some cases to drop a certain number of looped packets so that utilization during the loop doesn’t reach 100%, although some of the looped packets reach their destination. Such an increase in the efficiency of the algorithm will provide better quality for UDP Video-on-Demand and UDP video downloading. In these two cases dropping all the looped packets will increase the probability of the I-frames (compressed video frames that all the proceeding frames depend on) being dropped and hence, noticeable quality degradation will take place at the video receiver. Therefore, the equation in (7) can be altered so that: /
(9)
is the utilization during the loop. Because the Master node knows the original utilization of the link that contains the loop, the Master node can calculate the ) without causing the utilization to number of packets that can loop in the link ( reach 100% while dropping the rest of the packets. In this case only LT is considered (maximum Loop Traversals) because once some packets have been dropped, the queuing delay will decrease causing the packets to circulate up to their maximum limit ( /2). For example, in Case1 when the loop duration is 1 second and a harmful loop is expected, the number of packets that may loop without causing the link to be fully utilized is: 0.8
4080/5
653
Number of packets that need to be dropped (
) is: 1143
2.3
653
490
Mitigation of Transient Loops in IGP Networks
113
The Drop Ratio can be rounded r up to the next integer so that the Master node w will drop 1 packet out of each 3 packets. After running the Efficient EPD algorithm, the utilization of the link contaaining the loop for Cases 1, 2, 3 and 4 was approximattely 90% instead of 100% witho out EPD or 35% with EPD. Fig 9 shows the total num mber of packet drops during the loop in all four cases w with the implementation of eith her the EPD algorithm or Efficient EPD algorithm. T The graph shows that when pacckets enter the loop with large , the efficient algoritthm saves many packets from being dropped and allows them to be delivered to the , there are normally the same number of packet droops, destination. With small because the remaining pack kets that were not dropped by the efficient algorithm w will circulate faster because off the decreased queuing delay, and will be dropped oonce their TTLs expire.
Fig. 9. Total number of paccket drops for loop duration of 1 second when applying bboth algorithms
6 Conclusion This paper introduced an easily implementable loop mitigation mechanism whhich when mitigates routing loops thatt take place between two nodes in IGP networks only w the loop is predicted to utillize the link fully, which would congest router buffers and delay any traffic sharing lin nks on the loop. The overall mechanism was divided iinto two sub-solutions; the first solution was based upon TTL Relaxation where the TTL L of the packets entering the nettwork is relaxed so that the packets caught in the loops are dropped sooner than beforee. The second part of the mechanism is the Early Paccket Dropping mechanism whicch takes place only in cases where the loop is expectedd to utilize the link fully. heme reduces bandwidth consumption in loops of any ssize, The TTL Relaxation sch and minimizes the delay experienced by traffic passing through them, especially w when gress edge router. The VoIP traffic passing through looops the loop is close to the eg
114
M. Yousef and D.K. Hunter
closer to the source was slightly degraded (delay up to 130 ms) because of the small network diameter which implies that is small and hence packets are dropped can be large which in turn causes the sooner. In networks with large diameters, packets to circulate longer, potentially delaying VoIP traffic by more than 150 ms. While the EPD mitigation mechanism mitigates the loop by dropping all the looped packets, the Efficient EPD mechanism showed better performance by dropping a certain number of packets, thus ensuring that the link won’t be fully utilized while delivering the rest of the packets to their final destination.
References 1. Hengartner, U., Moon, S., Mortier, R., Diot, C.: Detection and analysis of routing loops in packet traces. In: Imw 2002: Proceedings of the Second Internet Measurement Workshop, pp. 107–112 326 (November 2002) 2. Seaman, M.: Exact hop count (December 2006), [Online] http://www.ieee802.org/1/files/public/docs2006/ aq-seaman-exact-hop-count-1206-01.pdf 3. Francois, P., Filsfils, C., Evans, J., Bonaventure, O.: Achieving sub-second IGP convergence in large IP networks. Computer Communication Review 35, 35–44 (2005) 4. Iannaccone, G., Chuah, C.N., Bhattacharyya, S., Diot, C.: Feasibility of IP restoration in a tier 1 backbone. Ieee Network 18, 13–19 (2004) 5. Murthy, S., Garcia-Lunes-Aceves, J.J.: A Loop Free Algorithm Based On Predecessor Information. In: Proceedings of IEEE IC- CCN (October 1994) 6. Garcia-Luna-Aceves, J.J., Murthy, S.: A path-finding algorithm for loop-free routing. IEEE/ACM Transactions on Networking 5, 148–160 (1997) 7. Bohdanowicz, F., Dickel, H., Steigner, C.: Detection of routing loops. In: Proceedings of the 23rd International Conference on Information Networking. IEEE Press, Chiang Mai (2009) 8. Francois, P., Bonaventure, O.: Avoiding transient loops during the convergence of linkstate routing protocols. IEEE/ACM Transactions on Networking 15, 1280–1292 (2007) 9. Francois, P., Bonaventure, O., Shand, M., Bryant, S., Previdi, S., Filsfils, C.: Internet Draft: Loop-free convergence using oFIB (March 2010), [Online] http://tools.ietf.org/html/draft-ietf-rtgwg-ordered-fib-03 10. Fu, J., Sjödin, P., Karlsson, G.: Loop-free updates of forwarding tables. IEEE Transactions on Network and Service Management 5(1) (March 2008) 11. Gjoka, M., Ram, V., Yang, X.W.: Evaluation of IP fast reroute proposals. In: 2007 2nd International Conference on Communication Systems Software & Middleware, vols. 1 and 2, pp. 686–693 994 (2007) 12. Nelakuditi, S., Zhong, Z.F., Wang, J.L., Keralapura, R., Chuah, C.N.: Mitigating transient loops through interface-specific forwarding. Computer Networks 52, 593–609 (2008) 13. Lei, S., Jing, F., Xiaoming, F.: Loop-Free Forwarding Table Updates with Minimal Link Overflow. In: ICC 2009: IEEE International Conference on Communications, pp. 1–6 (2009) 14. Ito, H., Iwama, K., Okabe, Y., Yoshihiro, T.: Avoiding routing loops on the Internet. Theory of Computing Systems 36, 597–609 (2003)
Mitigation of Transient Loops in IGP Networks
115
15. Francois, P., Shand, M., Bonaventure, O.: Disruption free topology reconfiguration in OSPF networks. In: Infocom 2007, vol. 1-5, pp. 89–97 (2007) 16. Shand, M., Bryant, S.: RFC 5715: A Framework for Loop-Free Convergence (January 2010), [Online] http://www.ietf.org/rfc/rfc5715.txt 17. Moy, J.: RFC 2328: OSPF version 2. (April 1998), [Online] http://www.ietf.org/rfc/rfc2328.txt
An Efficient Algorithm to Create a Loop Free Backup Routing Table Radwan S. Abujassar and Mohammed Ghanbari School of Computer Science and Electronic Engineering,University of Essex, Wivenhoe Park, Colchester, Essex CO4 3SQ, UK {rabuja,ghan}@essex.ac.uk
Abstract. Traditional routing protocols schemes allow traffic to pass via the single shortest path. Pre-computing an alternative aims to route the traffic through it when the primary path goes down. A new algorithm schema has been proposed in this paper to find an alternative and disjoint path with primary one by creating a new backup routing table based on adjacent nodes for each node connected with the primary one. The results showing that loss packets, reroute and end to end delay times have been improved between source and destination with avoid loops on the network. The result obtained showed that new schema does not degrade from the network performance by sending an additional messages to create a new backup routing table. The Simulation results (using NS2 simulator) show a comparison between existing link state protocol and the proposed algorithm. Keywords: component, Link state, ART (Alternative Routing Table), Dijkstra algorithm.
1 Introduction Network communication continues to increase and thus the system is required to tolerate large volumes of traffic with respect to huge capacity of links. Network communication is affected by frequent failures and this leads to find an efficient recovery mechanism. In current networks, failure occur frequently, which will affect the stability of the network. When there is a link or node failure, the node that is connected to that failure needs to re-compute their routing table and propagate the up- dates for all nodes concerned with this failure. Recovery mechanism has a motivation that concerns two cases. First, the time required to detect failure. Second, the time to compute a new shortest path takes roughly 70ms. However, the slow convergence time of the routing protocol for the network when failure occurs has inducted to find an optimal solution to carry all traffic to route from an alternative path until the routing protocol update the routing table and re-compute a new shortest path. Alternative Rout- ing Table (ART) algorithm aims to recover the network from failure during short period of time. Precisely, when link or node goes down, it aims to reduce delays and improve throughput in the network. Hence, the pre-computed alternative path can be used when link or node failure on the primary path without waiting for the routing protocol to re-compute a new shortest path[11]. The Open Shortest Path First (OSPF) routing pro-tocol is used as a dynamic link state protocol for TCP/IP or UDP traffic and is designed to update the S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 116–126, 2011. c Springer-Verlag Berlin Heidelberg 2011
An Efficient Algorithm to Create a Loop Free Backup Routing Table
117
information for topology by sending a Link State Advertisement (LSA) based on the presence of a failure. The convergence time of the recovery mechanism is still too large for the real application. The convergence time can be of the order of 100’s of millisecond or even 10’s of seconds in the BGP networks. In table 1 indicates to the default and minimum times for the routing protocol to re-compute a new shortest path. Hence, during the process, while the routing protocol is converging micro-loopsmay be created. The affection of this can lead to increase loss of packets and end to end delay in almost applications such as video or VOIP traffic; because the source will keep sending packets until it receives a notification that a failure has occurred. In this paper, we concentrate on the original routing table, which is computed by the Link State Pro- tocol. We propose a new algorithm to create a new backup routing table with excluding all primary paths in the original routing table. The main contribution of this paper is an alternative and full disjoint path computed by using the original routing table, which guarantees that the backup path does not join with any node or link on the primary path between source and destination to avoid loop in the network. The rest of the paper is organised as follows. In Section 2 we will discuss the related work. In Section 3 illustrates the basic concept for our algorithm. In Section 4 explains the mechanism of ART algorithm. In Section 5 explains the result and in Section 6 concludes how we can improve our algorithm and future work.
2 Related Work An efficient routing protocol algorithm has been built for achieving robustness and fast convergence within a short time in case of failure. In [5] indicates to classifications of failure. The result shows that 80% of all failures are unplanned. According to that, the 70% of failure effect a single link failure at a time, and the remained effect a shared link risk groups. The protection schema is a proactive mechanism, which calculates backup routes in advance while the restoration schema is reactive by calculating the backup routes when failure has been detected [1,9]. The restoration schema considers more flexibility with regard to the location of failures. The disjoint path between source and destination considers the best solution to recover the network from failure, which is guaranteed to pass the traffic through it to the destination with reduced loss of packets [14]. In[4,3] the author discusses the cost of the links in the network, which is considered to be an important parameter in determining the best path through the routing protocol algorithm. The minimum path cost will be determined by comparing it with other candidate paths. There are two kinds of the Dijkstra algorithms. Firstly, there is a Dijkstra algorithm to compute the best path by removing the links with bandwidths less than the threshold. Secondly, there is an on demand Dijkstra algorithm, which generates the shortest path tree to the pre-computation node[2]. The node will be added to the tree depending on the bandwidth request [15]. In [10,13] the author proposed a new mechanism, termed Failure Insensitive Technique. FIR uses the specific forwarding interface to provide a backup next hop with loop free. The FIR mechanism makes the node, which is connected to the failure to add a new header by re-encapsulating the packets and then re-sending them to the adjacent nodes to inform them about the fault through the interface packets that arrive. Hence, based on the interface packets when
118
R.S. Abujassar and M. Ghanbari Table 1. Components of the Failure Restoration Time [6]
Timer
Default Value Notification timer 2s Link state Packet (LSP) generation timer 50ms Shortest path computation timer 5.5s Processing phase Typical values LSP processing 10ms/hop
Minimum Value 10ms 1ms 1ms Processing phase Typical values
SPF computation
100 - 400 ms
Forwarding information update
20 entries / ms
failure occurred, the adjacent node will reroute the affected packets and the other nodes will not know about the failure by sending packets according to pre-computed routing tables. FIR has many drawbacks as follows: the encapsulation of packets it is not desirable because that will reduce the throughput and make the end-to-end delay longer. In addition, FIR cannot provide protection against node failure. The Internet Protocol Fast Re-Route (IPFRR) is an applicable technique. It includes the LFA, U-turn and not-via address [8,12]. The drawback to the IPFRR technique is that loop free is not guaranteed because the packet can be returned to the source with regard to a specific forwarding pre-computed routing table for each node on the network. In addition, not-via address needs to encapsulate / de-capsulate packet, which affects network performance[7].
3 Algorithm Overview The basic idea of ART algorithm by considering a link or node failure on the primary path. In this section,in fig.2 we show the flowchart of ART algorithm used to find the backup path between source and destination. There are four steps for ART algorithm to create a new backup routing table as below: – First: All nodes on the topology act as source and destination. – Second: All adjacent nodes will start check if they have a disjoint path to the destination not connected with a primary path. – Third: the adjacent node will send an acknowledgement to the source if it has a disjoint path or not. – Fourth: If the neighbour of the adjacent node has a disjoint path then the source node will add the adjacent node as a first backup hop and her neighbours as the second hop in the new routing table. The OSPF protocol computes the shortest path depending on the cost of the link. The ART algorithm computes a backup routing table based on the original routing table, which is computed by the OSPF protocol to create a new backup one. However, each node on the topology sends a packet to all adjacent nodes not connected to the primary
An Efficient Algorithm to Create a Loop Free Backup Routing Table
119
path between the source and destination in the network. This adjacent node will check from the routing table if it has a disjoint path with the primary one to the destination. When the adjacent node has checked the routing table to see if there is any node or link joined to any corresponding ones on the primary path, then it will send an acknowledgement message that informs the source node ”I cannot be a backup node in case of failure”. Hereafter, when the source node receives a negative acknowledgement, it will then check the other ones it has received from other adjacent nodes. On the other hand, if the source node receives ”Yes” then it will add this node to the new routing table as a first next hop in case of failure.
Fig. 1. The primary path and backup path
However, if source node receives an answer from all adjacent nodes that there is no disjoint path to the destination then the source node will send a packet to each one not on the primary path to check whether their neighbours have a disjoint route to the destination or not. Hence, the adjacent node will receive an acknowledgement from its neighbours that inform it if there is a disjoint path to the destination. In this case, if any acknowledgement has a positive answer the adjacent node will send an acknowledgement to the source, ”Yes, I have a disjoint path via me and my neighbours”. The source node will then add the adjacent node as a first hop and her neighbours as second one in the new backup routing table. Our algorithm will repeat these steps until the backup routing table completed. For example in fig.1 illustrates how ART algorithm works to create a new backup routing table. The source node will send packets to the adjacent nodes {2 and 3}excluding node 1 because it is the first hop on the primary path. Node 2 will check if there is any disjoint path to the destination node but not via {S, 1, 6, 9} from the routing table. Node 2 will send an acknowledgement to inform the source if it can act as a backup or not. Node 2 cannot be a backup because there is a common node in its path to the destination, which is node 6. Hereafter, the source node will check the other answers from other adjacent nodes. Node 3 will send acknowledgement to the source
120
R.S. Abujassar and M. Ghanbari
Fig. 2. Algorithm Flowchart
that, ”I have a disjoint path to the destination without any nodes/links joined to the primary path”. When the source node receives this answer, node 3 will add itself into the backup routing table as a first next hop in case of failure. On the other hand, if node {3 and 2} cannot be a backup adjacent node and they do not have a disjoint path with the primary path to the destination, then the source node will send a packet for node {3 and 2}to check if they have an adjacent node that has a disjoint path to the destination. Hence, node 2 will send a packet to node 7 to ask if there is a disjoint path with the primary path to the destination. Node 7 will check the routing table if there is any path available. As we can see in fig. 1 node 7 has a different route to the destination, then node 2 will send an acknowledgement that it can be a backup adjacent node via node 7 to the destination. The source node will add node 2 in the backup routing table as a first and node 7 as a second hop.
4 Our Proposition The originality of ART algorithm is how it can find a backup path from the original routing table. The ART will select the backup adjacent node with respect to the original routing table. This will give each node a high possibility to anticipate the second shortest path. However, the second shortest path will actively re-route the packets in case of failure. In addition, when any node on the primary path goes down all links connected
An Efficient Algorithm to Create a Loop Free Backup Routing Table
121
Algorithm 1 . AlternativeP ath returns a set of alternative paths for every possible path in the routing table. G(V, E) is an oriented graph with two sets, a set of vertices V and a set of edges E, where an edge e = (v, u), e ∈ E, v, u ∈ V is a connection from vertex v to vertex u. A path P is a set if edges e1 , e2 , ..., en , such that if v, u, x ∈ V , then ei = (v, x), ei+1 = (x, u), for all 1 ≤ i ≤ n − 1. procedure AlternativeP ath(Tr ) Tr : The routing table V : The vertex set in graph G(V, E) Γ (v): The set of adjacent vertices to a vertex v Pr (Tr , s, d): The path connecting the vertex s to d as in Tr Pa (s, d): An alternative path such that Pr (Tr , s, d) ∩ Pa (s, d) = ∅ SP : The set of all generated alternative paths qsub : A path Q: A queue of couple (path, vertex) Enqueue: Insert an element in a queue Dequeue: Removes an element from a queue F ront: The element at the front of a queue SP ← ∅ for all s ∈ V do for all d ∈ V do if s = d then qsub ← ∅ Q←∅ Enqueue(Q, (qsub , s)) while Q = ∅ and pa (s, d) = ∅ do (qsub , x) ← F ront(Q) for all k ∈ Γ (x) do e ← (x, k) if (qsub ∪ e) ∩ Pr (Tr , s, d) = ∅ then if Pr (Tr , k, d) ∩ Pr (Tr , s, d) = ∅ then pa (s, d) ← qsub ∪ e ∪ Pr (Tr , k, d) SP ← SP ∪ pa (s, d) break else Enqueue(Q, (qsub ∪ e, k)) end if end if end for Dequeue(Q) end while end if end for end for return SP end procedure
122
R.S. Abujassar and M. Ghanbari
to that node will also loss the connectivity. This will force the routing protocol to take more time to re-converge the network and to re-compute a new routing table. ART algorithm is shown in Algorithm 1. The ART algorithm is a pre-active mechanism (PAM) because it has computed a new routing table including a backup path for all nodes on the topology in advance. However, we assume the source node has at least one adjacent node not connected to the primary path. When the protocol starts to converge along the network, then our technique will begin to operate after the routing protocol builds the routing table for each node on the topology. Depending on the routing table, each node on the topology will send a special packet to enquire from the adjacent node if there is any alternative route to the destination. Hence, ART algorithm will exclude the first next hop for each node with regard to the routing table. Hereafter, when all nodes receive the packets, they will start to check if there is an alternative route to each destination. They will reply by an acknowledgement, which will contain the answer ”Yes, I have a different route and I can be a backup node in case failure” or ”No, I cannot be a backup adjacent to the destination”. When the source node checks the answer and is ”Yes” then it will add this node to the new routing table as a backup. If the answer is ”No” the source will check the answers from other adjacent nodes. Hereafter, if the answer from all adjacent nodes are ”No” then our algorithm will send another packet to ask the adjacent node if it has a node unconnected to the primary path with a disjoint route to the destination (but not via the primary path). If the answer is ”Yes” then the adjacent node will send an acknowledgement that contains ”Yes, I can be a backup node via my adjacent node”. When the source node receives a ”Yes” then it will add the adjacent node as a first hop backup and the node that is adjacent to the first hop will add it as a second next hop.
5 Results We have used the NS2 simulator to evaluate the ART algorithm. In this experiment, we ran the simulator 50 times with a configured failure randomly between source and destination. Failure can be made to occur instantaneously and at any time. In addition, we configured the source and destination randomly. The duration time for each simulation was 50.0s. We configured the LS protocol to work on this topology. The CBR traffic for all source nodes start sending from 0.5 to 50.0s. During that period, we caused many failures in different links by creating variables [Random /Uniform]. This caused failure to occur arbitrarily. Hence, the links went down haphazardly. Each failure selected the best route, which was computed by the routing algorithm (Dijkstra’s algorithm). We configured the simulation with respect to these values. In figure 3 (a) shows the average loss packets versus the number of hops to the destination. The traffic will be re-routed along an alternative path, which is computed by our algorithm until the LS protocol updates the information in the routing table and computes the new shortest path. On the other hand, fig.3 (b) shows throughput in the network when the link state in our algorithm works and improves it. When the failure occurs far from the source, the throughput is reduced because the source node will keep sending traffic until it receives a notification to inform it about any failures. When failure occurs, the source node will
An Efficient Algorithm to Create a Loop Free Backup Routing Table
123
(b) Throughput
(a) Loss Of Packets
Fig. 3. Loss Pcakets and Throughput
Fig. 4. Load
wait to receive a notification from other nodes about it and then the routing protocol will start to re-compute and update the routing table. In this case, fig.6 (b) shows the rerouting time for the link state with the ART algorithm reduced because the source node will reroute the traffic from the backup path when it receives this notification without waiting for the routing protocol to re-compute and update the routing table. In addition, considerable traffic in the network will make the notification time longer to be received by the source node with regard to the congestion, queue and link delay in the network. In figure 4 shows the load for the link state protocol with the ART algorithm and link state. The load of the link state with the ART algorithm is higher than load by the link
124
R.S. Abujassar and M. Ghanbari
Fig. 5. The Utilisation of link Exceed in the network topology
(a) End to End Delay
(b) Reroute Time in Sec
Fig. 6. End to End delay and Reroute Time in Sec
state, because our mechanism make all nodes on the topology send a small packets to ask all adjacent nodes if there is any route to the destination disjoint with the primary path. Hence, the comparison shows that the ART algorithm does not make a high load that can be degrade from the network performance. Additionally, the load increases when a failure occurs far from the source node by considering a LSA messages that is sent by the link state protocol to update the routing table. On the other hand, in figure 5 shows the utilization of the links on the network. In link state protocols its utilization can exceed the limit size of the links because when a failure occurs a loop occur, which leads
An Efficient Algorithm to Create a Loop Free Backup Routing Table
125
to increase loss of packets and degrades the network performance. On the other hand, the link state with an ART algorithm shows a high utilization and avoids exceeding the limit size of the link because the ART algorithm offers a backup path for re-routing packets in case of failure. This will avoid loop in the network. Fig.6 (a)shows the end-toend delay between source and destination. The delay by our algorithm, in some cases, is better than the link state protocol because our algorithm can select an alternative path, which is both the shortest and the same path for the new routing table. On the other hand, our algorithm, in a different case, will choose an alternative path but not necessarily the shortest one.
6 Conclusions This paper presented a new algorithm for computing an alternative path for each source on the network. The ART algorithm invests in the routing table, which is computed by a link state to find an alternative disjoint path to the destination by creating a new backup routing table. Although the ART algorithm gives a recovery path that can be the shortest one, we have shown that backup paths contain a number of hops to the destination. For real traffic, the result shows that our algorithm reduces the loss of packets and delay between source and destination node. Additionally, we have proved that the ART algorithm can avoid loop in the network by keep the utilization stable compared with the link state protocol. The ART algorithm has its own messages that are sent between nodes to create the new shortest path and we have shown these packets do not affect the performance of the network. In the future work, we will make the ART algorithm re-route the traffic from the node that is connected directly with link failure.
References 1. Banner, R., Orda, A.: Multipath routing algorithms for congestion minimization. In: Boutaba, R., Almeroth, K.C., Puigjaner, R., Shen, S., Black, J.P. (eds.) NETWORKING 2005. LNCS, vol. 3462, pp. 536–548. Springer, Heidelberg (2005) 2. Basu, A., Riecke, J.G.: Abstract stability issues in ospf routing 35, 225–236 (2001) 3. Ericsson, M., Resende, M., Pardalos, P.M.: A genetic algorithm for the weight setting problem in ospf routing. Journal of Combinatorial Optimization 6, 299–333 (2002) 4. Narvaez, P., et al.: New dynamic spt algorithm based on a ball-and-string model (6), 706–718 (2001) 5. Francois, P., Filsfils, C., Evans, J., Bonaventure, O.: Achieving subsecond igp convergence in large ip networks 35(3), 35–44 (2005) 6. Iannaccone, G., Nee Chuah, C., Bhattacharyya, S., Diot, C.: Feasibility of ip restoration in a tier-1 backbone. IEEE Network 18 (2004) 7. Kin-Hon, H., Wang, N., Pavlou, G., Botsiaris, C.: Optimizing post-failure network performance for ip fast reroute using tunnels. In: QShine 2008: Proceedings of the 5th International ICST Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), pp. 1–7. ICST, Brussels, Belgium (2008) 8. Menth, M., Hartmann, M.: Loop-free alternates and not-via addresses: A proper combination for ip fast reroute. Comput. Netw. 54, 1300–1315 (2010)
126
R.S. Abujassar and M. Ghanbari
9. Narvaez, P., Siu, K.-Y., Tzeng, H.-Y.: Efficient algorithms for multi-path link-state routing (1999) 10. Nelakuditi, S., Lee, S., Yu, Y., Zhang, Z.-L.: Failure insensitive routing for ensuring service availability. In: Jeffay, K., Stoica, I., Wehrle, K. (eds.) IWQoS 2003. LNCS, vol. 2707, pp. 287–304. Springer, Heidelberg (2003) 11. Pascal, M., Jean-Jacques, P., Stephane, C.: Path computation for incoming interface multipath routing. In: European Conference on Universal Multiservice Networks, vol. 0, pp. 75–85 (2007) 12. Merindol, P., Pansiot, J.-J.: Improving load balancing with multipath routing (2008) 13. Rabbat, R., Siu, K.-Y.: Restoration methods for traffic engineered networks for loop-free routing guarantees (2001) 14. Schollmeier, G., Charzinski, J., Kirstdter, A.: Improving the resilience in ip networks, pp. 91–96 (2003) 15. Shand, M., Brayant, S.: A memetic algorithm for ospf routing. In: Proc. Informs Telecom, pp. 187–88 (2002)
A Low Cost Moving Object Detection Method Using Boundary Tracking Soharab Hossain Shaikh and Nabendu Chaki A.K. Choudhury School of Information Technology, University of Calcutta, India
[email protected],
[email protected] Abstract. Moving object detection techniques have been studied extensively for such purposes as video content analysis as well as for remote surveillance. Video surveillance systems rely on the ability to detect moving objects in the video stream which is a relevant information extraction step in a wide range of computer vision applications. There are many ways to track the moving object. Most of them use the frame differences to analyze the moving object and obtain object boundary. This may be quite resource hungry in the sense that such approaches require a large space and a lot of time for processing. This paper proposes a new method for moving object detection from video sequences by performing frame-boundary tracking and active-window processing leading to improved performance with respect to computation time and amount of memory requirements. A stationary camera with static background is assumed. Keywords: Moving Object Detection, Frame Differencing, Boundary Tracking, Bounding Box, Active Window.
1 Introduction Moving object detection from video sequence is an important field of research as security has become a prime concern for every organization and individual nowadays. Surveillance systems have long been in use to monitor security sensitive areas such as banks, department stores, highways, public places and borders. Also in commercial sectors, surveillance systems are used to ensure the safety and security of the employees, visitors, premises and assets. Most of such systems use different techniques for moving object detection. Detecting and tracking moving objects have widely been used as low-level tasks of computer vision applications, such as video surveillance, robotics, authentication systems, user interfaces by gestures and image compression. Software implementation of low-level tasks is especially important because it influences the performance of all higher levels of various applications. Motion detection is a well-studied problem in computer vision. There are two types of approaches: the region-based approach and the boundary-based approach. In the case of motion detection without using any models, the most popular region-based approaches are background subtraction and optical flow. Background subtraction detects moving objects by subtracting estimated background models from images. This method is sensitive to illumination changes S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 127–136, 2011. © Springer-Verlag Berlin Heidelberg 2011
128
S.H. Shaikh and N. Chaki
and small movement in the background, e.g. leaves of trees. Many techniques have been proposed to overcome this problem. The mixture of Gaussians is a popular and promising technique to estimate illumination changes and small movement in the background. Many algorithms have been proposed for object detection in video surveillance applications. They rely on different assumptions e.g., statistical models of the background, minimization of Gaussian differences, minimum and maximum values, adaptability or a combination of frame differences and statistical background models [1]. Local neighborhood similarity based approach [2] and contour based technique [3] for moving object detection are also found in the literature. However, a common problem of background subtraction is that it requires a long time for estimating the background models. Optical flow also has a problem caused by illumination changes since its approximate constraint equation basically ignores temporal illumination changes. This paper presents a novel approach for detecting moving objects from video sequence. It is better than conventional background subtraction approach as it is computationally efficient as well as requires less memory for its operation.
Frame Extraction
Boundary Tracking
Frame Differencing (Around Boundary Only)
Repeat Defining Bounding Box and Active-Window (AW)
Detection of Moving Object
Track the Object (Processing Local to AW) Fig. 1. Major Functional Steps
2 Methodology The proposed method is a low complexity solution to moving object detection. It offers an improvisation and enhancement of the basic background subtraction method [1, 4]. The authors assume a stationary video camera. As there is no movement of the camera so the background it captures is fixed with respect to the camera frame. The proposed method is a two part solution for detecting moving objects in the assumed scenario. It detects those moving objects that appear in the frame by crossing the frame boundary from any possible direction. This would be the scenario in many real life situations for most of the surveillance cameras. However, there are possibilities that a moving object pops in the middle of a frame without crossing the camera frame boundary when appears for the first in the video. The proposed method has to be customized to handle this issue. The major steps are as follows: Frame
A Low Cost Moving Object Detection Method Using Boundary Tracking
129
Extraction, Boundary Tracking, Frame Differencing (Around Boundary Only), Detection of Moving Object, Defining Bounding Box and Active-Window (AW), Track the Object (Processing Local to AW). Finally all the steps are repeated around the frame boundary and inside the active-window if any moving object is detected inside the frame. A block diagram of the major functional steps has been given in Fig.1. All the steps are detailed out in the following subsections. 2.1 Frame Extraction This is the first step in which a frame is extracted from a video sequence. It is assumed that the camera is stationery and there is no change in the background. Objects enter the frame by crossing the frame boundary. Conventional method of background subtraction needs complete information of the entire frame for applying a frame differencing to detect the position of the moving object. But in the proposed method only the information of the ideal background frame is needed. Subsequent processing is done around the frame boundary or in the active window itself as explained in the following sections. 2.2 Boundary Tracking The frame is divided into small grids as depicted in Fig 2. This figure also highlights an area around the frame boundary (the four possible directions). This is the portion of the frame that should be tracked for detecting the initial movements of a moving object entering the camera frame crossing the frame boundary. The whole frame need not be considered for the frame differencing. This significantly reduces the computation time. Moreover, as frame differencing is performed on a sub-image of the actual frame, this leads to less memory requirement for storing the sub-images.
Fig. 2. Boundary Tracking
2.3 Frame Differencing Frame differencing is applied for detecting the existence and position of a moving object. An ideal background image (ground image) is considered. The shaded portion in Fig 2 shows the boundary of the frame. Each extracted sub-image (the boundary portion of an extracted frame) is subtracted from the respective portion of the ground image to determine the existence of an object near the boundary of the camera frame.
130
S.H. Shaikh and N. Chaki
2.4 Detection of Moving Object The initial position of any moving object can be detected from as explained in section 2.3. When it is found that an object is entering the camera frame by crossing the frame boundary, the object can be tracked to get its subsequent position inside the frame. For this purpose the sub-image of the previous frame is subtracted from that of the current frame. The difference indicates the changes in the image-pixels intensities where there is motion. This processing is performed on sub-images of the frame obtained by defining an active-window around the object. This has been explained in the next section. 2.5 Defining Bounding Box and Active Window When a moving object totally enters the camera frame a bounding box is defined around it. The bounding box is the rectangular window that most tightly defines the object. The concept is depicted in Fig 3. Active-window is another rectangular window around the bounding box. The size of this window is taken slightly larger than that of a bounding box because all the processing needs to be concentrated within this window only. Any subsequent movement of the object is possible in the lightly-shaded portion of the active-window. So if the frame differencing is performed only inside the active-window, the motion of the object can be successfully tracked in all subsequent video frames. The size of this shaded portion may be of 10 or 15 pixels width.
Fig. 3. Bounding Box, Active Window, Object and all together inside the frame
2.6 Track the Object For tracking the object all around the frame the frame differencing has to be applied over successive frames of the video. All the processing should be confined to active window itself.
3 The Proposed Methodology for Moving Object Detection The proposed algorithm for moving object detection involves several independent but associated sub-tasks. In the following few sections, the components of the proposed algorithm have been explained before the same has been described in section 3.4.
A Low Cost Moving Object Detection Method Using Boundary Tracking
131
3.1 Modeling the Background A common approach to moving object detection is to perform background subtraction, which identifies moving objects from the portion of a video frame that differs significantly from a background model. Modeling the background is an important task. It must be robust against changes in illumination. It should avoid detecting non-stationary changes in the background such as moving leaves, rain, snow, and shadows cast by moving objects or noise incorporated by the camera or the environment etc. It should react quickly to changes in background such as starting and stopping of vehicles [5, 6]. Color is an important feature of objects. Although the background subtraction can be performed on the binary frames for better performance, but the processing can also be performed on colored frames without binarizing the frame images. In a color image, each pixel can be represented by a 3x1 vector, representing the pixel intensities of three basic color components (Red, Green and Blue). The actual intensity of a color pixel can be calculated as follows: I(x,y)=0.299*R(x,y)+0.587*G(x,y)+0.114*B(x,y). The rest of the processing can be done on this calculated pixel value. The most important aspect of the proposed method is that all the processing need not be done on the whole image frame. It is required to be performed only at the boundary region (shaded portion of the frame in Fig 2). Once an object is detected in the boundary and the bounding-box and active-widow have been defined for the moving object, the processing can be confined inside the active-window only. 3.2 Handling Objects Inside the Frame without Crossing the Frame Boundary The proposed method performs boundary tracking of the frame. So, any moving object that appears inside the frame without crossing the frame boundary (e.g. any object pops inside the frame from behind a wall, or coming out of a building etc.) should be taken into consideration. The proposed method does not directly deal with objects appearing in the middle of the frame without crossing the frame boundary. However, this may be handled in a different way. Let us take an arbitrary situation where the camera is fixed for surveillance of a campus. The method assumes fixed camera and static background. Fig 4 (a) shows the campus of an organization having two buildings. The camera frame contains two buildings and the premises around the buildings. The possible sources of moving objects and the portions of the frame that need to be tracked for detecting these objects are as follows: i) Objects can pop up from inside the two buildings: The entrance/gate/doors of the buildings are to be tracked for detecting these objects. ii) Objects can appear inside the frame from behind the buildings: The entire building contours are to be tracked for detecting these objects. The system installed for the surveillance of the campus can be customized to deal with these possibilities. So, in addition to tracking the boundary of the frame as addressed in the proposed method, the building contours and the building entrance can be tracked for detecting the presence of the moving objects appearing in the frame without crossing the frame boundary. The portion around the entrance of the building is shown using dotted lines. This solution is better than considering the whole
132
S.H. Shaikh and N. Chaki
(a)
(b)
(c)
(d) Fig. 4. Boundary Tracking of a campus of an organization
image-frame pixel-by-pixel for frame differencing. The dotted rectangle around the person is the active-window that has to be tracked for detecting the motion/position of the person. Fig 4(b) shows a situation where a person comes out of Building 1 and goes
A Low Cost Moving Object Detection Method Using Boundary Tracking
133
to Building 2. The surveillance system can be customized to monitor this kind of movement of objects. As the person crosses the boundary of the dotted rectangle across the entrance of Building 1, the system starts tracking that person. It finds out the activewindow around the person as shown by the dotted rectangle. As the person approaches Building 2, the system continuously tracks him till he disappears inside the building. In case of a stationery object inside the frame, it is only required to identify the potential areas where a moving object can appear. Fig 4 (c) shows the camera-frame is subdivided into small equal-sized grids. Fig 4 (d) shows that the grids have to be processed for tracking the boundary of the building and the entrance. 3.3 Defining the Bounding Box and the Active Window It is important to define the bounding-box and the active-window properly. As any object enters the camera frame, the frame subtraction can be continued for a few frames on the whole image frame until the moving object entirely enters the camera frame. Once the moving object is totally inside the frame, the bounding box is defined that most tightly encloses the object. The active-window can be calculated accordingly. Once this has been done, all subsequent frame differencing can be applied only inside the active-window. The position of the active window changes over the camera frame as the object moves in different directions inside the frame. The position of the active-window needs to be updated in subsequent frames. This is done by finding the centroid of each bounding-box and then by measuring the difference of two consecutive centroids. If this is less than a predefined threshold, no change in the position of the active-window is made in the subsequent frame. Alternatively, the boundary of the active-window can be updated by horizontal and vertical separation of two centroids from two consecutive frames. Another alternative is to fix the size of the active window for all frames once it has been computed. This is suitable for objects moving slowly and predictably in the frame as shown in Fig 5 (b). If there is a sudden change in the speed of the object then an active-window defined very close to the bounding-box will fail to track the moving object correctly. 3.4 Algorithm for Moving Object Detection The proposed algorithms for moving object detection is outlined as follows: Begin Step 1: Boundary Tracking 1.1. Apply frame differencing and subtract the background frame from each incoming frame. 1.2. If there is no change continue step 1.1. Step 2: Apply frame differencing (subtract (i-1)-th frame from i-th frame) until the object entirely enters the camera frame. Step 3. Define the initial bounding-box around the moving object and estimate the size of the initial active-window. Step 4. Active-window processing 4.1. Apply frame differencing on active-window area only in successive frames. 4.2. Find out the bounding-box around the moving object. 4.3. Calculate and update the boundary of the active window. 4.4. Continue steps 3.1 through 3.3 until the object moves out of the frame. End.
134
S.H. Shaikh and N. Chaki
4 Experimental Results The authors have used some samples recorded at 30 fps as 480x640 resolution color video for experiments. For processing, the video is converted to grayscale and ultimately the binary frames have been considered for background subtraction purpose. Fig 5 (a) shows some frames of one of the test videos. Fig. 5 (b) shows some frames of the same video after applying frame differencing. The bounding-box has been outlined using a green rectangle around the moving object and the active-window has been depicted as a red rectangle around the bounding-box. The centroid of each Active-window has been marked as a red dot inside the green rectangle.
(a)
(b) Fig. 5. (a) A few frames from Original Video (b) Applying Frame Differencing using Bounding-box and Active-window
A Low Cost Moving Object Detection Method Using Boundary Tracking
135
Table 1. Some Results using the proposed method
Frame No.
AW Height (H)
AW Width (W)
13 14 15 16 17 18 19
316 339 341 341 339 339 336
145 160 144 175 187 170 160
Total no. of Pixels Saving used for background (Pixels) Subtraction using AW (480x640) (HxW) -(HxW) 45820 261380 54240 252960 49104 258096 59675 247525 63393 243807 57630 249570 53760 253440
Percentage of Saving 85% 82.34% 84% 80.57% 79.36% 81.24% 82.5%
Overall 82.59% performance benefit has been achieved for the test video shown in Fig 5. It has been tested with other sample videos. For most of the cases more than 80% saving has been achieved. However, the performance improvement highly depends on the type of video used for testing. In all the samples the moving objects occupy very small proportion of the overall frame. This is true for surveillance video where the size of moving objects inside the frame is insignificant compared to the size of the whole camera frame. There could be situation where the moving object may pass very close to the camera occupying a significant portion of the camera frame. Even in such scenarios the proposed approach will perform better than the conventional frame differencing approach that takes the entire frame for processing. The concept of active-window not only reduces the processing time and space requirement of conventional frame differencing approach for moving object detection, it also reduces the chances of being susceptible to non-stationery changes in the background. If the active-window is used, then all necessary processing in successive frames only need to be confined inside this window itself. So, any non-stationery change outside this window will not be considered in frame differencing. This strongly reduces the chances of considering non-stationery changes in background as motion of objects.
5 Conclusion The proposed methodology successfully detects any moving object maintaining low computational complexity and low memory requirements. This would open up the scope of its usage for many interesting applications. It is particularly suitable for online detection of moving objects in embedded mobile devices limited by less computation power and low preinstalled memory. This approach can also be adopted for transmitting data of moving objects over a low bandwidth line for a surveillance system. In future, the proposed methodology may be extended towards efficient modeling of the background. There are also issues of illumination changes and nonstationery transitions in the background. Identification of shadows is another important task in this scenario. Considering these issues in the view of providing a low complexity solution is a real challenge.
136
S.H. Shaikh and N. Chaki
References 1. Nascimento, J., Marques, J.: Performance Evaluation of Object Detection Algorithms for Video Surveillance. IEEE Transactions on Multimedia 8, 761–774 (2006) 2. Li, W., Yang, K., Chen, J., Wu, Q., Zhang, M.: A Fast Moving Object Detection Method via Local Neighborhood Similarity. In: Proceedings of the IEEE International Conference on Mechatronics and Automation, Changchun, China, August 9-12 (2009) 3. Yokoyama, M., et al.: A Contour-Based Moving Object Detection and Tracking. In: ICCV (2005) 4. Elhabian, S.Y., El-Sayed, K.M., Ahmed, S.H.: Moving Object Detection in Spatial Domain using Background Removal Techniques - State-of-Art. Recent Patents on Computer Science 1(1), 32–54 (2008) 5. Leone, A., Distante, C.: Shadow Detection for Moving Objects Based on Texture Analysis. Pattern Recognition 40(4), 1222–1233 (2007) 6. Cucchiara, R., Grana, C., Piccardi, M., Prati, A.: Detecting Moving Objects, Ghosts, and Shadows in Video Streams. IEEE Transactions on Pattern Recognition Analysis and Machine Intelligence 25(10) (October 2003)
Mitigating Soft Error Risks through Protecting Critical Variables and Blocks Muhammad Shaikh Sadi1, Md. Nazim Uddin1, Md. Mizanur Rahman Khan1, and Jan Jürjens2 1
Khulna University of Engineering and Technology (KUET), Khulna, Bangladesh
[email protected],
[email protected],
[email protected] 2 TU Dortmund, Germany
[email protected] Abstract. Down scaling of CMOS technologies has resulted in high clock frequencies, smaller features sizes and low power consumption. But it reduces the soft error tolerance of the VLSI circuits. Safety critical systems are very sensitive to soft errors. A bit flip due to soft error can change the value of critical variable and consequently the system control flow can completely be changed which may lead to system failure. To minimize the risks of soft error, this paper proposes a novel methodology to detect and recover from soft error considering only ‘critical code block’ and ‘critical variable’ rather than considering all variables and/or blocks in the whole program. The proposed method reduces space and time overhead in comparison to existing dominant approach. Keywords: Soft Errors, Safety Critical System, Critical Variable, Critical Block, Criticality analysis, Risk mitigation.
1 Introduction Temporary unintended change of states resulting from the latching of single-event transient create transient faults in circuits and when these faults are executed in system, the resultant error is defined as soft error. Soft error involves changes to data, i.e. charge in a storage circuit, as instance; but not physical damage of it [1], [2], [3]. The undesired change due to these errors to the control flow of the system software may be proved catastrophic for the desired functionalities of the system. Specially, soft error is a matter of great concern in those systems where high reliability is necessary [4], [5], [6]. Space programs (where a system cannot malfunction while in flight), banking transaction (where a momentary failure may cause a huge difference in balance), automated railway system (where a single bit flip can cause a drastic accident) [7], mission critical embedded applications, and so forth, are a few examples where soft errors are severe. Many static approaches [33] have been proposed so far to find soft errors in programs. These approaches are proven effective in finding errors of known types only. But there is still a large gap in providing high-coverage and low-latency (rapid) error detection to protect systems from soft error while the system is in operation. Soft S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 137–145, 2011. © Springer-Verlag Berlin Heidelberg 2011
138
M.S. Sadi et al.
errors mitigating techniques mostly focuses on post design phases i.e. circuit level solutions, logic level solutions, error correcting code, spatial redundancy, etc., and some software based solutions evolving duplication of the whole program or duplication of instructions [8], Critical variable re-computation in whole program [9], etc. are concerns of prior research. Duplication seems to provide high-coverage at runtime for soft errors; it makes a comparison after every instruction leaving high performance overhead to prevent error-propagation and resulting system crashes. This paper proposes a novel approach for soft error detection and recovery technique. It works with critical blocks and/or critical variables rather than the whole program. Critical blocks and/or variables are defined as the coding segments which impacts the overall control flow of the program. Identification of these blocks and advancing with it is the key concept of the proposed method. Only critical blocks and variables are considered since other variables in program code may cause such error that is from benign faults and faults that are not severe for the system. The main contribution of this paper is that it detects and recovers from soft errors in lesser time and space overhead in comparison to existing dominant approaches. The paper is organized as follows: Section 2 describes related works. Section 3 outlines the methodology. Experimental Analysis and conclusions are depicted in Sections 4 and 5, respectively.
2 Related Works Three types of soft errors mitigation techniques are highlighted so far; (i). Software based approaches, (ii). Hardware based approaches and (iii). Hardware and software combined (hybrid) approaches. Software based approaches to tolerate soft errors include redundant programs to detect and/or recover from the problem, duplicating instructions [12], [13], task duplication [14], dual use of super scalar data paths, and Error detection and Correction Codes (ECC) [15]. Chip level Redundant Threading (CRT) [11] used a load value queue such that redundant executions can always see an identical view of memory. Walcott et al. [29] used redundant multi threading to determine the architectural vulnerability factor, and Shye et al. used process level redundancy to detect soft errors. In redundant multi threading, two identical threads are executed independently over some period and the outputs of the threads are compared to verify the correctness. EDDI [28] and SWIFT [13] duplicated instructions and program data to detect soft errors. Both redundant programs and duplicating instructions create higher memory requirements and increase register pressure. Error detection and Correction Codes (ECC) [15] adds extra bits with the original bit sequence to detect error. Using ECC to combinational logic blocks is complicated, and requires additional logic and calculations with already timing critical paths. Hardware solutions for soft errors mitigation mainly emphasize circuit level solutions, logic level solutions and architectural solutions. At the circuit level, gate sizing techniques [16], [17], [18] increasing capacitance [19], [20], resistive hardening [21] are commonly used to increase the critical charge (Qcrit) of the circuit node as high as possible. However, these techniques tend to increase power consumption and lower the speed of the circuit. Logic level solutions [30], [31] mainly propose detection and recovery in combinational circuits by using redundant
Mitigating Soft Error Risks through Protecting Critical Variables and Blocks
139
or self-checking circuits. Architectural solutions mainly introduce redundant hardware in the system to make the whole system more robust against soft errors. They include dynamic implementation verification architecture (DIVA) [22]. Hardware and software combined approaches [23], [24], [25], [26] use the parallel processing capacity of chip multiprocessors (CMPs) and redundant multi threading to detect and recover the problem. Mohamed et al. [32] shows Chip Level Redundantly Threaded Multiprocessor with Recovery (CRTR), where the basic idea is to run each program twice, as two identical threads, on a simultaneous multithreaded processor. One of the more interesting matters in the CRTR scheme is that there are certain faults from which it cannot recover. If a register value is written prior to committing an instruction, and if a fault corrupts that register after the committing of the instruction, then CRTR fails to recover from that problem. In Simultaneously and Redundantly Threaded processors with Recovery (SRTR) scheme [26], there is a probability of fault corrupting both threads since the leading thread and trailing thread execute on the same processor. However, in all cases the system is vulnerable to soft error problems in key areas. In contrast, the complex use of threads presents a difficult programming model in software-based approaches while in hardware-based approaches, duplication suffer not only from overhead due to synchronizing duplicate threads but also from inherent performance overhead due to additional hardware. Moreover, these post-functional design phase approaches can increase time delays and power overhead without offering any performance gain.
3 Methodology to Detect and Recover from Soft Errors A novel methodology has been proposed in this paper to mitigate soft error risk. In this method, the major working phenomenon is a two-phase approach. During
Fig. 1. Soft Error Detection and Recovery
140
M.S. Sadi et al.
1st phase, the proposed method detects soft errors at only critical blocks and critical variables. At 2nd phase, the recovery mechanism goes to action by replacing the erroneous variable or code block with originals. The program that is to be executed is split into blocks; among them, ‘critical blocks’ (definition of critical block is detailed in section 3.2) are identified. As Figure 1 illustrates, while the program is being executed, if critical blocks are encountered, they are treated specially; that is, a procedure is invoked; the critical variables within are computed twice to get two outcomes. If the comparing-mechanism finds the results identical, the ordinary execution flow of the program continues from the next block. Otherwise, variable is noticed to be erroneous by dint of soft error. The recovery process is to replace the erroneous critical block with the original program that is backed up earlier. 3.1 Backing Up the Program That is in Operation The backup of the functioning program is kept in memory for further working. This backup is assumed to be soft error free. Diverse technique (ECC, Parity, RAID, CRC etc.) survives to preserve consistency of the backup. 3.2 Identification of the Critical Variables and Blocks Critical variables provide high coverage for data value errors. By dint of Critical variables, program execution flow is determined assuming erroneous values diversified from original, leads to erroneous outcome of program.
if(move==true){path A} else{path B} path A:run_train();
path B:stop_train();
if(move==1){move_left();} if(move==2){move_right();}
Fig. 2. Cri tical Block
Critical Blocks are defined as the programming segments which control the overall program flow. Criticality is determined analyzing diverse criteria (‘fan in’ and ‘fan out’, lifetime, functional dependencies, weight in conditional branches [34] etc.). Identification of these blocks and advancing with the critical blocks and/or variables noticed within it, is the key concept of the proposed method. The code blocks responsible for branching program control flow are recognized to be critical-blocks. The dashed block in Figure 2 is example of Critical blocks within a program segment. Critical blocks decide which of the distinctive paths will be followed. 3.3 Computation of Critical Blocks and Comparing Outcome While executing a full program code, each critical blocks and/ or variables is computed twice and then compared two distinct outcomes to determine their consistency. The basic steps can be stated as follows:
Mitigating Soft Error Risks through Protecting Critical Variables and Blocks
141
Step 1: Each critical block is recomputed (executed twice and outcomes are stored). Step 2: Recomputed results are compared to make sure that they are identical. Step 3: If recomputed values show consistency, program will be continued from the next code block and no soft error will be reported. Step 4: If recomputed values show inconsistency, program block will be identified as erroneous and soft error will be reported; and then the recovery procedure will be called for. Erroneous critical block will be replaced by the relevant original program’s critical block. And program execution will be continued from current block.
4 Experimental Analysis The proposed method is experimented through a multi phases simulation process that detects the soft error occurred through the detection phase and duly recovers it in order to lead the program towards expected output with its counterpart; the recovery process. Backup of the operational program is kept in memory for soft error recovery process. A candidate program is checked through the simulator to detect soft error and duly reports it if there is any. Block wise execution of core program along with twin computation of critical one is helped by this backup to go to the desired end by supporting the blocks to be actual valued. A binary representation of the executable program is formed and lunched on the simulator editor as hexadecimal format. They are sequence of bit stream to negotiate with. Error is injected manually that is flipping a bit/ bits to change the original code sequence. 4.1 Fault Injection Error is injected manually to change the original program. Error injection evolves bit flipping; this is to change a binary ‘0’ (zero) to a binary ‘1’ (one) and vice-versa at any bit position of a particular byte. Fault is injected at variables and/ or at any random position (instructions or variables) of the program’s binary file to change the value of the variables or the instructions. Suppose binary representation of a variable is 01000110, bit flipping may occur at any bit position due to soft errors. Suppose bit position 5 will be flipped, 0 to 1 then the value of the variable will be 01001110 that will cause a huge difference of the value. 4.2 Soft Error Detection The critical blocks come into main focus among all other program-blocks. While executing, they are computed twice and compared by values they contain. Error is detected if the comparison is distinguished-valued. As soft error is transitory, it has neither repetitive occurrence nor long lasting effect. Hence, consecutive computation of variables results different result if any of them assumes erroneous value that should not to be. If no such case is encountered, ‘no soft error’ is reported. 4.3 Recovery from the Effect of Soft Error The previously noticed erroneous critical variable and its location is traced; and then, the backup is invoked to replace the erroneous critical code block with relevant originals. Recovery tool activates mechanisms to perform recovery tasks.
142
M.S. Sadi et al.
4.4 Result Analysis The methodology is found to deal with critical blocks and/ or critical variables other than going through the whole program. This makes it less time consuming to perform computation (as shown in Figure 3).
Fig. 3. Comparison of Execution Time Table 1. Review of different soft-error tolerance techniques
Approaches
Adopted methodology
Memory space overhead Execution time overhead Number of variables to be executed
Drawback
Diverse Data and Duplicated Instruction ED4I [8] Original program and transformed program are both executed on same processor and results are compared. At least double
Critical Variables Recomputation for Transient Error Detection [9] Re-computes only critical variables (not the instructions) to detect and recovery from soft errors.
Longer than usual since comparison
Longer than usual since CV re-computation
Double
Program may crash before reaching comparison point. Both the programs may be erroneous.
Depends on no. of Critical variable.
Depends on no. of Critical variable. Unable to detect all severe errors since it works only with critical blocks. Instructions may also be erroneous.
Source Code Modification [10], [27]]
Proposed Method
Based on modifications of source code. Protection methods are applied at intermediate representation of source code. Larger than usual based on modification scheme High running time since source code modification Depends on modification techniques System may crush before reaching the control flow checking point.
Detection is performed by critical blocks re-computation. Erroneous blocks are replaced by relevant backed up blocks. Depends on no. of Critical Blocks Relatively much lower
Depends on no. of Critical Blocks Improper identification of critical blocks leads to inefficiency.
Mitigating Soft Error Risks through Protecting Critical Variables and Blocks
143
Table 1 depicts a theoretical comparison among different soft-error tolerance techniques. Treating with fewer blocks/ variables requires lower memory space. Figure 4 illustrates the memory utilization of proposed method and whole program duplication methods with respect to time. Proposed approach consumes lesser memory space than existing leading whole program duplication approaches.
Fig. 4. Comparison of memory utilization
5 Conclusions This paper reflects the possibility of modification of existing methodologies to lower the criticalities of program blocks to minimize the risk of soft errors. The significant contribution of the paper is to lower soft error risks with a minimum time and space complexity since it works only with critical variables in critical blocks; hence, all the variables in program code are not considered to be recomputed or replicated though they also may cause such error, that is from benign faults and faults that are not severe for the system; which does no interference to system performance. It is seen that only critical variables induced soft error affects systems program flow in a great extent to be malfunctioned. Hence, leaving some ineffective errors un-pursued, the proposed method can achieve the goal in a cost effective (with respect to time and space) way. Some possible steps could be adopted to enhance the performance of the method. Storing the recomputed outcomes of the critical blocks at cache memory will enhance memory access time, which can be a significant issue in case of memory latency. This can make the proposed method less time consuming by compensating the time killing to make double computation. The protection of backup of original program is a great concern to remain it soft error free. In support of storage, besides existing techniques such as Error-Correcting-Code (ECC), Redundant Array of Inexpensive Disks (RAID), enhancement can be explored. Another considerable concern is the critical blocks and critical variables. Obviously critical blocks and critical variables identification among numerous code blocks in operating program is a great challenge to make it optimum. Efficiency of the proposed method mostly depends on proper identification of critical blocks and critical variables. They can be grouped according
144
M.S. Sadi et al.
to their criticality that is treated differently in different view-points. Several issues like “fan out”, that is number of dependency/ branches exist; number of “recursion”that is, looping or how many times a call repeated; “severity of blocks”, that is block containing more weighted variables etc., are wide open to determine the criticality. Hence, much more scopes are available in the field of critical block and variable identification. In contrast, proper identification of these critical blocks and variables can mitigate the soft error risks with optimal time and space requirement.
References [1] Timor, A., Mendelson, A., Birk, Y., Suri, N.: Using under utilized CPU resources to enhance its reliability. IEEE Transactions on Dependable and Secure Computing 7(1), 94–109 (2010) [2] Rhod, E.L., Lisboa, C.A.L., Carro, L., Reorda, M.S., Violante, M.: Hardware and Software Transparency in the Protection of Programs Against SEUs and SETs. Journal of Electronic Testing 24, 45–56 (2008) [3] Mukherjee, S.S., Emer, J., Reinhardt, S.K.: The soft error problem: an architectural perspective. In: 11th International Symposium on High-Performance Computer Architecture, San Francisco, CA, USA, pp. 243–247 (2005) [4] Iyer, R.K., Nakka, N.M., Kalbarczyk, Z.T., Mitra, S.: Recent advances and new avenues in hardware-level reliability support. IEEE Micro 25, 18–29 (2005) [5] Narayanan, V., Xie, Y.: Reliability concerns in embedded system designs. Computer 39, 118–120 (2006) [6] Tosun, S.: Reliability-centric system design for embedded systems, Ph.D. Thesis, Syracuse University, United States –New York (2005) [7] Sadi, M.S., Myers, D.G., Sanchez, C.O., Jurjens, J.: Component Criticality Analysis to Minimizing Soft Errors Risk. Comput. Syst. Sci. & Eng. 26(1) (September 2010) [8] Oh, N., Mitra, S., McClusky, E.j.: ED4I: Error Detection by Diverse Data and Duplicated Instructions. IEEE Transactions on Computers 51(2) (February 2002) [9] Pattabiraman, K., Kalbarczyk, Z., Iyer, R.K.: Critical Variable Recomputation for Transient Error Detection (2008) [10] Piotrowski, A., Makowski, D., Jabłoński, G., Andrzej, N.: The Automatic Implementation of Software Implemented Hardware Fault Tolerance Algorithms as a Radiation-Induced Soft Errors Mitigation Technique. In: Nuclear Science Symposium Conference Record. IEEE, Los Alamitos (2008) [11] Mukherjee, S.S., Kontz, M., Reinhardt, S.K.: Detailed design and evaluation of redundant multi-threading alternatives. In: 29th Annual International Symposium on Computer Architecture, pp. 99–110 (2002) [12] Oh, N., Shirvani, P.P., McCluskey, E.J.: Error detection by duplicated instructions in super-scalar processors. IEEE Transactions on Reliability 51, 63–75 (2002) [13] Reis, G.A., Chang, J., Vachharajani, N., Rangan, R., August, D.I.: SWIFT: software implemented fault tolerance, Los Alamitos, CA, USA, pp. 243–254 (2005) [14] Xie, Y., Li, L., Kandemir, M., Vijaykrishnan, N., Irwin, M.J.: Reliability-aware cosynthesis for embedded systems. In: 15th IEEE International Conference on ApplicationSpecific Systems, Architectures and Processors, pp. 41–50 (2004) [15] Chen, C.L., Hsiao, M.Y.: Error-Correcting Codes for Semiconductor Memory Applications: A State-Of-The-Art Review. IBM Journal of Research and Development 28, 124–134 (1984)
Mitigating Soft Error Risks through Protecting Critical Variables and Blocks
145
[16] Park, J.K., Kim, J.T.: A soft error mitigation technique for constrained gate-level designs. IEICE Electronics Express 5, 698–704 (2008) [17] Miskov-Zivanov, N., Marculescu, D.: MARS-C: modeling and reduction of soft errors in combinational circuits, Piscataway, NJ, USA, pp. 767–772 (2006) [18] Quming, Z., Mohanram, K.: Cost-effective radiation hardening technique for combinational logic, Piscataway, NJ, USA, pp. 100–106 (2004) [19] Oma, M., Rossi, D., Metra, C.: Novel Transient Fault Hardened Static Latch, Charlotte, NC, United states, pp. 886–892 (2003) [20] P. R. STMicroelectronics Release, New chip technology from STmicroelectronics eliminates soft error threat to electronic systems, http://www.st.com/stonline/press/news/year2003/t1394h.htm [21] Rockett Jr., L.R.: Simulated SEU hardened scaled CMOS SRAM cell design using gated resistors. IEEE Transactions on Nuclear Science 39, 1532–1541 (1992) [22] Austin, T.M.: DIVA: a reliable substrate for deep submicron microarchitecture design. In: 32nd Annual International Symposium on Microarchitecture, pp. 196–207 (1999) [23] Gold, B.T., Kim, J., Smolens, J.C., Chung, E.S., Liaskovitis, V., Nurvitadhi, E., Falsafi, B., Hoe, J.C., Nowatzyk, A.G.: TRUSS: a reliable, scalable server architecture. IEEE Micro 25, 51–59 (2005) [24] Krishnamohan, S.: Efficient techniques for modeling and mitigation of soft errors in nanometer-scale static CMOS logic circuits, Ph.D. Thesis, Michigan State University, United States – Michigan (2005) [25] Mohamed, A.G., Chad, S., Vijaykumar, T.N., Irith, P.: Transient-fault recovery for chip multiprocessors. IEEE Micro 23, 76 (2003) [26] Vijaykumar, T.N., Pomeranz, I., Cheng, K.: Transient-fault recovery using simultaneous multithreading. In: 29th Annual International Symposium on Computer Architecture, pp. 87–98 (2002) [27] Piotrowski, A., Tarnowski, S.: Compiler-level Implementation of Single Event Upset Errors Mitigation Algorithms [28] Oh, N., Shirvani, P.P., McCluskey, E.J.: Error detection by duplicated instructions in super-scalar processors. IEEE Transactions on Reliability 51, 63–75 (2002) [29] Walcott, K.R., Humphreys, G., Gurumurthi, S.: Dynamic prediction of architectural vulnerability from microarchitectural state, New York, NY 10016-5997, United States, pp. 516–527 (2007) [30] Mitra, M.Z.S., Seifert, N., Mak, T.M., Kim, K.: Soft and IFIP, Soft Error Resilient System Design through Error Correction. VLSI-SoC (January 2006) [31] Zhang, M.: Analysis and design of soft-error tolerant circuits, Ph.D. Thesis, University of Illinois at Urbana-Champaign, United States – Illinois (2006) [32] Mohamed, A.G., Chad, S., Vijaykumar, T.N., Irith, P.: Transient-fault recovery for chip multiprocessors. IEEE Micro 23, 76 (2003) [33] Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Transactions on Software Engineering 20, 476–493 (1994) [34] Bergaoui, S., Vanhauwaert, P., Leveugle, R.: A New Critical Variable Analysis in Processor-Based Systems. IEEE Transactions (2010)
Animations and Simulations: Learning and Practice of Algorithmic Brika Ammar Research professor Graduate School of Computer Science, Algiers, Algeria
[email protected] Abstract. We try to show in this paper that animations and simulations are often presented as how learning tools, even though all the animations are not really able to fulfill this role. We define what a graphic animation and simulation to show that, in education systems, they should only be used if they have something to do with: the material to learn the process of learning and / or characteristics of the learner, thus leaving aside the animations used solely for marketing purposes. We then present and discuss a simple but pedagogically relevant simulation included a help system to learn and practice algorithms. Finally, we propose a guideline establishing general relationships between types of learning activities, types of knowledge implemented and the types of animations and simulations. Keywords: multimedia system, motion graphics, graphical simulation, interaction, teaching purposes (for animation), the purpose of learning, learning activity.
1 Introduction To increase the ability to visualize, teachers have worn images in the books first, transparencies and slides are then came, in black and white and color. To move these images, we had the books "animated" (children's books made of overlapping images that can slide one over the other), we had the magic lantern, and then film and video, computer animation has finally brought a new dimension to the hidden and was referred to our imagination. Since then, educational systems have evolved dramatically in recent years [8]. The rise of multimedia technology and hypermedia has made such systems more attractive. But that said information does not necessarily tell if the education and hypermedia are tools for information they are not in themselves educational tools, but may be suitable as learning systems. Alongside this structure flexible access to information, one should add the possibility to take notes, save a result, access to simulation programs, to insert new data .. AND SHOULD ALWAYS SUPERVISOR (automatic or not) to prevent the inexperience, to avoid getting lost in the jungle of information. For this is the risk of hypermedia, freedom of the learner has a setback, it may get lost in hyperspace or show no intellectual curiosity. The hypermedia must be accompanied by navigational aids (indexes, history) and the teacher's instructions, and knowledge of auditors. S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 146–157, 2011. © Springer-Verlag Berlin Heidelberg 2011
Animations and Simulations: Learning and Practice of Algorithmic
147
These techniques have become so broad usage across the Internet and in particular the web. A notable application of such techniques is the use of simulations and animated graphics, widely used before and during a long period of time regardless of the Internet. Except the Internet, their efficiency is even bigger, for example, the realistic animations are present particularly in the video games on computer who(which), even if they are available on Internet are executed more often from a CD-ROM, or even from a local hard disk, rather than directly from Internet.. In addition to games, another major use of animation is represented by the educational systems where they have an impact more broadly. In what follows, we will try to examine the roles and effects of graphic animation and simulations on educational systems. The first point is a reminder of what the animations and simulations. The second point presents several general aspects of the uses of animation in the educational system by giving appropriate examples. The third point, gives a more detailed example of animation in a single system built to acquire programming skills. The last and fourth point established some guidelines linking various types of animation through various types of educational objectives.
2 What the Animations and the Simulations Though often associated with the filmmaking example, the animation is a branch of computer graphics [9]. However, anything that is not always animated graphic (in fact, some use only ASCII text) and Everything is not as graphic animation. To do this we will focus this article on graphic animations. The animations are exactly the graphic displays that show that something is changing in time. It depends on the particular case, that something might appear or disappear, move or change (eg, shapeshifting and / or changing color etc..). The graphic can be a drawing, cartoon or picture, when animated, the latter gives rise respectively to cartoons, comics, cartoons and video movies (eg, videotape). Regarding the simulations, we first excluded discipline simulation [18] which is based on mathematics rather than graphics (although the simulation program often include animated simulations for marketing reasons) to take an interest in graphic simulations. These are of two types: • •
Static simulation: it displays the (real images or computer generated), so it is not entertainment (nothing changes over time), their interest is the quality of interpretation of the effect (the surface texture, landscapes). Dynamic simulations: this is entertainment.
These two categories are the simulation is to share their content and their interactive nature. They show a certain property, real or not the displayed scene: a body, an event, phenomenon, etc.. As to their nature, they allow the user to respond to the scene displayed by varying its speed or pausing if it is animated, focusing on a particular area (by zooming, etc..) Or even influencing the outcome of the animation itself (this is the case with some games based on the simulation).
148
B. Ammar
The two categories of simulation are used in educational systems, the scene displayed and demonstrated properties are then connected directly to the relevant issue. This article does not deal with static simulations that are simply related to good images. From a technical point of view, animations and simulations can be prepared either in advance or created on demand. The first approach requires a large memory capacity. The second approach requires a fast processor and large memory. The animation can combine both approaches.
3 The Animations in the Éducationnel System The animations are very popular, because of their charm, in particular in the games(sets) spectacular and movies. But what do they bring to the éducationnels systems? Do they improve them? If it is the case in which domain? To be more éducationnellement relevant, they should not be as the brought to light "free" word or the other more biased words, written at the top of certain advertising hoardings. The proof: you certainly read such an advertising(publicity) but you have never called the person who sent him(it) to you.? We are now going to examine the relations of the animations with the concerned subject, with the process of learning(apprenticeship), with the interests of the learner and finally with the marketing. 3.1 The Animations with the Concerned Subject In the implementation éducationnelle, the first purpose of any object presented to the learning is that he(it) should be descriptive of the contents to learn ( a course(price)). It is him(her,it) certainly for the livened up(led) scenes(stages). Many examples of such animations can be found: • • • •
in physical appearance(physics) (in kinematics and in dynamics), in engineering (the functioning of an engine in 4 time(weather), the creation or the absorption of a photon, in astronomy (the movements of planets and stars), In molecular biology (the human genome), in certain domains of the computing (the sorting of the algorithms) [12].
This is also true for static animations are very useful in some descriptive natural sciences (botany, zoology), physical sciences (mechanical stresses in static physics or materials science). Such animations or simulations are certainly appropriate (if not the most appropriate) way because they help set the learner understand the subject, the process or the law presented to him. However, to respond more effectively to this goal, such an animation or simulation should focus on what must be learned and nothing else. If it is too realistic, it can distract the learner's real subject concerned. Take for example the presentation of operating a four-stroke engine to a learner in mechanics: a pencil drawing or a cartoon can be effective. Now, presenting the same thing, a film showing a real engine in a real car accident can focus the attention of the learner on an inappropriate place (eg. A brightly colored label placed by inattention to a place in motion) . A similar situation can occur when driving a car. As driver attention may be drawn the night by a
Animations and Simulations: Learning and Practice of Algorithmic
149
set of flashing lights. If these lights indicate a dangerous place, in this case they warn me of danger and so much the better. Otherwise, they advise me a restaurant or something not that interesting, you may feel frustrated at having to focus unnecessarily on something that does not concern our conduct. So too much realism in the animation may cause the learner to make mistakes. 3.2 The Animations and Learning Processes Even when there is a relationship with the subject concerned, good educational animation would actually increase or at least facilitate learning. It would certainly do if it makes content easier to understand as in the above examples that depict the phenomenon itself, or the law that the learner is supposed to understand. Other activities could simply illustrate the subject in question without being too critical to the learning process, why should they make the content palatable to facilitate its ingestion. While understanding a concept, as a phenomenon that works is important, the process of understanding involves two other aspects: motivation for learning and memorization of what about to be learned. And animation software also help facilitate the process, although not as effectively as if they facilitate understanding. Finally we can say that the experiment that may be involved in three aspects: the understanding, motivation and memory, is mainly active in the simulation graphs. So, when we estimate the contribution of an existing or proposed animation or a simulation to the process of learning (apprenticeship), we should examine how she(it) addresses the understanding, the motivation, the memorization and the experiment. However we should also consider the time(weather) which crosses(spends) the learner to learn the environment of simulation against what allows the learner to learn that is its functional capacities. 3.3 The Animations that Benefit the Learner The learning process described in general terms above involves at least two entities: the matter concerned and the learner. Thus, in addition to the content discussed in 2.1. It may be that a movie tries to connect to the learner: its psychology, its experiences, interests, etc.. This is especially true when the "knowledge" to learn involves the learner himself, who must then learn how to be (or learn how to learn what to do). Such situations are like: driving a car or piloting an aircraft, management of resources, socio-political behavior, or the integration of persons with special need in their social environment. An example of the latter situation is the Brazilian project AVIRC, an integrated virtual reality environment for cognitive rehabilitation where the learner is a person with brain injury or neuropsychiatric disorder [13]. Another example [14] deals with a multimedia interface where the modalities of the task of understanding executed are adapted to the characteristics of the learner, a child with intellectual disabilities and cognitive. A more recent example is the automatic generation of scenarios for learning in a virtual environment healthy, currently used for students in politics, but usable in other areas. In these types of learning situations, simulations and animations are very realistic appropriate because the learner may be linked to process reality. Indeed, when we
150
B. Ammar
learn how to be, to explain important details may not be consistent so as visible as when studying a law in physics, for example. Such realistic software could then be appropriate substitutes for real life scenarios. The virtual reality environment can sometimes even be the best and the only way to learn, not only cost savings but also because other people are involved. These are the cases of simulators flights especially when they are developed for aircraft not yet built, surgery for virtual medical students or environment for learners with special needs. However, according to our opinion, the entertainment focused on the special interests of individuals are more difficult to assess objectively than before. Indeed, a person (here the learner) is much more complex than the individual or even the learning process, which have been studied more extensively. The evaluation questions to ask would be in the following directions: • What is the relationship between animation software simulation and learning have been established? • Is a particular type of learning is targeted? • If so has there been shaped? • How does the animation or simulation is it relevant to the learner model? 3.4 The Animations and the Marketing Really a fine line can separate the truly useful educational animations and simulations of those whose main virtue is their attraction (marketing). Indeed, many software praise their animations "breathtaking" and the modern techniques of interpretation are quite fabulous [14]. A learner can indeed a lot of fun experimenting with animation especially attractive but it does not learn how to better or more effective because of this type of animation, while the animation should nevertheless be questioned.
Concerned subject
Process Of learning
Animations
Interests of the learner
Marketing
Fig. 1. Animations and simulations in an educational system
Animations and Simulations: Learning and Practice of Algorithmic
151
Also, assessing animation software, we should adequately assess those which are intensions and those who are the primary effects of the animation. It often seems simply cause a surprise. You do not criticize sparkling animations such as to suggest or evaluate, you may be asking you if and how they are connected too concerned about, the general process of learning and learner focused. 3.5 Conclusion In a éducationnel context, an animation should be connected(bound) in a primary way with the concerned subject, with the process of learning(apprenticeship) or with the learner (Fig. 1). The interpretation(performance) of the effects such as the realism, the speed, or the reaction are only secondary if they do not strengthen these primary relations. David Merill [ 10 ] quoted: " evaluations based mainly on the value of production and appearance inform you very moderately about the instructionnelle quality of a product " it is of in particular in the animations and the simulations. Naturally, the present discussion and in particular Fig.1 treats(handles) animations supposing that the learner is alone with the éducationnel system, that is independently of the environment in which he is placed. If, for example, a teacher is present and can refocus the attention of the learner on the relevant points (the purposes of learning(apprenticeship)), the inconvenience or the inadequacy of an otherwise inappropriate animation can be decreased or even eliminated.
4 The Animations in the Simulation Software Algorithmics The system algorithmic [8] was conceived for the students in elementary class of programming. The main objective of the guardian is to help the student to learn and to use in a adequate way structures of basic commands(orders) (YEW THEN, YEW THEN ELSE, WHILE DOH) and appeal procedures. For that purpose, the algorithmics uses a macro called world " Karel the Robot " [16], in whom(which) the student has to schedule(program) a robot to make him(her) realize various spots requiring(demanding) the use of structures of learnt commands(orders). What allows to understand(include) how the power of the machine and the brain of the programmer (or of the user) can cooperate. The macro world, as major advantage requires(demands) no structure of data, so allowing to separate two levels of difficulty: the structures of data and the structures of command(order) felt by the beginners in programming. The algorithmics supplies to the learner three interactive components : • a structure editor for code entry, • a graphical editor of situation • a graphic simulator for executing and tracing a program. To help the learner to solve his problem, it also includes an intelligent tutoring system that can be defined as a software expert in teaching and pedagogy expert and could play the role of tutor dialogue with a learner, especially about of solving a problem. The educational objective of the intelligent tutoring system is defined as an educational system whose main objective and to transmit knowledge but also expertise. More
152
B. Ammar
clearly is an active system, which can guide the interaction, to judge the actions of the learner and offer explanations to the errors detected are not repeated. This principle is based on three features of interest to education: the problems of the learner and the problem of the learner by explaining the solutions found and managing a teaching session, taking into account the learner and context. Issues available are categorized according to their supposed difficulty: decision structures, iterative structure or control structure. When you start a session, the learner may choose a category and its first problem in this category, after he is guided in its work on more difficult problems. Alternatively, it may discontinue use of guardians and combine its own problem on which he worked. In this case, since the system does not "know" anything with regard to the task, the learner is left alone (without a guardian). Assuming that the student decides to work on the problem of the category of "all control structures, the problem of super obstacle course [8], he presents the following description of a task: [Karel must run an obstacle course. The hedges have a variable height and zero width. The end of the race is marked by a beeper [8]. Karel started his first race in facing the east at the end of the race, he also faces east and is supposed to have picked up the beeper ending the race]. With a display like that shown in Fig.2. showing a program window where he can work on his program by using the structure editor and an example of the initial situation. If the student is not happy with the proposed situation, it can be changed using the graphical situation editor, which allows him to place the robot at the intersection with the desired direction and desired to place or Beepers and move the wall sections around as they wish. Although they do not involve animation or simulation, the structure editor and publisher of situation are suitable interactive graphical tools to use, especially for varying the situational parameters of the program through experimentation. The simulator implementation, which can be activated using the run menu (Fig. 2.) involves much animation. It is used to visualize the situation enables the execution of the active window of the program again, the current instruction execution is simultaneously recorded in the program text.
Button which indicates the instruction which is in the course of
So, at any time, the relationship between the active statement of the program and the result of his performance is visible. In addition, at any time during the performance, the student can change the execution mode (fast speed, slow speed, or step), or it may activate commands or pause execution canceled. (Fig. 2).
Animations and Simulations: Learning and Practice of Algorithmic
153
Execution Execute Fast speed Slow speed By step Pause Cancelled execution Fig. 2. Algorithm execution menu
All these measures help learners to monitor the implementation of the program. In addition, slowing or even stopping the execution at any point in the program for lack of clarity on the part of the learner, it can collect some information possibly modified. The local situation (the position of Karel and orientation, Beepers and wall sections), to better understand how a program actually works and thus correct any mistakes. The capabilities of the simulator implementation are particularly useful for problems invented by the learner or the teacher, where the help of the guardian is not available. The simulator of execution seems to be a simple and interesting alternative in the most complex tools by using for example a more realistic robot. Indeed, the software algorithmics is simple to learn and to master because only what is important is posted(shown) and is included in the commands(orders) available on the learner. These simple commands allow him to concentrate on the conception and the development of the algorithm (rather than manipulating tools or more sophisticated displays).
([HFXWLRQ RI IXQFWLRQ XQGHU FRQWLQXRXVVKDSH
WKH WKH
'LVSOD\ RI WKH UHVXOWV LQDFRQWLQXHZD\
Example 1: an example of a simulator of execution "continues"
The execution of the instructions of the program is made by set. The execution of these sets and the display of the corresponding results are made without the intervention of the learner.
154
B. Ammar
This is intended in particular for the learners who have a minimum of knowledge in programming.
&XUUHQW9DOXH ,QLWLDOYDOXH
Example 2: An example of a simulator for execution of the loop "Repeat
The field is initialized by the limit border at the time of the execution of the first iteration and will be incremented in every new iteration. He can be even incremented in case of return in the previous instructions. Furthermore, we have a field to introduce the step of the buckle which is stable during the simulation.
5 Which Animations and Simulations for Which Objectives From what has been said and the examples given in previous sections, several general rules or guidelines that can be drawn. In the table of Fig. 3. We try to highlight the roles played by the animations and simulations in various learning situations. •
•
On the horizontal axis at a spot of learning identified by the type of knowledge to acquire, we associate the type of the animation typically more appropriate for this task of learning. We document the association with educational animation capabilities useful for this task, and with examples of typical fields. Please note that the animations can not operate at the level of the spot, which is much lower than the goal of learning the most general of the learner. On the vertical axis, our table shows a progression of learning the lowest level to those of the highest levels and a parallel increase in the animations or simulations from simple to sophisticated. Naturally the second increase is justified by the fact that the spots to run, and domains from which they are based are increasingly complex, so they need to be displayed with more realism.
Naturally, the animations or even the simulations are never self-important in themselves: they can be only a part of the learningand a part of the education(teaching).
Animations and Simulations: Learning and Practice of Algorithmic
155
Towards the learning, they are inevitably connected with the specific spots to be executed, the spots which can be taken away from the purposes has learnings connected with more general domains; the activities based on the animation need to be completed by more activities directed to the learning. Example: the resolution of problems. Towards the education(teaching), even with activities based on the animation, the capacities of animation need to be completed by the other capacities of the guardian of the system or of the teacher. For it we have to supply detail either with explanations of the subject or the phenomenon or the next behavior. We also have to answer diverse types(chaps) of questions resulting from the learner (on whom he(it) becomes harder and harder to answer in a appropriate way when the type( of knowledge: of the which to know in how to know etc.). Learning goal (task)
Type of knowledge
Useful educational capacity Accent on the distinctive characteristics
what
Static simulation or representation
Understand a phenomenon or a law
To know how he works
Comic strip or cartoon
Command of speed, effects of zooming
Execute a procedure
Know how to make
Realistic animation, simulation drawing image
Show the procedure being executed, effect of modification of parameters, playback.
Treat an implying situation the others
Know how to be (capacity and behavior)
Realistic simulation virtual reality
Knowledge of the concerned subject
Knows he is
Animation type
of or
Dumping in the situation to be mastered
Examples of domains Domains of classification (botany, zoology etc.) Theoretical domains (geometry, physical, mechanical) Domain of capacity practises(has a practice) (driving(behavior ), construction, resolution of problems.) Human domains (human resources management, politics(policies), medical domain, learners at special needs.)
Fig. 3. Relationships between educational goals and types of animation
Above all, the links must be made between the spots authorized by the animation or simulation and capabilities of the highest level that the student must eventually acquire. It may be noted in our table, the lack of animation properties such as the outbreak of motivation or attraction. This is due to the fact that such properties can be applied to all types of animation and also because they can not be objectively measured (at best through questionnaires sensitivity), more and more important , is our strong belief that the motivation or attraction alone can not be a source for learning and can only be a way that can facilitate a learning process whose source is necessarily elsewhere.
156
B. Ammar
Although realism does increase the attraction, we have shown in Section 3.1, a too realistic animation can distract the learner with learning problems can be difficult or even hide these problems. (By focusing the attention of accidentally learning about some irrelevant aspects of the movie) because it does not focus on what needs to be learned. Similarly, despite their current popularity as a trend of research, we believe that, although biological agents have their advantages, they also have their limits, especially if it distracts the learner rather than focus on what needs be learned.
6 Conclusion In this document, we tried to show that the animations or the simulations are not a characteristic which inevitably increases the educational value of a educational software. They can make it only if certain purpose of learning is them under neighboring, and if their type are suited to help the learner to answer this purpose. We hope that the presented orientations can help persons conceiving animations and simulations to build animations and more significant simulations and can help persons in charge of purchase or of evaluation of a éducationnel software not to be impressed illegally by animations of charm in which the objective of learning is absent. It would certainly be more valid to conceive and to build a real instrument based on the computer, to estimate the éducationnels merits of a given animation, by considering the contexts of learning, the materialsand the objectives as well as the specificities of the learner, but this attempt is except the reach of the position of this document.
References 1. Brummund, P.: The Animator. Hope College, Holand, Michigan, USA (1997) 2. Dion, P.: Conception et implantation d’un système de tutorat pour l’enseignement de l’algorithmique. Master’s thesis. U. Laval, Québec, Canada 3. Foley, J., van Dam, A., Feiner, S., Hugues, J.: Computer Graphics, 2nd edn. AddisonWesley, Reading 4. Furniss, M.R. (ed.) Animation Journal. Published since 1991, now at AJ Press, Irvine, CA, U.S.A. (1991) 5. Harrison, J. (ed.): Sorting Algorithms. University of British Colombia, Vancouver (2001) 6. Henderson, G., Rittenhouse, R.C., Wright, J.C., Holmes, J.L.: How a photon is created or absorbed. Journal of Chemical Education 56, 631–635 (1979) 7. Laybourne, K.: The Animation Book: a Complete. Guide to Animated Film-Making From Flip-Book to Sound Cartoons to 3-D Animation. Three Rivers Pr., N.-Y, U.S.A. 8. Lelouche, R.: How education can benefit from computers: a critical survey. Invited conférence presented at CALISCE 1998, The 4th International Conference on ComputerAided Learning and Instruction in Science and Engineering, Goteborg (Sweden), June 15-17, pp. 19–32 (1998) 9. Lelouche, R.: The animation Book: a complete Guide to Animated film-Marketing – from Flip-Books to sound Cartoons to 3-D Animation. Trees Rivers Pr., N-Y (1999) 10. Merrill, D.: Does your instruction rate 5 stars? Keynote address. In: Proceedings of IWALT 2000, The International Workshop on Advanced LearningTechnology, Palmerston North, New Zealand, December 4-6, pp. 8–11. IEEE Computer Society, LosAlamitos (2000)
Animations and Simulations: Learning and Practice of Algorithmic
157
11. Mc Cauley, R. (ed.) Algorithms and Algorithm animation. Computer Science Education Links. Department of Computer Science, College of Charleston, Charleston, SC, U.S.A. 12. Moreira da costa R.M., Vidal de Carvalho, L.A.: Virtual Reality in Cognitive retaining. In: Proceeding of IWALT 2000, The International Workshop on Advenced Learning Technology, Palmerston North, New Zealand, December 4-5, pp. 221–224 (2001) 13. Moreno, L., Gonzalez, M.C., Aguilar, R.M., Estévez, J., Sanchez, J., Barroso, C.: Adaptative mulimedia interface for users with intellectual and cognitive handicaps. Intelligent Tutoring Systems, 363–372 (2000) 14. Merill, D.: Does your instruction rate 5 stars? In: Proceeding of IWALT 2000, The International Workshop on Advenced Learning Technology, Palmerston North, New Zealand, December 4-5, pp. 08–11 (2000) 15. Pattis, R.: Karel the robot. A gentele introduction to the Art of Programming (1981) 16. Watt, A., Watt, M.: Advenced Animation and Rendering Techniques: Theory and pratice 17. Yurcvik, W.: The simulation education home page, Applied Computer Science Department, IIIinois State University (1999)
Dynamic Task Allocation in Networks-on-Chip Agarwalla Bindu1 and Deka Jatindra Kumar2 1
Kalinga Institute of Industrial Technology, Bhubaneswar, India
[email protected] 2 Indian Institute of Technology Guwahati, Guwahati, India
[email protected] Abstract. The run-time strategy for allocating the application tasks to platform resources in homogeneous Networks-on-Chip (NoCs) is enhanced with respect to minimising the communication energy consumption. As novel contribution, the entity Life-Time (LT) is incorporated to a node of an application characteristics graph (ACG), used for describing an incoming application. Several algorithms are then modified for solving the task allocation problem, while minimizing the communication energy consumption and network contention. If Life-Time(LT) is taken into consideration, it is observed that after each allocation, the nodes having lower value of life-time contains more available neighbors, compared to the existing solution, hence helps in allocation of an application as the system load increases, whereas the existing solution may fail to allocate the new application onto the current system configuration. Again, a node(used to describe a task in an application) belonging to an ACG is deallocated when its Life-Time expires which results in minimizing the communication energy consumption. Keywords: NoC, Life-Time, SoC,Communication Energy Consumption.
1
Introduction
Today’s semiconductor fabrication technologies can integrate over billions of transistors on one chip. The complexity of the system increases dramatically because of the amounts of transistors on a single chip is proliferating. The concept of SoC emerges, as times require. SoCs can be fabricated by several technologies, including: Full custom, Standard cell and FPGA. With the increase in system complexity, the design of future Systems-on-chip(SoCs) faces major challenges. One among them is the design of the communication infrastructure. In fact, the traditional bus-based or more complex hierarchical bus structures (e.g.,AMBA,STBus) cannot satisfy the performance, scalability, flexibility and energy efficiency requirements of future systems [1]. Network-on-Chip or Network-on-a-Chip (NoC or NOC) is a new approach to design the communication subsystem of System-on-a-Chip (SoC). NoC brings networking theories and systematic networking methods to on-chip communication and brings notable improvements over conventional bus systems. S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 158–170, 2011. c Springer-Verlag Berlin Heidelberg 2011
Dynamic Task Allocation in Networks-on-Chip
159
Till now, the techniques proposed for the resource management for SoCs depend on a resource manager working under the control of an operating system [7]. Since SoC design is moving towards a communication-centric paradigm [8], the resource management process need to consider new parameter metrics (e.g., physical links with limited bandwidth, communication-energy minimization). In fact, with improper task assignment, the system performance will degrade. E.g., the ”Random Allocation”scheme results in severe internal/external contention, which causes longer transmission latency and hence smaller system throughput. The incoming applications, described by the Application Characteristic Graph are mapped onto multi-processing element SoC where communication happens via the Network-on-Chip (NoC) approach. Our work basically improves the proposed strategies [9] for minimizing the internal and external contention. Then these modified approaches are combined from a user perspective by a hybrid allocation scheme [9]. The new entity added in the allocation process is Life-Time (LT) of a node. Details on LT are given latter. The new entity LT improves the allocation (reduces the communication cost) of the application, for whose ACG, LT is associated to its vertices as well as the communication cost for additional mapping is also reduced. And reducing the communication cost is also considered as one of the important quality parameter of an allocation strategy. In Section 2, we explain how this paper relates to other work. In Section 4 we describe the proposed work with the details of all the algorithms and finally we draw conclusions in Section 6.
2
Related Work
Task allocation is an important factor that influences performance optimization in parallel and distributed systems. To date, Contiguous and non-contiguous allocation techniques have been proposed for resource assignment to achieve i) Complete utilization of the system resources and ii) Maximize jobs performance [2,3,4,5,6]. In contiguous allocation, the processing elements are constrained to be physically adjacent. But the non-contiguous allocation scheme lifts the restriction of contiguity of processors [4]. The contiguous allocation can achieve only 34% to 66% resource utilization, while the non-contiguous allocation scheme can reach up to 78%. However, the performance of non-contiguous allocation may degrade due to internal and external contention [2], [4]. The internal contention is caused when two communication flows from the same application contend for the same link , whereas external contention is caused due to two communication flow from two different applications contend for the same link [5]. Pastrnak et. al in [10] has shown that by choosing the correct granularity of processing for enhanced parallelism and splitting time-critical tasks, a substantial improvement in processing efficiency can be obtained. Chang et. al in [11] solves the coarse-grain hardware/software partitioning and mapping of communicating processes in a generalized task graph by using an algorithm based on dynamic programming with binning. Murali et. al in [13] propose an off-line
160
A. Bindu and D.K. Jatindra
methodology for mapping multiple use-cases onto NoCs, where each use-case has different communication requirements and traffic patterns. In terms of the on-line management, Nollet et. al in [7], for improving the system performance, task migration is applied by reconfiguring the hardware tiles. This paper features an OS that can manage communication on a packetswitched NoC. Moreira et. al propose an online resource allocation heuristic for multiprocessor SoC which can execute several real-time, streaming media jobs simultaneously. They can achieve resource utilization values up to 95% while handling a large number of job arrivals and departures [14]. Smit et. al in [12] propose a run-time task assignment algorithm on heterogeneous processors; with a constraint that either the task graphs consist of a small number of tasks or a task degree of no more than two. The dynamic task allocation problem on NoC-based platforms is addressed in [1] and [16] considering the future addition of new applications. The user behavior information is used in the resource allocation process in [9] and [17]; which allows system to better respond to real-time changes and adapt dynamically to user needs. If user behavior is taken into account, about 60% communication energy is saved (with negligible energy and run-time overhead) compared to an arbitrary task allocation strategy. The detailed discussion of [9] and [17] work is given in Section 3.
3
Existing Solution
In the existing solution, dynamically task allocation is performed on NoC-based platforms by considering the user behavior [9]. The behavior of any user is defined as a set of consecutive, overlapping events spanning a given period of interaction between the user and the system. Let [t1 , Q, t2 ] characterize an event where an application Q enters and then leaves the system during a specific time period between two arbitrary moments in time t1 and t2 . The three different approaches used by [9] are as follows: – Approach 1: Primary goal is to minimize the internal contention and communication cost; minimize the external contention is always a secondary goal. – Approach 2: Primary goal is to minimize the external contention; minimize the internal contention and communication cost is always a secondary goal. – Hybrid approach: A hybrid method combines Approaches 1 and 2 while taking user behavior into consideration. Here, a master processor keeps track of the user’s behavior for a long sequence of events. In the Hybrid approach, Approach 1 is applied only to the critical applications (having higher communication rate or present for a longer duration in the system); else applications are mapped onto the system using approach 2. Here, the minimal-path routing is used to transmit data. For simplicity, the communication cost (i.e. energy consumption in bits, EQ of the event [t1 , Q, t2 ] is defined as follows:
Dynamic Task Allocation in Networks-on-Chip EQ =
M D(vi , vj ) × r(eij )) × |t2 − t1 |
161 (1)
∀(i,j)∈AppQ
Where M D(vi , vj ) represents the Manhattan Distance between any two vertices, vi and vj , connected to each other in Application Q.
4 4.1
Proposed Solution Platform Description/Architecture
Our NoC is 2-D mesh architecture consisting of processing resources and network elements (Figure.1(a)). The master processor captures the user behavior and acts as a manager which is responsible for the process management. The slave processors are responsible for executing the tasks assigned by the master. The communication infrastructure consists of a Data Network and a Control Network, each containing routers (Ra and Rc ) and channels connected to the processing resources via standard network interfaces (Dni or Cni in Figure.1(b)). In terms of functionality, the Data Network is meant for transmitting data under minimal-path routing scheme and wormhole switching. The Control Network is designed to deliver control messages from the slave processors back to the master processor notifying that the tasks assigned to them are done and they become available. Each slave processor can be considered as an independent sub-system; including a processin! g core (control unit and datapath) and its local memory (see Figure.1(b)). Here, it is assumed that the bandwidth of the links are sufficient enough to carry the load transmitted through them. Application Model. Applications are encoded by the Application characteristics graph ACG. Each ACG = G(V, A) is represented as a directed graph and contains the following: 1. Nodes. Each node vk ∈ V contains a set of tasks obtained from an offline task partitioning process [10], [11]. The tasks belonging to the same node are allocated to the same idle PE. And each task will have worst case execution time WCET in order to meet the application deadline. In our proposed work, with each node vk ∈ V , one more parameter LT(vk ) is associated. LT(vk ) represents time interval between the arrival and departure of the node into the system. And how to calculate LT(vk ) is defined later.
Fig. 1. (a)The logical view of the control algorithm. (b)the on-chip router microarchitecture that handles the control network.(source - [9]).
162
A. Bindu and D.K. Jatindra
Fig. 2. ACG of an application
2. Edges. Each edge ei,j ∈ E represents the inter-node communication from vi to vj , where node vi is any neighbor of node vj . Weights w(ei,j ) characterize the communication volume (bits per time unit) between nodes i and j. When an event occurs (e.g., an application enters the system), our aim is to allocate the tasks belonging to this application to the available resources on the platform such that the internal/external network contention and communication cost can be minimized. Communication Energy Model. The NoC platform follows wormhole switching and minimal-path routing. The communication cost (i.e. energy consumption in bits), EQ of the event [t1 , Q, t2 ] is defined as follows: EQ =
M D(vi , vj ) × r(eij ) × min(LT (vi ), LT (vj ))
(2)
∀(i,j)∈AppQ
The communication energy consumption is modeled using the bit energy metric in [15]. The total communication energy consumption of any application Q per time unit is calculated as follows: PAppQ =
r(eij ) × Ebit (eij )
(3)
(∀eij ∈E)∈AppQat time instantt
where r(eij ) is the communication rate of an arc in App Q(unit: bit per time unit), and Ebit (eij ) stands for the energy consumption to send one bit from those processors where vertices i and j are allocated to (unit: Joule per bit). More precisely, Ebit (eij ) = (M D(eij ) + 1) × ERbit + M D(eij ) × Elink
(4)
The term M D(eij ) is the minimization objective; it represents the Manhattan Distance between the processors where vertices i and j are mapped to. The parameter ERbit stands for the energy consumed in routers, including the crossbar switch and buffers, while Elink represents the energy consumed in one unit link, for one bit of data; these parameters are assumed to be constant. The total communication energy consumed by all events in the system, during time interval t = 0 to T , is denoted by: Ecomm total (T ) =
T N(t)
PApp(i)
(5)
t=0 i=1
where N (t) is the total number of applications active in the system between time t − 1 and t. At each instant of time t, PApp(i) is calculated, taking only the life edges of it.
Dynamic Task Allocation in Networks-on-Chip
163
User Behavior Model. The resource manager predicts the probability of a certain application being critical by recording and analyzing the events sequence for a specific user. When an event occurs, or an incoming application Q enters the system at time T , two ratios are used to determine whether or not this application is critical. – Instantaneous communication rate ratio (α): the ratio between the communication rate of application Q (bits per time unit) and that of all applications occurring from time 0 to T . Application Q is set as critical if it has a higher communication rate than other applications.
α=
r(aIJ )
(∀aIJ ∈A)∈App(Q)
⎛
⎝
∀App(Q)ever occured
⎞
(6)
r(aIJ )⎠
(∀aIJ ∈A)∈App(Q)
– Cumulative communication energy ratio (β): the ratio between the communication energy consumption of application Q active in the system (Δ(t) = 1 if Q is active between t − 1 and t and 0 otherwise) and energy of all events in the system from time 0 to T . For calculating PAppQ , at each instant of time t only the live edges of Q are considered, as when the LT of an node expires, then it’s corresponding edges are also removed. If application Q has a higher event cost, then it is set as critical. ( β=
4.2
T
Δ(t)) × PAppQ
t=1
Ecomm total (T )
(7)
Run-Time Task Allocation Approaches
Two threshold values αth and βth are set to solve the task allocation problem while including the user behavior. These values depend on the application communication rate and cost of events entering the system. If i) the rate ratio of an application, α, is higher than αth or ii) the energy ratio of an application, β, is higher than, βth then Approach 1 is used; otherwise Approach 2 is applied. Since building exact models for user behavior needs a long sequence of events, therefore, Approach 2 is applied in the beginning of the starting session. Next, four related sub-problems are formulated (namely, region forming, region rotation, region selection and application mapping) and efficient algorithms are presented for solving them. Problem Formulation of Run-time Task Allocation Process. Before defining the problem statement, some related terms are defined below: – P L(vi ):Particular location for vertex vi . – M D(P L(vi ) = (xi , yi ), P L(vj )): Manhattan Distance between locations P L(vi ) and P L(vj ) where xi , xj , yi , and yj are the x- and y-coordinates in the mesh system, i.e. M D(P L(vi ) , P L(vj )) = |xi − xj | + |yi − yj |.
164
A. Bindu and D.K. Jatindra
– ED((P L(vi ) = (xi , yi )), P L(vj)): Euclidean Distance between locations P L(vi ) and P L(vj ) i.e. ED(P L(vi ), P L(vj )) = (|xi − xj |2 + |yi − yj |2 )1/2 . – R: a region containing several locations. This region can be contiguous or non-contiguous. – L(R): summation of pair wise Manhattan distances between all locations within R. – LT (vi ): The lifetime of a node is the time interval between the instant of time when the node is allocated a computing resource and moment of time the node leaving the system. At the very first arrival of an application Q, for ∀vi ∈ ACG(Q), LT (vi ) = LT1 (vi )=Worst case execution time taken from off-line task partitioning. At the kth arrival of an application Q, for ∀vi ∈ ACG(Q), LT (vi ) = j=k−1 LTj (vi )) LTk (vi ) = 1/(k − 1)( j=1
For run-time overhead, LT (vi) of a node is newly computed till it’s 10th arrival, after that, for subsequent arrival LT (vi) = LT10 (vi ). Region Forming Sub-problem (P1) Given the ACG of an incoming application and the current system configuration. Find a region R and the corresponding location P L(vi ) inside R, ∀vi ∈ V , in ACG which minimizes the communication cost: ⎧ ⎨ min
⎩
Comm.cost =
r(eij ) × M D(P L(vi ), P L(vj ))
∀eij ∈E
⎫ ⎬ ⎭
(8)
Such that: ∀vi = vj ∈ V, P L(vi ) = P L(vj )
If more than one region satisfies Equation 8, then select the region R with L(R) minimized.i.e. we select the region as convex as possible since this helps reducing external contention. Region Rotation Sub-problem (P2) Given an already formed region R (derived from P1) and the current configuration with region R containing the available resources Find a placement for the region R within R which: min{L(R − R) } Region Selection Sub-problem (P3) Given the number of resources m required by the incoming application and the current configuration with region R containing the available resources Find a region R inside R with the number of locations in R equal to |V | which: min {L(R) + L(R’ - R)} Application Mapping Sub-problem (P4) Given a selected region R (as derived from P3) and the ACG of the incoming application
Dynamic Task Allocation in Networks-on-Chip
Find P L(vi ) inside R, ∀vi ∈ V ,in ACG which: ⎧ ⎨
min
⎩
Comm.cost =
r(eij ) × M D(P L(vi ), P L(vj ))
∀eij ∈E
165
⎫ ⎬ ⎭
(9)
Such that: ∀vi = vj ∈ V, P L(vi ) = P L(vj )
Solving the Region Forming Sub-problem (P1) The region forming procedure is given in Algorithm1. If more than one node is selected for allocation in the proposed solution [9], then in the proposed solution, we will select the node having higher value of life-time for allocation. The same definitions [1], are used for various data-structures used in the algorithm1. The life-time of each vertex vi ∈ V is stored in the variable LT [vi ]. An illustrative example is shown in Figure 3.
input : (1) current system configuration. (2) ACG = (V, E) of the incoming application. output: A region R(P L), and its corresponding mapping P L() for each vertex , i.e. R with the locations plx1 ,y1 = (x1 , y1 )=P L(v1 ), plx2 ,y2 , · · · , plx|v| ,y|v| begin Set col [vi ] ←− W HIT E for each vertex vi ∈ V ; R ←− φ; Choose vi ∈ V such that Adj w[vi ] is maximized ; col[vi ] ←− BLACK P L[vi ]=Centre[R] ←− (0, 0) R = R ∪ P L[vi ]; repeat Update Adj b[vi ] for each vertex vi ; Choose all vi with col[vi ] == W HIT E and Adj b[vi ] is maximized; a. Choose the vi with LT [vi ] is maximized; col[vi ] ← BLACK Choose available location plx,y such that ∀vj ∈ V, (r(evi ,vj ) × MD(plx,y , P L[vj ]))is minimized ; col[vj ]==BLACK
and then ED(Centre[R], plx,y ) is minimized; P L[vi ] ← plx,y ; R = R ∪ plx,y ; Update Centre[R];
until col[vi ] == BLACK
== |V | ;
∀vi ∈V
end
Algorithm 1. Region Forming Algorithm
Solving the Region Rotation Sub-problem (P2) Solution to the sub-problem P2, finds a placement for the region found in the solution of the sub-problem P1 on the current configuration with the objective of fitting the region within the configuration as best as possible. The steps of region rotation algorithm are shown in Algorithm2. Here, too we have used the same definitions for various terms and metrics as given in [9]. One simple example is illustrated in Fig. 4. The resultant solution using proposed technique is Fig.5(b). The benefit obtained using this technique is that vertices v14 and v15 have the lowest LT , so they will leave the system earlier,and can be used by
166
A. Bindu and D.K. Jatindra
Fig. 3. Example showing the region forming algorithm on an ACG
input : (1) Current system configuration, Conf(P ×Q) . (2) A region R(P L), and its corresponding mapping P L() for each vertex ,i.e. R with the locations plx1 ,y1 = (x1 , y1 )=P L(v1 ), plx2 ,y2 , · · · , plx|v| ,y|v| . output: A matching function map for mapping region R to Conf . begin calculate the size Rrec(p×q) where m = max(xi , 1 ≤ i ≤ |V |); calculate grid status(Rrec all ) where (Rrec all ) = Rotation(Rrec ); search the possible available submeshs Rs(p×q) from current system configuration; calculate system status (Rs ); calculate diff=subtract( grid status (Rrec all ) , system status(Rs )); if more than one orientation of Rrec(p×q) results in same lowest diff value and greater than 0 then choose the one such that the PE with lowest life time among the boundary PEs must have max ideal neighbors; end end
Algorithm 2. Region rotation Algorithm
any new incoming application with lower contention, but if Fig.5(a) solution is used, it incurs higher communication cost for an incoming application, resulting in higher external contention. Solving the Region Selection Sub-problem (P3) The main goal of this sub-problem is to select a near convex region of resources to minimize the external contention. The solution of problem P3 is same as [9]. Solving the Application Mapping Sub-problem (P4) Here, the main goal is to map the application tasks in ACG to the resource locations in R, obtained from the solution of P3, such that the communication energy consumption can be minimized. In the proposed solution, if more than one node is eligible for mapping then, we will store them in an array in the ascending order of their life-time and will select the node having highest value of life-time for allocation. If the location found belongs to convex area of the region found
Dynamic Task Allocation in Networks-on-Chip
167
Fig. 4. The subtraction process during the region rotation process
Fig. 5. Region Rotation final solution into the current configuration using proposed and existing algorithms
in the solution for P3. Then that node is mapped onto that location. Otherwise, if all the possible locations for the selected node belongs to non-convex area of the region found in the solution for P3. Then we will repeat the location search for all lower life-time nodes, if they are sharing the locations found for the first node selected, then that node will be finally mapped onto the current system configuration. Fig. 6.i illustrates the process of mapping the application ACG (see Fig. 6.i.(a)) onto a given region. As shown in Fig. 6.i.(a), the first vertex v4 ,which has the largest communication to its neighbors, is allocated to the center of the region. Next, vertices v7 and v8 are the candidate vertices for mapping, as both of them have the highest LT value, say v8 is chosen first. The location N2,0 is the only location fit for v8 . Here N2,0 ∈ N on convex(N ) then, before mapping v8 onto N2,0 , the probable locations (N2,0 ) for the lowest LT value node v5 among the nodes satisfying the criteria for mapping is found out. If the location found out is N2,0 , then first the node v5 is mapped onto the location N2,0 . This step is done to map the lowest LT value node onto the non-convex region of N , so that the cost of the new incoming application can be reduced.
168
A. Bindu and D.K. Jatindra
Fig. 6. < i > Steps of Proposed Application Mapping Algorithm
Fig. 7. < ii > Existing Application Mapping Algorithm
In the existing solution v8 is mapped onto the best possible location N2,0 , and lower LT value nodes onto the convex region of N . So, lower LT value nodes’ locations cannot be utilized efficiently by any other new incoming application.
5
Experimental Results
The Dynamic Task Allocator for Networks-on-chip is implemented using Java6. The existing and proposed methodology is applied to random applications and to real applications, i.e. the embedded system benchmark suite(E3S) [18]: Automobile/Industrial and Telecom. Three categories of random applications are generated with TGFF [19]. Each category contains 12 applications with the number of vertices in the ACG ranging from 8 to 12, the execution time of each node in the range of 1 to 11 clock ticks. For conducting the experiments, the value of α is taken to be 0.8 and that of β as 0.7. The three random categories are generated using the communication volume in the range of 10−110, 100−1100, 1000−11000. The experiments are conducted on a 8 × 8 homogeneous mesh-based NoC. The proposed allocation policy, minimizes the average communication energy consumption compared to existing allocation policy, for all the three approaches (approach1, approach2, hybrid approach) in both the random and embedded system benchmark suits.
Dynamic Task Allocation in Networks-on-Chip
169
Table 1. Energy for Random Category Energy in Joule
Approach1 Approach2 Hybrid
Communication Existing Solution 10-110 100-1100 1000-11000 2083.51 23696.03 267226.9 2793.35 27241.31 283798.5 2103.96 23697.91 267326.9
Volume(in bits) Proposed Solution 10-110 100-1100 1000-11000 1294.65 14085.93 133474.5 1556.20 14445.38 148368 1517.52 14257.04 142162.7
Table 2. Energy for ES Benchmark Suits Energy in Joule Approach1 Approach2 Hybrid
6
Existing Solution Proposed Solution Automobile Telecom Automobile Telecom 100608 71613 59689.8 48830.17 100611.9 71770.95 60443.41 48964.65 100609.5 71692.8 59721.11 48887
Conclusion
In this paper we proposed an alternative solution for dynamic allocation of tasks in NoC by incorporating a new parameter called Life-Time. Due to the inclusion of this new parameter, it enables us to deallocate a node when the Life-Time of the node expires. Due to the deallocation of a node, it minimizes the communication energy consumption of an application. But in case of existing allocation policy, a node belonging to a particular application is not deallocated till the whole application of completed [9]. In our allocation policy we use the information of Life-Time of a task and due to the consideration of the Life-Time information, after each allocation, the nodes having lower value of Life-Time contains more available neighbors, compared to the existing solution [9]. Since the node of lower value of Life-Time generally finishes early and deallocated, so the possibility of having more available neighbours for incoming tasks will be more and possibly get a better placement. From the experiment conducted, we have observed that the proposed solution is giving a better performance then the existing solution. The benefit of which can be realized with the increase in system load, where the proposed algorithm is successful in allocating resources to an application, whereas, the existing allocation policy may fail. This too helps in minimizing the communication energy consumed by the applications.
References 1. Chou, C.L., Marculescu, R.: Incremental Run-time Application Mapping for Homogeneous NoCs with Multiple Voltage Levels. In: CODES+ISSS (September 2007) 2. Chang, C., Mohapatra, P.: Improving performance of mesh connected multicomputers by reducing fragmentation. Journal of Parallel and Distributed Computing 52(1), 40–68 (1998) 3. Bender, M.A., Bunde, D.P., Demaine, E.D., Fekete, S.P., Leung, V.J., Meijer, H., Phillips, C.A.: Communication-Aware Processor Allocation for Supercomputers. In: Dehne, F., L´ opez-Ortiz, A., Sack, J.-R. (eds.) WADS 2005. LNCS, vol. 3608, pp. 169–181. Springer, Heidelberg (2005)
170
A. Bindu and D.K. Jatindra
4. Lo, V., Windisch, K., Liu, W., Nitzberg, B.: Non-contiguous processor allocation algorithms for mesh-connected multicomputers. JIEEE Trans. on Parallel and Distributed Computing 2(7) (1997) 5. Li, K., Cheng, K.H.: A two-dimensional buddy system for dynamic resource allocation in a partitionable mesh connected system. Journal of Parallel and Distributed Computing 12, 79–83 (1991) 6. Windisch, K., Lo, V.: Contiguous and non-contiguous processor allocation algorithms for k-ary n-cubes. In: Proc. Intl. Conference on Parallel Processing, pp. 164–168 (1995) 7. Nollet, V., Marescaux, T., Verkes, D.: Operating-system controlled network on chip. In: Proceedings of 41st Design Automation Conference, pp. 256–259 (June 2004) 8. Benini, L., De Micheli, G.: Networks on chip: a new paradigm for systems on chip design. In: Proc. DATE, pp. 418–419 (March 2002) 9. Chou, C.L., Marculescu, R.: User-aware dynamic task allocation in Networkson-Chip. In: Design, Automation and Test in Europe Conference (DATE) (March 2008) 10. Pastrnak, A.M., de With, P.H.N., Stuijk, S., van Meerbergen, J.: Parallel implementation of arbitrary-shaped MPEG-4 decoder for multiprocessor systems. In: Proc. Visual Commun. Image Process, pp. 732–744 (January 2006) 11. Chang, J.M., Pedram, M.: Codex-dp: Co-design of communicating systems using dynamic programming. IEEE Trans. Computer Aided Design Integr. Circuits Syst. 19(7), 732–744 (2000) 12. Smit, L.T., Smit, G.J.M., Hurink, J.L., Broersma, H., Paulusma, D., Wolkotte, P.T.: Run-time assignment of tasks to multiple heterogeneous processors. In: 5TH PROGRESS Symposium on Embedded Systems, pp. 185–192 (October 2004) 13. Murali, S., Martijn, C., Radulescu, A., Goossens, K., De Micheli, G.: A methodology for mapping multiple use cases onto networks on chips. In: Proc. DATE, Munich, Germany, pp. 118–123 (March 2006) 14. Moreira, O., Mol, J.J.D., Bekooij, M.: Online resource management in a multiprocessor with a network-on-chip. In: Proc. ACM Symp. on Applied Computing, Seoul, Korea, pp. 1557–1564 (March 2007) 15. Ye, T.T., De Micheli, G., Benini, L.: Analysis of power consumption on switch fabrics in network routers. In: Proceedings of the 39th Annual Design Automation Conference, pp. 524–529 (June 2002) 16. Chou, C.L., Ogras, U.Y., Marculescu, R.: Energy and Performance aware Incremental Mapping for Networks-on-Chip with Multiple Voltage Levels. IEEE Trans. on Computer Aided Design 27(10), 1866–1879 (2008) 17. Chou, C.L., Marculescu, R.: Run-Time Task Allocation Considering User Behavior in Embedded Multiprocessor Networks-on-Chip. IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems 29(1), 78–91 (2010) 18. Dick, R.: Embedded system synthesis benchmarks suites (E3S), http://www.ece.northwestern.edu/~ dickrp/e3s 19. Vallerio, K.: Task Graphs for free, TGFF v3.0 (2003), http://ziyang.ece.northwestern.edu/tgff/
Congruence Results of Weak Equivalence for a Graph Rewriting Model of Concurrent Programs with Higher-Order Communication Masaki Murakami Department of Computer Science, Graduate School of Natural Science and Technology, Okayama University 3-1-1, Tsushima-Naka, Kita-ku, Okayama, 700-0082, Japan
[email protected] Abstract. This paper presents congruence results of a weak equivalence on a graph rewriting model of concurrent processes with higher-order communication. A bipartite directed acyclic graph represents a concurrent system that consists of a number of processes and messages in our model. The model presented here makes it possible to represent local names that their scopes are not nested. We show that weak bisimulation equivalence is a congruence relation w.r.t. operations that correspond to τ -prefix, input prefix, new-name, replication, composition and application respectively.
1
Introduction
LHOπ (Local Higher Order π-calculus) [12] is a formal model of concurrent systems with higher-order communication. It is a subcalculus of higher order πcalculus[11] with asynchronous communication capability. The calculus has the expressive power to represent many practically and/or theoretically interesting examples that include program code transfer. On the other hand, as we reported in [5,7,6], it is difficult to represent the scopes of names of communication channels using models based on process algebra. We presented a model that is based on graph rewriting instead of process algebra as a solution for the problem on representation of scopes of names for first-order case [5]. We defined an equivalence relation of processes called “scope equivalence” based on scopes of names. We showed the congruence results of strong bisimulation equivalence [6] and the scope equivalence [8] for first order case. We extended the the model for systems with higher-order communication[7,9]. And we showed that the congruence results w.r.t. strong bisimulation equivalence for higher-order case[10]. This paper presents the congruence results of weak bisimulation equivalence for the extended model with higher-order communication. Congruence results on bisimilarity based on graph rewriting models are reported in [1,13]. Those studies adopts graph transformation approach for proof techniques. In this paper, graph rewriting is introduced to extend the model for the representation of scopes of names. S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 171–180, 2011. c Springer-Verlag Berlin Heidelberg 2011
172
M. Murakami
The model presented here is based on graph rewriting system such as [1,2,3,4,13]. We represent a concurrent program consists of a number of processes (and messages on the way) using a bipartite directed acyclic graph. The intuitive description of the model is presented in [9,10].
2 2.1
Formal Definitions Programs
First, a countably-infinite set of names is presupposed as other formal models based on process algebra. Definition 1 (program). Programs and behaviours are defined recursively as follows. (i) Let a1 , . . . , ak are distinct names. A program is a a bipartite directed acyclic graph with source nodes b1 , . . . , bm and sink nodes a1 , . . . , ak such that – Each source node bi (1 ≤ i ≤ m) is a behaviour. Duplicated occurrences of the same behavior are possible. – Each sink node is a name aj (1 ≤ j ≤ k). All aj ’s are distinct. – Each edge is directed from a source node to a sink node. Namely, an edge is an ordered pair (bi , aj ) of a source node and a name. For any source node bi and a name aj there is at most one edge from bi to ai . For a program P , we denote the multiset of all source nodes of P as src(P ), the set of all sink nodes as snk(P ) and the set of all edges as edge(P ). Note that the empty graph: 0 such that src(0) = snk(0) = edge(0) = ∅ is a program. (ii) A behavior is an application, a message or a node consists of the epidermis and the content defined as follows. In the following of this definition, we assume that any element of snk(P ) nor x does not occur in anywhere else in the program. 1. A node labeled with a tuple of a name: n (called the subject of the message) and an object: o is a message and denoted as no. 2. A tuple of a variable x and a program P is an abstraction and denoted as (x)P . An object is a name or an abstraction. 3. A node labeled with a tuple of an abstraction and an object is an application. We denote an application as Ao where A is an abstraction and o is an object. 4. A node whose epidermis is labeled with “!” and the content is a program P is a replication, and denoted as !P. 5. An input prefix is a node (denoted as a(x).P ) that the epidermis is labeled with a tuple of a name a and a variable x and the content is a program P . 6. A τ -prefix is a node (denoted as τ.P ) that the epidermis is labeled with a silent action τ and the content is a program P .
Congruence Results of Weak Equivalence for a Graph Rewriting Model
173
Definition 2 (locality condition). A program P is local if for any input prefix c(x).Q and abstraction (x)Q occurring in P , x does not occur in the epidermis of any input prefix in Q. An abstraction (x)P is local if P is local. A local object is a local abstraction or a name. Definition 3 (free/bound name) 1. For a behavior or an object p, the set of free names of p : fn(p) is defined as follows: fn(0) = ∅, fn(a) = {a} for a name a, fn(ao) = fn(o) ∪ {a}, fn((x)P ) = fn(P ) \ {x}, fn(!P ) = fn(P ), fn(τ.P ) = fn(P ), fn(a(x).P ) = (fn(P ) \ {x}) ∪ {a} and fn(o1 o2 ) = fn(o1 ) ∪ fn(o2 ). 2. For a program P where src(P ) = {b1 , . . . , bm }, fn(P ) = i fn(bi ) \ snk(P ). The set of bound names of P (denoted as bn(P )) is the set of all names that occur in P but not in fn(P ) (including elements of snk(P ) even if they do not occur in any element of src(P )). Definition 4 (normal program). A program P is normal if for any b ∈ src(P ) and for any n ∈ fn(b) ∩ snk(P ), (b, n) ∈ edge(P ) and any programs occur in b is also normal. It is quite natural to assume the normality for programs. So in this paper, we consider normal programs only. Definition 5 (composition). Let P and Q be programs such that src(P ) ∩ src(Q) = ∅ and fn(P ) ∩ snk(Q) = fn(Q) ∩ snk(P ) = ∅. The composition P Q of P and Q is the program such that src(P Q) = src(P ) ∪ src(Q), snk(P Q) = snk(P ) ∪ snk(Q) and edge(P Q) = edge(P ) ∪ edge(Q). Intuitively, P Q is the parallel composition of P and Q. Note that we do not assume snk(P ) ∩ snk(Q) = ∅. Obviously P Q = Q P and ((P Q) R) = (P (Q R)) for any P, Q and R from the definition. The empty graph 0 is the unit of “ ”. Note that src(P )∪src(Q) and edge(P )∪edge(Q) denote the multiset unions while snk(P ) ∪ snk(Q) denotes the set union. It is easy to show that for normal and local programs P and Q, P Q is normal and local. Definition 6 (n-clusure). For a normal program P and a set of names N such that N ∩bn(P ) = ∅, the N -closure νN (P ) is the program such that src(νN (P )) = src(P ), snk(νN (P )) = snk(P ) ∪ N and edge(νN (P )) = edge(P ) ∪ {(b, n)|b ∈ src(P ), n ∈ N }. We sometimes denote νN1 (· · · (νNi (P )) · · ·) as νN1 · · · νNi P for a program P and sets of names N1 , . . . Ni . Definition 7 (deleteing a behaviour). For a normal program P and b ∈ src(P ), P \ b is a program such that src(P \ b) = src(P )\ {b}, snk(P \ b) = snk(P ) and edge(P \ b) = edge(P ) \ {(b, n)|(b, n) ∈ edge(P )}. Note that src(P )\{b} and edge(P )\{(b, n)|(b, n) ∈ edge(P )} mean the multiset subtractions. We can show the following propositions from the definitions.
174
M. Murakami
Proposition 1. For normal programs P and Q and a set of names N , νN (P Q) = νN P νN Q and νN ν{m|(b, m) ∈ edge(P )}Q = ν{m|(b, m) ∈ edge(νN P )}Q and νN (P \ b) = (νN P ) \ b for b ∈ src(P ). Definition 8 (context). Let P be a program and b ∈ src(P ) where b is an input prefix, a τ -prefix or a replication and the content of b is 0. A simple firstorder context is the graph P [ ] such that the content 0 of b is replaced with a hole “[ ]”. We call a simple context as a τ -context, an input context or a replication context if the hole is the content of a τ -prefix, of an input prefix or of a replication respectively. Let P be a program such that b ∈ src(P ) and b is an application (x)0Q. A simple application context P [ ] is the graph obtained by replacing the behaviour b with (x)[ ]Q. A simple context is a simple first-order context or a simple application context. A context is a simple context or the graph P [Q[ ]] that is obtained by replacing the hole of P [ ] replacing with Q[ ] for a simple context P [ ] and a context Q[ ] (with some renaming of the names occur in Q if necessary). For a context P [ ] and a program Q, P [Q] is the program obtained by replacing the hole in P [ ] by Q (with some renaming of the names occur in Q if necessary). 2.2
Operational Semantics
Definition 9 (substitution). Let p be a behavior, an object or a program and o be an object. For a name a, we assume that a ∈ fn(p). The mapping [o/a] is a substitution if p[o/a] is defined as follows respectively. – For a name c, c[o/a] = o if c = a or c[o/a] = c otherwise. – For behaviours, ((x)P )[o/a] = (x)(P [o/a]), (o1 o2 )[o/a] = o1 [o/a]o2 [o/a], (!P )[o/a] =!(P [o/a]), (c(x).P )[o/a] = c(x).(P [o/a]) and (τ.P )[o/a] = τ.(P [o/a]). – For a program P and a ∈ fn(P ), P [o/a] = P where P is a program such that src(P ) = {b[o/a]|b ∈ src(P )}, snk(P ) = snk(P ) and edge(P ) = {(b[o/a], n)|(b, n) ∈ edge(P )}. For the cases of abstraction and input prefix, note that we can assume x =
a because a ∈ fn((x)P ) or a ∈ fn(c(x).P ) without losing generality. (We can rename x if necessary). Definition 10 (acceptable substitution). Let p be a local program or a local object. A substitution [a/x] is acceptable for p if for any input prefix c(y).Q occurring in p, x = c. In any execution of local programs, if a substitution is applied by one of the rules of operational semantics then it is acceptable. Thus in the rest of this paper, we consider acceptable substitution only for a program, a context or an abstraction. Namely we assume that [o/a] is applied only for the objects such that a does not occur as a subject of input prefix. This is the reason why (c(x).P )[o/a] = c(x).(P [o/a]) but not (c(x).P )[o/a] = c[o/a](x).(P [o/a]) in Definition 9.
Congruence Results of Weak Equivalence for a Graph Rewriting Model
175
Proposition 2. Let P and Q are normal programs and o be an object. If M is a set of names such that M ⊂ fn(o) and x ∈ (bn(P ) ∪ bn(Q)), then νM (P Q)[o/x] = (νM P [o/x]) (νM Q[o/x]) and (P \ b)[o/x] = P [o/x] \ b[o/x]. Definition 11 (action). For a name a and an object o, an input action is a tuple a(o) and an output action is a tuple ao. An action is a silent action τ , an output action or an input action. α
Definition 12 (labeled transition). For an action α, → is the least binary relation on normal programs that satisfies the following rules. input: If b ∈ src(P ) and b = a(x).Q, then a(o)
P → (P \ b) ν{n|(b, n) ∈ edge(P )} νM Q[o/x] for an object o and a set of names M such that fn(o) ∩ snk(P ) ⊂ M ⊂ fn(o) \ fn(P ). β-conversion: If b ∈ src(P ) and b = (y)Qo, then τ P → (P \ b) ν{n|(b, n) ∈ edge(P )}Q[o/y]. τ τ -action: If b ∈ src(P ) and b = τ.Q, then P → (P \ b) ν{n|(b, n) ∈ edge(P )}Q. α α replication 1: P → P if !Q = b ∈ src(P ), and P ν{n|(b, n) ∈ edge(P )}Q → P , where Q is a program obtained from Q by renaming all names in snk(R) to distinct fresh names that do not occur in anywhere else, for all R’s where each R is a program that occur in Q (including Q itself ). τ replication 2: P → P if !Q = b ∈ src(P ) and P ν{n|(b, n) ∈ τ edge(P )}(Q1 Q2 ) → P , where each Qi (i = 1, 2) is a program obtained from Q by renaming all names in snk(R) to distinct fresh names that do not occur in anywhere else, for all R’s where each R is a program that occur in Q (including Q itself ). av
output: If b ∈ src(P ), b = av then, P → P \ b. communication: If b1 , b2 ∈ src(P ), b1 = ao, b2 = a(x).Q then, τ P → ((P \ b1 ) \ b2 ) ν{n|(b2 , n) ∈ edge(P )} ν(fn(o) ∩ snk(P ))Q[o/x]. α
We can show that for any program P, P and any action α such that P → P , if P is local then P is local and if P is normal then P is normal. The following propositions are straightforward from the definitions. Proposition 3. For any normal programs P, P and Q, and any action α if α α P → P then P Q → P Q. α
Proposition 4. For any program P, Q and R and any action α, if P Q → R is derived by one of input, β-conversion, τ -action or output immediately, α α then R = P Q for some P → P or R = P Q for some Q → Q . Definition 13 (substitution for actions). For a substitution [o/x], τ [o/x] = τ , is a(n)[o/x] = a(n[o/x]) and an[o/x] = a[o/x]n[o/x]. Note that a(n)[o/x] = a[o/x](n[o/x]) as we consider local programs.
176
M. Murakami
Definition 14 (strong bisimulation equivalence). A binary relation R on normal programs is a strong bisimulation if for any P and Q such that (P, Q) ∈ α R (or (Q, P ) ∈ R), for any α and P if P → P then there exists Q such α that Q → Q and (P , Q ) ∈ R ((Q , P ) ∈ R respectively). Strong bisimulation equivalence ∼ is defined as follows: P ∼ Q iff (P, Q) ∈ R for some strong bisimulation R. Definition 15 (strong bisimulation up to ∼). A binary relation R on normal programs is a strong bisimulation up to ∼ if for any P and Q such that α (P, Q) ∈ R (or (Q, P ) ∈ R), for any α and P if P → P then there exists Q α such that Q → Q and (P , Q ) ∈ ∼ R ∼ ((Q , P ) ∈ ∼ R ∼ respectively). Proposition 5. If R is a strong bisimulation up to ∼, then ∼ R ∼ is a strong bisimulation. α ˆ τ τ α τ τ We denote Q ⇒ Q if and only if Q → · · · →→→ · · · → Q or α = τ and Q=Q. Definition 16 (weak bisimulation equivalence). A binary relation R on normal programs is a weak bisimulation if: for any P and Q such that (P, Q) ∈ R α α ˆ (or (Q, P ) ∈ R), for any α and P if P → P then there exists Q such that Q ⇒ Q and (P , Q ) ∈ R ((Q , P ) ∈ R respectively). Weak bisimulation equivalence ≈ is defined as follows: P ≈ Q iff (P, Q) ∈ R for some weak bisimulation R. We have the following proposition from the definitions. The outline of the proof of the following proposition is similar to that of Exercise 2.4.64 in [12]. Proposition 6. Suppose R is such that for any P, Q such that (P, Q) ∈ R (or (Q, P ) ∈ R), α
α
1. if P → P where α = τ , then Q ⇒ Q and P ≈ R ≈ Q (Q ≈ R ≈ P respectively) and τ
τˆ
2. if P → P then Q ⇒ Q and P ∼ R ≈ Q (Q ≈ R ∼ P respectively). Then if (P, Q) ∈ R then P ≈ Q.
3
Congruence Results
Proposition 7. If src(P1 ) = src(P2 ) then P1 ∼ P2 . Proof (outline). We can show that {(P1 , P2 )|src(P1 ) = src(P2 )}) is a strong bisimulation from the definitions. Proposition 8. Let P be a normal program, α be an action, [o/x] be a substitution that is acceptable to P , (fn(o) ∪ bn(o)) ∩ bn(P ) = ∅ and x ∈ fn(P ) that does not occur elsewhere and M be a set of names such that M ⊂ fn(o). If α νM P [o/x] → P , then one of the followings holds.
Congruence Results of Weak Equivalence for a Graph Rewriting Model
177
α
1. There exists P ” such that P → P ”, α [o/x] = α and P ”[o/x] ∼ P , xb
2. there exists P ” such that P → P ”, α is a silent action, o = (y)R and P ∼ (P ”[(y)R/x] R[b/y]) or anb(m)
3. α is a silent action and there exists P ” such that P → → P ”, a[o/x] = b, n[o/x] = m[o/x] and P ”[o/x] ∼ P . Proof (outline). By the induction on the number of replication 1/2 rules to α derive νM P [o/x] → P and Proposition 1, 2 and 7. Note that a[o/x] = b because we assume b = x as [o/x] is acceptable for P . Proposition 9. Let P be a normal program, α be an action, o be an object such that (fn(o) ∪ bn(o)) ∩ bn(P ) = ∅ and M be a set of names such that M ⊂ fn(o). α If P → P and x is a free name of P that does not occur in anywhere else then α[o/x]
P [o/x] → ∼ P [o/x] if α[o/x] is an action. Furthermore if α is an output action xn and o is an abstraction (y)R then τ νM P [o/x] →∼ νM (P [o/x] R[n/y]). Proof (outiline). By the induction on the number of replication 1/2 rules to α derive P → P and Proposition 1, 2 and 7. Proposition 10. Let P be a normal program, α be an output action, β be an input action, [o/x] be a substitution such that α[o/x] = an, β[o/x] = a(n) and (fn(o) ∪ bn(o)) ∩ bn(P ) = ∅ and M be a set of names such that M ⊂ fn(o). If α β
τ
P →→ P , then νM P [o/x] →∼ P [o/x]. Proof (outline). By the induction on the number of replication 1/2 rules to α β
derive P →→ P and Proposition 1∼3 and 7. an
a(n)
Proposition 11. For any program P and Q, if P → P and Q → Q then τ P Q →∼ P Q . Proof (outiline). By the induction on the number of replication 1/2 rules to an
a(n)
derive P → P and Q → Q and Proposition 1. Proposition 12. If b ∈ src(P ) and !Q = b then, P ν{n|(b, n) ∈ edge(P )}Q ∼ P where Q is a program obtained from Q by renaming names in bn(Q) to fresh names. Proof (outiline). We have the result by showing the following relation: {(P ν{n|(b, n) ∈ edge(P )}Q , P )|!Q ∈ src(P ), Q is obtained from Q by fresh renaming of bn(Q).}∪ ∼ is a strong bisimulation up to ∼. Now we have the congruence result w.r.t. “ ”.
178
M. Murakami
Proposition 13. For any program R, if P ≈ Q then P R ≈ Q R. Proof (outiline). From Proposition 6, it is enough to show that R = {(P R, Q R)|P ≈ Q} satisfies the following 1 and 2 for any P, Q such that (P R, Q R) ∈ R or (Q R, P R) ∈ R : α
α
1. if P R → P where α = τ , then Q R ⇒ Q and P ≈ R ≈ Q and τ τˆ 2. if P R → P then Q R ⇒ Q and P ∼ R ≈ Q . α
1. Induction on the number of replication 1/2 rules to derive P R → P . α If P R → P is derived with input or output immediately, it is easy to show from Proposition 4 and 3. The induction case is from Proposition 12 and the inductive hypothesis. 2. It is also by the induction on the number of replication 1/2 to derive τ P R → P with Proposition 1 ∼ 4 and 11 for the base case and the induction case is from 12. Proposition 14. For any P and Q such that P ≈ Q and for any τ -context R[ ], R[P ] ≈ R[Q]. Proof (outline). By showing {(R[P ], R[Q])|P ≈ Q, R[ ] is a τ -context.}∪ ≈ is a weak bisimulation using Proposition 3, 7 and 13. Proposition 15. For any P and Q such that P ≈ Q and for any replication context R[ ], R[P ] ≈ R[Q]. Proof (outline). From Proposition 12 and 13, we can show that: {(R[P ], R[Q])|P ≈ Q, R[ ] is a replication context.}∪ ≈ is a weak bisimulation. Proposition 16. For any P and Q such that P ≈ Q, an object o such that fn(o) ∩ (snk(P ) ∪ snk(Q)) = ∅ and a set of names M ⊂ fn(o), νM P [o/x] ≈ νM Q[o/x]. Proof (outline). From Proposition 6, it is enough to show that R = {(νM P [o/x], νM Q[o/x])|P ≈ Q, fn(o) ∩ (snk(P ) ∪ snk(Q)) = ∅} satisfies the satisfies the following 1 and 2 for any P, Q such that (νM P [o/x], νM Q[o/x]) ∈ R or (νM Q[o/x], νM P [o/x]) ∈ R : α
α
1. if νM P [o/x] → P where α = τ , then νM Q[o/x] ⇒ Q and P ≈ R ≈ Q and τ τˆ 2. if νM P [o/x] → P then νM Q[o/x] ⇒ Q and P ∼ R ≈ Q . 1. As α = τ , it is enough to consider the case of Proposition 8, 1. We have the result from the former part of Proposition 9 and ∼⊂≈. 2. If it is the case of Proposition 8, 1, then we can show the result from the former part of Proposition 9. If 8, 2. holds, we have the result from the latter part of Proposition 9, 3, the former part of 9 and 13. If 8, 3. holds, we can show the result from Proposition 10.
Congruence Results of Weak Equivalence for a Graph Rewriting Model
179
Proposition 17. For any P and Q such that P ≈ Q and for any input context R[ ], R[P ] ≈ R[Q]. Proof (outline). From Proposition 6, it is enough to show that R = {(R[P ], R[Q])|P ≈ Q, R[ ] is an input context.}∪ ≈ satisfies the following 1 and 2 for any P, Q such that (R[P ], R[Q]) ∈ R or (R[Q], R[P ]) ∈ R : α
α
1. if R[P ] → P where α = τ , then R[Q] ⇒ Q and P ≈ R ≈ Q and τ τˆ 2. if R[P ] → P then R[Q] ⇒ Q and P ∼ R ≈ Q . 1. is from Proposition 16 and 13. 2. is by the induction on the number of replication 1/2 with Proposition 16 and 13. Proposition 18. For any P and Q such that P ≈ Q and for any simple application context R[ ], R[P ] ≈ R[Q]. Proof (outiline). We can show that {(R[P ], R[Q])|P ≈ Q}∪ ≈ is a weak bisimulation from Proposition 4, 3, 13 and 16. From Proposition 14, 15, 17 and 18, we have the following result by the induction on the definition of context. Theorem 1. For any P and Q such that P ≈ Q and for any context R[ ], R[P ] ≈ R[Q]. We considered the context consists of composition, prefix, replication and application for congruence results. In asynchronous π-calculus, the congruence results w.r.t. name restriction “P ≈ Q implies νxP ≈ νxQ” is also reported. We can show the corresponding result with the similar argument as the first order case [6]. Proposition 19. For any P and Q such that P ≈ Q and for any set of names M such that M ∩ (bn(P ) ∪ bn(Q)) = ∅, νM (P ) ≈ νM (Q).
4
Conclusions and Discussions
This paper presented congruence results of weak bisimulation equivalence w.r.t. composition, τ -context, input context, replication context and application context for the graph rewriting model of concurrent systems. We did not mention the cases that a hole occurs in the object part of a message or an application because weak bisimulation equivalence is not congruent for these cases. For example, for a context R1 [ ] that has just one behaviour node which is a message node with a hole m[ ], it is obvious that R1 [P ] ≈ R1 [Q] for mP
mP
P and Q such that P ≈ Q but P = Q because R1 [P ] → but R1 [Q] ⇒ . For the case of application, consider the context R2 [ ] that has just one behaviour (x)(mx)[ ]. Then R2 [P ] ≈ R2 [Q] again for P and Q such that P ≈ Q but τ τ P = Q because R2 [P ] → R1 [P ] but R2 [Q] → R1 [Q] is the unique transition for R2 [Q].
180
M. Murakami
References 1. Ehrig, H., K¨ onig, B.: Deriving Bisimulation Congruences in the DPO Approach to Graph Rewriting with Borrowed Contexts. Mathematical Structures in Computer Science 16(6), 1133–1163 (2006) 2. Gadducci, F.: Term Graph rewriting for the π-calculus. In: Ohori, A. (ed.) APLAS 2003. LNCS, vol. 2895, pp. 37–54. Springer, Heidelberg (2003) 3. K¨ onig, B.: A Graph Rewriting Semantics for the Polyadic π-Calculus. In: Proc. of GT-VMT 2000, Workshop on Graph Transformation and Visual Modeling Techniques, pp. 451–458 (2000) 4. Milner, R.: The Space and Motion of Communicating Systems. Cambridge University Press, Cambridge (2009) 5. Murakami, M.: A Formal Model of Concurrent Systems Based on Bipartite Directed Acyclic Graph. Science of Computer Programming 61, 38–47 (2006) 6. Murakami, M.: Congruence Results of Behavioral Equivalence for A Graph Rewriting Model of Concurrent Programs. In: Proc of ICITA 2008, pp. 636–641 (2008) 7. Murakami, M.: A Graph Rewriting Model of Concurrent Programs with HigherOrder Communication. In: Proc. of TMFCS 2008, pp.80–87 (2008) 8. Murakami, M.: Congruence Results of Scope Equivalence for a Graph Rewriting Model of Concurrent Programs. In: Fitzgerald, J.S., Haxthausen, A.E., Yenigun, H. (eds.) ICTAC 2008. LNCS, vol. 5160, pp. 243–257. Springer, Heidelberg (2008) 9. Murakami, M.: On Congruence Property of Scope Equivalence for Concurrent Programs with Higher-Order Communication. In: Proc of CPA 2009, pp. 49–66. IOS Press, Amsterdam (2009) 10. Murakami, M.: Congruence Results of Behavioral Equivalence for A Graph Rewriting Model of Concurrent Programs with Higher-Order Communication. Journal of Networking Technology 1(3), 106–112 (2010) 11. Sangiorgi, D., Walker, D.: The π-calculus, A Theory of Mobile Processes. Cambridge University Press, Cambridge (2001) 12. Sangiorgi, D.: Asynchronous Process Calculi: The First- and Higher-order Paradigms. Theoretical Computer Science 253, 311–350 (2001) 13. Sassone, V., Soboci´ nski, P.: Reactive systems over cospans. In: Proc. of LICS 2005, pp. 311–320. IEEE, Los Alamitos (2005)
Performance Study of Fluid Content Distribution Model for Peer-to-Peer Overlay Networks Salah Noori Saleh, Maryam Feily, Sureswaran Ramadass, and Ayman Hannan National Advanced IPv6 Centre (NAv6) Universiti Sains Malaysia (USM), 11800 Penang, Malaysia {salah,maryam,sures,ayman}@nav6.usm.my
Abstract. Recently overlay networks are used to serve high-concurrency applications ranging from live streaming to reliable delivery of popular content. Comparing to traditional communication mechanism overlay networks offer an enhanced alternative for content delivery in terms of flexibility, scalability, and ease of deployment. Content distribution process in overlay networks is facilitated by leveraging the uploading capacity of the receiving nodes. Content distribution in overlay networks is generally based on Chunk and Fluid model. Fluid model provides continuous transfer of the content from the source to multiple receivers. However, deploying Fluid model in heterogeneous peer-topeer overlay networks requires special consideration due to the incorporation of tightly coupled connections between adjacent peers. The aim of this paper is to study the performance of different Fluid content distribution models for peer-topeer overlay networks. In this paper, investigates three different classes of Fluid content distribution models including: Fluid model with scheduling, backpressure and encoding. Moreover, the performance of Fluid model with backpressure, and encoding, have been evaluated and compared based on download time as a critical performance metric in peer-to-peer overlay networks. The performance tests have been carried out by real implementation tests over “PlanetLab". Keywords: Peer-to-Peer, Overlay Networks, Content Distribution.
1 Introduction Over the past decades, users have witnessed the enormous expansion and development of the Internet, which has resulted in enormous growth in network traffic accordingly. Specifically, content delivery and management have become more challenging due to the evolving nature of the Internet. Recently, overlay networks have become popular alternative to serve high-concurrency applications ranging from live streaming to reliable delivery of popular content and file sharing. An overlay network is a virtual network built on top of an existing physical network. It comprise of nodes and logical links connecting the overlay nodes in several routing domains. The main purpose of building overlay networks is to implement extra network S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 181–190, 2011. © Springer-Verlag Berlin Heidelberg 2011
182
S.N. Saleh et al.
services that are not provided by existing physical networks. Content distribution is considered as a component in all overlay networks. Comparing to traditional communication mechanism overlay networks offer an enhanced alternative for content delivery in terms of flexibility, scalability, and ease of deployment. Content distribution in overlay networks is facilitated by leveraging the uploading capacity of the receiving nodes. Specifically, once a node has received any portion of the content, it can redistribute that portion to any of the other receiving nodes [1]. Content distribution in overlay networks is generally based on two models: the Chunk model and the Fluid model. The Fluid model provides continuous transfer of the content from the source to multiple receivers. However, deploying Fluid model in heterogeneous peer-to-peer overlay networks requires special consideration due to the incorporation of tightly coupled connections between adjacent peers. Despite the Fluid content distribution model, in the Chunk model all connections among the peers are completely loosely coupled. The aim of this paper is to study the performance of different Fluid content distribution models for heterogeneous peer-to-peer overlay networks. The remaining of this paper is organized as follows: First, as a preface to our research, Fluid content distribution model will be introduced briefly in Section 2.Then, in Section 3 Fluid content distribution models are classified into three classes including: Fluid model with scheduling, backpressure, and encoding. In addition, the problem with each class will be discussed in this Section. Moreover, the performance of Fluid content distribution model with backpressure and encoding will be evaluated and compared in Section 4. In this Section, the performance test and experimental setup over “PlanetLab” will be described in details. Furthermore, the performance results of Fluid content distribution model with backpressure, and encoding will be compared with Chunk content distribution model based on download time as a critical performance metric in peer-to-peer overlay networks. Finally, the paper will be concluded in Section 5.
2 Fluid Content Distribution Model This Section provides a brief overview of the Fluid content distribution model with primary focus on the delay in this model. A Fluid model system denotes a stream distribution approach where the streaming information are organized in multiple substreams, and continuously delivered across one or more overlay network paths. However, since the streaming information is delivered in the form of IP packets in the physical IP network, the continuous delivery is just an abstraction. Yet, the small size of IP packets yields marginal transmission times at each node. As such, propagation delay and path delay (that is due to queuing in the underlying network path) are considered as predominant delay in overlay networks. Therefore, the delay in Fluid model systems ultimately depends on the delay characterizing a path between the source node and a generic end node. However, deploying Fluid content distribution model in heterogeneous peer-to-peer overlay networks requires special consideration due to the incorporation of tightly coupled connections between adjacent peers. In fact, the tightly coupled connections will significantly degrade the performance of all peers in a heterogeneous peer-to-peer overlay network where participating peers have different download time and bandwidth.
Performance Study of Fluid Content Distribution Model
183
Initially, ALM protocol with Fluid content distribution model proposed the use of a tree topology to distribute streams among peers, where all peers are arranged into a tree rooted at the source. The content is streamed down from the root of the tree (source) to every peer along the tree edges using the push-based approach. Although this topology is simple and achieves low delay, it cannot accommodate network dynamics and asymmetric bandwidth in heterogeneous peer-to-peer overlay networks. In fact, in a single tree topology, failure of a node can seriously affect the streaming quality of all its descendant nodes due to tree re-construction. Moreover, it is not possible to guarantee the streaming rate as it is limited by the least uplink bandwidth of a node in the tree. Furthermore, the leaf nodes in the tree do not participate in the distribution process [2, 3]. Nevertheless, single tree performance can be significantly improved by using a parallel tree (multi-tree) topology, which organizes the peers in k different trees such that each peer is an interior peer in at most one tree and a leaf peer in the remaining 1 trees. The content is then striped into k stripes, where each stripe is distributed into a different tree. Nevertheless, this approach has also important limitations. First of all, maintaining multiple tree-shaped overlays with desired properties could be very challenging if churn occurs. Second, the minimum throughput among the upstream connections limits the rate of content delivery to each peer through individual trees. The minimum throughput among the upstream connections can be even smaller than the bandwidth of a single sub-stream. And finally the peers cannot share the content more than once [4]. In order to overcome the Fluid content distribution model limitations several approaches have been proposed in academia. These approaches will be discussed further in Section 3.
3 Classification of Fluid Content Distribution Models Several approaches have been proposed to overcome the Fluid model limitations. The proposed solutions employ different strategies such as scheduling, backpressure, and encoding to enhance the performance of Fluid content distribution model in heterogeneous environments. In this Section, the enhancements of Fluid content distribution model will be classified into three classes including: Fluid model with scheduling, backpressure, and encoding. In addition, the problem with each class will be discussed. 3.1 Fluid Model with Scheduling The simplest Fluid content distribution model comprises a set of independent direct unicast connections from the data source to all other homogeneous participating nodes without any scheduling. Using this model, all receiving nodes will complete downloads at the same time, which is the optimal file distribution time. However, in a real heterogeneous overlay network, the minimum time needed for a receiving node to complete the download is related to its download capacity. The smaller the download capacity, the longer time it takes to complete the download. Obviously, a node with extremely limited download capacity will affect the file distribution time. Therefore, this model should be enhanced to reduce the download time.
184
S.N. Saleh et al.
Scheduling is a common method to reduce the file distribution time. There are few studies to employ scheduling to improve the performance of Fluid content distribution model [5, 6, 7, 8]. In fact, scheduling can be used to prevent slow peers from affecting the download speed of fast peers. Nevertheless, this solution is not practical due to the lack of knowledge of the underlying topology and its network metrics, such as delay and bandwidth. In other words, knowledge of the underlying topology and its network metrics is required to build an optimal overlay. Therefore, application layer multicast (ALM) protocols need to use end-to-end measurement techniques to infer such information. However, it is impractical to gather the metrics for all node-pairs in an nnode overlay as it requires measurements. Consequently, ALM protocols should work with limited topology information. In addition, limited access bandwidth at end systems restricts the branching degree of a delivery tree. Unfortunately, most degree-constrained spanning trees optimization problems (e.g., minimum cost or delay) have been proven to be NP-hard problem [9,10,11]. On the other hand, although scheduling reduces the file distribution time, the enhancement of Fluid content distribution model based on scheduling does not have direct affect on the content distribution behavior itself. 3.2 Fluid Model with Backpressure One of the limitations of Fluid content distribution model is the reliability issue in heterogeneous peer-to-peer overlay networks. For instance, consider a highbandwidth upstream TCP flow relaying content through an end system to a lowbandwidth downstream TCP flow. During the transmission process, there might be packets delivered by the upstream flow, which are not sent to the downstream flow yet. Therefore, the intermediate end system is forced to buffer these packets. The growing number of such packets leads to buffer overflow as the bulky set of incoming packets will soon exceed the finite application level buffers available for relaying data at the intermediate end system. One solution to overcome this limitation is to use push-back flow control to limit the upstream link coming from the sender [12, 13, 14]. The basic operation of this approach is to dequeue the packet from the incoming buffer only after it has been relayed in all of the outgoing buffers. In addition, coupling the flow control and congestion control avoids any buffer overflow due to different speed of downstream links. Once the arriving packet is copied into all outgoing buffers, the intermediate host sends back an acknowledgment to its parent. In case there is insufficient space on any outgoing buffer, the host stalls. Consequently, a queue will be built-up at the incoming buffer and the advertised window will be decreased subsequently. The effects of this decreased advertised window will ultimately propagate all the way back to the source. This is known as single-rate congestion control, where all nodes in the tree send packets to downstream links at approximately the speed of the slowest link. However, this backpressure or single-rate scheme has known limitations in presence of a large number of groups in that a single slow receiver can drag down the data transfer rate for the whole group. It is confirmed in [15] that there are serious performance and scalability problems with source-based single-rate end-to-end multicast congestion control, even in homogeneous environments. In general, for a backpressure system to become adaptive to the bandwidth variation, and to keep a
Performance Study of Fluid Content Distribution Model
185
maximum throughput, the system should regularly recalculate the available links capacity using real-time traffic measures in order to avoid temporary localized congestion. This will add extra burden to the overlay system in terms of calculation and network traffic. 3.3 Fluid Model with Encoding As mentioned earlier, incorporation of tightly coupled connections between adjacent peers in Fluid content distribution model significantly degrades the performance of all peers in a heterogeneous peer-to-peer overlay network where participating peers have different download time and bandwidth. For instance, when a high-bandwidth upstream TCP flow relays content through an end system to peers with low and highbandwidth downstream TCP flows, the intermediate end system is forced to slow down and accommodate the rate of the slowest downstream bandwidth. Consequently, the whole distribution process will slow down. One solution to solve this problem is to skip slow nodes and continue forwarding data to the fast nodes. However, this will cause reliability issues with the slow nodes. In order to overcome this issue, network encoding can be used, exactly the same way it is used with IP multicast to solve the reliability issue, whereby the content is first encoded, and then transmitted to clients. In addition, this approach accommodates asynchronous client arrivals to provide resilience to packet loss. Moreover, this approach enables heterogeneous client transfer rates in layered multicast. Under this mechanism, slow peers can recover missing blocks after they receive sufficient blocks. Proposals in [16, 17, 18] employed network encoding to solve the reliability issue of Fluid content distribution model in heterogeneous networks. However, network encoding also has weaknesses due to the overhead associated with encoding and decoding operations. For instance, a peer may need to spend a huge amount of time on decoding the data it receives. Moreover, coding will add extra information such as XOR operation symbols to the original data packets, which results in redundant information and reception overhead.
4 Performance Study of Fluid Model File transfer time and scalability are the defining attributes for a reliable content distribution system. Faster transfer times and a more graceful degradation under heavier loads equate to better performance. In addition, file size, bandwidth heterogeneity (loss rate), and number of peers are three important test parameters for any content distribution system. The performance of Fluid content distribution with backpressure and encoding have been evaluated and compared with the performance of Chunk content distribution model based on download time as a critical performance metric in peer-to-peer overlay networks. In fact, among three classes of enhancements, Fluid content distribution model with backpressure and encoding will be examined as these enhancements directly affect the behavior of Fluid content distribution model and enable content distribution in heterogeneous environments. The performance of Fluid content distribution model with scheduling is not evaluated in this study in that scheduling does not really change the content distribution behavior itself. Here is a brief description of the three content distribution models examined in this research.
186
S.N. Saleh et al.
•
• •
Chunk Model: Data is broken into many chunks that can be downloaded independently. These chunks are then redistributed by the receiving peers. A peer must complete the reception of a chunk prior to forward it to other peers. Fluid Model with Backpressure: Data is kept intact, and a continuous flow is used to distribute the data between peers. The data transfer rate of all peers is dragged down to the rate of the slowest peer. Fluid Model with Encoding: The encoded data is broken into several blocks. A receiving peer forwards any received bits only to downstream peers that can accommodate the downloading rate. Any 1 ψ C encoded blocks that are received correctly, will be sufficient to reconstruct the original C blocks. However, there will be a typical encoding overhead ψ that adds ψ · C redundant blocks to reconstruct the original C blocks.
The performance of each content distribution model has been examined by real implementation and through series of carefully designed experiments over PlanetLab [19]. In this Section, the experimental setup over the PlanetLab environment is described, and a set of representative results is presented. 4.1 Experimental Setup In order to examine the performance of enhanced Fluid content distribution model with backpressure and encoding in a heterogeneous overlay network, and to compare the efficiency of this model with Chunk content distribution model, all three models were deployed on the PlanetLab wide-area test-bed using 80 nodes located in 28 sites that are geographically distributed around the world. The sites are selected in a way that they represent a heterogeneous environment in terms of bandwidth assigned to each node. Some of the sites offer high throughput similar to typical corporate and university networks, whereas others provide much lower throughput similar to typical home and dial-up users. In other words, the performance of all three content distribution models has been examined under different network conditions and bandwidth. Moreover, all experiments were performed on an uncompleted/asymmetric tree topology where each peer in the overlay is directly connected to two or three other peers (out-degree = {2, 3}), except leaf peers. This uncompleted/asymmetric tree topology almost reflects actual Internet topologies. It allows maximum flexibility to create a heterogeneous environment with different bandwidth and different rates between any two peers. In this tree topology the branches become thicker near to the root. The idea of emulating Fat-Trees in overlays [20] is partially followed to overcome the “root bottleneck” in regular trees. The connections among peers were chosen randomly, but we have tried to keep fast peers at the root of the tree. Otherwise, the slow peers located at the root of distribution tree will drag down the speed of distribution into whole tree. Moreover, we have tried to keep the number of tree levels high enough as this will give us clear results especially for the Chunk model, where the chunk size is 256 Kbytes. In addition, the Fluid model is also configured to use a 256 Kbytes block size that is equal to the size of chunks in Chunk model. However, instead of performing file
Performance Study of Fluid Content Distribution Model
187
encoding and decoding in testing Fluid model with encoding, downloads were marked as successful when required number of distinct file blocks were successfully received. In addition, the fixed 3% overhead was included to emulate the overhead that an actual encoding scheme would incur according to [17, 21]. Furthermore, two different files with sizes 50Mbytes and 300Mbytes were used in these experiments, and the total download time was measured. Overall performance result of experiment will be presented in the following part of this Section. 4.2 Overall Performance In order to examine the performance of the aforementioned content distribution models, identical experiments were conducted over PlanetLab for three rounds, and the average of results was calculated. These experiments were useful to precisely quantify the speed of Fluid content distribution model with backpressure and encoding in heterogeneous overlay network, and to compare the performance of these models with Chunk model in a predictable environment. According to the experiments over PlanetLab, some peers may slow down significantly due to congestion. This is very highlight with Fluid model with encoding since all the peers share the same queue. Consequently, the slowest peer blocks other peers and causes congestion. Representative results of the experiments with a relatively small-size file (50Mbytes), and a large-size file (300Mbytes) are presented in Fig. 1(a) and Fig. 1(b) respectively. The results in Fig. 1(a) show that in the beginning of downloading a small-size file, peers in Chunk and Fluid model with encoding have almost the same download time that is far less than the peer download time in Fluid model with backpressure. However, since Fluid model with encoding add some extra overhead, the download time is a bit higher for the first peer comparing to the Chunk model. Moreover, for the range of 30-40 peers in all models peer download times are almost constant. This usually happens when all the peers are located in the same physical network, where the packet transition delay is very small. According to the results in Fig. 1(a), for small content distribution Fluid model with encoding outperforms the Chunk model. In fact, peers in Chunk model suffer from the large delay in chunk transition from one peer to another (overlay hop-count). Especially the high tree level in this topology results in high overlay hop-count. As it is shown in Fig. 1(b) for Chunk model and Fluid model with backpressure, the results of experiment with large-size file (300Mbytes) are almost similar to the results of experiment with small-size file (50Mbytes). In other words, content size does not have noticeable impact on content distribution behavior in Chunk model and Fluid model with backpressure. In fact, the delay encountered in Chunk model is due to chunk transition from one level to another which remains the same whether we are using small or large file. Also, in Fluid model with backpressure, the single rate scheme causes the delay which is not related to content size either. On the other hand, the download time of Fluid model with encoding increases for 48 peers and above. The increase of download time in Fluid model with encoding is due to the extra encoding overhead added to the file. In fact, the encoding overhead amplifies with larger file size and larger number of peers. Thus, Chunk content distribution model is more scalable than Fluid model with encoding for large content distribution to large number of peers.
188
S.N. Saleh et al.
Fig. 1. Performance test results: (a) File size = 50Mbytes, (b) File size = 300Mbytes
5 Performance Study of Fluid Model Content distribution is considered as a component in all overlay networks. Comparing to traditional communication mechanism overlay networks offer an enhanced alternative for content delivery in terms of flexibility, scalability, and ease of deployment. There are two general models for content distribution in overlay networks: the Chunk model and the Fluid model. In Chunk content distribution model all connections between peers are completely loosely coupled, whereas Fluid model incorporates tightly coupled connections between adjacent peers. The tightly coupled connections in Fluid model significantly degrade the performance of all peers in a heterogeneous peer-to-peer overlay network where participating peers have different download time and bandwidth. In order to overcome the Fluid model limitations several approaches have been proposed. Different strategies such as scheduling, backpressure, and encoding have been employed to enhance the performance of Fluid content distribution model. Fluid content distribution model with backpressure and encoding directly affect the behavior of Fluid content distribution model and enable reliable content distribution in heterogeneous environments; whereas the enhancement based on scheduling does not really change the content distribution behavior itself. In this paper the performance of enhanced Fluid content distribution model with backpressure and encoding in a heterogeneous overlay network were examined by real implementation over PlanetLab. Moreover, the efficiency of enhanced Fluid content distribution model with backpressure and encoding were compared with Chunk content distribution model. In overall, the test results show that the single-rate scheme in Fluid model with backpressure is not suitable for heterogeneous peer-topeer overlay networks. On the other hand, the encoding improves the performance of Fluid content distribution model in heterogeneous peer-to-peer overlay network. It considerably reduces total download time of content and offers higher speed for data dissemination in heterogeneous peer-to-peer overlay networks. However, for large content distribution to large number of peers, Chunk content distribution model is still more scalable. This performance study has encouraged designing a new content
Performance Study of Fluid Content Distribution Model
189
distribution model for fast data dissemination in heterogeneous peer-to-peer overlay networks by combining both Fluid and Chunk content distribution models. In fact, it is desired to design a viable content distribution model that can eliminate both backpressure caused by Fluid content distribution model, as well as chunk transition delay caused by Chunk content distribution model. Acknowledgments. The authors graciously acknowledge the support from the Universiti Sains Malaysia (USM) through the USM Fellowship awarded to Maryam Feily.
References 1. Tarkoma, S.: Overlay Networks: Toward Information Networking. CRC Press, New York (2010) 2. Ren, D., Li, Y.T.H., Chan, S.H.G.: On reducing mesh delay for peer to-peer live streaming. In: The 27th Conference on Computer Communications (INFOCOM 2008), pp. 1058–1066. IEEE Press, Los Alamitos (2008), doi:10.1109/INFOCOM.2008.160 3. Ren, D., Li, Y.T.H., Chan, S.H.G.: Fast-Mesh: A Low-Delay High-Bandwidth Mesh for Peer-to-Peer Live Streaming. J. IEEE Transaction on Multimedia 11(8), 1446–1456 (2009), doi:10.1109/TMM.2009.2032677 4. Magharei, N., Rejaie, R.: PRIME: Peer-to-Peer Receiver-Driven Mesh-Based Streaming. J. IEEE/ACM Transaction on Networking 17(4), 1052–1065 (2009), doi:10.1109/ TNET.2008.2007434 5. Kumar, R., Ross, K.W.: Optimal peer-assisted file distribution: Single and multi-class problems. Citeseer (2006), doi:10.1.1.63.5807 6. Kumar, R., Ross, K.W.: Peer assisted file distribution: The minimum distribution time. Citeseer (2006), doi:10.1.1.82.426 7. Ma, L., Wang, X., King-Shan, L.: A novel peer grouping scheme for P2P file distribution networks. In: IEEE International Conference on Communications (ICC20 2008), pp. 5598–5602. IEEE Press, Los Alamitos (2008), doi:10.1109/ICC.2008.1049 8. Ma, L., King-Shan, L.: Scheduling in P2P file distribution - On reducing the average distribution time. In: The 5th IEEE Consumer Communications and Networking Conference (CCNC 2008), pp. 521–522. IEEE Press, Los Alamitos (2008), doi:10.1109/ ccnc08.2007.121 9. Sherlia, Y.S., Jonathan, S.T., Marcel, W.: Dimensioning server access bandwidth and multicast routing in overlay networks. In: The 11th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV 2001), pp. 83–91. ACM Press, New York (2001), doi:10.1145/378344.378357 10. Malouch, N., Liu, Z., Rubenstein, D., Sahu, S.: A graph theoretic approach to bounding delay in proxy-assisted, end-system multicast. Citeseer (2002), doi:10.1.1.16.9072 11. Könemann, J.: Approximation algorithms for minimum-cost low-degree subgraphs. Doctoral Thesis. Graduate School of Industrial Administration, Carnegie Mellon University (2003) 12. Tellium, D.: ALMI: An application level multicast infrastructure. Citeseer (2001), doi:10.1.1.25.4223 13. Urvoy-Keller, G., Biersack, E.: A congestion control model for multicast overlay networks and its performance. In: The 4th International Workshop on Networked Group Communication (NGC 2002). Citeseer (2002), doi:10.1.1.19.8008
190
S.N. Saleh et al.
14. Templemore-Finlayson, J., Budkowski, S.: REALM: A Reliable Application Layer Multicast protocol. Citeseer (2003), doi:10.1.1.86.7643 15. Chaintreau, A., Baccelli, F.c., Diot, C.: Impact of network delay variations on multicast sessions with TCP-like congestion control. In: The 20th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM 2001), pp. 1133–1142. IEEE Press, Los Alamitos (2001), doi:10.1109/INFCOM.2001.916307 16. Dejan, K., Adolfo, R., Jeannie, A., Amin, V.: Bullet: high bandwidth data dissemination using an overlay mesh. In: The 19th ACM Symposium on Operating Systems Principles (SOSP 2003), pp. 282–297. ACM Press, New York (2003), doi:10.1145/945445.945473 17. Kwon, G.I., Byers, J.W.: ROMA: Reliable overlay multicast with loosely coupled TCP connections. In: The 23rd Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM 2004), pp. 385–395. IEEE Press, Los Alamitos (2004), doi:10.1109/INFCOM.2004.1354511 18. Ying, Z., Baochun, L., Jiang, G.: Multicast with network coding in application-layer overlay networks. IEEE Journal on Selected Areas in Communications 22(1), 107–120 (2004), doi:10.1109/JSAC.2003.818801 19. PlanetLab, http://www.planet-lab.org 20. Birrer, S., Lu, D., Bustamante, F.E., Qiao, Y., Dinda, P.: FatNemo: Building a resilient multi-source multicast fat-tree. In: Chi, C.-H., van Steen, M., Wills, C. (eds.) WCW 2004. LNCS, vol. 3293, pp. 182–196. Springer, Heidelberg (2004), doi:10.1007/b101692 21. Chou, P.A., Wu, Y., Jain, K.: Network coding for the internet. Communication Theory. Microsoft Research website (2004), http://research.microsoft.com/pubs/78173/ChouWJ04.pdf
The Hybrid Cubes Encryption Algorithm (HiSea) Sapiee Jamel1, Mustafa Mat Deris1, Iwan Tri Riyadi Yanto1, and Tutut Herawan2 1
Faculty of Computer Science and Information Technology Universiti Tun Hussein Onn Malaysia Parit Raja, Batu Pahat 86400, Johor, Malaysia {sapiee,mmustafa}@uthm.edu.my,
[email protected] 2 Faculty of Computer System and Software Engineering Universiti Malaysia Pahang, Lebuh Raya Tun Razak, Gambang 26300, Pahang, Malaysia
[email protected] Abstract. Hybrid cubes are generated from a combination and permutation of integers as shown in Latin squares and orthogonal Latin squares. In this paper we extend our earlier non binary block cipher using all possible combination of hybrid cubes layers as the source for the encryption and decryption keys. The overall security of the cipher is improved with the inclusion of substitution box (SBOX) and diffusion functions. The experimental results indicate that the proposed generator matrices from hybrid cubes layers are suitable as candidate for a key schedule algorithm. Large key space will make brute force attack on the key space difficult and time consuming. Keywords: Non binary cipher; Block cipher; Hybrid cubes.
1 Introduction Fast advancement of computer systems and telecommunications infrastructure indirectly boosts the need for ensuring the secrecy of transmitted data using reliable and secure protection mechanisms. Ciphers become an important technique for protecting the secrecy of messages can be dated back as early as in the mid-19th century with the advancement of telegraphy. Currently, ciphers offer higher level of security for modern digital transmission because it provides the flexibility for the user to change the encryption keys regularly as compared to the use of a codebook [1]. Permutation of a finite set of numbers or symbols plays an important role in the development of block ciphers as shown in [2],[3],[4],[5],[6] and [7]. In simple transposition cipher [2], permutation is used to mix up symbols or numbers to create ciphertext. This technique preserved the number of symbols or number of a given type within a block which make it easy to be analyzed by cryptanalyst if the block size is small. Development of efficient computer hardware and software make permutation only algorithm more prone to attack by cryptanalysts. Modern binary based cryptographic algorithms such as Rijndael, Twofish, SAFER+ [9] used combination of substitution and transposition to enhance the complexity of the ciphertext. Substitution creates confusion in the ciphertext to avoid attack just by applying simple permutation of symbols to get the encryption or decryption keys. Substitution Boxes S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 191–200, 2011. © Springer-Verlag Berlin Heidelberg 2011
192
S. Jamel et al.
or SBoxes are used as confusion element in Rijndael [3] and Twofish [4]. Diffusion function for binary ciphers such as MDS and PHT are used in Rijndael and Twofish, respectively. A non binary block cipher, on the other hand, is a block cipher that use integer as a block. In non binary cipher, the message, ciphertext and keys exist in the decimal form {0 , , 9} as appears in TOY100 [6], DEAN18 [7] and A Large Block Cipher [8]. These ciphers provide an alternative to the existing binary block ciphers. Meanwhile, TOY100 [6] is used to encrypt 32 decimal digits. The diffusion function for TOY100 consists of two similar components called MixColumns and MixRows. MixColumns is performed by applying Mix function to each column and MixRows when applying Mix function to each row in the matrix. These two functions provide diffusion to the decimal ciphertext. DEAN18 is also used to encrypt 18 decimal numbers and has a fixed bijective substitution box (S-Box) which is used to provide non linear relation between the message and the ciphertext [7]. The application of S2 Box is based on the mapping of an element (a, b ) ∈ Ζ10 which is represented as
integer 10 * a + b ∈ [0,99] on each 2-digit element of the ciphertext. In this paper we extend our earlier proposed algorithm [10] to include the confusion and diffusion functions in the encryption algorithm, key schedule algorithm based on hybrid cubes of order 4 and decryption algorithm. The overall security of the cipher is improved with the inclusion of substitution box (SBOX) and diffusion functions. The experimental results indicate that the proposed generator matrices from hybrid cubes layers are suitable as candidate for a key schedule algorithm. Furthermore, large key space will make brute force attack on the key space difficult and time consuming. The rest of the paper is organized as follows: Section 2 describes the construction of hybrid cubes from combination and permutation of integer numbers. Section 3 outlines the proposed Hybrid Cubes Encryption Algorithm (HiSea) which consists of key schedule algorithm, encryption algorithm and decryption algorithm. Section 4 discusses the experimental result. Section 5 presents conclusion and future work of this research.
2 Construction of Hybrid Cubes Series of permutation and combination of a set of integers {1, 2, 3, 4} is used as the foundation for constructing all 576 Latin squares of order 4. All these Latin squares are then used to construct 3,456 pairs of orthogonal Latin squares. The existence of 880 Magic squares of order 4 was available in [11]. Adopting Trenkler’s formulation [12], we can then construct Magic cubes where layers entries fall within the set of integers {1, 2, ,63, 64} . Using layers of magic cubes, we developed a new cube structure called hybrid cube where layers entries within a set of integers {1, 2, ,4095, 4096} [10]. This new combination of layer entries can be used to add complexity in the design of our encryption algorithm. Furthermore, all layers of this hybrid cube are invertible, which can be used as the decryption keys for our new design. The following sub-section will discuss each step for construction hybrid cubes.
The Hybrid Cubes Encryption Algorithm (HiSea)
193
2.1 Latin Squares of Order 4
A Latin square of order n is an n × n matrix where each element can occur exactly once in each row and column as defined in Definition 1. Definition 1 ([See 13]). A Latin square of order n, denoted as Rn = [r (i, j ) : 1 ≤ i, j ≤ n] is a two dimensional (n × n ) matrix such that every row and every column is a
permutation of the set of natural number {1,
, n} .
Thus, a Latin square is a square array in which each row and column consists of the same set of entries without repetition. Based on Definition 1, Latin squares of order 4 are generated using series of combination and permutation of {1,2,3,4} as described in the following steps. a.
b. c.
d.
e.
Generate all possible combination of {1, 2,…, 24} with 4 elements which are used as an index for selecting possible sequence for generating Latin squares of order 4. Generate permutation of the set {1, 2, 3, 4} . This permutation is then used to build entries for constructing Latin square of order 4. Generate permutation for each entry in step 1 and used this as an index to select and sort entry based on columns in step b. This step will generate all possible combination of 4-by-4 matrices. Select only matrices where intersection between all rows and columns with standard set {1, 2, 3, 4} will result in unique matrix with entries values of all 4’s. This unique characteristic of Latin square as in Definition 1 is used to select Latin square in our implementation. This method will generate 576 Latin square of order 4 which can be used to generate orthogonal Latin squares.
Only Latin square matrices are selected in this process and otherwise are excluded because the intersection of row and column does not give a unique matrix with entries values of all 4’s.
2.2 Orthogonal Latin Squares of Order 4 Definition 2 ([See 13]). Two Latin squares, Rn = [r (i, j )] and S n = [s(i, j )] are said to be orthogonal if whenever i, i ' , j , j ' ∈ {1, … , n} are such that
[r (i, j )] = [r (i' , j ')] and [s(i, j )] = [s(i' , j ')] , then i = i ' and j = j ' .
Thus, two Latin Squares Rn = [r (i, j )] and S n = [s (i, j )] order n are said to be orthogonal if and only if the n 2 pair r (i, j ) and s (i, j ) are all different.
194
S. Jamel et al.
The following steps are used for generating orthogonal Latin squares as in Definition 2. a. b.
Compare Latin square L1 with Latin square L2. Calculate the new entries for superimposed Latin square using the following formula
(c c.
i, j
= a i , j × 10 + bi , j ) .
This step is used to create a superimposed matrix where all the entries are based on combination of entries from two Latin squares. Check if there are orthogonal (superimposed of two Latin squares should give unique elements of the set
{11, 12, 13, 14, 21, 22, 23, 24, 31, 32, 33, 34, 41, 42, 43, 44}). d.
Superimposed matrix with element similar to this set is used for find orthogonal Latin square of order 4. Repeat these steps for all the Latin square generated in sub-section 2.1.
For example, a superimposed matrix (S) of the following Latin squares L1 and L2 is produced using the formula in step b.
⎡1 ⎢2 L1 = ⎢ ⎢3 ⎢ ⎣4
2 3 4⎤ 1 4 3⎥⎥ , L2 = 4 1 2⎥ ⎥ 3 2 1⎦
⎡1 ⎢3 ⎢ ⎢4 ⎢ ⎣2
2 3 4⎤ ⎡11 22 ⎥ ⎢ 23 14 4 1 2⎥ , and S = ⎢ ⎢34 43 3 2 1⎥ ⎥ ⎢ 1 4 3⎦ ⎣42 31
33 44⎤ 41 32⎥⎥ 12 21⎥ ⎥ 24 13 ⎦
Each element in matrix S is then compared for uniqueness because an orthogonal Latin square should produce sixteen unique entries of the following set
{11, 12, 13, 14, 21, 22, 23, 24, 31, 32, 33, 34, 41, 42, 43, 44}. In matrix S, all elements are similar to the above set. This result indicates that L1 and L2 are orthogonal. 2.3 Magic Square of Order 4
Magic square is described as in definition 3. Definition 3 (See [13]). A magic square of order n, denoted as
M n = [m(i, j ) : 1 ≤ i, j ≤ n] is a two dimensional n × n matrix (square table) containing the natural numbers 1, , n 2 in some order such that the sum of the number along every row, column and main diagonal is a fixed constant of
(
)
n n2 + 1 . 2
A complete list of Magic squares of order 4 is adopted from [11].
The Hybrid Cubes Encryption Algorithm (HiSea)
195
2.4 Magic Cubes
Magic cube of order n (n ≠ 2, 6) can be viewed as layers of (n × n ) matrices which elements are permutation of numbers in {1, be sub-divided into fifty-two columns:
a. b. c. d.
, n 3 }. In [14], magic cube of order 4 can
sixteen horizontal columns from the front to the back sixteen vertical column from the top of the cube to the bottom sixteen horizontal columns from right to left, and four main diagonal columns uniting the four pairs of opposite corners.
Adopting similar approach as in [14], we can sub-divided magic cube of order 4 using layers (group of four columns) as appear in [15]. Using axis (i, j , k ) as the reference point, we can have twelve different layers. These 12 layers can be used to form twelve (4× 4) matrices which elements are in {1, … , n 3 }. These layers can be generalized
into triple matrices ( A, B, C ) where A is based on ith layers, B is based on jth layers and C on the kth layers respectively. Using this method, there exist eight matrices with similar column values which occur in the diagonal intersection between axis j and axis k as defined below:
Definition 4. Grouping of “magic cube” of order 4 into layers based on axis i, j and k will resulted in similar columns values when values of coordinates ( j = k ) with
(i = 1,
,4 ) as follow:
⎡ b1,1,1 ⎤ ⎡ c1,1,1 ⎤ ⎡ b1, 2, 2 ⎤ ⎡ c1, 2 , 2 ⎤ ⎡ b1, 3, 3 ⎤ ⎡ c1, 3, 3 ⎤ ⎡ b1, 4 , 4 ⎤ ⎡ c1, 4 , 4 ⎤ ⎢b ⎥ ⎢c ⎥ ⎢b ⎥ ⎢c ⎥ ⎢b ⎥ ⎢c ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 2,1,1 ⎥ = ⎢ 2,1,1 ⎥ , ⎢ 2, 2, 2 ⎥ = ⎢ 2, 2 , 2 ⎥ , ⎢ 2 , 3, 3 ⎥ = ⎢ 2, 3, 3 ⎥ , and ⎢b2, 4 , 4 ⎥ = ⎢c2 , 4, 4 ⎥ ⎢b3,1,1 ⎥ ⎢c 3,1,1 ⎥ ⎢b3, 2 , 2 ⎥ ⎢c 3, 2, 2 ⎥ ⎢b3, 3, 3 ⎥ ⎢c 3, 3, 3 ⎥ ⎢b3, 4, 4 ⎥ ⎢c3, 4, 4 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢⎣b4,1,1 ⎥⎦ ⎢⎣c 4,1,1 ⎥⎦ ⎢⎣b4, 2, 2 ⎥⎦ ⎢⎣c 4, 2 , 2 ⎥⎦ ⎢⎣b4 , 3, 3 ⎥⎦ ⎢⎣c 4, 3, 3 ⎥⎦ ⎢⎣b4, 4 , 4 ⎥⎦ ⎢⎣c4 , 4, 4 ⎥⎦
The development of hybrid cubes is form using inner matrix multiplication of elements, we omitted layers based on axis k and only use eight layers from axis i and axis j to generate unique hybrid cube. 2.5 Hybrid Cubes [See [10]]
Hybrid cube is formed using inner matrix multiplication of layers between two “magic cubes”. For example, hybrid 1 is based on inner matrix multiplication of layer in the same coordinate (i = 1,2,3,4) of “magic cube” 1 and layer (i = 1,2,3,4) of “magic cube” 2. Hybrid cube 2 is based on matrices multiplication of cubes 2 and 3, and so on.
3 Hybrid Cubes Encryption Algorithm (HiSea) The implementation of Hybrid Cubes Encryption Algorithm consists of three algorithms: key schedule, encryption and decryption algorithms. The detail design of each algorithm is described in the following sub-section.
196
S. Jamel et al.
3.1 Key Schedule Algorithm
Hybrid cubes generated from Section 2.5 is used to construct Key Table as depicted in Figure 1. Rows/ Column 1–4 5–8 9 - 12 13 - 16 ………………. ((n*4)-3) – (n*4)
1–4
5-8
9 – 12
H1L1 H2L1 H3L1 H4L1 …….. HnL1
H1L2 H2L2 H3L2 H4L2 ……. HnL2
H1L3 H2L3 H3L3 H4L3 …….. HnL3
13 16 H1L4 H2L4 H3L4 H4L4 …….. HnL4
Fig. 1. Structure of Key Table
Based on Figure 1, hybrid cube layers are ordered based on rows and column. For example, Hybrid Cube layer 1 is placed in row 1 to row 4, column 1 to 4 for layer 1, column 5 to 8 for layer 2, column 9 to 12 for layer 3 and column 13 to 16 for layer 4, respectively. This process is repeated for all other 30,000 hybrid cubes. Master key for the encryption is selected from Key Table based on values of password (secret integer number) entered by the user. A step for obtaining master key is described below. a.
b. c. d.
Calculate the first values of the seed based on (modulo 30000) + 1 and an index for accessing key table is calculated using the second value (modulo 4) + 1. For our implementation, 30,000 hybrid cubes are used for the construction of the Key Table. The first value is to locate the row number of Key Table. The second value is for the column number of Key Table. Extract master key from key table using row and column numbers.
Four sub-keys for the encryption algorithm are generated using the following simple steps : a. b. c. d.
Sub-Key1 = apply permutation 1 to master key. Sub-Key2 = apply permutation 2 to master key. Sub-Key3 = apply permutation 3 to master key. Sub-Key4 = apply permutation 4 to master key.
Permutation 1 to 4 are based on rows and column permutation as applied in Arkin et al. [16] for defining orthogonality of Latin cubes based on relation among three cubes or in general among k Latin k-cubes. These sub-keys are used to encrypt message 1 (M1) to message 4 (M4) in the encryption algorithm. 3.2 Encryption Algorithm
The overall design for the encryption algorithm is depicted in Figure 2. In the encryption process, messages, keys and ciphertext are formatted in (4 × 4 ) matrix . In
The Hybrid Cubes Encryption Algorithm (HiSea)
197
our design, the intermediate result (message 1’) for message 1 is used in the process of encrypting message 2. Intermediate result (message 2’) for message is used in the process for encrypting message 3. This process is repeated with message 4. Primary reason for incorporating this technique is to ensure that any changes to message 1 is reflected in the other ciphertext, thus introduce complexity to the overall ciphertext. The ciphertext diffusion is perform using MixRow and MixCol from TOY100 [5] while confusion from SBox of DEAN18 [7]. The ADDITION and MULTIPLICATION of matrices are similar as in [10].
Fig. 2. Encryption Algorithm
3.3 Decryption Algorithm
In decryption algorithm, Inverse SBox, Inverse MixRow and Inverse MixCol, Inverse K1 to K4 are used to get the original message from Ciphertext 1 (C1) to ciphertext 4 (C4).
4 Experimental Result and Discussion For our approach, we demonstrate the experimental result of HiSea by encrypting the following message:
198
S. Jamel et al.
Cryptography is fun and interesting. Lets learn cryptography. Using integer 298765 as the password. The message is formatted into four (4× 4 ) matrices. The matrices M i , i ∈ {1,
,4} (Character ‘X’ (ASCII character of 88) are
padded to the message if the message is less than 64 characters). ⎡ 67 ⎢116 M1 = ⎢ ⎢ 97 ⎢ ⎣ 32
121 112⎤ ⎡102 117 110 32 ⎤ ⎥ ⎢ 97 110 100 32 ⎥ 103 114⎥ ⎥ , , M2 = ⎢ ⎢105 110 116 101⎥ 104 121⎥ ⎥ ⎢ ⎥ 115 32 ⎦ ⎣114 101 115 116⎦ ⎡105 110 103 46 ⎤ ⎡ 99 114 121 112⎤ ⎢ 32 76 101 116⎥ ⎢ ⎥ ⎥ , and M = ⎢116 111 103 114⎥ . M3 = ⎢ 4 ⎢115 32 108 101⎥ ⎢112 104 121 46 ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ 97 114 110 32 ⎦ ⎣ 88 88 88 88 ⎦ 114 111 112 105
The following ciphertexts C i , i ∈ {1,
,4} are obtained from the encryption algorithm.
⎡ 81439023 ⎢52098695 C1 = ⎢ ⎢88240646 ⎢ ⎣52569488
85984594 88866648 41325575 85414806
88156959 05263548 85005304 12980045
⎡85934669 ⎢00638453 C2 = ⎢ ⎢84961294 ⎢ ⎣04210709
81260116 52224352 04687033 84497475
04033484 22958824 05034909 85439974
⎡84006016 ⎢ 22405007 C3 = ⎢ ⎢66582567 ⎢ ⎣ 22418214
92384327 84653420 21831162 42768011 and 11011710 11736949 70196185 10236734
42766800 53794852 21547974 21136309
12617288 ⎤ 81905366⎥⎥ , 52734113⎥ ⎥ 88137868⎦ 22789283⎤ 94826864⎥⎥ , 81076955⎥ ⎥ 00253336⎦ 22920346⎤ 66368942⎥⎥ , 84294874⎥ ⎥ 22292707⎦
10302479 09433854 53197230 11993172
70737692⎤ 10237061⎥⎥ . 11964287 ⎥ ⎥ 53826153⎦
⎡04951346 ⎢ 42320654 C4 = ⎢ ⎢10341061 ⎢ ⎣ 09313848
Based on the results from the ciphertext C1 to C 4 and the key schedule algorithm are then evaluated using brute force attack and entropy which are described in the following sub-sections.
The Hybrid Cubes Encryption Algorithm (HiSea)
199
4.1 Brute Force Attack
The encryption keys used in this algorithm is represented as a 4 × 4 matrix of integer numbers with each entry is an integer of 212 bits. The key space for encryption and decryption keys are
212 × 212 × or approximately
(10 )
( )
× 212 = 212
3 51.2
16
= 2 512
≈ 10153.6 keys.
This large key space will make brute force attack on the key space difficult and time consuming. 4.2 Entropy
We can evaluate the randomness of the session keys and the ciphertext using entropy [17]. Table 1 shown the result of entropy for Initial Matrix (IM), session keys and ciphertext. Table 1. Entropy for session keys and ciphertext
Items Initial Matrix (IM) Session keys Ciphertext 1 (C1) Ciphertext 2 (C2) Ciphertext 3 (C3) Ciphertext 4 (C4)
Entropy (H) 0.8199 0.8632 0.9477 0.8382 0.9447 0.8658
From Table 1, the entropy for all session keys are 0.8632. This result indicates that session keys generated using hybrid cubes are 86.32% random. Initial Matrix (IM) which is used to mix message 1 in the encryption process has entropy of 0.8199 or 81.99% random. These key components create a ciphertext which is more than 83% random. Our experimental results indicate that ciphertext block which consists of sixteen decimal numbers are on average 0.8991 or 89.99% random which hides the relationship between message, key and the ciphertext.
5 Conclusion In this paper, we have developed our earlier algorithm [10] using 3,456 orthogonal Latin squares of order 4 and 880 magic squares to generate magic cubes. These magic cubes are then used to generate hybrid cubes which are used for the key schedule algorithm. The entropy of encryption keys are more than 85% random which can be used to strengthen our encryption algorithm. The inclusion of a confusion function and diffusion functions increase the randomness of the ciphertext up to 89.99%. The result of this research can be further analyzed in the future against Linear Cryptanalysis, Differential Cryptanalysis and other suitable cryptanalysis for decimal ciphertext.
200
S. Jamel et al.
Acknowledgements The authors would like to thank the Ministry of Higher Education Malaysia (MyBrain15) and Universiti Tun Hussein Onn Malaysia for supporting this research.
References 1. Wrixon, F.B.: Codes, Ciphers, Secrets and Cryptic Communications. Black Dog and Leventhal Publisher Inc. (1998) 2. Menenzes, A.J., Oorschot, P.C.V., Vanstone, S.A.: Handbook of Applied Cryptography. CRC Press, Boca Raton (1996) 3. Daemen, J., Rijmen, V.: The Design of Rijndael: AES – The Advanced Encryption Standard. Springer, Heidelberg (2002) 4. Schneier, B., Kelsey, J., Whiting, D., Wagner, D., Hall, C., Ferguson, N.: The Twofish Encryption Algorithm. John Wiley and Sons, New York (1999) 5. Massey, J.L.: On the Optimality of SAFER+ Diffusion, Cylink Corporation, Sunnyvale, California, USA (1999) 6. Granboulan, L., Levieil, É., Piret, G.: Pseudorandom permutation families over abelian groups. In: Robshaw, M.J.B. (ed.) FSE 2006. LNCS, vol. 4047, pp. 57–77. Springer, Heidelberg (2006) 7. Baignères, T., Stern, J., Vaudenay, S.: Linear Cryptanalysis of Non Binary Ciphers with an Application to SAFER. In: Adams, C., Miri, A., Wiener, M. (eds.) SAC 2007. LNCS, vol. 4876, pp. 184–211. Springer, Heidelberg (2007) 8. Sastry, V.U.K., Mrrthy, D.S.R., Bhavani, S.D.: A Large Block Cipher Involving a Key Applied on Both the Sides fo the Plain Text. International Journal of Computer and Network Security (IJCNS) 2(2) (2010) 9. National Institute of Standards (NIST). FIPS Pub. 197: Advanced Encryption Standard, AES (2001), http://csrc.nist.gov/ 10. Jamel, S., Herawan, T., Mat Deris, M.: A Cryptographic Algorithm Based on Hybrid Cubes. In: Taniar, D., Gervasi, O., Murgante, B., Pardede, E., Apduhan, B.O. (eds.) ICCSA 2010. LNCS, vol. 6019, pp. 175–187. Springer, Heidelberg (2010) 11. Suzuki, M.: Magic Squares (2001), http://mathforum.org/te/exchange/hosted/suzuki/MagicSquare.html (retrieved from September 9, 2010) 12. Trenkler, M.: A Construction of Magic Cubes. The Mathematical Gazeete, 36–41 (2000) 13. Trenkler, M.: Magic Cubes. The Mathematical Gazeete 82, 56–61 (1998) 14. Andrews, W.S.: Magic Squares and Cubes. Cosimo Inc., New York (2004) 15. Trenkler, M.: An Algorithm for making Magic Cubes. The ∏ME Journal 12(2), 105–106 (2005) 16. Arkin, J., Straus, E.G.: Latin k-Cubes. The Official Journal of The Fibonacci Association 12(3) (1974) 17. Newton, P.K., Desalvo, S.A.: The Shannon Entropy of Sudoku matrices. In: Proceedings of the Royal Society A. First Cite e-publishing (2010)
Reducing Handover Latency in Mobile IPv6-Based WLAN by Parallel Signal Execution at Layer 2 and Layer 3 Muhammad Arif Amin, Kamalrulnizam Bin Abu Bakar, Abdul Hanan Abdullah, and Rashid Hafeez Khokhar Department of Computer Science and Information Systems Universiti Teknologi Malaysia
[email protected], {knizam,hanan}@utm.my,
[email protected] Abstract. The emergence of wireless networking demands continuous connectivity to support end-to-end TCP or UDP sessions. Wireless networking does not provide reliable connections to mobile users for realtime traffic such as voice-over IP, audio streaming and video streaming. Handover latency in Mobile IPv6 is one factor that disconnects users while roaming. Most of the proposed methods to reduce handover latency include either layer 2 or layer 3 design considerations. This paper discusses the mobile IPv6 handover process and proposes an efficient handover scheme for reducing the overall handover latency in Mobile IPv6. The proposed scheme consists of parallel signal execution at the access router while the mobile node is performing layer 2 handover. Simulation results shows that the proposed scheme performs better than the standard MIPv6 delays. Keywords: Handover, WLAN, MIPv4, MIPv6, FMIPv6 and HMIPv6.
1
Introduction
Over the last decade internet technology has started offering a variety of services to users with wireless devices. Traditionally, people used these services with the help of cables plugged into the wall jack, and they were considered to be stationary [1]. This led researchers to invent wireless technology to allow users to move from one place to other, hence called the mobile user. Wireless technology allows users to move within set boundaries of the network; however, with the advancement in technology and user requirements, organizations have started to deploy wireless technology across campus and enterprise networks. However, this facility does not provide uninterrupted real-time communication such as voice-over IP (VoIP), audio streaming and video streaming to users when they move to different subnets, between logical networks called virtual Local Area Networks (VLANs) or Internet Service Providers (ISPs). Uninterruptible service also means that an ongoing session should not end when the user moves from one point of attachment to another. S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 201–211, 2011. c Springer-Verlag Berlin Heidelberg 2011
202
M.A. Amin et al.
Internet Protocol (IP) [2] was originally designed for fixed networks. IP addresses were associated with fixed network computers and were required to be unchanged for the session in progress; however, if the user moved to a different network, the computer was rebooted to gain network connectivity by obtaining a new IP address. Therefore, to satisfy the requirements of mobile users, Mobile IP [3] was proposed by the Internet Engineering Task Force (IETF). Mobile IP provides a solution to mobility for fixed nodes but is not suitable for mobile users with wireless computers. Mobile IPv4 [4] was proposed to overcome the mobility issues in IPv4 networks; however, due to the address space restriction and non-hierarchical nature of IPv4, it did not succeed in many deployments. Mobile IPv6 (MIPv6) [5] has been proposed by IETF to support mobility for nodes in IPv6 networks. Though MIPv6 proposes many solutions to provide mobility, it also poses many challenges to the network world, mainly delay, packet loss and signaling overhead caused during the movement of the node. These factors result in the disturbance or disconnection of real-time applications such as VoIP. When a mobile node (MN) changes its point of attachment, the node moves from one network to another; this process is called a handoff or handover. Handoff latency is one factor that makes most mobility protocols lose sessions during movement. Many TCP [6] applications can request retransmission in case of loss of packets, but real-time applications cannot do so because they rely mostly on UDP [7] transmission. Real-time applications are time sensitive and rely on timely controlled packets that cannot be recovered once lost during the handoff process. Many methods have been proposed by researchers to solve handoff latency problems, which include reduction in movement detection [8], packet loss [9] , creating new IPv6 address, exchanging fast signals between old networks and performing fast signaling [10] in the new network. An extension to the MIPv6 original protocol called Fast Handover in MIPv6 (FMIPv6)[11] and Hierarchical Mobile IPv6 [12] has also been standardized to reduce the handoff latency. In this paper, we present the simulation results and propose an efficient handover process to overcome latencies in Mobile IPv6-based Wireless Local Area Networks- (WLAN). The proposed scheme aims to reduce the handover latency, signaling overhead and packet loss by implementing the process at the access router rather than the mobile node. The simulation results show that the proposed scheme performs much better in reducing the handover delay. The rest of the paper is organized as follows: Section II describes the Mobile IPv6 handover process and its components. The proposed handover scheme is presented in section III. Section IV describes the performance evaluation, and section V concludes the paper and suggests future work.
2
Mobile IPv6 Handover Process
The Mobile IPv6 handoff process provides a mechanism for users to roam; however, in order to roam freely, a user must not disconnect from the network. To achieve that, network entities should communicate with each other and be able to transfer information while the mobile node (MN) is moving from one network to another. An MN is allocated an IPv6 address called the home address
Reducing Handover Latency in Mobile IPv6-Based WLAN
203
Fig. 1. MIPv6 Handover Message Flow
(HoA) from the home network. The MN is always addressable using this address by communicating nodes called correspondent node (CN). An Access router at home network called as home agent (HA) maintain communication between CN and the mobile node; however, when it moves to a foreign network, the MN must form another IPv6 address called a care of address (CoA). Packets can still be routed to the MN by using a mechanism in which network entities such as access routers communicate with each other and forward traffic [13]. To perform the handover process, the access router must be configured with additional features that allow packets to move continuously to the foreign network while keeping the MN in contact. A complete MIPv6 handover process consists of the layer 2 and layer 3 handover process[13]. However, the layer 3 process cannot start unless the layer 2 process is finished by the MN. The layer 2 process includes scanning channels, sending probes, authentication and association signals to a wireless access point. A Layer 3 process includes discovering new routers, movement detection, address configuration and then IP address registration. The MIPv6 handover message flow diagram is shown in Figure 1 and the handover time line in Figure 2. The MIPv6 handover process mainly consists of the following components: – Movement detection time (Tmvd ): this is the time the MN uses to detect IPv6 router advertisements and neighbor discovery to find out if it has moved to a new network. – Duplicate address detection time (TDAD ): this is the time taken by the MN to form CoA and perform duplicate address detection to confirm the uniqueness of the IPv6 address.
204
M.A. Amin et al.
Fig. 2. Handover Time line of Mobile IPv6
– Binding Update time (TBU ): this is the time taken by the MN to get acknowledgment from the CN and HA after sending binding update (BU) signals. The mathematical representation of the L2 and L3 handovers is shown below. Thand = TL2 + TL3 TL2 = Tscan + Tauth + Tassoc TL3 = Tmvd + TDAD + TBU T otalhand = Tscan + Tauth + Tassoc + Tmvd + TDAD + TBU
3
Proposed Handover Scheme
Much of the research is devoted to reducing the handover latency independently either at layer 2 or at layer 3. However, handover latency at layer 3 has gained the maximum attention because MIPv6 is considered a layer 3 protocol. In this paper, we propose a handover scheme in which some of the layer 3 signals are performed by the access router in parallel while layer 2 signaling is executing, that is, the L2 and L3 processes are cooperating, as shown in Figure 3. The proposed scheme offers an efficient method to reduce handover latency by parallel signal execution at the access router. We have modified the function of NAP to send an additional message containing the MAC address of the MN to NAR. In addition, the function of the NAR is also modified by introducing a method to form CoA and perform optimistic duplicate address detection (ODAD) function [14]. The proposed scheme algorithm flowchart is presented in Figure 4. We have implemented a restriction of sending association response signal to MN by the NAP until optimistic DAD is finished. Once the CoA address is
Reducing Handover Latency in Mobile IPv6-Based WLAN
Fig. 3. The Proposed Handover Scheme Message Flow
Fig. 4. The Proposed Scheme Flowchart
205
206
M.A. Amin et al.
sent to the MN and configured, the MN sends binding update signals to the correspondent nodes and HA to inform about the new address. This process eliminates the time required by the MN to perform CoA registration and DAD. However, with the proposed scheme, the CoA formation and ODAD process will be done by the access router during the L2 association signals which will slightly increase the normal layer 2 handover delay. This method will eliminate the delays Tmvd and TDAD and will allow the L3 process to be completed faster, thereby reducing the overall latency.
4
Performance Evaluation
To evaluate and compare the delay performance of MIPv6 using the proposed scheme, a simulation technique was accomplished using the OMNET++ 4.0 network simulator with the xMIPv6 module [15]. The simulation environment is shown in Table 1. The network topology used in the simulation is shown in Figure 5. The network is composed of two access routers (AR), two correspondent nodes (CN), two access points (AP) and one mobile node (MN). Each access points is connected to a different access router and configured with a different wireless network name or SSID. We assumed that the MN moves from the area of AP1 toward the area of AP2 and then returns to AP1. The handover delay is calculated every time an MN moves in the direction of new access point. We have used different speeds of the MN in order to simulate real life environment in which an MN is subjected to different conditions such as walking, cycling or driving. Table 1. Simulation Environment Simulator OMNET++ 4.0 MAC Type 802.11 Traffic Generator CN generates UDP traffic every 0.01 sec Transmission rate 100 Mb/s Packet Size 56 Byte Movement Type Linear Simulation Time 300 Sec Number of runs 10 for each MN speed of 1m/s, 2.5m/s, 5m/s, 6.5m/s, 10m/s, 12.5m/s, 15m/s, 16.5m/s and 20m/s
The total handover delay can be divided into layer 2 and layer 3 handover delays. An L2 handover delay mainly consists of scanning, authenticating and associating with the new access point. An L3 handover delay consists of movement detection, IP address configuration, duplicate address detection and binding registration as shown in section 2.
Reducing Handover Latency in Mobile IPv6-Based WLAN
207
Fig. 5. Simulation Topology
A continuous stream of UDP packets is sent from CN1 and CN2 to MN at 0.01 sec while the MN moves between two access points. The mobile node is moved with multiple speeds in order to get more accurate results for delays. The L2 delays are measured from the time the MN loses beacons from the previous access point and receives association acknowledgment from the new access point. Figure 6 shows the L2 handover results of MIPv6 and the proposed scheme. In case of MIPv6, the L2 delay remains constant at 0.655 sec regardless of the speed of the MN; however, the proposed scheme minimum is 0.66 sec at 5 m/s while the maximum is 0.71 sec at 16.5 m/s. This is because some of the L3 signal execution occurs during this time. The layer 3 signals’ movement detection delay is measured from the time the MN associates with the new access point and sends RS signal to NAR. Similarly, the CoA and ODAD delays are calculated within the new access router. Finally, the binding update delay is calculated from the time the BU signal is sent and the binding acknowledgment received from CN1, CN2 and HA. Layer 3 handover delays are shown in Figure 7; the MIPv6 maximum delay is 3.2 sec at 1 m/s and the lowest is 2.13 sec at 15 m/s. However, the proposed scheme maximum is 1.473 sec at 16.5 m/s, and the minimum is 0.775 sec at 15 m/s. The total minimum handover latency of MIPv6 is 2.7 sec at 15 m/s and maximum is 3.8 sec at 1 m/s. The proposed scheme, however, has significantly fewer delays, minimum being 1.43 sec at 15 m/s MN speed and maximum 2.19 sec at 16.5 m/s speed as shown in Figure 8. The packet loss comparison of standard MIPv6 and the proposed scheme is shown in Figure 9; the minimum
208
M.A. Amin et al.
Fig. 6. Handover Delay on layer 2
Fig. 7. Handover Delay on layer 3
Reducing Handover Latency in Mobile IPv6-Based WLAN
Fig. 8. Total Handover Delay
Fig. 9. Packet Loss
209
210
M.A. Amin et al.
packet loss of MIPv6 is 1.29% at 1 m/s which remains constant at 2.5 m/s MN speed, and the maximum packet loss is 64.29% at 16.5 m/s MN speed. However, the proposed scheme results shows that the minimum packet loss is 0.57% at 1 m/s and the maximum is 33.83% at 16.5 m/s. This shows that there is a big drop in packet loss with the proposed scheme; however, it also shows that the speed of the MN affects the packet loss and handover latency.
5
Conclusion
In this paper, we proposed an efficient handover scheme that executes IP address configuration and optimistic duplicate address detection process at the access router. This process is executed while the MN is performing layer 2 handover, mainly re-association with the new access point. The simulation results shows that the proposed scheme is much lower in handover latency as compared to standard Mobile IPv6. The proposed scheme modifies the function of the access point and access router, but aims to reduce the total handover delays. In future, multiple experiments will be performed using more mobile nodes, different mobility types by running a real-time application. Moreover, the number of access points will be increased to analyze the handover delay in more details. Handover delay is an important factor in mobility; it has to be reduced in order to support these services. However, a mobile node cannot handle it alone; the other entities of the network must participate in executing the signals on behalf of the mobile node. More intelligent access points or access routers are required to perform additional functions and participate in reducing the handover delays.
References 1. Saltzer, J.H., Reed, D.P., Clark, D.D.: End-to-end arguments in system design. ACM Trans. Comput. Syst. 2(4), 277–288 (1984) 2. Postel, J.: Internet Protocol, RFC 791 (Standard) (September 1981), Updated by RFC 1349 3. Perkins, C.: IP Mobility Support, RFC 2002 (Proposed Standard) (October 1996), Obsoleted by RFC 3220, updated by RFC 2290 4. Perkins, C.: IP Mobility Support for IPv4, RFC 3344 (Proposed Standard) (August 2002), Updated by RFC 4721 5. Johnson, D., Perkins, C., Arkko, J.: Mobility Support in IPv6. RFC 3775 (Proposed Standard) (June 2004) 6. Ramakrishnan, K., Floyd, S., Black, D.: The Addition of Explicit Congestion Notification (ECN) to IP, RFC 3168 (Proposed Standard) (September 2001) 7. Postel, J.: Transmission Control Protocol. RFC 768 (Standard) (September 1981); Updated by RFCs 1122, 3168, (793), Internet Engineering Task Force. Request for Comments. IETF, http://www.ietf.org/rfc/rfc793.txt 8. Blefari-Melazzi, N., Femminella, M., Pugini, F.: A layer 3 movement detection algorithm driving handovers in mobile ip. Wirel. Netw. 11(3), 223–233 (2005) 9. Zhao, Y., Nie, G.: A scheme for reducing handover and packet loss in heterogeneous mobile ipv6 networks, pp. 339–342 (2009)
Reducing Handover Latency in Mobile IPv6-Based WLAN
211
10. Daley, B., Pentland, G., Nelson, R.: Effects of fast router advertisement on mobile ipv6 handovers. In: Proceedings of the Eighth IEEE International Symposium on Computers and Communication 2003 (ISCC 2003), pp. 557–562 (2003); cited By (since 1996) 15 11. Koodli, R.: Mobile IPv6 Fast Handovers. RFC 5568 (Proposed Standard) (July 2009) 12. Soliman, H., Castelluccia, C., ElMalki, K., Bellier, L.: Hierarchical Mobile IPv6 (HMIPv6) Mobility Management. RFC 5380 (Proposed Standard) (October 2008) 13. Koodli, R., Perkins, C.: Mobile Internetworking with IPv6. John Wiley & Sons, Chichester (2007) 14. Moore, N.: Optimistic Duplicate Address Detection (DAD) for IPv6. RFC 4429 (Proposed Standard) (April 2006) 15. Zarrar Yousaf, F.: An accurate and extensible mobile ipv6 (xmipv6) simulation model for omnet++, p. 8. ACM, New York (2008)
Minimization of Boolean Functions Which Include Don't-Care Statements, Using Graph Data Structure Masoud Nosrati1, Ronak Karimi1, and Reza Aziztabar2 1
Islamic Azad University, Kermanshah Branch, Young Researchers Club, Kermanshah, Iran
[email protected] [email protected] 2 Department of Information Technology, Qom University, Qom, Iran
[email protected] Abstract. In this paper, we intend to introduce a heuristic algorithm to apply maximum minimization to Boolean functions with normal SOP form. To implement the proposed algorithm, we use the graph data structure and define the adjacencies. Also, we demonstrate some conditions to achieve the maximum minimization. Through this paper, the problem of shared vertices in more than one adjacency is talked, and the solution is presented. Also, don'tcare statements are considered and the way of behaving with them is explained. Karnaugh map is used to clarify the matter. Keywords: Minimization of Boolean functions, SOP form, Don't-care statements, Graph data structure.
1 Introduction Minimization of Boolean functions is one of the basic operations in Boolean algebra. This is also useful in digital circuits design, and it was been regarded to decrease the price of manufactured circuits by removing extra gates [1,13,14,20]. In this paper, we present an algorithm to minimize the Boolean functions extremely. We use the graph data structure to implement this algorithm. In second part which is entitled as "Graph data structure and agreements", the structure of proposed graph and its objects and methods will be talked. In addition, some agreements are presented which are considered during this paper. In third part that is named "SOP functions and graph", the relationship between SOP functions and graph data structure is objected. Furthermore, the conditions of minimization of function by proposed graph are demonstrated. In "Minimization algorithm", that is forth part of this paper, the algorithm of minimization and its description is presented. Eventually, "conclusion" is places as fifth part. S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 212–220, 2011. © Springer-Verlag Berlin Heidelberg 2011
Minimization of Boolean Functions Which Include Don't-Care Statements
213
1.1 Related Works Before it, many methods and algorithms was introduced for minimizing a Boolean function. For example, "A Heuristic Method of Two-Level Logic Synthesis" can be pointed out, which uses the weight of cubes to minimize DSOP functions [2]. One important research which is closely related to ours is "Factoring Boolean functions using graph partitioning", which uses graphs also. Central to this approach is to combine the graph partitioning with the use of special classes of Boolean functions, such as read-once functions, to devise new combinatorial algorithms for logic minimization. This method is a generalization of techniques for the so called read-once functions, a special family of monotone Boolean functions, also known as non-repeatable tree (NRT) functions. In this study, authors obtain better results than algebraic factoring in most test cases and very competitive results with Boolean factoring with less computation time [3]. Generally, there are many methods for minimizing Boolean functions. Here we are going to present a new algorithm that uses graph data structure to do the minimization.
2 Graph Data Structure and Agreements First time, graph was been used for solving the classic problem of Königsberg bridges by Leonhard Euler in 1736. After that, graph came into mathematics world [4]. A graph is constructed of two sets, V (vertices) and E (Edges) [18,21]. For example, look at Fig.1. 2
4
5
3
G=(V,E) V={1,2,3,4,5} E={(1,2),(1,3),(2,4),(3,4),(4,5)}
1
Fig. 1. A simple example of graph
A path in graph is a set of vertices we should cross to get to a special vertex. If the initial and final vertices are the same, this path is called cycle, and if all the edges in a cycle are met just one time, it is called a simple cycle [5,6,7]. According to these definitions, there is one simple cycle in Fig.1, which is {1,2,4,3}. Here, we make two agreements and describe the reason in third part of paper. Agreement1: Each vertex makes a simple cycle by itself. Agreement2: A couple of adjacent vertices make a simple cycle. (Adjacent vertices are those which are related by an edge.)
214
M. Nosrati, R. Karimi, and R. Aziztabar
According to these agreements, the simple cycles for Fig.1 are like below: {1} , {2} , {3} , {4} , {5} {1,2} , {1,3} , {2,4} , {3,4} , {4,5} {1,2,4,3} Now, we can implement the class of proposed graph data structure. This class contains some objects to store V and E, and also some methods to create and remove vertices and edges [4,8,9,16,18]. The most important method in this class is list Cycles(Vertex Vi). This method returns the list of all simple cycles that begin with Vi. Also, it should return the number of non-don't-care vertices in each cycle which is called Wnd (Weight of nondon't-care vertices). It is a metric to choose the best adjacency for the vertex. Don'tcares and the usage of Wnd is covered during the next section. In addition, another method is defined to return the number of all adjacent vertices of Vi, too. Class Graph { //Objects // Data containers to store Vertices, Edges and Wnd
public: Graph(); // To create an empty graph
bool IsEmpty(); // If graph has no vertices returns TRUE(1), else returns FALSE(0)
void AddVertex(Vertex V); // Insert a new vertex
void AddEdge(Vertex U , Vertex V); // Insert a new edge between u and v
void RemoveVertex (Vertex V); // Deletes v and all edges incident to it
void RemoveEdge(Vertex U , Vertex V); // Deletes edge (u,v)
list Cycles(Vertex Vi); // Returns the list of cycles that begins with Vi and their weights (Wnd)
int AdjacentVertices(Vertex Vi); // Returns the number of adjacent vertices of Vi
}
3
SOP Functions and Graph
Boolean functions are used for indicating the performance of combinational two-level circuits with AND - OR gatesThese functions could be shown in normal SOP (Sum Of Products) or POS forms [10,11,17]. We aren't going to talk about algebraic concepts or the way of generating SOP or POS forms. Just propose that we have a SOP function which should be minimized.
Minimization of Boolean Functions Which Include Don't-Care Statements
215
There are different ways to minimize SOP functions. One is using the algebraic rules, which is hard and confusing for large functions with many variables. Another is using Karnaugh map. It could be used for functions with 2 to 6 variables, by drawing the map of adjacency. In fact, Karnaugh map is an illustrative form of truth table. It puts the adjacent statements near each other and provides the opportunity of selecting appropriate adjacency. Fig.2 shows the Karnaugh map for 4-variables Boolean functions. In this map, different states of variables are showed by 0 and 1 [12]. 0000
0001
0011
0010
0100
0101
0111
0110
1100
1101
1111
1110
1000
1001
1011
1010
Fig. 2. Karnaugh map for 4-variables Boolean functions
It is seen that each two adjacent cells has one different bit. In other word, the XOR of two adjacent cells equals 2r, r=0,1,2,…. Consider that function (I) should be minimized by this map. By replacing the variables with 0 and 1 (function (II)), its Karnaugh map will be as Fig.3. (I) f(w,x,y,z) = w'x'yz' + w'xy'z + w'xyz' + w'xyz + wx'yz' + wx'yz + wxyz' (II) f(w,x,y,z) = 0010 + 0101 + 0110 + 0111 + 1010 + 1011 + 1110
0010 w'xz
0101
0111
0110
yz'
1110 1011
1010
wx'y
Fig. 3. Karnaugh map for function (I) with appropriate adjacencies
In Fig.3, appropriate adjacencies are selected, and minimization operation - which is remaining similar bits and removing the others [15] - is done. Essential condition to choose an appropriate adjacency is defined as (*).
216
M. Nosrati, R. Karimi, and R. Aziztabar
(*) The number of cells in an adjacency should be equal to 2k, k=0,1,2,… and no similar bits equal to k. Another point which should be regarded is that always the biggest adjacencies that contain more cells should be selected, in order to make the function more minimized, by reducing more different bits [1,12]. Also, it should be paid attention that in functions with complete statements - where all statements are present - minimized function equals to 1. It is seen that minimized function (I) will be like function (III). (III) f(w,x,y,z) = yz' + w'xz + wx'y Suppose that fv is a desire Boolean function. It could be adapted to graph data structure, if for each statement in it, create a vertex and show adjacencies by edges. For example, Fig.4 shows the graph of function (I). 0010 0101
0111
0110 1110
1011
1010
Fig. 4. Graph of function (I)
To minimize the function of this graph, first the biggest appropriate adjacencies should be found for each vertex. It has to be done regarding agreements 1 & 2. Now, you can find out the reason of making these agreements. Minimization is operated according to the adjacencies (not vertices), so for each alone vertex, an adjacency should be considered. For two adjacent vertices it has to be done, too. In mathematical definition of simple cycles in no directed graphs, simple cycles with less than 3 vertices are not defined [3,6]. Regarding condition (*), Table.1 shows the biggest simple cycles in graph of Fig.4. Table 1. List of biggest cycles of graph Fig.4 regarding condition (*) for each vertex
Vertex 0010 0101 0111 0110 1110 1011 1010
Simple cycle (*) 0010-0110-1110-1010-(0010) 0101-0111-(0101) 0111-0101-(0111) 0111-0110-(0111) 0110-1110-1010-0010-(0110) 1110-1010-0010-0110-(1110) 1011-1010-(1011) 1010-0010-0110-1110-(1010)
Wnd 4 2 2 2 4 4 2 4
Minimized yz' w'xz w'xz w'xy yz' yz' wx'y yz'
Minimization of Boolean Functions Which Include Don't-Care Statements
217
For vertices 0010, 0110, 1110, 1010 cycles are the same. Consequently, the minimized forms are similar, too. For vertex 0111 two appropriate adjacency is available. In other word, the biggest cycle is not unique. If both of them be involved in final minimized function, then one extra statement is imposed to it. So, one of them must be chosen as below: If vertex V has more than one biggest cycle regarding (*), choose the adjacency that its first vertex (next to proposed V) has less adjacent. In our example, vertex 0111 has two adjacent 0101 and 0110. First one has 1 adjacent vertex, and second has 3. So, first one has to be chosen and w'xz should appear in final minimized rather than w'xy. The reason is when you choose the path which its first vertex has less adjacent, probability for this vertex to be included in other adjacencies is less, and if it has no other adjacent vertices, it couldn't participate in minimization operation. So, to make sure that it never happens, choose the path with less. But, if the number of adjacent vertices for both of them was equal, then you can choose one randomly. Because, they have similar circumstances and it doesn't differ that which one is selected. Don't-care vertices are those which can take 0 or 1, and their value doesn't matter in final output of function or circuit [19]. When there are don't-cares in a graph, you have to consider the (**) and (***) strategies to achieve the maximum minimization. (**) Biggest appropriate cycle should not be found for don't-care vertices. We intend to reduce don't-care vertices, because their existence may impose extra gates to the circuit. Look at the Fig.5 (a). If you propose the adjacency for the single d, then you have added an extra statement to the final minimized function. So, you have to avoid finding biggest appropriate cycle for don't-care vertices, in order to achieve maximum minimization. This matter causes that adjacencies which consist of pure don't-care vertices (Wnd=0), be omitted from the final minimized function. For example, look at Fig.5 (b).
d
0010
0010
0110
0110
1110 1010
d
d
d
d
d
(Omitted) (Omitted)
wxy'z
yz'
(a)
(b)
Fig. 5. Omitting the cycles with pure don't-care vertices
w'yz'
218
M. Nosrati, R. Karimi, and R. Aziztabar
(***) If the biggest appropriate cycle for a vertex is not unique, then choose the cycle with greater Wnd. The goal is including more non-don't-care vertices in minimization. So, the cycle that contains less don't-cares should be chosen. Fig.6 is an example for this point. It shows the chosen appropriate cycle for vertex 0110.
d
d
0011
0010
Wnd=3
d
0110
Wnd=1
Fig. 6. Choosing the cycle with greater Wnd
4 Minimization Algorithm To implement the maximum minimization on the introduced graph, the algorithm in below is offered. Create Graph of Boolean function; If all the vertices and edges are present then return 1; Else { For each non-don't-care vertex V { Find biggest cycles with the condition (*); If biggest cycle is not unique then { If Wnd of them isn't equal then Choose one with greater Wnd; Else { Find the number of adjacent vertices of the first vertex next to V in path; If the numbers of adjacent vertices aren't equal then Select the path with lest adjacent vertices for its first vertex; Else Choose one randomly; } } Minimize (take the similar bits and reduce others); Store the minimized forms; } Reduce the repeated minimized statements; Return the minimized function; }
Minimization of Boolean Functions Which Include Don't-Care Statements
219
First, it creates a graph according to Boolean function. Then, it checks whether the function is complete, return 1. Else, find the biggest cycles for each non-don't-care vertex, and if it wasn't unique, according the Wnd and first vertex next to V in the cycles chooses the appropriate one. After that, minimizes the adjacency by taking the similar bits and reduce others. Then, it stores the minimized form. When these steps were done for all the vertices, some statements will be created per vertices (as you see in Table.1). After reducing the repeated ones, final minimized function will be achieved and returned.
5 Conclusion In this paper, we introduced a heuristic algorithm to apply maximum minimization to SOP Boolean functions that include don't-care statements. Therefore, graph data structure as the essential base of this algorithm was defined, and two agreements were made which forked from the difference between mathematical definition of simple cycles and what we need for our object. Then, the method of minimization was talked, which used the concepts of Karnaugh map. A solution for the problem of locating a vertex in more than one appropriate adjacency was presented, and also don't-care statements were included to the solution. Finally, the algorithm of minimization presented.
References 1. Harris, D.M., Harris, S.L.: Digital Design and Computer Architecture, pp. 51–62. Morgan Kaufmann, San Francisco (2007) 2. Hlaviča, J., Fišer, P.: A Heuristic Method of Two-Level Logic Synthesis. Karlovo nám 13, 121 35 Prague 2 3. Mintz, A., Golumbic, M.C.: Factoring Boolean functions using graph partitioning. Discrete Applied Mathematics 149 (2005) 4. Horowitz, E., Sahni, S., Mehta, D.: Fundamentals of Data Structures in C++, 2nd edn. Silicon Press (2006) 5. Bondy, J.A., Murty, U.S.R.: Graph theory with applications, 9th edn., pp. 1–24. Elsevier Science Ltd, Amsterdam (1976) 6. Lipschutz, S.: Schaum’s outline of theory and problems of discrete mathematics, 3rd edn., pp. 154–200. McGraw-Hill, New York (2009) 7. Golumbic, M.C.: Algorithmic Graph Theory and Perfect Graphs, 2nd edn. Annals of Discrete Mathematics, vol. 57. Elsevier, Amsterdam (2004) 8. Puntambekar, A.A.: Design & Analysis of Algorithms, 1st edn., pp. (6-1)–(6-5). Technical Publications, Pune (2010) 9. Koffman, E.B., Wolfgang, P.A.T.: Data Structures: Abstraction and Design Using Java, pp. 547–550. John Wiley & Sons, Chichester (2010) 10. Balch, M.: Complete Digital Design, pp. 3–32. McGraw-Hill, New York (2003) 11. Popel, D.V.: Information theoretic approach to logic function minimization (ebook). Technical University of Szczecin (2000) 12. Nelson, V.P.: Digital logic circuit analysis and design, 2nd edn., pp. 90–120. Prentice Hall, Englewood Cliffs (1995)
220
M. Nosrati, R. Karimi, and R. Aziztabar
13. Sasao, T.: EXMIN2: A Simplification Algorithm for Exclusive-OR-Sum-of -Products Expression for Multiple-Valued-Input Two-Valued-Output functions. IEEE Trans. on Computer Aided Design 12, 621–632 (1993) 14. Thornton, M.A., Drechsler, R., Miller, D.M.: Spectral Techniques in VLSI CAD. Kluwer Academic Publ., Dordrecht (2001) 15. Karp, R.M.: Reducibility Among Combinatorial Problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum Press, New York (1972) 16. Parberry, I.: Lecture notes on algorithm analysis and computational complexity (ebook). Department of Computer Science, University of North Texas, pp. 66–71 17. Wang, Y.: Data structures: Minimization and complexity of Boolean functions (ebook). A thesis of Ph.D degree, University of Saskatchewan, Canada, pp. 8–20 (1995) 18. Bondy, J.A., Murty, U.S.R.: Graduated Texts in Mathematics: Graph theory, pp. 1–8. Springer, Heidelberg (2010) 19. Vahid, F.: Digital Design with RTL Design, Verilog and VHDL, 2nd edn., pp. 336–337. John Wiley and Sons, Chichester (2010) 20. Mano, M.M.: Digital Design, 4th edn., pp. 36–110. Prentice Hall, Englewood Cliffs (2006) 21. He, M., Petoukhov, S.: Mathematics of Bioinformatics: Theory, Methods and Applications, pp. 138–139. Wiley, Chichester (2011)
Clustering Moving Object Trajectories Using Coresets Omnia Ossama, Hoda M.O. Mokhtar, and Mohamed E. El-Sharkawi Faculty of Computers and Information, Cairo University Cairo, Egypt
[email protected],
[email protected] [email protected] Abstract. Given a set of moving objects, we show the applicability of using coresets to perform fast approximate clustering. A trajectory coreset is simply a small weighted set of the trajectory segments that approximates the original trajectory. In this paper, we present an efficient algorithm that integrates coreset approximation with k-means clustering. Our methodology to build the trajectory coreset depends on using the trajectory segments movement direction. Using the movement direction feature of the segments we basically select the most influential segments to contribute in our coreset. The main strength of the algorithm is that it can quickly determine a clustering of a dataset for any number of clusters. In addition, to measure the quality of the resulting clustering, we use the silhouette coefficient. Finally, we present experimental results that show the efficiency of our proposed algorithm. Keywords: Moving Object Database(MOD), clustering moving objects, and approximate clustering via coresets.
1 Introduction Today portable devices as mobile phones, laptops, personal digital assistants (PDAs), have become ubiquitous. Along with to the rapid advances in RFID, satellites, sensors, wireless, and video technologies, moving object position information has become easier to acquire. On the hand, the adoption of Global Positioning Systems (GPSs) promotes many new applications. Today, mobility managers embed GPS in cars to better monitor and guide vehicles; and meteorologists use weather satellites and radars to observe hurricanes. Therefore, data mining on such huge data volumes became an essential and crucial task. Data mining techniques and specially clustering techniques play an important role in extracting useful knowledge from moving objects’ location information, and in discovering hidden patterns in their motion behaviors. Consequently, a variety of disciplines including database research, market research, transport analysis, and animal behavior research show an increasing interest in movement patterns of moving objects [9][1]. For example, in transportation context, movement patterns could be used for traffic jams prediction, and in the case of moving animals movement patterns can be utilized to view spatio-temporal behaviors, e.g. seasonal animal migration. Clustering is the computational task to group a given input into subsets of similar characteristics (clusters) [8]. The goal is that objects within a group are similar (close) to S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 221–233, 2011. c Springer-Verlag Berlin Heidelberg 2011
222
O. Ossama, H.M.O. Mokhtar, and M.E. El-Sharkawi
each other and different from objects in other groups. Clustering has many applications in different areas; it is a widely used technique in computer science with applications to unsupervised learning, classification, data mining, text mining, information retrieval, and pattern recognition. There is no general clustering algorithm that can be applied to all applications because usually clustering is problem dependent. So, there are many different clustering algorithms. K-means clustering algorithm is considered one of the basic proposed clustering algorithms [20,19]. It has been applied to a wide spectrum of research problems and has been used in several applications. Nevertheless, it has a strong weakness namely, the dependence on the initial number of centroids (k). In fact, there might be no definite or unique answer as to what value k should take for an optimal clustering. Thus, both the dependence on the chosen number of clusters (k) and the initial choice of the centroids affects both the performance and accuracy of the algorithm. In this paper, we present a novel “approximate” clustering for moving objects. Our approach integrates the idea of approximation using coresets along with clustering using k-means technique. The main characteristic of our algorithm is its fast clustering time compared to traditional k-means clustering. The reason behind this feature is that being an approximate approach it does not need to examine all the trajectory segments, only the segments in the generated coreset are the ones that need to be examined. This reduction in the number of candidate segments reduces the overall clustering time. Nevertheless, clustering quality remains good. Using the silhouette coefficient [16] as a clustering quality measure we test the quality of the obtained clusters to ensure the efficiency of the proposed algorithm. In the paper we use the concept of coresets to perform efficient approximate clustering for moving object trajectories. We define a trajectory coreset as a small weighted set of segments that approximates the original trajectory. This weighted set is the set of the most influential segments that characterize the motion direction of the trajectory over time. Our main contributions are: – We present an efficient algorithm that builds a coreset for each trajectory using the repeated inter-segments’ directions. The result is a small weighted set of trajectory segments that efficiently approximates the whole original trajectory. – We integrate coreset approximation approach together with k-means clustering technique to achieve a fast, high quality clustering for moving objects. – In our proposed clustering algorithm, we present a solution to overcome the weaknesses of the k-means algorithm, namely, the dependency on the chosen number of clusters, and the dependency on the initial choice of centroids. – We test the efficiency of our resulting clustering using the silhouette coefficient [16]. The silhouette coefficient is one of the proposed techniques for measuring the quality of clustering. – Finally, we present experimental results that test both the performance of our technique and measure the accuracy (quality) of our resulting clustering. The remainder of the paper is organized as follows: Section 2 provides an overview of the related literature. Section 3 describes the basic preliminaries, notations, and
Clustering Moving Object Trajectories Using Coresets
223
definitions that we will use throughout the rest of the paper. Section 4 presents our proposed algorithms for approximate clustering of moving object trajectories. Section 5 presents our experimental results. Finally, Section 6 concludes and proposes directions for future work.
2 Related Work Applying data mining techniques to moving object databases became a crucial direction for many applications. Traffic jam prediction, animal migration, air flight traffic monitoring and control, or location based services are all new applications that require efficient analysis and mining of location data. Today, data mining is an essential ingredient for efficient data analysis and knowledge discovery. Clustering is considered one of the main data mining techniques. In general, clustering is an unsupervised learning technique that plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, computational biology, spatiotemporal data analysis and many others. Many clustering algorithms were developed to satisfy the needs of different applications [14]. These algorithms can be generally characterized as hierarchical algorithms (i.e Agglomerative and Divisive algorithms), partitioning algorithms (i.e k-means, k-mediods, and density based algorithms), Grid based, and constraint based algorithms [29]. Hierarchical algorithms build a hierarchy of clusters, i.e. every cluster is subdivided into child clusters, which form a partition of their parent cluster. Depending on how the hierarchy is built we distinguish between agglomerative (bottom-up) and divisible (top-down) clustering algorithms [14]. On the other hand, partitioning algorithms compute a clustering directly. For example, they compute a clustering by iteratively swapping objects or groups of objects between the clusters. The most prominent clustering algorithm is the k-means algorithm or Lloyd’s algorithm [20].However, the generic definition of clustering is usually redefined depending on the type of data to be clustered and the clustering objective. In other words, different clustering techniques use different definitions and evaluation criteria like partitioning (i.e k-means), hierarchical clustering, and density-based methods [13,27,18]. The original version of the k-means clustering problem is Lloyd’s algorithm [19] and MacQueen [23]. Lloyd’s algorithm is very popular in many fields because of its simplicity and flexibility. Nevertheless, it does not specify the initial placement of centers; authors in [6] provided an approach for solving this issue. There are many clustering algorithms that are very close to clustering based on k-means like the Euclidean k-medians [5,17], in which the objective is to minimize the sum of distances to the nearest center. The geometric k-center problem [4] in which the objective is to minimize the maximum distance from every point to its closest center. Recently, a number of very efficient implementations of k-means algorithm have been developed [15,25,12]. There are many variants of this algorithm; authors in [7] have shown how to scale k-means clustering to very large data sets through sampling and pruning. Clustering moving objects was also investigated in a number of research work. In [21], the authors cluster moving object trajectories based on the time interval of each trajectory segment. Another approach was proposed in [24] where the authors employ the direction feature of trajectory segments to group similar segments in the same cluster. On the other hand, authors in [3]
224
O. Ossama, H.M.O. Mokhtar, and M.E. El-Sharkawi
provided an overview of coreset constructions. There are several coreset constructions that have been developed for the k-median and k-means clustering problem [10,11]. Besides, there are many applications that support usage of coresets clustering of moving data [26], approximation algorithms [3], and data streaming [10,11]. Inspired by the importance of clustering moving objects, in this paper we propose a novel algorithm that employs the coreset approximation idea together with the k-means algorithm to provide an efficient clustering methodology. The key difference between this work and previous moving object clustering work is that we propose an approximate clustering technique that only examines a subset of the trajectory segments rather than examining all trajectory segments. We employ the trajectory segments angle as proposed in [22] to determine similar segments. However, only candidate segments selected in the coreset are clustered. In the rest of the paper we present the details of our approach.
3 Preliminaries In this paper we propose a fast approximate clustering algorithm for clustering moving object trajectories. The main idea of our algorithm is to compute a weighted set of segments which we refer to as the trajectory-coreset. A trajectory-coreset is thus a subset of the original trajectory, such that the selected subset provides a fair approximation of the original trajectory. In general, moving objects are defined as objects that change their location and/or shape over time [28]. Moving objects are usually divided into moving points and moving regions. In this paper we focus on moving points. Moving points are objects whose location is continuously changing over time; like cars, buses, trains, plans, mobile phone users, etc. The motion of a moving point is usually expressed in terms of its trajectory that represents the path that the object follows during its motion along time. A trajectory of a moving object is typically modeled using either a discrete or a continuous model. In the discrete model, the trajectory is modeled as a sequence of consecutive locations in a multidimensional (generally 2D or 3D) Euclidean space.On the other hand, in the continuous model, the trajectory is modeled as a piece-wise linear function of time. In our model, we consider the continuous representation of a trajectory. In the model, we focus on an important characteristic of a trajectory, namely, the trajectory segment’s slope. A segment’s slope is simply expressed by the segment angular value θ. The angular value θ is a value between [0, 360], so each segment belongs to one of the following four quadrants: Q1 : [0 − 90[; Q2 : [90 − 180[; Q3 : [180 − 270[; Q4 : [270 − 360]. For simplicity we assume that each segment passes only by one quadrant so we calculate its θ value only once. Definition 1. A segment angular measure is an angle that expresses the segment’s motion direction. The segment angular measure θ is measured as: θ = (arctan(
s.endy − s.starty 180 )) ∗ ( ) s.endx − s.startx Π
(1)
Definition 2. A moving object trajectory is a finite sequence of time-parameterized line segments. A trajectory segment is a quadruple (A, B, θ, T ), where A, B are the 2D motion parameter vectors, θ is the angular measure of the segment, and T ∈ R is the time interval over which the segment is defined.
Clustering Moving Object Trajectories Using Coresets
225
Having computed the angular value for all trajectory segments. Our approach for clustering the trajectory then proceeds as presented in the remainder of the paper.
4 Coreset Clustering Algorithm We consider a cluster to be a set of similar trajectory segments. To create clusters, simply a trajectory is first partitioned into its composing line segments, then, we create the trajectory coreset for each trajectory in the moving object database (MOD). Each coreset consists of a set of weighted line segments selected using the algorithm shown in Fig. 1. Once the coresets are computed, clustering is then performed over those coresets using the algorithm in Fig. 3. This procedure can result in a single trajectory belonging to multiple clusters based on the clustering result for its individual segments. The clustering algorithm is based on the idea that segments within a cluster are close to each other according to a distance measure and are at distance from segments in other clusters. To define similar trajectories; we apply two different criteria to find similar segments; a distant-based measure, and a shape-based measure. For the distance measure we simply employ the Euclidean distance to retrieve segments that are close to each other. On the other hand, for the shape-based measure we employ the angular measure computed in Equation 1 that utilizes the segments slope to define segments’ similarity based on their orientation. Thus, the similarity process proceeds in 2 steps: 1. The Euclidean distance is applied to retrieve spatially close trajectory segments. 2. The angular measure defined in Equation 1 is applied to select from the segments resulting from step (1) those segments that have the same direction (orientation). This two step procedure thus guarantees that clustered segments are ”close” to each other. In the following discussion we elaborate in more details how the clustering algorithm works. The main point is that since we are clustering trajectories, thus, the centroid of a cluster is a line segments rather than a single point. Definition 3. Given a cluster C with centroid (i.e. segment) c = (A, B, θc , T ), and a trajectory segment s = (C, D, θs , T ). The spatial distance between c and s is defined as: (d(At + B, Ct + D)) D(c, s) = t dt (2) t Where d(At + B, Ct + D) is the Euclidean distance between the spatial locations of c, s at each time instant t ∈ T . The key input to the clustering algorithm is the coreset for the MOD trajectories. In the rest of this section we present our proposed algorithm for picking a small representative subset of the moving object trajectory (trajectory-coreset) that best approximates the trajectory. In addition, we present our technique for using the silhouette coefficient to ensure high clustering quality.
226
O. Ossama, H.M.O. Mokhtar, and M.E. El-Sharkawi Algorithm:CreateCoreset(T, Delta) Input: T : List of segments in the given trajectory τ , δ: the accepted deviation between segments’ angular values Output: T rajCoreset: List of segments in coreset SegP revious .angle=T[0].seg; foreach (Seg ∈ T ) do SegCurrent = Seg; if (|SegCurrent .angle − SegP revious .angle| δ ) then TrajCoreset.Add(SegCurrent ); SegP revious =SegCurrent ; end return T rajCoreset; Fig. 1. Create Coreset
4.1 Building the Trajectory Coreset A MOD trajectory is composed of a set of segments. Each segment is assigned a weight that is a real number expressing the influence of this segment on the trajectory motion pattern. A segment of high weight will thus be added to the coreset as it consequently has a high impact on the overall trajectory motion pattern. A segment gets high weight if the angular difference between it and the previous segment’s angle is high. In other words, the segments with little effect in trajectory shape will be ignored. A threshold δ is defined to control the angular difference value that results in a segment addition to the coreset. Thus, given 2 consecutive segments s1 , s2 with angular values θ1 , θ2 resp. If θ2 − θ1 δ, then, s2 is added to the coreset. Thus we can consider the coreset as a representable subset of the original trajectory. Example 1. Consider the trajectory shown in Fig. 2 (a). The trajectory segments (s2 and s3 ) and (s4 and s5 ) have the same motion direction and slope. Thus, segments s3 and s5 will be ignored and only s1 , s2 , s4 and s6 will be added to the trajectory coreset as shown in Fig. 2 (b). Definition 4. Given a trajectory τ ∈MOD, a trajectory coreset of τ is an approximate representation of τ . Such that: – The coreset of τ = ((s1 , θ1 ), (s2 , θ2 ), · · · , (sm , θm )) is a weighted finite subset of the segments of τ . – The coreset Cτ is represented by an ordered set of pairs: ((s1 , θ1 ), · · · (si , θi ), · · · (sn , θn )). Where θi is the angular value of segment si . – If | τ |= m, then, | Cτ |= n, such that n m. – Let δ be the allowed angular deviation between segments. Then, given a trajectory τ = ((s1 , θ1 ), · · · , (si , θi ), · · · , (sn , θn )), if θi−1 − θi δ, then, si ∈ Cτ . The algorithm in Fig. 1 shows how to create the coreset from the original trajectory. The input to the algorithm is the trajectory τ and the deviation allowance between segments’ angles δ. The algorithm adds segments to the trajectory coreset Cτ if the absolute difference between its angular value and the previous segment’s angular value is greater than or equal the deviation allowance.
Clustering Moving Object Trajectories Using Coresets
227
s6 s5 s1
s2
s3 s4
(a) Trajectory
(b) Trajectory Coreset
Fig. 2. Trajectory vs. Trajectory Coreset
4.2 The Silhouette Coefficient The silhouette coefficient is a measure for the clustering quality. In order to be rather independent from features used for clustering and the number of clusters produced as a clustering result, our main evaluation is based on the silhouette coefficient [16]. To evaluate the quality of a clustering we compute the average silhouette coefficient of all segments. Definition 5. (Silhouette Coefficient): Given a coreset based clustering result CS = (c1 , c2 , · · · , ci , · · · , cn ), such that each ci ∈ CS is a cluster of line segments. The distance between a segment sj in cluster ci is given by the Euclidean distance in Equation 2 between sj and the centroid of ci . Dist(sj ,ci )= D(sj , ci .centroid) And the distance of segment sj to other clusters that do not contain sj is the minimum of Euclidean distance to all other clusters’ centroids. Dist((sj ∈ ci ),ck )=Min(D(sj , ck .centroid)) ∀k = i Then, Dist(s ,c )−Dist(s ,c )
j i j k Silhouette(sj )= Max(Dist(s . j ,ci ),Dist(sj ,ck ))
The resulting value of the silhouette coefficient of a segment varies between −1 and 1. A value near −1 implies that the segment is clustered badly. While, a value near 1 implies that the segment is well clustered. Using this approach is computing the Silhouette coefficient a remarkable reduction in the computation time for each value of the number of clusters is achieved. The algorithm shown in Fig. 3 presents our proposed framework to enhance trajectory clustering using the k-means algorithm through optimally choosing the initial number of clusters (k) using silhouette coefficient and speeding up the overall clustering time using coreset. The idea of building the trajectory coreset relies on ignoring consecutive similar segments. In this stage we consider shape similarity, where as mentioned above, 2 consecutive segments are considered similar if the difference between their slopes is below
228
O. Ossama, H.M.O. Mokhtar, and M.E. El-Sharkawi
Algorithm:k-means Using Coresets(S, C) Input: S: List of segments in MOD, C Initialized k cluster centroids, δ: accepted deviation between segments’ angles Output: CList : List of Clusters Coreset=CreateCoreset(S , δ); foreach ( Segment s ∈ Coreset) do foreach ( Cluster centroid c ∈ C) do TempDist=Similarity(s, c); end ClosestCentroid=Min[TempDist].centroid; Cluster=C[ClosestCentroid]; CList =Update Centroid (Cluster,s); end return CList ; Fig. 3. Clustering Phase Procedure:Update_Centroid(C,NewSeg) Input: C:is the selected cluster, N ewSeg:new segment to be inserted in C Output: C: Updated Cluster C.Insert(N ewSeg); M inDist=Similarity(C.centroid, N ewSeg); foreach (seg ∈ C) do T empDist=Similarity(seg, N ewSeg); if (T empDist M inDist ) then C.centroid=N ewSeg; Break; else Continue; end return C; Fig. 4. Updating Centroid Procedure Algorithm:Angle_Similarity (s1 .Angle, s2 .Angle, tolerance) Input: s1 .Angle,s2.Angle are angles of two segments s1 and s2 , δ: deviation allowance Output: Dif f : angular difference between s1 .Angle and s2 .Angle Dif f =∞; if (s1 .Quadrant = s2 .Quadrant) then if ((s1 .Angle = s2 .Angle ± δ) ) then Dif f =|s1 .Angle − s2 .Angle|; end end return Dif f ; Fig. 5. Angle Similarity Measure
Clustering Moving Object Trajectories Using Coresets
229
deviation allowance. This in turn means that the new segment has minimal or negligible effect on the trajectory shape (motion pattern). For each trajectory we compute the angles of its segments. The angle of each motion is evaluated using Equation 1. To cluster all trajectories in MOD, we apply the algorithm in Fig. 3. The algorithm tries to find the best match between a new motion and all clusters’ centroid. We denote our proposed approach as core-means because we combine the k-means clustering algorithm along with the coreset approach. Thus the clustering algorithm is finally applied on the coreset trajectory segments not the original trajectory data. The complete clustering techniques thus proceeds as follows: – First, we build the coreset from the input trajectory set in MOD. – Then, we assign segments to clusters by calculating the distance similarity between new segment and cetroids of all cluster using Euclidean distance in Equation 2. – Having computed the distance similarity, we then apply our second measure, the angular measure. – The centroid of the cluster is then updated based on the new segment assigned to the cluster through calling U pdateCentroid procedure in Fig. 4. This step sets the new segment to be the cluster centroid if it is the closest to cluster segments than current centroid and consequently has the most similar shape to other cluster segments. – Finally, through experiments we run this algorithm many times and calculate silhouette coefficient for each run to find the best approximation to the original dataset and the optimal number of clusters k.
5 Experimental Evaluation In this section we present experimental evaluation for our proposed clustering technique. The proposed method is tested using real traffic trajectory set. This dataset represents the movement of 10000 moving object in London during a one month period (July 2007) with a 10 seconds sampling rate [2]. The first experiment compares whole trajectory clustering against our approximated coreset trajectory clustering technique.
Fig. 6. Running time of clustering coreset vs. whole set
230
O. Ossama, H.M.O. Mokhtar, and M.E. El-Sharkawi
Fig. 7. Running time vs. coreset size
(a) No. of Segments
(b) No. of Trajectories
Fig. 8. Running time vs. Dataset size
Fig. 9. Coreset Size vs. Silhouette Coefficient
The results shown in Fig. 6 are the average performance over 5 runs. The key conclusion is that clustering the whole set takes almost triple the time that the coreset clustering requires. The second experiment works on changing the percentage of the whole set that would be represented by the coreset; we found that when the coreset size gets smaller it takes less time as shown in Fig. 7. Moreover clustering the coreset and whole set takes
Clustering Moving Object Trajectories Using Coresets
231
more time when dataset size increases by increasing either the number of trajectories or the trajectory length (number of segemnts/trajectroy) because it takes more processing time. Nevertheless, coreset clustering still saves about triple the time that whole set clustering takes as shown in Fig. 8. On the other hand, we study the efficiency (quality) of the resulting clustering using the silhouette coefficient as our quality measure [16]. As presented in Fig. 9, silhouette coefficient value gets -ve values when coreset size becomes a very small percentage of whole set. This result is due to the fact that the coreset is initially an approximation of the original trajectories and not an actual representing of the whole trajectory. Yet, for reasonable approximation (larger coreset size/trajectory), +ve values are achieved indicating good clustering.
6 Conclusions and Future Work In this paper we propose a new efficient and fast clustering algorithm for moving object trajectories. The proposed algorithm applies the famous k-means clustering algorithm on a representative subset of the moving object trajectory. This subset is obtained through the use of coresets as a technique to reduce the number of trajectory segments that are considered for clustering. Using coresets, influential segments that cause remarkable changes in the trajectory motion behavior are only added to the coreset for further processing. The proposed algorithm is characterized by its fast running time and its quality of clustering for appropriate coreset sizes. For future work we think that other trajectory clustering approximation techniques can be considered. Also, other data mining techniques can be investigated for better clustering results. We think that further improvements in trajectory approximation and consequently in clustering results can be investigated.
References 1. http://www.environmental-studies.de/projects/projects.html (December 2009) 2. http://www.ecourier.co.uk (Novemeber 2009) 3. Agarwal, P.K., Har-peled, S., Varadarajan, K.R.: Geometric approximation via coresets. In: Combinatorial and Computational Geometry, MSRI, pp. 1–30. University Press, New Haven (2005) 4. Agarwal, P.K., Procopiuc, C.M.: Exact and approximation algorithms for clustering. In: SODA 1998: Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 658–667. Society for Industrial and Applied Mathematics, Philadelphia (1998) 5. Arora, S., Raghavan, P., Rao, S.: Approximation schemes for euclidean k-medians and related problems. In: STOC 1998: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 106–113. ACM, New York (1998) 6. Bradley, P.S., Fayyad, U.M.: Refining initial points for k-means clustering. In: ICML 1998: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 91–99. Morgan Kaufmann Publishers Inc., San Francisco (1998) 7. Bradley, P.S., Fayyad, U.M., Reina, C.A., Bradley, P.S., Fayyad, U., Reina, C.: Scaling clustering algorithms to large databases (1998)
232
O. Ossama, H.M.O. Mokhtar, and M.E. El-Sharkawi
8. Frahling, G., Sohler, C.: A fast k-means implementation using coresets. In: SCG 2006: Proceedings of the Twenty-Second Annual Symposium on Computational Geometry, pp. 135–143. ACM, New York (2006) 9. Guting, R.H., Schneider, M.: Moving objects databases (September 2005) 10. Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: STOC 2004: Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, pp. 291–300 (2004) 11. Indyk, P.: Algorithms for dynamic geometric problems over data streams. In: STOC 2004: Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, New York, NY, USA, pp. 373–380 (2004) 12. Ishioka, T.: Extended K-means with an efficient estimation of the number of clusters. In: Leung, K.-S., Chan, L., Meng, H. (eds.) IDEAL 2000. LNCS, vol. 1983, pp. 17–22. Springer, Heidelberg (2000) 13. Iwerks, G.S., Samet, H., Smith, K.: Continuous k-nearest neighbor queries for continuously moving points with updates. In: VLDB 2003: Proceedings of the 29th International Conference on Very Large Data Bases, pp. 512–523. VLDB Endowment (2003) 14. Han, M.K.J.: Data mining: Concepts and techniques. In: Proc. ACM Symp. on Principles of Database Systems, p. 770 (2006) 15. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002) 16. Kaufman, L., Rousseeuw, P. (eds.): Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990) 17. Kolliopoulos, S.G., Rao, S.: A nearly linear-time approximation scheme for the euclidean k-median problem. In: Neˇsetˇril, J. (ed.) ESA 1999. LNCS, vol. 1643, pp. 378–389. Springer, Heidelberg (1999) 18. Liu, W., Wang, Z., Feng, J.: Continuous clustering of moving objects in spatial networks. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part II. LNCS (LNAI), vol. 5178, pp. 543–550. Springer, Heidelberg (2008) 19. Lloyd, S.P.: Least squares quantization in pcm. Bell Laboratories Internal Technical Report 1(2), 281 (1957) 20. Macqueen, J.B.: Some methods of classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967) 21. Mokhtar, H.M.O., Ossama, O., El-Sharkawi, M.: A time parameterized technique for clustering moving object trajectories. International Journal of Data Mining & Knowledge Management Process (IJDKP) 1(1), 14–30 (2011) 22. Mokhtar, H.M.O., Ossama, O., El-Sharkawi, M.E.: Clustering moving objects using segments slopes. International Journal of Database Management Systems (IJDMS) (accepted for publication - 2011) 23. Moody, J., Darken, C.J.: Some methods for classification and analysis of multi-variate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1(2), p. 281 (1967) 24. Ossama, O., Mokhtar, H.M.O., El-Sharkawi, M.: An extended k-means technique for clustering moving objects. Egyptian Informatics Journal (2011) 25. Pelleg, D., Moore, A.: Accelerating exact k-means algorithms with geometric reasoning. In: KDD 1999: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, pp. 277–281 (1999) 26. Goodman, J.E., Pach, J., Pollack, R. (eds.): Discrete and Computational Geometry. Springer, New York (1995)
Clustering Moving Object Trajectories Using Coresets
233
ˇ 27. Saltenis, S., Jensen, C.S., Leutenegger, S.T., Lopez, M.A.: Indexing the positions of continuously moving objects. In: SIGMOD: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, vol. 29(2), pp. 331–342 (2000) 28. Wolfson, O., Xu, B., Chamberlain, S., Jiang, L.: Moving objects databases: issues and solutions. In: SSDBM: Proceedings of the International Conference on Scientific and Statistical Database Management, pp. 111–122 (1998) 29. Yiu, M.L., Mamoulis, N.: Clustering objects on a spatial network. In: SIGMOD 2004: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 443–454. ACM, New York (2004)
ASIC Architecture to Determine Object Centroids from Gray-Scale Images Using Marching Pixels Andreas Loos1 , Marc Reichenbach2 , and Dietmar Fey2 1 Friedrich Schiller University of Jena Department of Computer Architecture
[email protected] 2 Friedrich Alexander University Erlangen-N¨ urnberg Department of Computer Science 3 {Marc.Reichenbach,Dietmar.Fey}@informatik.uni-erlangen.de
Abstract. The paper presents a SIMD architecture to determine centroids of objects in binary and gray-scale images applying the Marching Pixels paradigm. The introduced algorithm has emergent and selforganizing properties. A short mathematic derivation of the system behavior is given. We show that a behavior describing the computation of object centroids in gray-scale images is easily derived from those of the binary case. After the architecture of a single calculation unit has been described we address a hierarchical three-step design strategy to generate the full ASIC layout, which is able to analyze binary images with a resolution of 64×64 pixels. Finally the latencies to determine the object centroids are compared with those of a software solution running on a common medium performance DSP platform. Keywords: Marching Pixels, emergence and self-organization, image analyzing by moments, image processing in ASIC hardware.
1
Introduction
Marching Pixels (MPs) are an alternative design paradigm e. g. to analyze global image attributes like the zeroth and first moments of objects with an arbitrary form and dimension. Subsequently the centroid of these objects can be determined. This method is inspired by an ant colony, where each artificial ant has a strongly limited complexity but the local interaction of all individuals together results in a more complex system behavior. Using the biological inspired MPs in a two-dimensional SIMD-structure an artificial computer architecture provides emergent and self-organizing behavior. The resulting ASIC presented here can be a part of an embedded stand alone machine vision system. 1.1
Emergence vs. Self-organization
An overview about emergence and self-organization system characteristics is given in [13] and [3]. A list of abstracts of these characteristics reported in [13] is presented here: S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 234–249, 2011. c Springer-Verlag Berlin Heidelberg 2011
ASIC Architecture to Determine Object Centroids from Gray-Scale Images
235
– Characteristics for emergence • Micro-Macro-Effect: the global system behavior is a result of the interaction of components at the micro level. • Radical novelty: the global system behavior is not deducible from the interactions of the micro-level. • Coherence: the elements of the micro-level constitute to a coherent whole. • Local interacting components: the interaction of micro-level components form up emergents1 . • Dynamic (in the meaning of latency): the emergents are established after a certain time period respectively to a certain point of time. • Decentralized Control: only local mechanisms at the micro-level control the system; only actions at the micro-level are even controlled. • Bidirectional Link (between micro-and macro-level): the opposite direction of the micro-macro-effect; the established emergent feeds back to the global system behavior. • Robustness and flexibility: relative insensitiveness of the system against to disturbances and failures at the micro-level; components of the microlevel can be replaced by others without destroying the emergence phenomena. – Characteristics for self-organization • Increasing in order: the system is able to transform its structure from an unordered (arbitrary) state to a state with higher order. • Autonomy: the absence of any central control instance • Robustness (in the meaning of flexibility): a change of the environmental conditions influences the global system behavior without a change of the system characteristics of the components at the micro-level; this robustness implies a variety of behavior patterns of the global system. • Dynamic (in the meaning of “to be far from the state of equilibrium”): systems with this characteristic are stronger affected by outside influences as systems in the state of equilibrium. At the end of this paper it is pointed out, how far our MP architecture conform with the characteristics listed above. 1.2
Related Work
Hard-to-solve classes of image operators are those to determine global image information. An example is the computation of centroids and orientations of large sized image objects with an arbitrary form factor (convex as well as concave). In order to introduce the problems to extract these object attributes some related work is given here. 1
Emergent: the phenomena that is established due to emergent behavior at the macrolevel.
236
A. Loos, M. Reichenbach, and D. Fey
Integrated image processing circuits to obtain zeroth, first and second moments of image objects are presented for the first time in the early 90’s of the 20th century. An earlier photosensitive analog circuit with these capabilities is already presented in [11]. The single photo currents generating by the photo diodes are summed up by a two-dimensional resistor network. The centroid and the orientation can be determined by measuring the voltages (which correspond to the moments) at the network boundaries. A further method introduced in [12] describes, how moments of zeroth to second order of numerous small objects (particles) can be determined in a massively parallel fashion within a FPGA. The computation bases on image projections, while a binary tree of full-adders performs a fast accumulation of the moments. A fully-embedded, heterogeneous approach is addressed by [10]. The architecture presented in that paper is based on a reduced behavioral description of a Xilinx-PicoBlaze-processor. Only the required operators to compute binary images are realized. At all 34×26 processing elements have been connected to an SIMD-processor to compute a tenth of the pixel resolution of an QVGA-image (320×240). The calculation of the bounding boxes of the image objects as well as the zeroth, first and second moments of each of these partitions is performed in parallel. The object centroids and orientations calculates a Xilinx-MicroBlaze-processor. Subsequently a Xilinx Spartan-3 1000 FPGA implementation could be realized to apply the architecture at binary images with VGA-resolution (640×480). Another approach to extract global image information are CNNs (Cellular Neuronal Networks). In [1] a CNN network calculates symmetry axis and centroids of exclusively axially symmetric objects. To apply this method an object has to be extracted by a preprocessor from a given image. Afterwards the coordinates of each object pixel has to be transformed into its polar form. In [2] it was shown by Dudek and Geese, that it is possible to rotate and mirror images in a fast way due to the swapping of neighbored pixels and the use of only local communication mechanisms by Marching Pixels. The present paper affects the analysis of distributed algorithms to extract global image attributes by a two-dimensional field of calculation units (CUs). Each CU has a Von Neumann connectivity to its immediate neighbors. The main difference to the CNNs is the capability of these CUs to store states and to change them in dependence of the states of the CUs in the local neighborhood. The interactions between the CUs cause a information flow beyond the local neighborhoods, which has lead to the concept of the Marching Pixels [6]. In [8] and [5] the finding of object centroids is carried out due the evolving of a horizontal line, where MPs begin to run from the upper and lower edge of each object. During passing through the object vertically the MPs increments the column sums within the object. When opposing marching pixels meet, the upper column sum is added to its lower counterpart and a so called reduction line emerges. The two points of the reduction line’s contact with the object edge are defined as left side and right side. Then the evaluation process is described as follows:
ASIC Architecture to Determine Object Centroids from Gray-Scale Images
237
– At the left side one MP starts and adds up the left sums located on the reduction line. – When the MP reaches the opposite object edge it returns and adds up the right sums in the same way. – During the MP returns it calculates the difference between the actual right with the stored left sum. When an underflow is detected the object center is found. This algorithm is useful in the case of convex objects with a not too small form factor. Otherwise, it is possible that the algorithm does not converge in the object centroid. Some theoretical work to the present paper had been done by [9] and [4], especially for the Flooding and the Opposite Flooding algorithm, which are converge correctly in all cases in contrast to the method mentioned before. Key parts to implement these algorithms in FPGA hardware are presented in [7]. The rest of the paper is organized as follows. Chapter 2 describes the entire system architecture in a briefly manner. A mathematical basis of the MPs is given in section 3. Some chip architecture and implementation details are focused by section 4 and 5.2. Afterwards a comparison of the frame execution times of the MP architecture with benchmarks achieved with a mid-priced TMS320 DSP is given. The paper closes with a short resume and an outlook.
2
System Architecture
Before we introduce the chip architecture in a more detailed way we take a brief look at the characteristic data processing flow of the considered machine vision system, in which the MP processor chip could work. Figure 1 illustrates a generic procedure of the embedded machine vision environment. At the beginning a real scene is captured by an image sensor and converted to digital values (1 → 2). Afterwards the gray-scale image is segmented (2 → 3) and we receive a binary image representation.
*
'()
+
!"" #!
"$%& $%&
0/
,- )%& %
Fig. 1. Data processing flow
./ +
238
A. Loos, M. Reichenbach, and D. Fey
For step 1 → 2 we can employ a commercial CMOS sensor with an integrated ADC unit. The step 2 → 3 is only required, if the MP architecture does not support binary image computation. This can be solved by hardwired algorithms implemented in a low-priced medium class FPGA (field programmable gate array). The steps 2 → 4 respectively 3 → 4 are performed by our MP-ASIC which is the main task that we present in this paper.
3 3.1
Mathematical Basis Image Segmentation
Let I an image with a set of pixels (Pi,j ) in the following way: ⎛ ⎞ · · · · · · Pm,0 P0,0 · · · · · · Pi,0 ⎜ .. .. .. ⎟ ⎜ . . . ⎟ ⎜ ⎟ ⎜P0,j−1 Pi,j−1 Pm,j−1 ⎟ ⎜ ⎟ ⎟ I =⎜ ⎜ P0,j · · · Pi−1,j Pi,j Pi+1,j · · · Pm,j ⎟ ⎜P0,j+1 Pi,j+1 Pm,j+1 ⎟ ⎜ ⎟ ⎜ . . .. ⎟ .. ⎝ .. . ⎠ P0,n · · · · · ·
Pi,n
· · · · · · Pm,n
where a pixel is defined as: Pi,j = (i, j, xi,j ) while xi,j is the gray value for that pixel. To assign a pixel to its gray value v at the coordinates i, j we define the function v(Pi,j ) = xi,j . The segmentation of gray-scale images can be carried out in different ways depending on the image processing application. The easiest way to create a binary image from a given gray-scale image is to compare the actual gray value v(Pi,j ) of each pixel P with an adjustable threshold S: ’0’, if v(Pi,j ) < S; g(Pi,j ) = (1) ’1’ else. To get feasible results while executing MP algorithms on gray scale images, the following method is recommended: ’0’, if v(Pi,j ) < S; ∗ g (Pi,j ) = (2) v(Pi,j ) else. All pixels below the threshold are assigned with ’0’ (image background) whereas all object pixels keep their gray value.
ASIC Architecture to Determine Object Centroids from Gray-Scale Images
3.2
239
Bounding Box Computation
The Flooding algorithm is characterized by the generation of a bounding box for each image object. Considering a pixel Pi,j the pixels of the Von Neumann neighborhood are denoted as {Pi,j−1 , Pi−1,j , Pi+1,j , Pi,j+1 }. In the case of a binary image all pixels PO ∈{0,1} have the attribute to be located within the bounding box of an object O. Denoted as wave all pixels Pw are a starting point of any MP. Due to binary operations between Pi,j and the pixels Pi,j−1 , Pi−1,j , Pi+1,j resp. Pi,j+1 the binary value of wO can be determined as follows: ’1’, if g(Pi,j ) = ’1’;
wO (Pi,j ) = (3) g(Pi,j−1 ) ∨ g(Pi,j+1 ) ∧ g(Pi−1,j ) ∨ g(Pi+1,j ) , else. In the case of gray-scale images the segmentation should be carried out by (2). Afterwards one possibility to compute the bounding box is the following: ⎧ ’1’, if g ∗ (Pi,j ) > 0; ⎪ ⎪ ⎪ ⎨
∗ g ∗ (Pi,j−1 ) > 0 ∨ g ∗ (Pi,j+1 ) > 0 ∧ (Pi,j ) = wO ⎪ ⎪
⎪ ∗ ⎩ g (Pi−1,j ) > 0) ∨ g ∗ (Pi+1,j ) > 0 , else.
(4)
The edges of the bonding box span a local cartesian coordinate system separately for each object which is also called the Local calculation area of the object (see figure 2). The rightmost x-coordinate (ri) resp. the bottommost y-coordinate (bj) of the calculation area of an object O is then defined by:
riO = max {i|wO (Pi,j ) = 1} , i
bjO = max {j|wO (Pi,j ) = 1} . (5) j
ErO and EbO are sets of all right edge pixels PriO ,j respectively of all bottom edge pixels P (i, bjO ), which have to be computed by a local 3×3 edge filter applying to the object’s bounding box: ErO = {PriO ,j |∀jwO (PriO ,j ) = 1}, EbO = {Pi,bjO |∀iwO (Pi,bjO ) = 1}.
(6)
∗ ∗ , ErO , Eb∗O are defined in the same way using the For gray-scaled images ri∗O , bjO ∗ g function (2). The subsequent integer arithmetic computation steps are subdivided into forward and backward calculation. The forward calculation provides the zeroth and the first moments in x and y direction. During the backward calculation the centroid pixel emerges by evaluating the zeroth and the first moments.
240
A. Loos, M. Reichenbach, and D. Fey
Fig. 2. Gray-scaled image object (O) enclosed by an object-related coordinate system denoted as local calculation area
3.3
Forward Calculation
The mathematical derivation of the forward calculation in the case of binary images, including the distributed computation of the zeroth (m00 ) and the first
moments in horizontal (m01 ) and vertical direction (m10 ), is already described in [7] and [9]. For gray-scale images the bi nary pixel value g(P ) has to be replaced with g ∗ (P ). Figure 3 shows the scheme to cumulate the zeroth moment sum (object
mass) beginning from the upper left towards the bottom right Fig. 3. Distributed computation of the zeroth edge. moment 3.4
Backward Calculation
To avoid area consuming division operators the backward computation process is distributed in a similar way as shown in figure 3. The two-dimensional topology can be easily used to substitute the division by successive subtractions. For this purpose the backward calculation process (here only shown for the y-coordinate) starts with the initialization of two CU-registers in the bottom right pixel POn,m of the local calculation area. The registers are denoted as n (numerator) and d (denominator):
ny (n, m) = m10 (n, m) , if P (i, j) ∈ ErO ∧ P (i, j) ∈ EbO . (7) d(n, m) = m00 (n, m) Furthermore all edge state registers ey (i, j) receive a logical ’1’, if j = n.
ASIC Architecture to Determine Object Centroids from Gray-Scale Images
241
Note 1. The content of register d is equal for both the x- and y-coordinate. Therefore no index is required. Finally, the pixel POn,m is left and has to be successively shifted into the object’s center. In fact it virtually “marches” into the centroid while carrying the registers ny (resp. nx ) and d. The values of these registers are changed in the following way, depending on their actual position (i, j): ⎧ ny (i + 1, j), if ey (i, j) = ’1’ ⎪ ⎪ ⎪ ⎨ n (i, j + 1) − d(i, j + 1), if 2d(i, j + 1) y (8) ny (i, j) = ⎪ < ny (i, j + 1) ⎪ ⎪ ⎩ 0, else; d(i + 1, j), if ey (i, j) = ’1’ d(i, j) = (9) d(i, j + 1), else. 3.5
Centroid Calculation
The CU-register midy (i, j) is set to a logical ’1’ or ’0’ if the following conditions occur: ⎧ ’0’, if d(i, j) = 0 ⎪ ⎪ ⎪ ⎨ ’0’, if n (i, j + 1) = ’0’ y midy (i, j) = (10) ⎪ ’1’, if 2ny (i, j) < d(i, j) ⎪ ⎪ ⎩ ’0’, else. In the case of a logical ’1’ midy (i, j) respectively the corresponding pixel P (i, j) is located at the y-coordinate of the center of mass axis. Note 2. The backward calculation in x-direction is carried out on the exactly same way, only the coordinate label has to be changed. The centroid is determined by a logical AN D of midy (i, j) and midx (i, j): mid(i, j) = midy (i, j) ∧ midx (i, j).
(11)
In fact the centroids mid(i, j) are binary result pixels located in a cartesian coordinate system (denoted as “centroid image”) as well as the origin data. A step-by-step computation example of an object with a calculation area of 4×4 pixels can be found in [4].
4 4.1
Chip Architecture Global Data Path
Binary Image Processing. The low pixel resolution (64×64) of the initial test chip design allows to read in the image data din in a line parallel fashion (see figure 4). The local communication logic of CU cell enabled for binary pixel
242
A. Loos, M. Reichenbach, and D. Fey
computation is attached at the right side of the figure shown below. The chip operates in two global modes: either data input/output or data computation. The activity of the shift signal switches the chip to the input mode, where external image data are transfered and stored in the internal CU pixel registers. I-Pads clk din(0)
... din(N-1)
din(1)
shift run
Data_In_Control
pix
pix ALU
CU
...
ALU
CU
run shift
pix ALU
CU pix
pix
...
ALU
ALU
CU
CU array
pix ALU
CU
ALU
CU
...
...
CU
pix
AUs
pix
...
ALU
CU
middle_state_i
qpix / qmiddle_state
...
pix
DFF
ALU
pix ALU
CU Data_Out_Control
dout(0)
...
dout(1)
dout(N-1)
dout_ena
O-Pads
Fig. 4. Global datapath within the CU-array, gray: data IO modules
Each CU-ALU cell contains one 1-bit register2(denoted as DFF), in which the previously upper data value qpix /qmiddle state or the internal middle state i signal value can be stored, depending on the global mode state. During the read-in phase the signal shift is active, where each rising edge of the clock signal (clk) leads to a synchronous capturing of image data through the module Data In Control . During the read-in phase the signal shift is active, where each rising edge of the clock signal (clk) leads to a synchronous capturing of image data through the module Data In Control . The pixel registers are concatenated to a register chain leading to a vertical data transport from the upper to the lower edge of the CU array. When the image capturing process is done the signal run enables all CUs to read out the internal stored pix -value and to carry out the centroids (computation phase). After the computation latency has expired the centroid image data are established at the AU’s outputs. When releasing the global data communication path to its active state, the DFFs are chained again and the synchronous 2
In the case of an input image and a resulting centroid image identical in size the read in / write out procedure can be carried out simultaneously. Therefore one data register can be saved, which leads to only one shared input/output DFF.
ASIC Architecture to Determine Object Centroids from Gray-Scale Images
243
data output can occur via the Data Out Control unit by enabling the dout signal. At the same time new image data can be read in into the CU array.
...
Gray-Scale Image Processing. To realize an easier global data communiqmiddle_state qpix cation the gray-scale computation CU has separate data paths for the image and centroid data transport (see figure 5). The single bit CU has to be AUs replaced with the module shown bemsb low which is able to process n-bit pixel middle_state_i values. The read in phase is characlsb terized due to the storage of the n-bit image data into the pix-register. DurDFF pix ing the computation phase the stored ALU bits rotate in counter-clockwise direcRE tion within the pix-register. The internal ALU operands have deeper bit widths as the pixel operand. Therefore pix middle_state a constant number of serial zero-bits Fig. 5. Local datapath within a grayhave to be preceded, before the pixel’s scale pixel computing CU LSB can be computed. The number of the inserted bits depends on the pixel gray scale depth and the image resolution. If the computation of the centroids has be done the procedure to output the centroid image is organized as explained above. 4.2
Local Datapath
The local data path of the MP architecture is characterized by orthogonal serial data connections between the CUs. In figure 6 a CU black box representation is shown. The figure depicts all neighboured input data to compute the y-coordinate of the object’s centroid. P2 m00|2
m10|8 m00|8 sumx|8 P8
P4 w4
P0
CU
ny|6 w6 P6 Fig. 6. Data inputs of the Flooding -CU
244
A. Loos, M. Reichenbach, and D. Fey
Note 3. For a better readability the indexes of the Von Neumann’s neighborhood are substituted as follows (compare subsection 3.1) (i, j) → 0; (i, j − 1) → 2; (i + 1, j) → 4; (i, j + 1) → 6; (i − 1, j) → 8. The data flow diagram shown in figure 7 exemplifies the forward calculation process in order to compute all required values in y direction. As input data the pixels of the Von Neumann’s neighborhood (denoted as the last index expression at the signal names) and the own pixel value (P0 ) are involved. The results are the bounding box state w0 as well as the zeroth (m00 ) and the first moment (m10 ). All gray-boxed signal names are output registers, signal names without a box are the CU inputs already denoted in figure 6.
m00|2 P0 P2 P4 P6 P8
w0
sumx|0
m00|0
sumx|8 m10|8 m00|8
m10|0
Fig. 7. Data flow diagram (forward calculation)
To carry out the backward calculation the moments processed by the forward calculation and the bounding box state of the actual pixel position are required. In addition the edge information ex|0 and ey|0 using the bounding box states w4 and w6 (bottom and right pixel neighborhood) has to be evolved. Figure 8 demonstrates the data dependencies in the same way as the figure above.
m00|0 w0
d0 w4
ex|0
w6
ey|0
m10|0
ny|6
ny|0
midy|0
Fig. 8. Data flow diagram including the edge detection (backward calculation)
5 5.1
Prototype Chip Design Considerations
Before the global chip architecture could be established, the capability to implement the algorithms mentioned before had to be investigated. The results of the design analysis are the following:
ASIC Architecture to Determine Object Centroids from Gray-Scale Images
245
– All CU arithmetic units (adders, subtracters, comparators) are implemented as bit-serial modules. The main reason is, that the required area resources are significant smaller in comparison to a bit-parallel CU implementation. Once the CU chip layout is clean for a given gray-scale pixel resolution it is reusable for any image resolution. The CU design depends NOT on the array resolution anymore. – A hierarchical design strategy in three phases is recommended. The CU itself, a line of CUs and the CU-array composed of CU-lines including the chip IO resources are designed separately as hardware blocks. – A flat design strategy is not recommended due to congestion problems during the physical chip routing process. 5.2
Physical Implementation
The physical implementation of the MP-Chip had been done for the binary image processing flow as described in subsection 4.1 by applying the design considerations listed above. In this case the bit-serial working CU has to be designed only once. The behavioral description of the entire chip is designed in a consistently generic way for both the vertical and the horizontal MP array resolutions n and m. Table 1 summarizes the layout results for one CU, one CU-line and the entire 64×64 MP chip design driven by a 50 MHz clock. Table 1. Layout parameters of the chip modules using a 90 nm CMOS technology Parameter Ports Inputs Data Control Outputs Data Control Used metal layers Standard cells/macro blocks Gates Physical dimensions Height/µm Width/µm Critical path latency/ns
CU
CU-line
Chip
18 6
640 6
64 4
12
768
64 1
3 172 536
6 25/64 34413
7 767/64 2207516
41.22 41.60
45.14 2755.00
3636 3424
1.48
2.59
9.67
In figure 9 the resulting prototype chip is shown as a GDSII database representation. The squared chip core is formed by 64×64 MP calculation units. The pad ring consists of the data in- and output pads (64 pads each) located at the top and bottom chip edge. The left and right pad ring segments contain eight pairs of core power supply pads (four each).
246
A. Loos, M. Reichenbach, and D. Fey
Fig. 9. Chip prototype layout (without bonding pads)
6
Benchmark Comparison
To demonstrate the performance of chip architectures derived from our MP design strategy a comparison with simulation results of two different TMS320 DSP platforms had been carried out. The older C6416 as well as the actual DaVinci platform (DM6446) had been simulated at a virtual CPU clock of 500 MHz. Therefore a software benchmark running on the DSP cores models has been created by manually optimized C code programming. The computation of centroids bases on projections of binary and gray-scale image objects (with a bitwidth of eight), where the algorithmic approach is similarly to [10]. In addition to the absolute worst case latencies shown in figure 10 the achieved speedups (figure 11) are plotted as a function of squared worst case object3 resolutions N 2 . Latencies
! "
bin
gray
2
2
2
2
2
2
2
2
2
2
Fig. 10. MP-latencies vs. DSP-benchmarks for squared worst case objects 3
The MP-algorithm’s worst case is the largest possible L-shaped image object with a width of exactly one pixel.
ASIC Architecture to Determine Object Centroids from Gray-Scale Images
247
Speedup
! "
2 2
2
2
! " #
2
2 2
2
2
2
Fig. 11. Speedups for squared worst case objects Table 2. MP architectures versus emergence and self-organization Attribute
true ?
Emergence Micro-Macro-Effect Radical Novelty Coherence Local interacting components Dynamic (latency) Dezentralized control Bidirectional link Robustness and flexibility
yes yes yes yes yes yes no limited
Self-organization Increasing in order Autonomy Robustness (flexibility)
irrelevant yes yes
Dynamic4
7
irrelevant
Comment
no feedback of the emergent robustness against disturbances in image objects no design target no change of the local CUbehavior self-contained during a MP processing cycle
Conclusions and Outlook
This paper depicts an architecture overview as well as the design strategy to realize a Marching Pixels prototype chip. The Marching Pixels concept is an alternative design paradigm to determine global image information in an elegant fashion using a dedicated hardware platform. We showed, that an array of simple structured hardware modules is able to extract centroids of image objects of any shape and size faster than commonly used DSP platforms. 4
In the meaning of “to be far from the state of equilibrium”.
248
A. Loos, M. Reichenbach, and D. Fey
We denote that the benchmark comparison results are based on a serial data input and the largest possible worst case object, depending on the MP array resolution. The computation latencies decrease dramatically using the line-parallel data input scheme supported by our chip architecture and several image objects with smaller form factors than the worst case object. Finally, the table 2 evaluates the attributes of emergence and self-organization defined in [13] with the properties of our MP architectures. At the moment the emergent sends no feedback to the local interacting MPs. Therefore no termination condition to stop the algorithm while computing a given image exists. The algorithm has to be always active for the time which is needed to compute the worst case object. For this drawback a remedy could already be found in the following way: Projections data in x and y have to be constantly stored into register banks at the CU-array boundaries. If the centroids remain on their positions, no changes of these data registers are occurred. This is the signal to terminate the algorithm and to put out the centroid image. By applying this mechanism to the present MP architecture the system characteristic is supplemented with the still missing bidirectional link.
Acknowledgement The work was partially supported by funding from the Application Centre from the Embedded Systems Initiative, a common institution of the FraunhoferInstitute for integrated circuits and the interdisciplinary centre for embbeded systems (ESI) at Friedrich-Alexander-University Erlangen-Nuremberg supported by Bayerisches Staatsministerium f¨ ur Wirtschaft, Infrastruktur, Verkehr und Technologie.
References 1. Costantini, G., Casali, D., Perfetti, R.: A new cnn-based method for detection of symmetry axis. In: 10th International Workshop on Cellular Neural Networks and Their Applications, CNNA 2006, pp. 1–4 (August 2006) 2. Dudek, P., Geese, M.: Autonomous long distance transfer on simd cellular processor arrays. In: Proceedings of the 2010 12th IEEE International Workshop on Cellular Nanoscale Networks and Their Applications (CNNA 2010) (January 2010) 3. Eisenberg, W., Fey, D., Meiß, K. (eds.): Naturwissenschaftliche und technische Systeme im Fokus von Fremd- und Selbstorganisation: Symposium 2005. Leipziger Universit¨ atsverlag (December 2007) 4. Fey, D., Gaede, C., Loos, A., Komann, M.: A new marching pixels algorithm for application-specific vision chips for fast detection of objects’ centroids. In: PDCS 2008: Proceedings of the Parallel and Distributed Computing and Systems. ACTA Press (November 2008) 5. Fey, D., Komann, M., Schurz, F., Loos, A.: An organic computing architecture for visual microprocessors based on marching pixels. In: IEEE International Symposium on Circuits and Systems, ISCAS 2007, pp. 2686–2689 (May 2007)
ASIC Architecture to Determine Object Centroids from Gray-Scale Images
249
6. Fey, D., Schmidt, D.: Marching pixels: a new organic computing paradigm for smart sensor processor arrays. In: CF 2005: Proceedings of the 2nd Conference on Computing Frontiers, pp. 1–9. ACM, New York (2005) 7. Gaede, C.: Vergleich verteilter Algorithmen f¨ ur die industrielle Bildvorverarbeitung in FPGAs. Master’s thesis, Friedrich-Schiller-Universit¨ at Jena (December 2007) 8. Komann, M., Fey, D.: Marching pixels - using organic computing principles in embedded parallel hardware. In: PARELEC 2006: Proceedings of the International Symposium on Parallel Computing in Electrical Engineering, pp. 369–373. IEEE Computer Society, Washington, DC, USA (2006) 9. Komann, M., Kr¨ oller, A., Schmidt, C., Fey, D., Fekete, S.P.: Emergent algorithms for centroid and orientation detection in high-performance embedded cameras. In: CF 2008: Proceedings of the 2008 Conference on Computing Frontiers, pp. 221–230. ACM, New York (2008) 10. Schurz, F., Fey, D.: A programmable parallel processor architecture in fpgas for image processing sensors. In: Integrated Design & Process Technology, Society for Design and Process Science, pp. 30–35 (June 2007) 11. Standley, D.L.: An object position and orientation ic with embedded imager. IEEE Journal of Solid-State Circuits 26, 1853–1859 (1991) 12. Watanabe, Y., Komuro, T., Kagami, S., Ishikawa, M.: Parallel extraction architecture for information of numerous particles in real-time image measurement. Journal of Robotics and Mechatronics 17(4) (2005) 13. De Wolf, T., Holvoet, T.: Emergence versus self-organisation: Different concepts but promising when combined. In: Brueckner, S., Serugendo, G.D.M., Karageorgos, A., Nagpal, R. (eds.) ESOA 2005. LNCS (LNAI), vol. 3464, pp. 1–15. Springer, Heidelberg (2005)
Learning Digital Cashless Applications with the Consolidation of Authenticity, Confidentiality and Integrity Using Sequence Diagrams Ajantha Herath1, Suvineetha Herath1, Hasan Yousif Kamal1, Khalid Ahmed Al-Mutawah1, Mohammed A.R. Siddiqui1, and Rohitha Goonatilake2 1
Information Systems Department, College of Information Technology, P.O. Box 32038 University of Bahrain, Kingdom of Bahrain 2 Department of Engineering, Mathematics, Texas A&M International University, USA
Abstract. Recent surges in e-cashless transactions, attacks and intrusions reflect vulnerabilities in computer networks. Innovative methods and tools can help reinforce defenses, prevent attack propagations, detect and respond to such attacks. We have been experimenting with methods for introducing important concepts related to e-cashless transactions and improving undergraduate curricula and research experiences for Computer Science and Information Systems students. To achieve this goal, sequence diagrams which represent the progression of events over time are introduced to our students. This paper describes a learning module developed to help students understand integration of confidentiality, integrity and authentication into modeling web applications to prevent unscrupulous attacks using the secure electronic transaction protocol [1]. Keywords: sequence diagrams, e-cashless transactions, transaction protocol.
1 Introduction An evolution of the e-World has taken place during the last two decades. During this period of time postal mail became e-mail, cash transactions became cashless, commerce transformed to e-commerce, libraries became digital libraries, education is becoming online education, banking became online banking, news, TV and games became online entertainment. E-commerce transactions, electronic fund transfers, digital cash and debit cards are transforming society into a cashless society. Ecommerce can be defined as developing an electronic business service or the buying and selling of products or services over the internet. An electronic transaction is an agreement between a buyer and a seller using the internet. Electronic payments provide audit trails that can be used for many other purposes including the intended purpose of money transfers, destinations and locations. As soon as a computer starts to share the resources available on the web or local network, it immediately becomes vulnerable to attacks or infiltration. Confidentiality ensures that information is protected from unauthorized listening, reading or exposure. S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 250–259, 2011. © Springer-Verlag Berlin Heidelberg 2011
Learning Digital Cashless Applications
251
It guarantees privacy and no loss of information from the client or the server. Integrity assures that the data or messages received are the same as those sent by an authorized person with no modifications of data, messages or impersonation. The modification of data, memory or messages is related to writing and affects integrity. Authentication helps identify the user. The consequence of the misrepresentation of a user can be impersonation and forgery. Authentication is the validation provided by the communicating entity’s identity as the one that it claims to be. The process of converting plain text message to cipher text so that unintended parties cannot read the message is called encryption. Confidentiality, integrity and authentication are achieved through the encryption of the message. In the case of providing integrity a message is transformed to a fixed size message digest using an encryption function or a specialized function. Authentication is implemented through encryption, signatures and certificates. Single, session or symmetric key cryptography, consists of a private key that is used for both encryption and decryption. Faster symmetric key encryption algorithms like Data Encryption Standard, DES, and Advanced Encryption Standard, AES, are popular for larger data encryption. To conduct transactions with consumers, the merchant needs to generate a unique key for each consumer and send it over a separate secure channel. Public or asymmetric key cryptography, consists of a pair of public and private keys. The private key is kept secret whereas the public key is distributed for use by multiple parties. The RSA Algorithm is the most popular among asymmetric cryptosystems. A message encrypted with the public key can be decrypted only with the corresponding private key and it is nearly impossible to compute the private key from the available public key. This provides confidentiality as only the party with the private key can convert the message to plaintext. RSA is not used often for encryption of large data because the time taken to encrypt a message is the cube of the number of bits of the modulus. The merchant generates a pair of RSA private/public keys and the public key distributed to the consumers will be used during transactions. Digital Signatures are used to provide authenticity. A message signed with the merchant’s private key can be verified by any consumer who has access to merchant’s public key. This provides confirmatory evidence that the signed message has not been tampered with by any unauthorized party. A public key certificate contains the identity of the certificate holder such as name and the digital signature of the certificate issuing authority. A public key certificate is used to validate the sender’s identity. The certification authority attests that the public key indeed belongs to the sender [1, 2, 3]. The remainder of this paper is organized as follows. Section 2 of this paper provides a brief description of an e-commerce transaction. Section 3 describes a simple definition of a sequence diagram. Section 4 presents the derivation of a sequence diagram from the transaction. Section 5 describes major threats that might be seen in an e-commerce transaction. It also, discusses five major security concepts that can be used to avoid those threats. Section 6 presents some details of the integration of confidentiality into the transaction using SET protocol. Section 7 presents the integration of confidentiality, integrity and authentication to the transaction. Section 8 presents related other work. Section 9 concludes the paper.
252
A. Herath et al.
2 E-Cashless Transactions The major players of electronic cashless transactions are clients, internet service providers, the merchant’s servers, the client’s and merchant’s banks, warehouses and delivery services. In a transaction diagram, the major players are represented by nodes and directed arcs with labels presenting the messages transferred. The purchase of goods from the internet can be represented as a transaction diagram as shown in Figure 1. In this diagram each link is numbered to represent the order of the progression of events and communications. Label 1 represents the client sending the message to an internet service provider. Label 2 represents the ISP sending the message to the merchant’s web server, located in the internet. Label 3 denotes the merchant’s web server communications with the e-commerce server Label 4 shows the merchant’s e-commerce server communications with the payment gateway and the database server. Label 5 depicts the payment gateway communications with the client’s bank. Label 6 shows the payment gateway communications with the merchant’s bank. Label 7 denotes the merchant’s e-commerce server sending a message to the warehouse. Label 8 presents the warehouse sending goods to the delivery service. Label 9 shows the delivery service sending goods to the client.
Fig. 1. E-Cashless Transaction Diagram
3 Sequence Diagrams In a sequence diagram the agents involved in the transaction are listed from left to right, messages are represented by directed arcs and time is presented from top to bottom. The transaction diagram could be easily transformed into a sequence diagram that will illustrate the snapshot of the sequence of events taking place represented on the horizontal axis, message transmissions and the particular time slot of the event taking place, shown on the vertical axis progressing from top to bottom [4,5].
Learning Digital Cashless Applications
253
4 Sequence Diagrams from Transactions Figure 2 illustrates the sequence diagram for the transaction described in section 2 above. The client first sends payment and order information to the merchant’s server via his or her internet service provider. Then the merchant’s server sends payment information to the client’s bank. The client’s bank then sends payment to the merchant’s bank. Payment confirmation will be issued by the merchant’s bank to the merchant’s server. Thereafter the payment and order confirmation will be sent to the client by the merchant’s server via the ISP. The merchant’s server sends the order issue request to the warehouse. The warehouse issues goods for delivery. The delivery service delivers the goods to the client.
Fig. 2. Sequence Diagram for the Transaction
5 Sequence Diagrams with Attacks Particularly, E-cashless transactions involve the client’s and merchant’s secure information such as credit/debit card numbers and private information. Most of the communications among the client, merchant and banks are done via the internet. Much of the communication, billing and payments are done by electronic message transfers. There is a higher possibility of stealing, losing, modifying, fabricating or repudiating information. Such systems and messages transmitted need extra protection from eavesdroppers. Many threats such as Denial of Service, DoS, Distributed Denial of Service, DDoS, Trojans, phishing, Bot networks, data theft, identity theft, credit card fraud, and spyware can be seen in these systems. These attacks might cause the loss of private information or revelation of sensitive information such as credit card numbers and social security numbers, misinterpretation of users, gaining unauthorized access to sensitive data, altering or
254
A. Herath et al.
Fig. 3. Vulnerable Points and Attacks
replacing the data. Sniffing can take place at vulnerable points such as the ISP, the merchant’s server, the client’s bank, the merchant’s bank or at the internet backbone [6]. Figure 3 shows vulnerable points and possible attacks. Figure 2 also depicts an insecure e-commerce transaction. In this transaction anyone can read or modify the payment and order information. An intruder can interrupt, modify or initiate the transaction. The client’s bank information can be stolen by a third party.
6 Confidentiality in E-Cashless Transaction Providing confidentiality is vital for this system. The transaction can be made secure by converting the plain text message to cipher text so that the holders of the keys can decrypt and read the messages. Common algorithms used to achieve this encryption and decryption goal are AES, DES with single symmetric keys and RSA with public/private asymmetric key pairs. Encryption will prevent strange third parties from obtaining the client’s credit/ debit card numbers, passwords, pin numbers or personal details. But in e-transactions, there are many possibilities for an unauthorized third party to obtain this sensitive and private information and violate the privacy of the people, particularly in e-commerce service, the privacy of the consumer and the merchant. Thus, this e-commerce system needs to be assured that the information is not to be spread to unauthorized people in order to provide a genuine and reliable service. Symmetric encryption plays a key role in assuring the confidentiality of the data because, even if an unauthorized third party intercepts the message, usage of the unique session key, which can be accessed only by the two parties involved, prevents that person from viewing the message. Hence, the encryption of the information not only guarantees authentication, but also assures the confidentiality of the information. Figure 4 shows a transaction with confidentiality.
Learning Digital Cashless Applications
255
Fig. 4. E-Cashless Transaction with Confidentiality
7 Consolidation of Integrity, Confidentiality and Authenticity To make the e-cashless transaction secure, the data need to be received free from modification, destruction and repetition. When we consider the security of the electronic transaction, data integrity is another significant feature, because changing address, order information, or payment information may have possibly happened in this system. Therefore, to get the message free from modifications, the e-commerce system should provide protection of the message during transmission. This can be achieved by encryption and message digesting. A unique message digest can be used to verify the integrity of the message. To do this, hash functions take in variable length input data and produce fixed length unique outputs that are considered the fingerprint of an input data/message. Thus, it is very likely that if two hashes are equal, the messages are the same. Specialized hash functions are often used to verify the integrity of a message. To begin, the sender computes the hash of the message, concatenates the hash and the message, and sends it to the receiver. The receiver separates the hash from the message and then generates the hash of the message using the same hash function used by the sender. The integrity of the message is said to be preserved if the hash generated by sender is equal to the hash generated by the receiver. This implies that the message has not been altered or fabricated during the transmission from sender to receiver. Encryption algorithms such as AES, DES could be used to generate message digests. In addition there are special purpose hash functions such as SHA and MD for this purpose. SHA is a message-digest algorithm developed by the National Institute of Standards and Technology and the National Security Agency. SHA is secure, but slower than MD5. MD5 produces the digest of 128 bits whereas SHA1 produces a 160-bit message digest and is resistant to brute force attacks. It is widely used for digital signature generation. Figure 5 shows how authenticity, confidentiality and
256
A. Herath et al.
integrity can be used in our example. It uses the encryption, message digest, digital signature and a digital certificate to ensure the authenticity, confidentiality and integrity of the order and payment information. One of the most important aspects of the security of the transaction is authenticating that the suppliers and consumers are who they say they are and assure the trustworthiness of the sources they are exchanging. This is really important in
Fig. 5. Secure E-cashless Transaction
Fig. 6. Secure Transaction with Symbolic Message Representation
Learning Digital Cashless Applications
257
cashless e-commerce transactions because the supplier and consumer never meet face to face. Authentication can be presented in different ways. Exchanging digital certificates helps the seller and buyer verify each other’s identity so that each party knows who is on the other end of the transaction. The digital signature is another method to be certain that the data is indeed from a trusted party. In addition, symmetric encryption can also be used in certifying authenticity. In this way, the receiver of the information can make sure that the information that they received is sent by a trusted party, because the key that is used to encrypt and decrypt the information is shared only by the sender and the receiver. In Figure 6 Ks represents the temporary symmetric key, PI, Payment Information, DS, Dual Signature, OIMD, Order Information message digest, KpubB ,the Bank’s public key-exchange key, PIMD, PI message digest, OI = Order Information and Certificate, Cardholder Certificate [1,2]. The equation in Figure 6 Eks {PI + DS + OIMD} + Ek pub B{ {Ks} & PIMD +OI + DS + Certificate summarizes the message generation in Secure Electronic Transaction protocol, an application of hashing and encryption algorithms in providing integrity, confidentiality and authentication for messages. This message consists of two parts: one for the client’s bank and the other for the merchant. The request message part {PI + DS + OIMD} is encrypted by using the session key Ks. The Digital Envelope consists of the session key encrypted by using the public key of the Bank KpubB. Secure transactions use both public and private key encryption methods for message exchange between the merchant and consumers. Light-weight-crypto algorithms such as Simplified-DES take an 8-bit block of plaintext and a 10-bit key as input to produce an 8-bit block of ciphertext. The goal of dual signature generation and use is to send a message that is intended for two different recipients. Each recipient has access to the message, however only a part of the message can be read by each. In the case of SET protocol, the customer sends the order information (OI) and payment information (PI) using dual signature. The merchant can only see the OI and the bank can only access PI. Dual signature could be generated by using the order information and payment information. This information is securely delivered to the two recipients – the merchant and the bank. The digital envelope combines the speed of DES and the efficient key-management of RSA. The envelope and the encrypted message is sent to the recipient who decrypts the digital envelope using his private key to generate the symmetric key and then uses this symmetric key to regenerate the original message.
9 Related Other Work Credit cards, e-cash, e-checks, smartcards and micropayment are used to purchase items over the Internet. Credit cards contain the card owner’s financial information. Ecash is a digital form of money provided by financial institutions. Consumers need to install e-wallet software on their machines to use e-cash. PayPal is a successful e-wallet application used in many countries. E-check. similar to e-cash, contains consumer’s bank information. This information is used by the merchant to authenticate
258
A. Herath et al.
the consumer and the consumer’s bank uses this information to authorize the payment. Banks allow the e-check feature for purchasing items over the Internet. Smartcard, consists of a chip that contains a prepaid amount of money and the consumer’s financial information. Service prepayment is similar to prepaid phone cards. The subscriber or merchant charges an amount in advance for the service or product that is provided at a later time. Usually a third party is responsible for distributing these cards, collecting money from the consumer, and charging a commission for their services. Offline authorization can reduce online computational load. Micropayment systems such as CyberCoin, NetBill, PayWord and MicroMint provide low-cost transactions [7, 8, 9, 10].
10 Conclusion Internet users demand fast, reliable and secure transactions. Also they want their information to be private and protected [11]. Thus electronic cashless services rely upon the security provided in crypto systems provided for the transaction. To survive in a highly competitive world the service provider should be able to provide fast, reliable and secure service to their customers. Providing a safe and trustworthy environment among the merchant, the consumer, and their financial institutions is always essential. It is hard to determine the degree of safety and trustworthiness in electronic transactions. Introducing crypto system concepts to future designers, developers and students are not an easy task. This paper first presented how to integrate confidentiality, integrity and authentication using sequence diagrams to students in computer science, information system and network security courses to make the task easier. Thereafter it presented the sequence diagram derivation from equations. Reading, interpreting and developing crypto equations and algorithms are important in Computer Security classes. This paper summarized mathematical representations used in security as well as sequence diagram to represent cryptographic algorithms, providing examples related to confidentiality and integrity and their combinations. The active learning module developed is easily adapted and effectively used in classrooms with senior undergraduate and graduate students in Computer Science, Engineering and Information Systems to teach other symmetric key algorithms and help students understand crypto system quickly.
References 1. IBM International Technical Support. Secure Electronic Transaction: Credit Card Payment on the Web in Theory and Practice (1997), http://www.redbooks.ibm.com/redbooks/pdfs/sg244978.pdf 2. Stallings, W.: Cryptography and Network Security Principles and Practice, 2nd edn. Prentice Hall, Englewood Cliffs (2007) 3. Laudon, K., Traver, C.G.: E-Commerce. Pearson Publishing, London (2009) 4. Bruegge, B., Dutoit, A.H.: Object-Oriented Software Engineering Using UML, Patterns, and Java, 3rd edn. Pearson Publishing, London (2010) 5. Herath, A., Herath, S.: Case Studies for Learning Software Security Engineering. In: 9th International Conference on Humans and Computers. Aizu University, Japan (2006)
Learning Digital Cashless Applications
259
6. Herath, A., Gunathilake, R., et al.: Mathematical Modeling of Cyber Attacks: A Learning Module to Enhance Undergraduate Security Curricula. Journal for Computing Sciences in Colleges, Consortium for Computing Sciences in Colleges, South Central Region 18th Conference (2007) 7. Cox, B., Tygar, J.D., Sirbu, M.: NetBill security and transaction protocol. In: Proceedings of 1st USENIX Workshop on Electronic Commerce, New York (1995) 8. iPAID (2008), http://www.ipaid-insurance.com/ 9. Mevinsky, G., Neuman, B.C.: NetCash: a design for practical electronic currency on the Internet. In: Proceedings of 1st ACM Conference on Computation Communication and Security (1993) 10. Troncoso, C., Danezis, G., Kosta, E., Preneel, B.: PriPAYD: Privacy Friendly Pay-AsYou-Drive Insurance. In: Workshop on Privacy in the Electronic Society (2007) 11. Herath, A.: Secure Digital Cashless Transactions with Sequence Diagrams and Spatial Circuits to Enhance the Information Assurance and Security Education. Journal of Information Systems Education (working Paper)
Encoding in VLSI Interconnects Brajesh Kumar Kaushik1, S.K.Verma2, and Balraj Singh2 1
Department of Electronics and Computer Engineering, Indian Institute of Technology-Roorkee, Roorkee, Uttarakhand, India 2 G.B. Pant Engineering College, Pauri-Garhwal, Uttarakhand, India
[email protected],
[email protected],
[email protected] Abstract. This paper reviews different encoding schemes for reduction of power dissipation, crosstalk noise and delay. The encoding scheme is categorized based on the type of encoding methods and their power, delay reduction capability with crosstalk avoidance. Crosstalk is aggravated by enhanced switching activity which is frequently main reason for the malfunctioning of any VLSI chip. Consequently, delay and power dissipation also increases due to enhanced crosstalk. Reduction in switching activities through coupled transmission line results in enormous reduction of power dissipation, crosstalk and delay. The researchers therefore often concentrate on encoding schemes that reduces the transitions of the signals. This paper reviews all such encoding schemes. Keywords: VLSI, CMOS, Interconnects, Encoding.
1 Introduction As VLSI technology advances ahead more and more number of devices or modules would be added on a single chip. This has been possible because of continuous reduction in feature size. Semiconductor technologies with feature sizes of several tens of nanometers are currently in development. As per, International Technology Roadmap for Semiconductors (ITRS), the future nanometer scale circuits will contain more than a billion transistors and will operate at clock speeds well over 10 GHz. Distributing robust and reliable power and ground lines; clock; data and address; and other control signals through interconnects in such a high-speed, high-complexity environment, is a challenging task. The major design concerns such as crosstalk, propagation delay and power dissipation can be reduced by several methods [1-5] such as repeater insertion, shield insertion between adjacent wires, optimal spacing between signals lines and lastly the much superior bus encoding method. Many researchers have used variety of bus encoding technique to handle the problems of crosstalk, delay and power dissipation. A typical encoding scheme converts or encode data bits stream in such a manner so that the transitions of bit stream are minimized. Power dissipations in a VLSI circuit can be categorized as dynamic, short-circuit and static. The dynamic power dissipation [3] is expressed as
PD y n a m ic = α ∗ V D2D ∗ C L ∗ f S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 260–269, 2011. © Springer-Verlag Berlin Heidelberg 2011
(1)
Encoding in VLSI Interconnects
261
where α, VDD, CL and f are the transition activity factor, supply voltage, load capacitance and frequency of operation respectively. α can be further expressed as
α = α s Cs + α c Cc
(2)
where, αs, Cs, αc and Cc are self transition activity factor, self capacitance of interconnects, coupling transition activity factor and coupling capacitance respectively. Self transition and coupling transition are defined as the transitions in line compared to the previous line value and adjacent line respectively. Dynamic power dissipation depends on the frequency or transitions of the signal and load capacitance [2]. The fault free operation of a large circuit requires not only correct logic functioning but also propagation of accurate logic signals within specified time limit through interconnects. Delay faults can lead to false switching, logic failures and malfunctioning of the designed chip. As feature size approaches the nanometer regime, portions of delay contributed by gates reduces while the delay due to interconnects dominates because interconnect parameters do not scale in proportion to the shrinking area of transistor. Crosstalk refers to the interaction between signals that are propagating on various lines in the system. Crosstalk is mainly due to the dense wiring required by compact and high-performance systems. High-density and closely laid interconnects result in electromagnetic coupling between signal lines. The active signal energy is coupled to the quiet line through both mutual capacitance and inductances, resulting in noise voltage–currents. This may lead to inadvertent switching and system malfunctioning. Crosstalk is a major constraint while routing in high-speed designs. By its very nature, crosstalk analysis involves systems of two or more conductors. It can affect timing, causing a delay failure, it can increase the power consumption due to glitches, and it can cause functional failure because of the signal deviation. Interconnect cross capacitance noise refers to the charge injected in quiet nets, victims, by switching on neighboring nets, aggressors, through the capacitance between them (cross capacitance). This is perceived to be one of the significant sources of noise in current technologies. Cross capacitance can also affect delay and slew depending on whether the aggressor signals are switching in the same or in the opposite direction to the victim net. Amplitude of the signal generated on the passive line is directly related to the edge rate of the signal on the active line, the proximity of the two interconnects and distance that the two interconnects run adjacent to one another. The transition in three neighbors interconnects and their crosstalk, class and delay are shown in Table.1. Here λ is ratio of interwire capacitance to line/ground capacitance [3-6]. Table.1. Delay and Crosstalk Classes for Three Bits Transition (Δk-1, Δk, Δk+1) ↑-↑, ↓-↓, ↑-↓, ↓-↑, ↑--, --↓, --↑↑↑, ↓↓↓ ↑↑-, ↓↓-, -↑↑, -↓↓ -↑-, -↓-, ↓↓↑, ↑↑↓, ↓↑↑, ↑↓↓ -↓↑, -↑↓, ↑↓-, ↓↑↓↑↓, ↑↓↑
Delay of line ‘k’ (Crosstalk Class/Type) 0 (1) 1 (2) 1+λ (3) 1+2λ (4) 1+3λ (5) 1+4λ (6)
262
B.K. Kaushik, S.K.Verma, and B. Singh
2 Power and Crosstalk Reduction Using Bus Inversion Encoding Bus encoding [5-11] is widely used technique to reduce dynamic switching power and crosstalk (signal noise, delay) during data transmission on buses. Low power encoding techniques and crosstalk aware encoding technique transform the data being transmitted on buses in such a manner, so that self-switching activity and crosstalk coupling effect is reduced. Bus encoding schemes are classified according to the type of code used (algebraic, permutation, or probability based), the degree of encoding adaptability (static or dynamically adaptable encoding), the targeted capacitance for switching reduction (self, coupling, or both), the amount of extra information needed for coding (redundant or irredundant coding), and the method of encoding implementation (hardware, software, or a combination of the two). Certain optimizations in crosstalk reduction can have multiple benefits associated with them, such as power reduction, signal delay reduction and noise reduction [1]. Bus Invert (BI) coding proposed by Stan and Burleson [8-9] is based on the inversion of data in data line. This encoding scheme compares the present state of ‘n’ bit line with the previous state of ‘n’ bit data line. If the number of transition with respect to the previous line are greater than the half of the n bit line i.e. n/2, then the entire n bit data line is inverted. The inversion of data is indicated by an extra line. Shin et al. [10] proposed a similar technique known as partial bus invert (PBI) coding, where the bit lines are considered for inversion in smaller groups instead of the entire bit line. The grouping of bit lines depends upon the probability of transition of bit lines. The lines are kept in same group if they have same or highest probability of transition and therefore bus invert coding is applied separately to each group. Thus, as the number of groups increases the redundancy of encoding also increases. Yoo et al. [7] further extended PBI encoding scheme by proposing interleaved PBI technique. This encoding scheme changed groupings dynamically thus enhancing the performance by reducing overall transition activity with respect to previous state. Zhang et al. [11] extended BI scheme to propose an improved technique i.e. odd/even bus-invert code (OE-BI) which reduces power dissipation by decreasing coupling transitions on the bus. This technique is based on the fact that coupling capacitors are charged and discharged by the activity on the neighboring interconnects, where one interconnect bus may have an odd number or even number of interconnects. The coupling activity is reduced by independently controlling the odd and even bus lines with two separate lines, the Odd Invert, and Even Invert line, respectively. Four possible cases are considered i.e. no bus lines inverted (00), only odd lines inverted (10), only even lines inverted (01) and all lines inverted (11). Zhang et al. had chosen a value with the smallest coupling activity to transmit on the bus. They observed that even after encoding, the coupling activity for a pair of bus lines are still strongly dependent on the data. In particular the toggling sequences 01→10 and 10→01 resulted in 4 times more coupling energy dissipation than other coupling events. Thus they proposed a targeted Two-Phase transfer in order to reduce total power only on the pairs of lines that carried such toggling events. Jayaprakash et al. [12] proposed Partitioned Hybrid Encoding (PHE) technique where the bus is partitioned optimally and the most efficient energy scheme is applied independently to each partition. The BI and OEBI schemes are invariably applied in the PHE technique. Using dynamic programming, the partitioning and choice of applying a particular scheme to a partition is solved.
Encoding in VLSI Interconnects
263
Baek et al. [13] proposed a Low Energy Set Scheme (LESS) where XOR-XNOR (XON type) or XNOR-XOR (XNO type) operations are used to transmit data. In XON technique, the bus is divided into the group of four bits and performs the XOR operation on most significant two bits of current data with the previous data. The XNOR operations are performed on least significant two bits with the previous data. The use of XON and XNO depends on the encoding rule defined by the behavior of bit sequence to minimize the energy-delay and self-switching on the bus. Lyuh et al. [14] proposed EN_shield-Ip encoding scheme which simultaneously reduces power and crosstalk delay. This encoding scheme takes the probability graph as an input. Assuming a code length of m-bit (m > n), it creates a codeword graph ensuring that there is no Type-4 crosstalk. For each input data a suitable code is evaluated, which is an NP-complete problem. Subrahmanya et al. [15] proposed No Adjacent Transition (NAT) coding scheme which claims to reduce power consumption and eliminate worst case crosstalk. A combination of transition signaling [14] and LWC schemes [8] are used in NAT coding scheme. It limits the number of 1’s in every transmitted code word to a constant ‘m’. This approach with transition signaling assures that number of transitions per transmission is bounded by a constant. Figure-1 shown a block diagram of the proposed (n, b, t) NAT code, where ‘b’ refers to the number of bits in the input to be encoded, ‘n’ is the number of bits in encoded output and ‘t’ decides the maximum number of 1’s allowed in output. B-bit input
B-bit output
(n, b, t)- NAT Encoder
n-bit Transition Signaling Encoder
(n, b, t)- NAT Decoder
n-bit Transition Signaling Decoder
Fig. 1. (n, b, t) NAT encoder decoder schemes [15]
Sridhara et al. proposed an overlapping coding [16], which divides the bus into subchannels like partial coding technique. The two adjacent sub-channels overlap at their boundary. Let ‘m’ and ‘n’ are the number of code and data bits in the sub-channel respectively. ‘n’ data bits are mapped to the central ‘m-2’ bits of the code-words, and the boundary bits of data-words form the boundary bits of code-words. This technique eliminates crosstalk delay by using forbidden pattern overlapping codes to avoid overlapping caused crosstalk delay in the boundary bits without using shielding wire. Khan et al. [17] proposed an encoding scheme where the incoming data is encoded such that the Type-3 and Type-4 crosstalk are eliminated. The authors considered the three lines of lumped model and derived the energy function as E1 = CL .{(1 + λ )(V1 f − V1i ) − λ.(V2f − V2i )} .V1 f E 2 = C L .{− λ .(V1 f − V1i ) + (1 + 2.λ ).(V2 f − V2i ) − λ .(V3 f − V3i )} .V2 f E3 = C L .{−λ .(V2 f − V2i ) + (1 + β ).(V3 f − V3i )} .V3 f
E = E1 + E 2 + E 3
(3a) (3b) (3c) (3d)
264
B.K. Kaushik, S.K.Verma, and B. Singh
where V1i , V2i , V3i ; V1 f , V2f , V3 f and E1, E2, E3 are initial state, final state and energy of the three wires respectively. The encoding scheme is based on the intrinsic property of 4-bit sequence. A 4-bit bus can have sixteen 4-bit sequences (4-tuples). If any one of 4-tuple is modulo-2 summed with the functions Z1 (0101) and Z2 (1010) and compared with remaining 4-tuples, authors observed that one of the two XOR’d data have no Type-4 switching with respect to the remaining fifteen 4-tuples. Verma et al. [18] proposed a similar kind of the encoding scheme with modification. The authors identify the worst case crosstalk in the actual five data stream and inverted data streams. If actual data stream has worst case crosstalk then inverted data is sent. Invert line is used for indicating the inversion data. Avinash et al. [19] proposed a novel spatio-temporal bus encoding scheme to minimize the crosstalk in interconnects. The proposed scheme eliminates crosstalk classes 4, 5 and 6 among the interconnect wires, thereby reducing delay and energy consumption. The proposed scheme has the feature of built-in error detection without any performance overhead. For an ‘n’ bit bus, the total number of bits to be sent are divided into two equal groups of ‘n/2’ bits and the data of two groups are further divided in blocks of four. 4-bit bus encoding technique is applied to avoid crosstalk of type 4, 5 and 6 (Table-1). Encoded data Tx from the group 1 is sent in first temporal cycle while the encoded data Tx+1 from second group in second temporal cycle. Lampropoulos et al. [20] proposed modified bus invert (MBI) scheme to reduce inductive crosstalk. This scheme inverts the data pattern to minimize transition in the same direction. The bus lines are partitioned into pairs and each pair of adjacent interconnects as well as their previous values are the inputs of the logic cell. The logic cell encodes the values occurring in the pair of the bus interconnects to reduce the inductive coupling. Tu et al. [21] proposed a similar encoding technique that utilizes BI scheme to remove the inductive effects. The input data is inverted when number of transitions in the same direction is more than half of the number of bus interconnects. Raghunandan et al. [22] proposed an encoding scheme that uses selective bit inversion and shield bit insertion to reduce inductive crosstalk delay and simultaneous switching noise (SSN). In this scheme, input data is broken into chunks of 4-bits and this chunk is coded as two 3-bit data tuples. Two tuples are sent in two successive cycles. SSN check is used to determine whether two or more simultaneous transitions are occurring for a set of 3-bit with respect to previous data. If SSN checker detects two or more simultaneous transitions then the 3-bits of a 4-bit chunk are inverted before sending in the first cycle and the coding bit value is put at high impedance state. Using high impedance state in place of ‘1’ for coding bit will not allow any current flow which effectively reduces both power and inductive coupling effects. Chen et al. [23] developed a mathematical model for a memoryless encoding scheme where the encoding and decoding circuitries are implemented using simple combinational logic. Furthermore, they propose a novel partitioning method for significant reduction of transition energy dissipation due to coupling capacitance between adjacent wires. Low-swing signaling schemes inherently consume less since parasitic capacitances need not require to full charging or discharging at each signal transition. However, these techniques suffer from abated signal to noise ratio as the supply voltage scales in DSM technologies. Lin et al. [24] proposed an encoding scheme assuming coplanar bus structure driven by uniformly sized and symmetrical driver. With the given parameters of wires
Encoding in VLSI Interconnects
265
(length, width, height and pitch), delay constraint, working frequency and the number of data bit n, a valid code set is generated that has the minimal total transition power to map the data patterns. The valid code set is obtained at the cost of extra bus wires with condition that any transition between codes should necessarily meet the delay constraint. It is observed that valid code set has minimal total transition power to map the data patterns. It is versatile encoding scheme for on chip busses considering the given parameters to minimize the bus power consumption subject to the delay constraint by effectively reducing the LC coupling effect. Huang et al. [25] extended the Lin et al. [24] work to obtain a flexible bus encoding method for reduction of LC coupling delay in a model having user-defined bus structure, the working frequency, and the delay constraint. They built the bus structure based on given parameters, constraints and number of data and extract the RLC parameter. After extracting the RLC parameter, transition graph is prepared and obtain the maximum clique. If the code set covers all data patterns, then the valid code set is received, otherwise one more bit is added to code pattern and reiterate from building of bus structure again. Akl et al. [26, 27] proposed Transition Skewing Coding (TSC) scheme for reduction of power dissipation and area. This scheme targets many of the global interconnects challenges such as crosstalk, peak energy and current, switching and leakage power, repeaters area, wiring area, signal integrity and noise. TSC reduces the number of active signal wires used for communication by encoding two signals in a single wire transition, and transmitting the information of the two encoded signals via a single wire. The information is recovered based on the time the transition occurs. This technique achieves good noise immunity and eliminates crosstalk by using shields between every two wires without increasing the total number of wires. The total number of repeaters is reduced considerably leading to a reduction in devices area and leakage power. Peak energy and peak current are reduced due to the reduction in simultaneous transitions on the bus. The average power reduction increases as the inputs switching activity increases.
3 Encoding Techniques Based on Dictionary or Codebook This section presents encoding techniques based on memory or cache and dictionary for reduction of transitions. The dictionary used for encoding and decoding purpose is made available at encoder and decoder side with both copies synchronized and updated. Sundarajan and Parhi [28] proposed code word slimming technique, which pertained to reduction of code size. The words are transmitted in bit-serial and wordparallel manner whereas; traditionally the data was transmitted word-serial and bitparallel transmission order. The code word weight has to be within a limited value. Applications for the proposed scheme include VLSI systems implemented in a bitserial manner and bit-parallel systems where the data-format converters can be used without much power penalty. Ramprasad et al. [29] proposed a source-coding framework for the design of coding schemes to reduce transition activity. These schemes are claimed to be suitable for high-capacitance busses where the extra power dissipation due to the encoder and decoder circuitry is offset by the power savings at the bus. In this framework, a data source (characterized in a probabilistic manner) is first passed through a decorrelating
266
B.K. Kaushik, S.K.Verma, and B. Singh
function f1. Thereafter, a variant of entropy coding function f2 is employed, which reduces the transition activity. The framework is then employed to derive novel encoding schemes whereby practical forms for f1 and f2 are proposed. The proposed framework claims to dissipate low-power and capable to characterize schemes. Komatsu et al. [30] proposed adaptive code-book based encoding technique which is an extension to static codebook technique. In this method, data transition activity on bus signals is lowered by data encoding similar to the vector quantization (VQ). Transferred data on bus are the quantized vector numbers along with the Hamming difference between the original data and the quantized vector. The code-book is regularly updated with new source code. This scheme has the overhead of using additional bus lines for sending the code word as well as the code book tables at the transmitter and the receiver. Probabilistic encoding scheme proposed by Benini et al. [31] is based on detailed statistical characterization of the target data stream so as to minimize the number of transitions on the bus. But this kind of scheme is feasible for narrow bus widths only. For wider buses, the exact approach is not applicable, since the complexity of the synthesized encoding/decoding logic may become unmanageable. For wider buses, two classes of encoding algorithms are proposed. The first one reduces the size of the codec by separately considering clusters of bus lines instead of the entire bus width; the second scheme re-encodes only a subset of the patterns being transmitted, However, if cluster size increases, it leads to the complex designing of encoder and decoder with which construction time for the code table also increases. Transition Pattern Coding scheme (TPC) proposed by Sotiriadis and Chandrakasan [32] is used for reduction of coupling power in the data bus with encoding. This scheme creates transition matrix for selecting codeword patterns such that neighboring bus line changes values in the same direction. Thus coupling capacitance and inter-wire energy is reduced. The scheme is suited for smaller buses only, however, for wider buses the entire bus is split into smaller groups and TPC is applied separately on each of these smaller groups. Xie et al. [33] studied the bus grouping method in TPC and applied genetic algorithm for grouping and found more energy saving using genetic algorithm. Raun et al. [34] proposed a low power dynamic bus encoding scheme known as Dictionary-based Segmented Inversion Scheme (DS2IS). This encoding scheme reduces capacitive and inductive effects by the measurement of real RLC model. The authors proposed a real RLC driven signal encoding scheme for low power dynamic bus design. Based on the RLC effects characterized by the real bus model, capacitive and inductive effects are minimized dynamically while transmitting signal transitions onto the bus. The Bus Encoded Dictionary (BED) is used for the determination of the coupling effect based on the power consumption in the case of inversion of data and original data. BED stores power consumption for original and inverted input pattern, where higher value shows the worst coupling effect. For better performance, encoder sends less valued pattern of BED through the bus. Komatsu and Fujita [35] proposed irredundant address bus encoding methods which reduce signal transitions on the instruction address buses by using adaptive codebook methods. These methods are based on the temporal locality and spatial locality of instruction address. The proposed encoding methods reduce the signal transitions on the instruction address buses by an average of 88%.
Encoding in VLSI Interconnects
267
Mamidipaka et al. [36] proposed an adaptive self-organizing list code for reduction of switching activity on instructive and data address bus. It uses a list to create one to one mapping between address and code. To reduce switching activity on the bus, the list is reorganized in every clock cycle in such a manner so that the most frequently used address is mapped with the code having fewer 1’s. The proposed technique does not require any extra bit lines and have minimal delay overhead. Sridhara et al. [37] proposed code construction scheme where the data is first passed through a nonlinear source coder that reduces self and coupling transition activity and imposes a constraint on the peak coupling transitions on the bus. Thereafter, a linear error control coder adds redundancy to enable error detection and correction. The framework efficiently combines existing codes to derive novel codes that span a wide range of tradeoffs between bus delay, codec latency, power, area, and reliability. The authors derived fundamental limits on the number of wires required for crosstalk avoidance codes and error correction as shown in Figure-2. k
l
Crosstalk Avoidance Code (CAC)
m
m Error Correction Code (ECC)
Linear Crosstalk Avoidance Code (LXC)
c
Fig. 2. Code construction for joint crosstalk avoidance and error correction [37]
Zhang et al. [38] presented a bus encoding method based on codeword selection for enduring crosstalk-induced effects, which avoid crosstalk and provide error correction as well. The method finds a subset from crosstalk avoidance code (CAC) to provide single error correction. It also avoids crosstalk induced by late signal transition on checking bits with no extra wires.
4 Conclusion This paper reviewed various encoding schemes for reduction of power dissipation delay, crosstalk and area. It is observed that with time and advancement of VLSI designs the complexity of encoding schemes has also increased dramatically. Over the time different researchers have mainly concentrated to reduce the signal transitions in the bus. Crosstalk effects and power dissipations are generally observed due to transitions in the signals which further cause uncertainty in delay. Thus transition reduction has been prime choice in various encoding schemes. Researchers have also suggested various schemes based on dictionary to curb down the effects of crosstalk from micrometer to nanometer technology.
References 1. Kang, S.-M., Leblebici, Y.: CMOS Digital Integrated Circuits: Analysis and Design. Tata McGraw Hill Edition (2003) 2. Arunachalam, R., Acar, E., Nassif, S.R.: Optimal Shielding/spacing metrics for low power design. In: IEEE Computer Society Annual Symposium on VLSI, pp. 167–172 (2003)
268
B.K. Kaushik, S.K.Verma, and B. Singh
3. Basu, K., Choudhary, A., Pisharath, J., Kandemir, M.: Power protocol: Reducing power dissipation on off-chip data buses. In: 35th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 345–355 (2002) 4. Hirose, K., Yasuura, H.: A bus delay reduction technique considering crosstalk. In: Design, Automation and Test in Europe Conf. and Exhibition (DATE), pp. 441–445 (2000) 5. Victor, B., Keutzer, B.: Bus encoding to prevent crosstalk delay. In: IEEE/ACM International Conference on Computer Aided Design (ICCAD), pp. 57–63 (2001) 6. Verma S.K., Kaushik B.K.: Encoding Schemes for the Reduction of Power Dissipation, Crosstalk and Delay: A Review. International Journal of Recent Trends in Engineering, 74–79 (2010) [ISSN 1797-9617] 7. Yoo, S., Choi, K.: Interleaving partial bus-invert coding for low power reconfiguration of FPGAs. In: 6th International Conference on VLSI and CAD (ICVC), pp. 549–552 (1999) 8. Stan, M.R., Burleson, W.P.: Low-power encodings for global communication in CMOS VLSI. IEEE Trans. on VLSI (TVLSI) 5(4), 444–455 (1997) 9. Stan, M.R., Burleson, W.P.: Bus-invert coding for low-power I/O. IEEE Trans. on Very Large Scale Integration Systems (TVLSI), 49–58 (1995) 10. Shin, Y., Chae, S., Choi, K.: Partial bus-invert coding for power optimization of application-specific systems. IEEE Trans. on VLSI 9(2), 377–383 (2001) 11. Zhang, Y., Lach, J., Skadron, K., Stan, M.R.: Odd/even bus invert with two-phase transfer for buses with coupling. In: International Symposium on Low Power Electronics and Design (ISLPED), pp. 80–83 (2002) 12. Jayaprakash, S., Mahapatra, N.R.: Partitioned hybrid encoding to minimize on-chip energy dissipation of wide microprocessor buses. In: 20th Intl. Conf. on VLSI Design, pp. 127–134 (2007) 13. Baek, K.H., Kim, K.W., Kang, S.M.: A low energy encoding technique for reduction of coupling effects in SOC interconnects. In: 43rd IEEE Midwest Symposium Circuits and Systems, pp. 80–83 (2000) 14. Lyuh, C.-G., Kim, T.: Low power bus encoding with crosstalk delay elimination. In: 15th Annual IEEE International ASIC/SOC Conference, pp. 389–393 (2002) 15. Subrahmanya, P., Manimegalai, R., Kamakoti, V., Mutyam, M.: A bus encoding technique for power and cross-talk minimization. In: 17th Intl. Conf. on VLSI Design, pp. 443–448 (2004) 16. Sridhara, S.R., Shanbhag, N.R.: Coding for Reliable On-Chip Buses: A Class of Fundamental bounds and Practical Codes. IEEE Trans. on CAD of Integrated Circuits and Systems 26, 977–982 (2007) 17. Khan, Z., Arslan, T., Erdogan, A.T.: A novel bus encoding scheme from energy and crosstalk efficiency perspective for AMBA based generic SoC systems. In: 18th International Conference on VLSI Design, pp. 751–756 (2005) 18. Verma, S.K, Kaushik, B.K.: Crosstalk and Power Reduction Using Bus Encoding in RC Coupled VLSI Interconnects. In: 3rd International Conference on Emerging Trends in Engineering and Technology (ICETET) (2010) (in-press) 19. Avinash, L., Krishna, M.K., Srinivas, M.B.: A novel encoding scheme for delay and energy minimization in VLSI Interconnects with built-in error detection. In: IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 128–133 (2008) 20. Lampropoulos, M., Al-Hashimi, B.M., Rosinger, P.: Minimization of crosstalk noise, delay and power using a modified bus invert technique. In: Design, Automation and Test in Europe Conference and Exhibition (DATE), vol. 2, pp. 1372–1373 (2004)
Encoding in VLSI Interconnects
269
21. Tu, S.-W., Chang, Y.-W., Jou, J.-Y.: RLC coupling-aware simulation and on-chip bus encoding for delay reduction. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems 25(10), 2258–2264 (2006) 22. Raghunandan, C., Sainarayanan, K.S., Srinivas, M.B.: Bus-encoding technique to reduce delay, power and simultaneous switching noise (SSN) in RLC interconnects. In: 17th ACM Great Lakes Symposium on VLSI (GLS-VLSI), pp. 371–376 (2007) 23. Chen, G., Duvall, S., Nooshabadi, S.: Analysis and design of memoryless interconnect Encoding Scheme. In: IEEE InternationalSymp. on Circuits and Systems (ISCAS), pp. 2990–2993 (2009) 24. Lin, T.-W., Tu, S.-W., Jou, J.-Y.: On-Chip Bus Encoding for Power Minimization under Delay Constraint. In: International Symposium on VLSI Design Automation and Test (VLSI-DAT), pp. 1–4 (2007) 25. Huang, J.-S., Tu, S.-W., Jou, J.-Y.: On-Chip Bus Encoding for LC Crosstalk Reduction. In: IEEE VLSI-TSA International Symposium on VLSI Design, Automation and Test, pp. 233–236 (2005) 26. Akl, C.J., Bayoumi, M.A.: Transition Skew Coding: A Power and Area Efficient Encoding Technique for Global On-Chip Interconnects. In: Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 696–701 (2007) 27. Akl, C.J., Bayoumi, M.A.: Transition Skew Coding for Global On-Chip Interconnects. IEEE Trans. on VLSI Systems 16(8), 1091–1096 (2008) 28. Sundararajan, V., Parhi, K.K.: Reducing bus transition activity by limited weight coding with codeword slimming. In: 10th Great Lakes Symp. on VLSI, pp. 13–16 (2000) 29. Ramprasad, S., Shanbhag, N.R., Hajj, I.N.: A coding framework for low-power address and data buses. IEEE Trans. on VLSI Systems (TVLSI) 7(2), 212–221 (1999) 30. Komatsu, S., Ikeda, M., Asada, K.: Low power chip interface based on bus data encoding with adaptive code-book method. In: 9th Great Lakes Symposium on VLSI (GLS-VLSI), pp. 368–371 (1999) 31. Benini, L., Macii, A., Poncino, M., Scarsi, R.: Architectures and synthesis algorithms for power-efficient bus interfaces. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems 19(9), 969–980 (2000) 32. Sotiriadis, P.P., Chandrakasan, A.: Reducing bus delay in submicron technology using coding. In: Asia South Pacific Design Automation (ASP-DAC), pp. 109–114 (2001) 33. Xie, L., Qiu, P., Qiu, Q.: Partitioned bus coding for energy reduction. In: Asia and South Pacific Design Automation Conference (ASP-DAC), vol. 2. pp. 1280–1283 (2005) 34. Raun, S.J., Tsai, S.F.: DS2IS-Dictionary based segmented inversion scheme for low power dynamic bus design. Journal of System Architecture 54(1-2), 324–334 (2008) 35. Komatsu, S., Fujita, M.: Irredundant address bus encoding techniques based on adaptive codebooks for low power. In: Asia South Pacific Design Automation Conf. (ASP-DAC), pp. 9–14 (2003) 36. Mamidipaka, M.N., Hirschberg, D.S., Dutt, N.D.: Adaptive low-power address encoding techniques using self-organizing lists. IEEE Trans. on Very Large Scale Integration Systems (TVLSI) 11(5), 827–834 (2003) 37. Sridhara, S.R., Shanbhag, N.R.: Coding for Reliable On-chip Buses: A Class of Fundamental Bounds and Practical Codes. IEEE Trans. on CAD of Integrated Circuits and Systems 26(5), 977–982 (2007) 38. Zhang, Y., Huawei, L., Xiaowei, L., Hu, Y.: Codeword Selection for Crosstalk Avoidance and Error Correction on Interconnects. In: 26th IEEE VLSI Test Symp (IEEE-VTS), pp. 377–382 (2008)
Vertical Handover Efficient Transport for Mobile IPTV Salah S. Al-Majeed and Martin Fleury University of Essex, Colchester, United Kingdom {ssaleha,fleum}@essex.ac.uk
Abstract. The success of IPTV suggests that an expansion to mobile devices is likely. A key difference between IPTV delivery to mobile devices and broadband access is the possibility of vertical handovers, which can cause disruption to real-time video streaming. This paper proposes a lightweight form of IPTV transport based on negative acknowledgments. The performance of the scheme is analyzed in comparison to an industry-standard congestion controller and baseline UDP transport. A selective acknowledgment variation of the scheme is also examined. The paper shows both proposed schemes result in better mean video quality (by as much as 4 dB) but that the non-selective scheme is better in the presence of vertical handovers. The paper presents a case study which emulates an IPTV streaming architecture with handovers between IEEE 802.16e (WiMAX) broadband wireless and an 802.11 network. Keywords: IEEE 802.16e, IEEE 802.21, media transport, mobile IPTV, vertical handover, WiMAX.
1 Introduction Next Generation Mobile Networks (NGMN) [1] will consist of a number of overlapped heterogeneous networks, allowing the user to seamless pass between them through the process of vertical handover (VHO). VHO can be accomplished with the Media Independent Handover (MIH) part of the IEEE 802.21 standard [2] or through all-IP framing as integrated within the IP Multimedia Subsystem (IMS) initiative [3]. MIH is a looser way of approaching the goal of NGMN through optimized signaling between access points (APs) and/or base stations (BSs). This paper considers the MIH approach, which appears now to be gaining ground in the marketplace relative to IMS. The scenario considered is a mobile device crossing between an IEEE 802.16e (mobile WiMAX) [4] network to an IEEE 802.11 WLAN and vice versa, representing movement from indoors to outdoors reception. However, the general principles of efficient transport extend to transitions between other types of network. In this scenario, even though IEEE 802.21 attempts to harmonize signaling, the extent of signally differs between the two network types because of the relative complexity of WiMAX, which adds quality-of-service management to the underlying transmission system. The contribution of the paper is a transport method tuned to the needs of IPTV (Internet Protocol TV) [5], which paves the way for new services such as timeshifted TV and TV-on-demand. The proposed transport method improves upon standardized TCP-friendly Rate Control (TFRC) [6]. TFRC forms the basis of the S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 270–282, 2011. © Springer-Verlag Berlin Heidelberg 2011
Vertical Handover Efficient Transport for Mobile IPTV
271
video phone extension to mobile Google Talk [7], and as such improved performance with a different, more VHO sympathetic approach to video transport will have a practical significance. There appear to be two ways to improve handover management for real-time services. The first is to reduce the latency of the network selection process [8] and/or the mobility management [9] [10]. The second is to improve performance at higher layers of the protocol stack. The two methods are not incompatible, though the concentration herein is to adapt the transport scheme to the needs of handover and video streaming. It is also possible to act at the application layer through increased protection against packet loss and delay [11] [12][13], the consequences of service interruption during handover. However, it appears that little attention has been given to transport-layer improvements for VHO, which is the novelty of this paper’s contribution. The proposed scheme is a modification of UDP transport which is applied at the application layer of the protocol stack. Though UDP streaming has been used for broadband wireless access [14], UDP packet losses can seriously harm a compressed video stream. This is due to the predictive nature of video coding which operates through motion compensation and entropy coding. Bell Labs introduced a reliable form of UDP, R-UDP, see [15], and there is also an R-UDP protocol employed by Microsoft in their MediaRoom product for IPTV service delivery over multicast networks. In the present scheme, UDP is supplemented with negative acknowledgments (NACKs) whenever a packet is lost for the first time. To avoid additional latency, the receiver only requests retransmission once. In the paper, this non-selective scheme is contrasted with selective NACKs, which are reserved for lost anchor frame packets from the video stream. Both are assessed for their performance during VHOs and compared to TFRC and raw UDP transport. For ease of reference in the following, the NACK enhancement to UDP is called broadband video streaming (BVS). The remainder of this paper is organized as follows. Section 2 presents essential background knowledge to understand the results of the paper, before Section 3 outlines the proposed scheme. Details of the simulation settings used to evaluate the proposal are given in Section 4. Section 5 presents those results, while Section 6 presents conclusions and future work, based on the findings of Section 5.
2 Background This Section introduces essential background to an understanding of the findings. 2.1 Handover Procedures While horizontal handover is concerned with migration between homogeneous networks, vertical handover is more intricate as it involves signaling between heterogeneous networks. Handovers are either soft in which the previous connection is kept alive until the new connection is made or hard in which the previous connection is broken before the current one is made. Handovers can be: entirely controlled by a mobile device; be assisted by the mobile device though executed by
272
S.S. Al-Majeed and M. Fleury
the network based on connection information at the mobile; or initiated by the network without any action by the mobile device. Handover consists of: detection of a new network and selection of that network based on channel conditions; resource allocation as a new connection is established; and the update of routes and forwarding of data over the new connection. Mobile WiMAX supports three handoff mechanisms, but only the mandatory Hard Handover (HHO) at layer 2 can be accomplished with a single channel at any one time, thus reducing equipment cost and improving base station (BS) capacity. HHO employs a break-before-make procedure which reduces signalling. As is normal, a mobile subscriber station (SS) monitors signal strength from adjacent BSs, employing a hysteresis mechanism to avoid thrashing between BS. The SS must then: obtain uplink and downlink parameters; negotiate capabilities; gain security authorisation and exchange keys; register with the BS; and establish connections. It is expected that these mechanisms will be subsumed in the emerging IEEE 802.11.21 [2]. 2.2 IPTV Video Streaming A typical delivery chain [5], refer to Fig. 1, is from a Video Hub Office (VHO) (not to be confused with VHO for vertical handover), perhaps receiving Digital Video Broadcasting (DVB) from a satellite, then over an IP-framing based network. A Video Serving Office lies at the edge of the IP network to distribute video to different types of access network. In the Figure, a WiMAX BS and an IEEE 802.11 Access Point (AP) act as potential delivery routes for a multi-homed mobile SS device. Notice that to facilitate fast channel swapping [5] it is likely that through intelligent management, content will be placed relatively close to the receiver. This implies that a feedback channel for acknowledgments will not experience long delays, which is one reason why reinforcing UDP with NACKs seems appropriate for IPTV. Metro network
VHO Live TV channels Time-shifted TV Video on Demand
Access network
VSO IP network
Downlink streaming
802.11 AP
WiMAX base station
Mobile subscriber station
Fig. 1. IPTV video content delivery architecture
As a test, the Paris sequence H.264/AVC (Advanced Video Coding) codec [16] Variable Bit-Rate (VBR)-encoded at 30 frame/s with Common Intermediate Format (CIF) (352 × 288 pixel/frame) with quantization parameter (QP) set to 26 (from a range 0 to 51). The Peak-Signal-to-Noise Ratio (PSNR) for this sequence without packet loss is 38 dB. The slice size was fixed at the encoder at 900 B. In this way the
Vertical Handover Efficient Transport for Mobile IPTV
273
risk of network segmentation of the packet was avoided. Consequently, the risk of loss of synchronization between encoder and decoder was reduced. Paris consists of two figures seated round a table in a TV studio setting, with moderate to high spatialcoding complexity and moderate motion. Quality-of-experience tests show [17] that this type of content is favored by users of mobile devices as it does not stretch the capabilities of the screen display (as for instance sport sequences would do). The Intra-refresh rate was every 15 pictures with an IPBB…I coding structure. In the prioritized version of 1065 frames were transmitted resulting in a video duration of 35.5 s. Simple previous frame replacement was set for error concealment at the decoder as a point of comparison with others’ work. Other forms of error concealment increase decoder complexity. The presence of I-pictures rather than gradual decoder refresh enables channel swapping for IPTV and in [5] a means of accelerating critical channel zapping time from network caches was demonstrated. This further encourages the use of the proposed NACK scheme. 2.3 IPTV Transport As mentioned in Section 1, various methods of improving upon UDP offer the possibility of improved media transport without the overhead of application layer congestion control superimposed upon UDP transport. This is particularly the case if the latency between the streaming server and the mobile device is relatively small. Because content management can bring the server closer to the access network reduced latencies are likely to occur. In fact, Agilent recommend [18] a maximum cumulative delay factor of between 9 and 50 ms for IPTV delivery over a network. Raw UDP has been used for IPTV transport over IEEE 802.16e systems [14] but the principle interest in this paper, as the proposal improves upon UDP, is to use UDP as a baseline. The alternative selected at the industry level is a standards-based form of transport. TFRC [6] or its variant Datagram Congestion Control Protocol (DCCP) [19], which adds connection set-up, have been used directly [20] or in cross-layer form [21] for video streaming in research work on wireless networks. In the simulations, the inter-packet sending time gap was varied according to the TFRC equation [6]. As described in [6], TFRC is a receiver-based system in which the packet loss rate is found at the receiver and fed-back to the sender in acknowledgment messages (through TCP in the simulations). The sender calculates the round-trip time from the acknowledgment messages and updates the packet sending rate. A throughput equation models TCP New Reno to find the sending rate: TFRC (t rtt , t rto , s , p ) = t rtt
s § 2bp 3bp + t rto min ¨¨1,3 3 8 ©
· ¸ p (1 + 32 p 2 ) ¸ ¹
(1)
where trtt is the round-trip time, trto is TCP’s retransmission timeout, s is the segment size (TCP’s unit of output) (herein set to the packet size), p is the normalized packet loss rate, wm is the maximum window size, and b is the number of packets acknowledged by each ACK. b is normally set to one and trto = 4trtt. It is important to notice that trto comes to dominate TFRC’s behavior in high packet loss regimes. Clearly packet loss and round-trip time cause the throughput to decrease in (1), whereas other terms are dependent on these two variables in the denominator.
274
S.S. Al-Majeed and M. Fleury
3 NACK-Based Scheme Fig. 2 is a general representation of the processing involved in the scheme, which for convenience of reference, as previously mentioned, we name Broadband Video Streaming (BVS), showing the NACK response of the receiver. The following describes the operation assuming downlink streaming from to an MS. At a mobile SS a record is kept of packet sequence numbers available through the Real Time Protocol (RTP) header and, if an out-of-sequence packet arrives, a NACK may be transmitted to a BS or Access Point (AP) for forwarding to the video server. The SS only ransmits a NACK if this is the first time that particular packet has been lost. If it is the first time and the non-selective NACK version of BVS is in operation then a NACK is sent. However, if prioritized operation is in use then a decision is made according to the picture type of the video packet that has been lost, reflecting the importance to the reconstruction of the video of that packet. Video Application
Video Application
H.264/AVC Video Encoder
H.264 Video Decoder
Video Coding Layer I-pic.
P-pic.
B-pic.
Re-order
Network Abstraction Layer Buffer RTP packet
RTP packet Yes
NACK Buffer No Packets Queue
No Skip
Scheduler
Priority packet?
Yes No
Retransmit all? Yes
Initial Packet loss?
Transmission
Receiver
Wireless IP Network
Wireless IP Network
Fig. 2. Operation of BVS NACK enhancement to UDP
4 Simulation Scenario Simulation studies have been conducted for IEEE 802.16e (mobile WiMAX) with vertical handover to an IEEE 802.11 WLAN. To establish the behavior of VHO under
Vertical Handover Efficient Transport for Mobile IPTV
275
WiMAX the well-known ns-2 simulator augmented with a module from the Chang Gung University, Taiwan [22] that has proved an effective way of modeling IEEE 802.16e’s behavior. IEEE 802.21 was modelled with the NIST handover module for ns-2 [23] which is tied to the IEEE 802.11b model built into ns-2 operating at 11 Mbps (hence the use of this version of IEEE 802.11 in simulations). Simulation settings for 802.11b are given in Table 1. 25 runs per data point were averaged (arithmetic mean) and the simulator was first allowed to reach steady state before commencing testing. A Gilbert-Elliott channel model [24] modeled fast fading. The probability of remaining in the good state was set to 0.95 and of remaining in the bad state was 0.94, with both states modeled by a Uniform distribution. The packet loss probability in the good state was fixed at 0.01 and the bad state probability (PB) was made variable. We are aware that the channel could also be modelled more closely for propagation effects but it is the presence of burst errors [25] that mostly effect’s compressed vidoe quality. In fact, after testing for the impact of burst errors during VHO, future work will consist of assessment under the channel conditions specified in the Standard [4], i.e. propagation loss modelled by the COST 231 Walfisch-Ikegami model with Jake’s model for fast fading. Slow fading as a result of changes in the wireless envirnoment such as increased reflection from buildings or interior walls as result of motion can be provided for by a shadowing model, such as log-normal. Table 1. IEEE 802.11b parameter settings Parameter PHY Frequency band Bandwidth capacity Max. packet length used Raw data rate (downlink) AP transmit power Approx. range Receiving threshold
Value DSSS 2.4 GHz 20 MHz 1024 B 11 Mbps 0.0025 W 100 m 6.12e-9 W
To evaluate the proposal, transmission over WiMAX was carefully modeled. The PHYsical layer settings selected for WiMAX simulation are given in Table 2. The antenna heights and transmit power levels are typical ones taken from the Standard [4]. The antenna is modeled for comparison purposes as a half-wavelength dipole, whereas a sectored set of antenna on a mast might be used in practice to achieve directivity and, hence, better performance. The IEEE 802.16 Time Division Duplex (TDD) frame length was set to 5 ms, as only this value is supported in the WiMAX forum simplification of the Standard. The data rate results from the use of one of the mandatory coding modes [4] for a TDD downlink/uplink sub-frame ratio of 3:1. The BS was assigned more bandwidth capacity than the uplink to allow the WiMAX BS to respond to multiple mobile devices. Thus, the parameter settings in Table 1 such as the modulation type and physical-layer coding rate are required to achieve a datarate of 10.67 Mbps over the downlink. Buffer sizes were set to 50 packets (a single MAC Service Data Unit with a MAC Protocol Data Unit [26]). This buffer size was selected as it seems appropriate to mobile, real-time applications for which larger buffer sizes might lead both to increased delay and larger memory energy consumption in mobile devices.
276
S.S. Al-Majeed and M. Fleury
Fig. 2 shows a vertical handover scenario, in which streaming is from the VSO across the metropolitan IP network showing intermediate routers (=R) along with link capacities and latencies on the streaming path simulated. Nodes marked A and B inject traffic into the bottleneck link between them, as sources of congestion. Node A Table 2. IEEE 802.16e parameter settings Parameter PHY Frequency band Bandwidth capacity Duplexing mode Frame length Max. packet length Raw data rate (downlink) IFFT size Modulation Guard band ratio MS transmit power BS transmit power Approx. range to SS Antenna type Antenna gains MS antenna height BS antenna height Receiving threshold
Value OFDMA 5 GHz 10 MHz TDD 5 ms 1024 B 10.67 Mbps 1024 16-QAM 1/2 1/16 245 mW 20 W 1 km Omni-directional 0 dBD 1.2 m 30 m 7.91e-15 W
OFDMA = Orthogonal Frequency Division Multiple Access, QAM = Quadrature Amplitude Modulation, TDD = Time Division Duplex
sources to node B a CBR stream at 1.5 Mbps with packet size 1 kB and sinks a continuous TCP FTP flow sourced at node B. Node B also sources an FTP flow to the BS and a CBR stream at 1.5 Mbps with packet size 1 kB. The effect of traffic sourced from B is to cause some congestion to the returning TFRC acknowledgement.
5 Evaluation Frame-by-frame results in Fig. 3 isolate the effect on video quality (PSNR) of vertical handovers from IEEE 802.16e to 802.11b and vice versa. When the mobile SS loses connection to the IEEE 802.16e BS it is at 0.93 km distance from the BS, whereas when it connects to the IEEE 802.11b AP it is at 70 m distance from the AP. For ease of analysis, no other station is present in either the IEEE 802.16e or the 802.11b network. By default, the speed of movement of the mobile SS was 1 m/s (2.2 mph), i.e. as appropriate for somebody strolling between the BS and AP while using a mobile device. Notice that in a short study of the impact of different speeds [27] when going between these networks it was found that IEEE 802.18e’s performance actually improved at higher speeds (up to 10 m/s) because of IEEE 802.16e’s superior mobility management.
Vertical Handover Efficient Transport for Mobile IPTV
277
Fig. 3. Video streaming during the vertical handover scenario, MS = mobile subscriber station
Plots shown are raw UDP, TFRC, and the two varieties of the lightweight NACK scheme: that is BVS when all initially lost packets are NACKed and BVS-I when only I-picture bearing packets are NACKed. Because of the longer exposure of packets to channel conditions, video quality is generally worse during an IEEE 802.16e connection than a connection to 802.11. BVS is clearly better able to cope with the transition in Fig. 4a, while BVS-I only results in a limited gain over UDP. TFRC begins to recover quality after the handover but it does not compete with BVS. Notice that during the stable period between the handovers, re-sending only I-frames (BVS-I) is sufficient to maintain quality, with reduced throughput and reconstruction delay at the decoder. However, during worse channel conditions, Fig. 4b, BVS-I also results in lower quality. In the extended test, the mobile SS journeys from the BS to the AP and back again. An approximately equal time was spent under IEEE 802.16e and IEEE 802.11b streaming. As Table 3 shows for the two handover scenario, the TFRC response to both congestion and channel errors is to increase the inter-packet gap such that the total sending period of the 34.5 s clip grows to an unacceptable level (about 8 s longer than the display time), though all transport methods suffer from interruptions (freeze frame effects). TFRC also suffers from poor wireless channel utilization, as its throughput declines. Both the proposed schemes result in reduced packet loss, i.e. the packets lost even after a single attempt at retransmission. A consequence of retransmission is greater end-to-end packet delay, when delay is the aggregate of an initial transmission and any delay from resending a packet after a NACK. However, the delay in both cases is less than the 50 ms noted in Section 2.3. The main deficiency of BVS appears to be the delay that occurs during handovers, as the maximum end-to-end delay is high. However, it is actually the maximum period of interruption caused to frame display rather than the impact of delay from individual packets making up the frame that is significant. In this respect all systems appear to behave similarly.
278
S.S. Al-Majeed and M. Fleury
40
40 TFRC
35
PSNR ( dB )
PSNR ( dB )
UDP
30 25 20 0
20
40
35 30 25 20 0
60
Frame Index ( x 20 ) 40
40
35
35
20 40 Frame Index ( x 20 )
60
PSNR ( dB )
PSNR ( dB )
BVS-I
30 25 20 0
BVS 20 40 Frame Index ( x 20 )
30 25 20 0
60
20 40 Frame Index ( x 20 )
60
40
40
35
35
PSNR ( dB )
PSNR ( dB )
(a)
30 25
30 25 TFRC
UDP 20 40 Frame Index ( x 20 )
20 0
60
40
40
35
35
PSNR ( dB )
PSNR ( dB )
20 0
30 25
20 40 Frame Index ( x 20 )
30 25
BVS 20 0
20 40 Frame Index ( x 20 )
60
BVS-I 20 0
60
20 40 Frame Index ( x 20 )
60
(b) Fig. 4. Frame-by-frame video quality (a) during vertical handover from IEEE 802.11b to 802.16e (b) from IEEE 802.16e to 802.11b
Vertical Handover Efficient Transport for Mobile IPTV
279
Table 3. Summary at speed 2 mps of different transport schemes after two vertical handovers — the first from IEEE 802.16e to 802.11b and the second from IEEE 802.11b to IEEE 802.16e with data the mean of 25 simulation runs, PB = 0.15, speed 2 m/s
Throughput ( kbps) Sending period ( s ) Packet loss ( % ) Packet jitter ( s ) Mean packet end-to-end delay ( s ) Max. packet end-to-end delay (s ) PSNR ( dB ) Standard deviation (dB) Max. interruption (s)
UDP 838 35.48 3.84
TFRC 687 43.08 3.29
0.0077
0.0093
0.0141
0.0106
0.0680
0.0520
31.29 5.85 0.301
32.31 6.43 0.309
BVS 851 35.79 0.38 0.007 5 0.015 0 0.298 0 37.43 2.52 0.300
BVS-I 855 35.58 2.49 0.0076 0.0149 0.0790 33.01 6.09 0.301
An important point to note is that packet loss during handover is heavily dependent on speed, even at slow walking speeds. The packet loss reflects itself in the overall video quality which is shown for different speeds in Fig. 5. Notice that 2 mps or 4.5 mph is already fasting than walking speed of about 3.3 mph. Likewise, see Fig. 6, variation in channel condition can notably affect video quality, especially when communication to IEEE 802.11e because of the longer transmission times involved (as previously mentioned). However, though BVS suffers from poor channel conditions its video quality overall remains superior. P SNR ( dB ) 40 U DP T FRC BVS B V S-I 35
30
25 0
1
2
3
4
5
S peed ( mps )
Fig. 5. Variation of video quality for the different schemes with differing speed
280
S.S. Al-Majeed and M. Fleury PSNR ( dB ) 40 UDP PSNR BVS BVS-I 35
30
25 0
0.05
0.1
0.15
0.2
0.25
0.3
Wireless Channel Error Rate ( PB )
Fig. 6. Variation of video quality for the schemes with differing channel quality
6 Conclusion Next Generation Mobile Networks will support seamless motion across heterogeneous networks, thus raising user expectations that mobile IPTV will be able to follow the mobile device. In low latency conditions, this paper has proposed a lightweight transport method to minimize the impact of congestion control delays. In fact, the method seems to be sufficient in the presence of network congestion affecting the path from the video server to the mobile device. An interesting point is that TFRC, which requires an acknowledgment after every packet transmission can be more affected by congestion in the feedback path than the BVS scheme which only uses acknowledgments after the first packet loss. TFRC is also affected by its inability to distinguish between those packet losses due to congestion (on the streaming path) and those due to packet drops on the wireless channel. An interesting observation from the simulations is that the selective NACK version of BVS tested, i.e. BVS-I, did not perform as well as BVS itself if there were vertical handovers. However, during streaming to the IEEE 802.11 AP good video quality resulted with less mean packet delay and reduced throughput. This suggests that handover detection at the mobile SS will make a hybrid BVS/BVS-I scheme effective. Future work will investigate the value of such a combined scheme.
References 1. Munasinghe, K.S., Jamalipour, A.: Interworked WiMAX-3G Cellular Data Networks: An Architecture for Mobility Management and Performance Evaluation. IEEE Trans. Wireless Commun. 8(4), 1847–1853 (2009) 2. Das, S., et al.: IEEE 802.1: Media Independent Handover: Features, Applicability, and Realization. IEEE Commun. Mag. 47(1), 112–120 (2009)
Vertical Handover Efficient Transport for Mobile IPTV
281
3. Elmanagosh, A.A., Ashibani, M.A., Shatwan, F.B.: Quality of Service Provisioning Issue of Accessing IP Multimedia Subsystem via Wireless LANs. J. of New Technologies, Mobility and Security 3(1), 133–143 (2007) 4. IEEE, 802.16e-2005: IEEE Standard for Local and Metropolitan Area Networks. Part 16: Air Interface for Fixed and Mobile Broadband Wireless Access Systems (2005) 5. Degrande, N., Laevens, K., De Vleeschauwer, D.: Increasing the User Perceived Quality for IPTV services. IEEE Commun. Mag. 46(2), 94–100 (2008) 6. Handley, M., Floyd, S., Padhye, J., Widmer, S.: TCP-Friendly Rate Control (TFRC): Protocol Specification. Internet Engineering Task Force, RFC 3448 (2003) 7. Google Talk for developers. Google Talk Call Signaling (2010), online document at http://code.google.com/apis/talk/call_signaling.html#Video 8. Lee, S.K., Sriram, K., Kim, K., Kim, Y.H., Golmie, N.: Vertical Handoff Decision for Providing Optimized Performance in Heterogeneous Networks. IEEE Trans. Veh. Technol. 58(2), 865–881 (2009) 9. Bargh, M.S., et al.: Reducing Handover Latency in Future IP-based Wireless Networks: Proxy Mobile IPv6 with Simultaneous Bindings. In: Int. Symp. on a World of Wireless, Mobile and Multimedia Networks, pp. 1–6 (2008) 10. Kong, K.-S., Lee, W., Han, Y.-H., Shin, M.-K.: Handover LatencyAnalysis of a NetworkBased Localized Mobility Management Protocol. In: IEEE Int. Conf. on Commun., pp. 5838–5843 (2008) 11. Chen, L.-J., Yang, G.,Sun, T., Sanadidi, Y, Gerla, M.: Adaptive Video Streaming in Vertical Handoff: A Case Study. In: First Ann. Int. Conf. Mobile and Ubiquitous Syst., pp. 111–112 (2004) 12. Tsagkaropoulos, M., Politis, I., Dgaiuklas, T.: Enhanced Vertical Hanover Based on 802.21 Framework for Real-Time Video Streaming. In: ACM Mobimedia (2009) 13. Lee, D., Kim, J.W., Sinha, P.: Handoff-aware Adaptive Media Streaming in Mobile IP Networks. In: Int. Conf. on Info. Networking (2006) 14. Issa, O., Li, W., Liu, H.: Performance Evaluation of TV over Broadband Wireless Access Networks. IEEE Trans. Broadcast. 56(2), 201–210 (2010) 15. Partridge, C., Hinden, R.: Version 2 of the Reliable Data Protocol. RFC 1151 (1990) 16. Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the H.264/AVC Video Coding Standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003) 17. Agboma, F., Liotta, A.: Addressing User Expectations in Mobile Content Delivery. Mobile Information Systems 3(3/4), 153–164 (2007) 18. Agilent Technologies: Validating IPTV Service Quality under Multiplay Network Conditions. White Paper (2008) 19. Kohler, E., Handley, M., Floyd, S.: Datagram Congestion Control Protocol. Internet Engineering Task Force, RFC 4340 (2006) 20. Tappayuthpijam, K., Liebl, G., Stockhammer, T., Steinbach, E.: Adaptive Video Streaming over a Mobile Network with TCP-Friendly Rate Control. In: Int. Conf. on Wireless Commun. and Mobile Computing, pp. 1325–1329 (2009) 21. Görkemli, B., Sunay, M.O., Tekalp, A.M.: Video Streaming over Wireless DCCP. In: IEEE Int. Conf. on Image Processing, pp. 2028–2031 (2008) 22. Tsai, F.C.D., et al.: The Design and Implementation of WiMAX Module for ns-2 Simulator. In: Workshop on ns2: the IP Network Simulator article no. 5 (2006) 23. National Institute of Standards and Technology (NIST): Seamless and Secure Mobility, ns-2 IEEE 802.21 module, http://w3.antd.nist.gov/seamlessandsecure/ (accessed July 2010)
282
S.S. Al-Majeed and M. Fleury
24. Haßlinger, G., Hohlfeld, O.: The Gilbert-Elliott Model for Packet Loss in Real Time Services on the Internet. In: 14th GI/ITG Conf. on Measurement, Modelling, and Evaluation of Computer and Commun. Systems, pp. 269–283 (2008) 25. Liang, Y.J., Apostolopoulos, J.G., Girod, B.: Analysis of Packet Loss for Compressed Video: Effect of Burst Losses and Correlation between Error Frames. IEEE Trans. Circuits Syst. Video Technol. 18(7), 861–874 (2008) 26. Fleury, M., Razavi, R., Saleh, S., Al-Jobouri, L., Ghanbari, M.: Enabling WiMAX Video Streaming. In: Dalal, U.D., Kosta, Y.P. (eds.) In-Teh, Croatia, pp. 213–239 (2009) 27. Thaalbi, M., Tabbane, N.: Vertical Handover between Wifi Network and WiMAX Network According to IEEE 802.21 Standard. In: Elleithy, K., et al. (eds.) Technological Developments in Networking, Education and Automation, pp. 533–537. Springer, Dordrecht (2010)
Job Scheduling for Dynamic Data Replication Strategy Based on Federation Data Grid Systems M. Zarina1, M. Mat Deris2, ANM. M. Rose3, and A.M. Isa4 1,3,4 Faculty of Informatics, University Sultan Zainal Abidin Gong Badak, Kuala Terengganu, Malaysia {zarina,anm,isa}@udm.edu.my 2 Faculty of Information Technology and Multimedia, University Tun Hussein Onn Malaysia Batu Pahat, Johor, Malaysia
[email protected] Abstract. Data access latency is one of the important factors of system performance in federated data grid. Effective job scheduling can reduce data transfer time by considering bandwidth and also dispatching a job to where the needed data are present. However, different job scheduling strategies resulted in different access latency. This paper proposes a model of job scheduling for dynamic replication strategy in federated data grid that is known as dynamic replication for federation (DRF). DRF uses the concept of a defined ‘network core area’ (NCA) as the designated schedule area of which a schedule will be focussed. This paper will also highlight how NCA is reallocated if a schedule fails. Results from the simulation using OptorSim show that, the job scheduling for DRF is superior as compared to Optimal Downloading Replication Strategy (ODRS) in terms of data access time and the inter-region communication. Keywords: Data grid, replication, scheduling, access latency.
1 Introduction A data grid links a collection of hundreds of geographically distributed parts of the world to facilitate sharing of data and resources [1]. Data grids enable scientists from universities and research laboratories to collaborate with one another to solve largescale computational problems. Size of data that needs to be accessed on the data grid is in terabytes or petabytes. Data-intensive jobs are one major kind of jobs in data grid, and their scheduling strategies are regarded as one of the most important research field [2]. One of the aspects that must be considered is how the scheduling efficiently work with the amount of data need for each job. Data grid represents a distributed data storage solution, its performance is dictated by the access latency and bandwidth of the underlying communication network. In large-scale data-intensive applications, data transfer time is the primary cause of delay in the job execution. To mitigate the effect S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 283–292, 2011. © Springer-Verlag Berlin Heidelberg 2011
284
M. Zarina et al.
of such problems and facilitate efficient system performance, replication strategies and scheduling algorithms has been employed [4-8]. Data grid systems are classified into four categories: hierarchical, federation, monadic and hybrid [9]. Most current research works focus on hierarchical model. In this paper, dynamic replication for federated (DRF) data grid is been proposed. The objective of this research is to minimize access latency when accessing data from the nearest node in the federation model within the cluster of the defined perimeter. Any job to be scheduled will always tend to focus in the area known as ‘Network Core Area’ (NCA). Initially, the inner most core of the sub grid that is defined as the NCA, where initial jobs will only be scheduled within the defined NCA. If the schedule of job fails to materialize, the scope of schedule will now be extended to the chosen cluster in the outer core area of the same sub grid. The outer core area will now be known as the core area since it contains nodes that will be servicing the requested data. And then the outer core area will now be defined as the new NCA. In the simulation, we have scheduled the jobs for DRF using OptorSim in order to evaluate the performance of DRF strategy. The simulation results show that job scheduling for DRF has successfully reduced time taken for data transfer and job execution time. The reduction in time is an indicator that the scheduling strategy for DRF has narrowed down it search scope to its nearest neighbours.
2 Model and Problem Formulation 2.1 Federation Data Grids In the following diagram (Fig. 1), Cluster 1 will be defined as the NCA, whereby the initial search will be focussing to. If the search failed to yield any desired results, the search will then be expanded into the area outside the Cluster 1, Sub Grid 1. The Sub Grid 1 will then be next area of search and therefore will be next known as the new NCA. The header in Cluster 1 will then relinquish its role as the main arbitrator for data request as Cluster 1 is no longer the NCA. Fig. 1 gives an instance of grouping in federated data grid. There are two sub data grids and four clusters in this data grid, which is a logical independent system and any two clusters/sub grids peer to each other. For each sub grid, it consists of two or more clusters which each of cluster consist of several nodes. In each cluster there is one header node and the others are normal node normal node has finite local storage space to store data replica. A header node is in charge of storing index information of those nodes that belong to this cluster. A header node answers the request messages from normal nodes that belong to the same cluster with it and the request messages from other cluster or other sub grid. As for our proposal, it can be safely assumed that the group of sub grids can be located on the different network region but the group of clusters in the same network region. This network region/sub grid can be seen as Internet topology in a country. All the LANs that exist in the country region (which can be called as a sub grid) can be seen as clusters. All that is needed then, is that for each of the participating cluster in the proposed federated data grid, is a header that will response to request. This research extends the original model defined in [10] where, the sub grid grouped by regional is proposed to give access from local node of the same cluster and sub grid.
Job Scheduling for Dynamic Data Replication Strategy
Sub Grid 1
285
Sub grid
Super node
Normal node
Fig. 1. An instance of grouping in federated data grid
2.2 Terms and Definitions The same assumptions and approaches have deployed based on ODRS, but in ODRS, the system deals with homogeneous cases involved clusters only, whereas DRF involved clusters and sub grids. Assuming that there are M nodes in the system, and a node nk belongs to a cluster ol only, with k = 1,2,…, M and l = 1,2, . …, S which S is all clusters in the system. A cluster ol can only belong to a sub grid only. We assume that all data clusters and sub data grids have the same size m,s and m,s ≥ 2. In this paper, we deal with homogeneous cases that cluster and sub grid can take the same size. We assume that the nodes belong to cluster ol are n(l-1)m+1, n(l-1)m+2 ,..., nlxm, then a node nk belongs to cluster o[k/m]. Cluster o[k/m] belong to sub grid gi are o(i-1)s+1, o(i-1)s+2 ,..., oixs , then a sub cluster ol belongs to sub grid g[l/s]. There are N unique files fj in the data grid, where j =1,2,...,N. In order to avoid adding 0/1-knapsack-type complexities to a problem that is combinatorial [18] with same query rate, where cj is the size of file fj. We are also assuming that different files are of equal unit size. This is similar to previous work on replication strategy in P2P network [13]. For every node, the storage space is of the same size and it can store K file replicas, so the system can store MK file replicas. The data set stored in node nk is Dk. Each file is associated with a normalized request rate of λ j for file fj per node, which is a fraction of all requests that are issued for the jth file. The normalized cumulative request rate of a node for all files in the system is
λ
λ
1
1
For file fj, there are rj replicas uniformly distributed in the system, and we assume rj ≥1. For a node nk,, there is at most one replica of fj in its storage space. For file fj, the probability of having a replica in node nk is pj.. We enhances the ODRS model by adding sub grids so that the hit ratio that are going to be taken into account are: P(local-hit), P(Cluster-hit), P(Intra-Grid-hit) and P(Inter-Grid-hit). As a consequence, when there is a request for a file; the request may be served by the following sequence : local node, local cluster, local sub grid or other sub grids. The cumulative (average) hit ratio of the local node is P(local-hit), indicating the probability of a file requests served by a local node from any nodes in the system.
286
M. Zarina et al.
Similarly, the cumulative (average) hit ratio of a local cluster, a local sub grid and of other sub grids are defined as P(Cluster-hit), P(Intra-Grid-hit) and P(Inter-Grid-hit) respectively. The nodes in a cluster are connected by local area network (LAN). The clusters and sub grids are connected by wide area network (WAN), but the clusters in a specific sub grid are defined to be in the same region, so the cumulative average hit ratio can be interpreted as the ratio of different network bandwidth requirements for a file request. 2.3 Expected Access Latency of DRF According to the algorithm for a node nk requesting file fj, four events are considered and compute their probabilities respectively. Case 1 Event Elkj, which means there is a replica of file fj in nk, and we have 1
Case 2 Event Eokj, which means there is no replica of file fj in nk, but file fj hits in the other nodes of cluster o[k/m]. Case 3 Event Eglj, which means there is no replica of file fj in local cluster o[k/m] but file fj hit in other clusters o-kj of the same sub grid g[l/s] . Case 4 Event Eg-lj, which means there is no replica of file fj in sub grid g[l/s], therefore, file fj must be hit in other sub grids. Then we have 1
1
2
From [10] we can get a replica of file fj in other clusters/sub grids. So to solve the problem in case 3 is 1 1
1
3)
Since there is at least one replica of fj in data grid, from case 2 we have, 1 1
1
4
Having the probabilities above, we can compute the cumulative expected hit ratios now.
λ λ
λ
5
Job Scheduling for Dynamic Data Replication Strategy
λ λ
λ
λ
λ
λ
λ
1
1
1
6
1
λ
λ
287
7
1
8
In our model, tl is used as costs when accessing a file replica from a node’s local storage space, from a remote node of the same cluster that this node belongs costs to, from a remote node of other cluster but same sub grid that this node belongs costs tg and from a node of other sub grids costs tG, with tl ≤to≤ tg ≤tG. Fig. 2 shows the relationship between to, tg and tG. (tl is not depicted here), in which λ is the normalized request rate for file fj at this node. Cluster 3
λ
Cluster 1
tG to
Cluster 2
Cluster 4
Sub Grid 1
Sub Grid 2
Fig. 2. Data access for DRF
t(nk, fj) denotes the access latency of node nk requesting file fj. Applying Eq. (1) to Eq. (4), we can obtain the expected access latency for a file request. The expected access latency of node nk requesting file fj isE(t(nk, fj)) . 1
1
For simplicity in the notation, we let (to - tl= tlo) , (tg – to= tgo) and (tG – tg= tGg) . Then we have ,
1
1
- ρj)sm
By considering the request rate of every file, we can compute the expected access latency for node nk requesting any file in data grid:
λ λ
,
λ
1
1
we can know access latency for a node to request any file in data grid is uniform across the system, because the system is symmetric i.e.: , 1,2,3, … … , ,
288
M. Zarina et al.
The objective of this paper is to minimize t, with the following constraints: 1) The number of all replicas in system is less than the total storage space of data grid: ,
2) The number of the replica of a file is at least one, and at most the system size, i.e. 1≤rj ≤M, or 1 1, 1,2,3 … … , Then, the constrained optimization problem of this paper is ∑
Min
.
λ
1
,
1
1
1,
9
1,2,3 … … ,
3 Simulations OptorSim is used to evaluate the performance of different replication strategies and job scheduling. OptorSim was developed to mimic the structure of a real Data Grid. All of general components are included into it with an emphasis on file access optimization and dynamic replication strategies. OptorSim is written in Java and developed by the European Data Grid project [11]. The interaction of the individual Grid components is shown in Fig. 3. These includes Computing Elements (CEs), Storage Elements (SEs) respectively, which are organised in Grid Sites. A Resource Broker (RB) controls the scheduling of job to Grid Sites. Each site handles its file content with a Replica Manager (RM), with which a Replica Optimizer (RO) contains the replication algorithm which drives automatic creation and deletion of replicas. We embedded DRF scheduling algorithm and ODRS scheduling in OptorSim to evaluate and compare the performance of each algorithm.
Fig. 3. OptorSim simulates data grid architecture
Job Scheduling for Dynamic Data Replication Strategy
289
3.1 Simulation Parameters The DRF algorithm has been evaluated by comparing it with ODRS and several replication strategy using theoretical analysis [4]. It is shown that, DRF is superior to ODRS in term of inter-cluster communication. Fig. 4 shows the runtimes of varying number of jobs for 2 algorithms. We used the LRU algorithm (always replicated and then deletes those files that have been used least recently) to evaluate the proposed scheduling algorithm The difference between DRF and ODRS can be observed in two aspects. In DRF, the nearest files will be accessed first by clusters grouped in same region so that the file can be accessed starting from local node, and then from within the local cluster. Then if not available, it will be accessed from within the local sub grids or other sub grids. This is done in order to reduce bandwidth consumption and access latency. While ODRS searches initially from within the LAN. If the searches failed, it will transgress its search from all the nodes within the WAN, with no specific intra-region and inter-region has been defined. With no curtailing of searches, Table 1. Simulation Parameters Topology parameters Number of sub grid (region) Number of cluster in each region Number of Sites in each cluster Storage space in each site Intra-cluster connectivity bandwidth Inter-cluster connectivity bandwidth Inter-grid connectivity bandwidth Grid job Parameters Number of jobs Number. of job types Number. of file accessed Size of single file
Value 2 2 3 10GB 1000Mbps 310Mbps 155Mbps Value 500 6 10 500 MB
Time (Sec)
Average Job
200000 150000 100000
DRF
50000
ODRS
0 100
200
300
400
500
Number of Jobs Fig. 4. Average job time based on varying number of jobs
290
M. Zarina et al.
Number of Average inter-Comunication
therefore there is the probab bility of files is accessed from inter-regions although whhen there is the replica of file fi within the intra-region. Secondly, DRF takes iinto consideration of the populaarity of replicas at cluster level, while ODRS is basedd on node level. This is becausee the nodes within a cluster are connected in a local aarea network, and therefore the access latency is very high. If there are no requested ffiles ds it’s search to other nodes in same cluster. Therefore our in local node it can forward method has better perform mance compared to ODRS. Thus, average total executtion time is about 9% faster usin ng DRF than ODRS. The average number of inter-communications for a job execution is illustratedd in the Fig. 5. By selecting the best site based on location of required data by the job, the proposed scheduling algorrithm with LRU replication can decrease the numberr of inter-communications effecctively. Overall the simulation results with OptorSim has shown better performance when w there is no file in the local node and local cluster,, all jobs will be accessed within n the inter-region while in ODRS, in the average of 20% % of files will be accessed from m other region although there are existence of files in saame region. Fig. 6 shows the averagee job time for 500 jobs. As for the experiment used for this purpose, the bandwidth of nodes within inter-cluster has been set constantly at 155 h between nodes of within the inter-cluster has been set Mbps, while the bandwidth from 155 Mbsp to 775 Mb bps. As we are comparing DRF with ODRS with varyying inter-communication bandw width by doubling the inter-communication bandwiddth, 600 400 Intra-region
200
inter-region
0 100 0
300 DRF
500
100
200
300
ORDS Number of Jobs
Fig. 5. Average number of Inter-Communication
Average Job Time (Sec)
4000000 3000000 2000000
DRF
1000000
ODRS
0 155
310
465
620
775
inter-communication Bandwidth Fig. 6. Average job time baased with varying inter-communication bandwidth for 500 jobss
Job Scheduling for Dynamic Data Replication Strategy
291
DRF has shown faster execution time. We can therefore conclude that DRF strategy can be effectively utilized when bandwidth intra-region is larger than bandwidth inter-region because ODRS has no distinction between intra-region and inter-region. With faster time achieved in Fig. 7, it can be anticipated that DRF has avoided interregion communications.
4 Conclusion and Future Works In this paper, the model for dynamic data replication strategy for federation data grid is proposed. The aim of this model is to access data from the nearest node as possible by using the concept of ‘Network Core Area’. The NCA is mainly used to define, control and restrict the area of search. Accessing data will start with the local node as the designated NCA, if not found then from within the local cluster. If there is no file within the local cluster, then the search will be expanded to within local sub grid and will be further searched in other sub grids. If the requested data is not found at the nearest node, then after a certain threshold; the data will be replicated by the local node. Results from the simulation using OptorSim show that the job scheduling for DRF proposed in this paper is superior to Optimal Downloading Replication Strategy (ODRS) in terms of reduce data access time and the amount of inter-region communication. By focussing the search to the nearest node, DRF has been proven successful in reducing the need of a greater bandwidth and at the same time minimizing expected data access latency of file. For the future work, we will try to further reduce the job execution time by combine job scheduling for DRF with replication for DRF. In addition, the algorithm on how to calculate hit file (threshold) will be investigated in order to optimize the efficiency of the proposed model.
References 1. Chang, R.-S., Chen, P.-H.: Complete and fragmented replica selection and retrieval in Data Grids. Fut. Generation Computer System 23(4), 536–546 (2007) 2. Jiang, J., Ji, H.: Scheduling algorithm with potential behaviors. Journal of Computers 3(12), 51–59 (2008) 3. Zarina, M., Deris, M.M., Rose, A.N.M.M., Isa, A.M.: Dynamic data replication strategy based on federation data grid systems. In: Zhu, R., Zhang, Y., Liu, B., Liu, C. (eds.) ICICA 2010. LNCS, vol. 6377, pp. 25–32. Springer, Heidelberg (2010) 4. Park, S.-M., Kim, J.-H., Ko, Y.-B.: Dynamic Data Grid Replication Strategy Based on Internet Hierarchy. In: Li, M., Sun, X.-H., Deng, Q.-n., Ni, J. (eds.) GCC 2003. LNCS, vol. 3033, pp. 838–846. Springer, Heidelberg (2004) 5. Noraziah, A., Mat Deris, M., Saman, M.Y.M., Norhayati, R., Rabiei, M., Shuhadah, W.N.W.: Managing transactions on grid-neighbour replication in distributed systems. International Journal of Computer Mathematics 86(9), 1624–1633 (2009) 6. Venugopal, S., Buyya, R., Ramamohanarao, K.: A taxonomy of data grids for distributed data sharing, management, and processing. ACM Computing Surveys 38(1), 1–53 (2006) 7. Sashi, K., Thanamini, A.S.: Dynamic replication in data grid using Modified BHR Region Based Algorithm. Fut. Gen. Computer System 27, 202–210 (2010)
292
M. Zarina et al.
8. Park, S.-M., Kim, J.-H., Go, Y.-B., Yoon, W.-S.: Dynamic data grid replication strategy based on internet hierarchy. In: Li, M., Sun, X.-H., Deng, Q.-n., Ni, J. (eds.) GCC 2003. LNCS, vol. 3033, pp. 838–846. Springer, Heidelberg (2004) 9. Ruay-Shiung, C., Jih-Sheng, C., Shin-Yi, L.: Job scheduling and data replication on data grids. Future Generation Computer System 23, 254-846–860 (2007) 10. Jianjin, J., Guangwen, Y.: An Optimal Replication Strategy for data grid system. Front Computer. Sci. China 1(3), 338–348 (2007) 11. Bell, W.H., Cameron, D.G., Capozza, L., Millar, P., Stockinger, K., Zini, F.: OptorSim—a grid simulator for studying dynamic data replication strategies. Int. J. High Perform. Comput. Appl. 17(4), 403–416 (2003)
Matab Implementation and Results of Region Growing Segmentation Using Haralic Texture Features on Mammogram Mass Segmentation Valliappan Raman1, Putra Sumari2, and Patrick Then3 1,3
Swinburne University of Technology Sarawak, 93350 Kuching, Sarawak, Malaysia {vraman,pthen}@swinburne.edu.my 2 University Sains Malaysia 11800, Penang, Malaysia
[email protected] Abstract. In digital mammograms, accurate segmentation of tumor is very important stage; therefore we have chosen region based segmentation using haralic texture feature for our research. The main objective of this paper is to identify regions of interest and segment the tumors in digital mammograms. The segmented image is then analyzed for estimating tumor and the results are compared against previously known diagnosis of the radiologist. This paper shows the matlab implementation and experimental results of various stages in detecting and segmenting the tumor. Keywords: Mammography, Classification.
Region
Growing,
Haralic
Textures
and
1 Introduction In many countries, breast cancer represents one of the main causes of death among women [18]. X-ray mammography is an effective diagnosis tool for detecting earlystage breast cancers. However, a heavy workload makes it difficult for radiologists to screen cancer cases efficiently because of observational oversights. The computeraided diagnosis (CAD) system screens a large number of cases, thereby minimizing these observational oversights [1]. An effective CAD system, that clearly identifies position, size, and staging of lesions in x-ray mammographies, must be evaluated using a large number of reference images with approved diagnostics [2]. The detection of anomalies in mammographic images is made difficult by a great number of structures similar to the pathological ones, related also to tissue density. One of the abnormalities which are often a marker of a tumor is the presence of massive lesions, which are rather large objects with a diameter of the order of the centimeter and variable shapes. In this paper we focus on segmenting the mass tumor by region growing method using haralic texture features. Section 2 explains the background and research S.S. Al-Majeed et al. (Eds.): WiMoA 2011/ICCSEA 2011, CCIS 154, pp. 293–303, 2011. © Springer-Verlag Berlin Heidelberg 2011
294
V. Raman, P. Sumari, and P. Then
objectives. Section 3 explains the existing works of image enhancement segmentation and classification. Section 4 describes the haralic texture features. Section 5 explains the segmentation of tumor using haralic texture features. Section 6 explains the feature extraction and selection. Section 7 explains the case base classification of tumor Section 8 provides the experimental results of matlab implementation of tumor segmentation and classification. Finally conclusion is made at Section 9.
2 Background Mammography is the technique of choice to detect breast cancer and it is based on the difference in absorption of X-rays between the various tissue components of the breast such as fat, tumor tissue, and calcifications. Mammography has high sensitivity and specificity, even small tumors and micro calcifications can be detected on mammograms. The projection of the breast can be made from different angles. The two most common projections are medio-lateral oblique (side view taken at an angle) and cranio-caudal (top to bottom view), as shown in Figure 1.
Fig. 1. Illustrates the Medio lateral and Cranio-caudal view [15]
The two most important signs of breast cancer that can be seen on a mammogram are focal masses and micro calcifications. In this paper we are mainly interested in focal masses. When a mass is present in a breast, a radiologist will estimate its malignancy by looking at the appearance of the lesion and the surrounding tissue. The most important sign of malignancy is the presence of spiculation i.e. spiky lines radiating in all directions from a central region extending into surrounding tissue. Benign masses have sharp, circumscribed borders where malignant masses have slightly jagged or spiculated borders. The objective of the research is to reduce the error of false negatives and false positives. The presented work will detect mass lesions by analyzing a single view of the breast. The first step detects suspicious locations inside the breast area. In the second step the image at these locations is segmented into regions and several features are calculated for each region. These features are being used to determine whether a lesion is benign or malignant. They are also used to eliminate false positive detections.
3 Literature Survey There are several existing approaches were made to detect the abnormal tissues in mammogram images. Petrick et al [6] developed a two-stage algorithm for the enhancement of suspicious objects. In the first stage they proposed an adaptive
Matab Implementation and Results of Region Growing Segmentation
295
density weighted contrast enhancement filter (DWCE) to enhance objects and suppress background structures. The central idea of this filtering technique was that it used the density value of each pixel to weight its local contrast. In the first stage the DWCE filter and a simple edge detector (Laplacian of Gaussian) was used to extract ROIs containing potential masses. In the second stage the DWCE was re-applied to the ROI. Finally, to reduce the number of false positives, they used a set of texture features for classifying detected objects as masses or normal. Tarassenko et.al [7] proposed an image segmentation technique based on region clustering. The mammogram is partitioned into clusters on the basis of data density. In each region the probability density is calculated using Parzen estimator, and the result of the image segmentation procedure is an image containing all possible regions of interest. The regions of interest are then presented to the human expert for further analysis. Bottigli et al [8] presented a comparison of some classification system for massive lesion classification. An algorithm based on morphological lesion differences was used to extract the features. The two classes (pathological or healthy ROIs) were differentiated by utilizing the features. A supervised neural network was employed to check the discriminating performances of the algorithm against other classifiers and the ROC curve was used to present the results [3] [4]. Vibha L [9][13], proposes a method for detection of tumor using Watershed Algorithm, and further classifies it as benign or malignant using Watershed Decision Classifier (WDC).Experimental results show that this method performs well with the classification accuracy reaching nearly 88.38%. Serhat Ozekes.et.al [10] proposed to develop a new method for automated mass detection in digital mammographic images using templates. Masses were detected using a two steps process. First, the pixels in the mammogram images were scanned in 8 directions, and regions of interest (ROI) were identified using various thresholds. Then, a mass template was used to categorize the ROI as true masses or non-masses based on their morphologies. Each pixel of a ROI was scanned with a mass template to determine whether there was a shape (part of a ROI) similar to the mass in the template. The similarity was controlled using two thresholds. If a shape was detected, then the coordinates of the shape were recorded as part of a true mass. To test the system’s efficiency, we applied this process to 52 mammogram images from the Mammographic Image Analysis Society (MIAS) database [15]. The results of this experiment showed that using the templates with these diameters achieved sensitivities of 93%, 90% and 81% with 1.3, 0.7 and 0.33 false positives per image respectively. Rabi Narayan Panda.et.al [11], proposed technique is based on a three-step procedure: regions of interest (ROI) specification, two dimensional wavelet transformation, and feature extraction based on OTSU thresholding the region of interest for the identification of microcalcifications and mass lesions. ROIs are preprocessed using a wavelet-based transformation method and a thresholding technique is applied to exclude microcalcifications and mass lesions.
4 Haralic Texture Features Texture, an intrinsic property of object surface, is used by our visual perception system to understand a scene; therefore texture analysis is an important component of
296
V. Raman, P. Sumari, and P. Then
image processing. Texture analysis methods can be divided into two classes: structural methods and statistics methods .In structural methods the texture of the image can be considered as the regular repetition of a micro-texture with some rule and the image can be considered as a fractal surface, and it is possible to think that the fractal dimension may be appropriate for the characterization of the image. The fractal dimension has been used as descriptor for the image segmentation by texture operators. Haralick [12][17] developed techniques using first order statistics and introduced second-order statistics, more precisely texture measures based on gray level co-occurrence matrices. Each element of the gray level co-occurrence matrix represents the relative frequency with which two neighboring pixels separated by a distance of Δ x columns and Δ y lines occur, one with gray tone g and the other with grey tone g. Such matrices of gray level spatial dependence frequencies are a function of the angular relationship between the neighboring resolution cells as well as the function of the distance between them. Adjacency can be defined to occur in four directions in a 2D, square pixel image (horizontal, vertical, left and right diagonals, four such matrices can be calculated. Rotation invariance is a primary criterion for any features used with these images; a kind of invariance was achieved for each of these statistics by averaging them over the four directional co-occurrence matrices.
Fig. 2. Illustrates the Gray Scale Co-occurrence Matrix function [12]
In this paper only four of the texture features extracted from the gray level cooccurrence matrix are presented, they are: entropy, contrast, energy, local homogeneity and correlation for feature selection and extraction.
5 Methodology: Region Growing Segmentation Using Haralic Features for Mass Segmentation Segmentation is therefore extremely important because the diagnosis of a tumor can strongly depend upon image features [5]. First, the X-ray mammograms are digitized with an image resolution of 100 × 100 μm2 and 12 bits per pixel by a laser film digitizer. To detect microcalcifications on the mammogram, the X-ray film is digitized with a high resolution. Because small masses are usually larger than 3mm in diameter, the digitized mammograms are decimated with a resolution of 400 × 400 mm 2 by averaging 4 × 4 pixels into one pixel in order to save the computation time.
Matab Implementation and Results of Region Growing Segmentation
297
After reducing the size of the image, enhancement is done using histogram equalization. After the digitization and enhancement of the X-ray mammograms, breast region Br is extracted from the decimated image [14] [17]. In general, the nonbreast region of the digitized mammograms has a very low intensity and a maximum peak in the histogram. The threshold level Bt for the breast region is determined as follows: Bt = I max + 2.5σ bg (1)
I max is the pixel intensity at the maximum peak count in the histogram of the decimated image and σ bg is a standard deviation of all pixel values less than I max
Where
under the assumption that the histogram of the background has Gaussian distribution centered at I max . After breast region extraction, the extracted breast region Br is divided into three partitioned regions, the fat region, the glandular region, and the dense region. After breast region extraction, check whether mass is slightly brighter than its surrounding areas, produces a sharp peak of unusual gray level intensity pixels, peak analysis of histogram is applied for the extraction of significant peak regions therefore it is called as Region of Interest (ROI). Once the ROI is identified, median filtering is applied to remove the noise from 2D signals without blurring edges. After the removal of noise in the image, divide the identified ROI into R×R block. Check whether the block is too small or large, the difference of the mass textures from normal textures cannot be well characterized .If it is too large, the result may be too coarse, so calculate the Haralick texture features from Spatial Gray Level Dependence Matrix (SGLD) of each block. From that select the significant features that can easily discriminate mass and non mass region in the image. Next select the block that contains mass based on the features. Pixel aggregation is applied in which the region of interest begins as a single pixel and grows based on surrounding pixels with similar properties, e.g., grayscale level or texture. It is a commonly used method due to its simplicity and accuracy. The system will use the maximum intensity as the "seed point" -a pixel that is similar to the suspected lesion and is located somewhere inside the suspected lesion. The next 4- or 8-neighboring pixel is checked for similarity so that the region can grow. If pixels in the 4- or 8-neighboring region are similar, they are added to the region. The region continues to grow until there are no remaining similar pixels that are 4- or 8-neighbors of those in the grown region. Our method checks the 4-neighbors of the seed pixel and uses a gray level threshold as the similarity criterion. If a 4-neighbor of a pixel has intensity value greater than or equal to a set threshold, it is included in the region of interest. The 4-neighbors were checked instead of the 8-neighbors so that surrounding tissue will not be included. The intensity threshold was used as a similarity criterion due to its simplicity and effectiveness. Based on region growing and pixel aggregation, approximate the segmented mass to a circle and estimate the radius of the circle and compare it with the ground-truth data. Therefore comparison will provide the results how close the segmented mass to the ground truth mass determined by the expert radiologists. Finally extract the mass region from the original image that is used as an input for case base classification.
298
V. Raman, P. Sumari, and P. Then
6 Feature Extraction In this paper nine features are extracted from the segmented masses and feed as the input for classification. Below table provides the 4 texture features.
Skewness = 1
[g (i, j ) − g (i, j )] ∑ [g (i , j ) − g (i , j ) ]
∑ N
N −1
3
i, j =0
3
N −1
i , j =0
1
g (i, j ) is the intensity value g (i, j )
is the average
intensity value.
2
Kurtosis = 1
N
∑
i, j=0
∑
i, j=0
N −1
N −1
[g ( i , j ) − g ( i , j ) ] [g ( i , j ) − g ( i , j ) ]
4
4
3
Mean =
4
SD =
7 Case Base Classification Classifiers play an important role in the implementation of computer-aided diagnosis of mammography. The features or a subset of these features are employed by classifiers to classify mass into benign and malignant. Case Base Classifier is a classifier where the reuse phase can be simplified. The kernel in a Case-Based Reasoning system is the retrieval phase (phase 1). Phase 1 retrieves the most similar case or cases to the new case. Obviously, the meaning of most similar will be a key concept in the whole system. Similarity between two cases is computed using different similarity functions. For our purpose in this paper [14] [17], we use the similarity functions based on the distance concept. The most used similarity function is the Nearest Neighbor algorithm, which computes the similarity between two cases using a global similarity measure. Where Case_x, Case_y are two cases, whose similarity is computed; F is the number of features that describes the case; xi y i represent the value of the ith feature of case Case_x and Case_y respectively; and
wi is the
weight of the ith feature. In this study we test the Minkowsky’s metric for three different values of r: Hamming distance (r = 1), Euclidean distance (r = 2), and Cubic distance (r = 3). This similarity function needs to compute the feature relevance ( wi ) for each problem to be solved. Assuming an accurate weight setting, a case-based reasoning system can increase their prediction accuracy rate.
Matab Implementation and Results of Region Growing Segmentation
299
8 Experimental Results- MATLAB Implementation Results & Discussion A total of 30 mammograms were considered for this study and experiment is at final stage. Mammogram data's have been taken from the MIAS database of mammograms containing cancerous masses. The mammograms are scanned from X-rays with a maximum resolution of 512x512 pixels. Mammographic image is reduced to an m × n matrix. This matrix contains as many rows, m, as the number of Mass present in the image, and as many columns (n=4) as the number of features that describe one mass. Next, this m × 4 matrix is transformed into a vector. This transformation computes the average value for each column (feature) across all the rows (mass in the image). Finally, the computed vector is labeled using the class (benign or malign) obtained from the diagnosis done by surgical biopsy. We performed two kinds of experiments in order to compare the performance of the different algorithms [14] [17]. First, we maintained the proportion of original images - now, a set of features for each imageas training and test sets proposed by human experts. Thus, we compared the results obtained by other classifiers with those achieved by human experts, and the statistical modeling terms of classification accuracy. We also included in this comparison the true positive (malignant cases) rate of classified examples (sensitivity) and the true negative rate of classified examples (specificity) in below table 2. MATLAB Code
function J=regiongrowing(I,x,y,reg_maxdist) Isizes = size(I); % Dimensions of input image reg_mean = I(x,y); % The mean of the segmented region reg_size = 1; % Number of pixels in region pixdist=0; % Distance of the region //ROI IDENIFIER & REGION GROWING SEGMENTATION// newest pixel to the regio mean % Neighbor locations (footprint) neigb=[-1 0; 1 0; 0 -1;0 1]; % Start regiogrowing until distance between regio and posible new pixels become while(pixdist=1)&&(xn Cd). Now the problem that arises is to know at the instant tn-1 or before, the value of tj. In this paper we propose the calculation through the prediction of the link expiration, in order to predict tj value.
Available Bandwidth Estimation with Mobility Management in Ad Hoc Networks
311
4.2 Prediction Link Expiration Several studies have been conducted in the field of prediction of nodes movement, including that in [8] where the authors present a fairly comprehensive approach which could be used in our approach or any other in order to calculate the tj value. To predict the link expiration time, we based on the location of nodes as in [9], but for simplicity reasons, we use a very simple technique which uses mainly a physical layer. Here we propose a scheme which depends of the mobility speed of nodes, by using the received signal strength (RSS). In our prediction method, we assume a freespace propagation model [10], where the RSS solely depends on its distance to the transmitter and the SSS. We assume that all nodes transmit their signals with the same strength SSS. The intended receiving node measures the signal strength received which holds the following relationship for free-space propagation model. RSS = SSS (λ / 4 π d )2 GT GR
(5)
Where λ is wavelength of carrier, GT and GR are unity gain of transmitting and receiving antennas, respectively. The effects of noise and fading are not considered. The receiver node can calculate d from RSS, like illustrated in figure 5, crossing d value calculated on Phy to MAC layer.
Fig. 5. Crossing the distance value from physical layer to MAC layer
Receipting two signals with two different powers in free space, involves two different distances (RSS1 ≠ RSS2 → d1 ≠ d2). If a node detects a distance d1 at t1 time from its neighbor and at t2 detects a distance d2 different to d1 from the same neighbor, and d2 is largest to d1, (In linear motion,) the node concludes that it moves away from its neighbor with a speed of: SP= (d2 - d1)/ (t2-t1). Assuming that each node can know with what speed it moves away from its neighbor. So each node can know at what point the distance from its neighbor reaches the distance of communication Cd, and the expiration of the link with its neighbor. So to calculate the mobility criterion “M” of the formula (4), the tj value is no longer unknown and is calculated by: tj =( Cd - d)/SP
(6)
• Where d is the last distance the moving neighbor node (in SP formula, d is d2). 4.3 Protocol Adaptation with the Mobility Criterion For the protocol version of ABE-MM, is the same version that used in the ABE technical [2] aiming to examine the mobility criterion in the formula (3). Certainly, there is some changing that has been found necessary. So there are some steps that we have left, deleted and others new added, and it gives the following:
312
R. Belbachir, Z. Mekkakia Maaza, and A. Kies
In the admission control, an extension of the RREQ packet to insert a new field where the source indicates the throughput requested (described in previous sections). But now, each mobile which receives a RREQ, performs an admission control by comparing the throughput requirement carried in the RREQ packet by the estimated available bandwidth with the equation (3) on the link where it received the RREQ on. If this request is admitted, the node adds its own address to the route and forwards the RREQ; otherwise it discards it. When the destination receives a first RREQ, following the AODV protocol after a control admission, it sends a unicast route reply (RREP) to the initiator of the request along the reverse path. The resources are then reserved and the new QoS flow can be sent. We have added a new step of sending the route error (RERR) packets by the nodes which discover that available bandwidth is no longer sufficient on reserved link, like in figure 4 (b) at tn-1. (Then RERR packets are sanded because of available bandwidth failure). This step is caused particularly by the mobility phenomena and the changing of “M” value, where the senders of these RERR packets, are the nodes which have predicted the broking of link. And another RERR will be send after failure link by nodes which detect it but the available bandwidth was sufficient. Note: In ABE-MM, predicting the link expiration doesn’t reject the QoS flow automatically, but it can be admitted if will be satisfied in the duration of existence.
5 Performances Evaluation of ABE-MM 5.1 Evaluation Method To evaluate the effectiveness of the proposed mobility management approach in bandwidth measurement, we conducted a comparative experiment using Network Simulator 2 (NS2.34) and the IEEE 802.11 implementation provided with the simulator. The scheme used was (CSMA/CA) without (RTS/CTS) Mechanism. Four constant bit rate (CBR) flows (flow1, flow2, flow3) are generated with 1000 bytes of data packets size. A free space propagation model is adopted in our experiment, with radio propagation range length for each node was 250 meters (d) and channel capacity was 2Mbits/sec. Our simulation models a network of 4 sources and 4 destinations (for each flow) in 12 mobile nodes that move to a common and constant velocity equal to 12m/s. the movement of nodes is random following the model of Manhattan [11] without buildings (without obstacle. to ensure an free space environment) on an area of 1000mX1000m, where the length of a street is 100m where nodes move through. At each arrival in a corner, a node can stay where it is, or continue its movement in the same direction or change it. For this simulation we used the prediction model that is explained in the previous section where nodes predict with accuracy when the move away each from other is on a straight line (other prediction models more efficient like in [8] can be used). Efficiency of our approach is evaluated through a comparison between ABE and ABE-MM following criteria: • The throughput obtained along the simulation by each flow sent by source nodes. • Average loss ratio: The average of the ratio between the number of data packets sent but not received by sources and the total number of data packets.
Available Bandwidth Estimation with Mobility Management in Ad Hoc Networks
313
• Data packet delivery: The total number of data packets delivery to each destination along simulation. 5.2 Simulation Results 5.2.1 Throughput Following the simulation logs (movement of nodes, routing flow and control packets) we note the following: Figure 6 (a) shows the throughput of the four flows along simulation time when the ABE is enabled for paths reservation. In the absence of a subsequent monitoring of the bandwidth evolution depending on the mobility, the flow 1 continues to consume bandwidth while the path is in failure, penalizing other flows (2) and (3), given the slow process of detecting the link failure. We also note that the absence of mobility criterion in ABE formula, has allowed flow (4) a reservation path which contains a link being missing, that caused still delaying the admission of the flow2. Figure 6 (b) shows the throughput of the four flows when the paths reservation is activated with ABE-MM. we observe that the consumption of bandwidth on the network is much more optimal compared to ABE. Through the equitable utilization of this resource (flow1 is stopped because of the mobility that has reduced the available bandwidth with which it was admitted, allowing a non-belated admission of flows 2 and 3). Also, eliminating unemployment times in the network caused by a flow 4 with ABE technique (the interconnection zone of flow 2 and 4 paths for [26 ... 45] seconds). 5.2.2 Average Loss Ratio The Average loss ratio along simulation is shown in diagram of figure 7. We observe that the absence of the mobility criterion in ABE has caused a high loss rate which reaches up to 27% for flow 1, particularly in the [14...18] seconds. And the admission of flow (4) has generated also a significant ratio loss. With ABE-MM, we see a very interesting results where losses caused by the mobility have been avoided by stopping the flow 1 at right time and a non-admission of the flow(4) because of the link that was going to disappear has been taken into consideration in measurements of available bandwidth through the mobility criterion “M”.
(a): ABE
(b): ABE_MM
Fig. 6. Throughput of each flow using ABE, ABE-MM
314
R. Belbachir, Z. Mekkakia Maaza, and A. Kies
5.2.2 Data Packet Delivery The total data packet delivery to each destination is shown in diagram of figure 8. We observe that almost the same number of data packets was delivered of flow (1) by ABE-MM and ABE, which confirms the rate of loss due to mobility in the previous diagram. The late admission of the two flows (2) and (3) with ABE has resulted in a very poor numbers of deliveries compared to ABE-MM. But we also note that the delivery of the flow (4) with the ABE technique is very little but it exists, on the other hand, with ABE-MM no data packets of flow (4) are delivered to destination.
Fig. 7. Average loss ratio diagram using ABE, ABE-MM
Fig. 8. Data packets delivery diagram using ABE, ABE_MM
6 Conclusion In this paper, we present the importance of the taking into consideration the mobility phenomenon in available bandwidth measurement, especially during the path reservations. Our solution is based on the distances changing between neighboring nodes which are linking together. ABE_MM was the result of the extension of the ABE technique by our approach. The results obtained from a comparison between ABE and ABE_MM are satisfactory in terms of the consumption optimality of bandwidth in the network. We have noticed an improvement of flow circulation where the density of traffic has increased over the network while decreasing of loss rates. Despite good results, ABE_MM presents some problems in some kind of dynamicity topology where each node must have an ability to analyze its mobile environment to the admission flow; this will be the subject of our next work.
References 1. IEEE Computer Society LAN MAN Standards Committee, Wireless LAN Medium Access Protocol (MAC) and Physical Layer (PHY) Specification, IEEE Std 802.11-1997. The Institute of Electrical and Electronics Engineers, New York (1997) 2. Sarr, C., Chaudet, C., Chelius, G., Lassous, I.G.: Bandwidth Estimation for IEEE 802.11based Ad Hoc Networks. IEEE Transactions on Mobile Computing 7(10) (2008)
Available Bandwidth Estimation with Mobility Management in Ad Hoc Networks
315
3. Chen, L., Heinzelman, W.: QoS-aware Routing Based on Bandwidth Estimation for Mobile Ad Hoc Networks. IEEE Journal on Selected Areas of Communication 3 (2005) 4. Renesse, R., Ghassemian, M., Friderikos, V., Hamid Aghvami, A.: QoS Enabled Routing in Mobile Ad Hoc Networks. In: IEE3G (2004) 5. Yang, Y., Kravets, R.: Contention Aware Admission Control for Ad Hoc Networks. IEEE Transactions on Mobile Computing 4, 363–377 (2005) 6. Badis, H., Al Agha, K.: QOLSR, QoS routing for Ad Hoc Wireless Networks Using OLSR. European Transactions on Telecommunications 15(4) (2005) 7. Perkins, C.E., Royer, E.M.: The Ad hoc On-demand Distance Vector Protocol. In: Perkins, C.E. (ed.) Ad hoc Networking, pp. 173–219. Addison-Wesley, Reading (2000) 8. Lee, S., Su, W., Gerla, M.: Ad hoc wireless multicast with mobility prediction. In: Proceedings of Computer Communications and Networks, pp. 4–9 (1999) 9. Ko, Y.-B., Vaidya, N.H.: Location-Aided Routing (LAR) in Mobile Ad-Hoc Networks. IEEE/ACM Wireless Networks 6(4), 307–321 (2000) 10. Rappaport, T.S.: Wireless Communications: Principles and Practice. Prentice Hall, Upper Saddle River (October 1995) 11. Meghanathan, N., Gorla, S.: On the Probability of K-Connectivity in wireless Ad Hoc networks under Different mobility models. International Journal on Applications of Graph Theory in Wireless Ad Hoc Networks and Sensor Networks (GRAPH-HOC) 2(3) (September 2010)
Author Index
Abdullah, Abdul Hanan 201 Abujassar, Radwan S. 116 Al-Majeed, Salah S. 270 Al-Mutawah, Khalid Ahmed 250 Amin, Muhammad Arif 201 Ammar, Brika 146 Ananthanarayana, V.S. 91 Aziztabar, Reza 212 Bakar, Kamalrulnizam Bin Abu Batten, Lynn Margaret 11 Belbachir, Redouane 304 Bhatia, J.S. 69 Bindu, Agarwalla 158 Chaari, Lamia 54 Chaki, Nabendu 21, 127 Chakraborty, Supriya 21 Chen, Yen-Wen 42 Deka Kr, Jatindra 79, 158 Deris, Mustafa Mat 191, 283 Doss, Robin 11 El-Sharkawi, Mohamed E. Feily, Maryam 181 Fey, Dietmar 234 Fleury, Martin 270 Ghanbari, Mohammed 116 Goonatilake, Rohitha 250 Gu, Hong-Jang 42 Hannan, Ayman 181 Hayajneh, Thaier 31 Herath, Ajantha 250 Herath, Suvineetha 250 Herawan, Tutut 191 Hunter, David K. 101 Isa, A.M.
201
Li, Bai 11 Lin, Meng-Hsien 42 Loos, Andreas 234 Meghanathan, Natarajan 1 Mokhtar, Hoda M.O. 221 Murakami, Masaki 171 Nosrati, Masoud
212
Ossama, Omnia
221
Ramadass, Sureswaran 181 Raman, Valliappan 293 Reichenbach, Marc 234 Rose, ANM.M. 283 Sadi, Muhammad Shaikh 137 Saleh, Salah Noori 181 Sehgal, R.K. 69 Shaikh, Soharab Hossain 127 Siddiqui, Mohammed A.R. 250 Singh, Ajeet Kumar 79 Singh, Balraj 260 Sumari, Putra 293 Then, Patrick
293
Uddin, Md. Nazim Verma, S.K.
283
137
260
Yanto, Iwan Tri Riyadi 191 Yousef, Mohammed 101
Jamel, Sapiee 191 J¨ urjens, Jan 137 Kamal, Hasan Yousif Kamoun, Lotfi 54
221
Karimi, Ronak 212 Kaushik, Brajesh Kumar 260 Khan, Md. Mizanur Rahman 137 Khasawneh, Samer 31 Khokhar, Rashid Hafeez 201 Kies, Ali 304 Kumar, Sanjeev 69
250
Zarina, M. 283 Zoulikha, Mekkakia Maaza
304