Manolis Wallace, Ioannis E. Anagnostopoulos, Phivos Mylonas, and Maria Bielikova (Eds.) Semantics in Adaptive and Personalized Services
Studies in Computational Intelligence, Volume 279 Editor-in-Chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw Poland E-mail:
[email protected] Further volumes of this series can be found on our homepage: springer.com Vol. 258. Leonardo Franco, David A. Elizondo, and Jos´e M. Jerez (Eds.) Constructive Neural Networks, 2009 ISBN 978-3-642-04511-0 Vol. 259. Kasthurirangan Gopalakrishnan, Halil Ceylan, and Nii O. Attoh-Okine (Eds.) Intelligent and Soft Computing in Infrastructure Systems Engineering, 2009 ISBN 978-3-642-04585-1 Vol. 260. Edward Szczerbicki and Ngoc Thanh Nguyen (Eds.) Smart Information and Knowledge Management, 2009 ISBN 978-3-642-04583-7 Vol. 261. Nadia Nedjah, Leandro dos Santos Coelho, and Luiza de Macedo de Mourelle (Eds.) Multi-Objective Swarm Intelligent Systems, 2009 ISBN 978-3-642-05164-7 Vol. 262. Jacek Koronacki, Zbigniew W. Ras, Slawomir T. Wierzchon, and Janusz Kacprzyk (Eds.) Advances in Machine Learning I, 2009 ISBN 978-3-642-05176-0 Vol. 263. Jacek Koronacki, Zbigniew W. Ras, Slawomir T. Wierzchon, and Janusz Kacprzyk (Eds.) Advances in Machine Learning II, 2009 ISBN 978-3-642-05178-4
Vol. 269. Francisco Fern´andez de de Vega and Erick Cant´u-Paz (Eds.) Parallel and Distributed Computational Intelligence, 2009 ISBN 978-3-642-10674-3 Vol. 270. Zong Woo Geem Recent Advances In Harmony Search Algorithm, 2009 ISBN 978-3-642-04316-1 Vol. 271. Janusz Kacprzyk, Frederick E. Petry, and Adnan Yazici (Eds.) Uncertainty Approaches for Spatial Data Modeling and Processing, 2009 ISBN 978-3-642-10662-0 Vol. 272. Carlos A. Coello Coello, Clarisse Dhaenens, and Laetitia Jourdan (Eds.) Advances in Multi-Objective Nature Inspired Computing, 2009 ISBN 978-3-642-11217-1 Vol. 273. Fatos Xhafa, Santi Caballé, Ajith Abraham, Thanasis Daradoumis, and Angel Alejandro Juan Perez (Eds.) Computational Intelligence for Technology Enhanced Learning, 2010 ISBN 978-3-642-11223-2 Vol. 274. Zbigniew W. Ra´s and Alicja Wieczorkowska (Eds.) Advances in Music Information Retrieval, 2010 ISBN 978-3-642-11673-5
Vol. 264. Olivier Sigaud and Jan Peters (Eds.) From Motor Learning to Interaction Learning in Robots, 2009 ISBN 978-3-642-05180-7
Vol. 275. Dilip Kumar Pratihar and Lakhmi C. Jain (Eds.) Intelligent Autonomous Systems, 2010 ISBN 978-3-642-11675-9
Vol. 265. Zbigniew W. Ras and Li-Shiang Tsay (Eds.) Advances in Intelligent Information Systems, 2009 ISBN 978-3-642-05182-1
Vol. 276. Jacek Ma´ndziuk Knowledge-Free and Learning-Based Methods in Intelligent Game Playing, 2010 ISBN 978-3-642-11677-3
Vol. 266. Akitoshi Hanazawa, Tsutom Miki, and Keiichi Horio (Eds.) Brain-Inspired Information Technology, 2009 ISBN 978-3-642-04024-5
Vol. 277. Filippo Spagnolo and Benedetto Di Paola (Eds.) European and Chinese Cognitive Styles and their Impact on Teaching Mathematics, 2010 ISBN 978-3-642-11679-7
Vol. 267. Ivan Zelinka, Sergej Celikovsk´y, Hendrik Richter, and Guanrong Chen (Eds.) Evolutionary Algorithms and Chaotic Systems, 2009 ISBN 978-3-642-10706-1
Vol. 278. Radomir S. Stankovic and Jaakko Astola From Boolean Logic to Switching Circuits and Automata, 2010 ISBN 978-3-642-11681-0
Vol. 268. Johann M.Ph. Schumann and Yan Liu (Eds.) Applications of Neural Networks in High Assurance Systems, 2009 ISBN 978-3-642-10689-7
Vol. 279. Manolis Wallace, Ioannis E. Anagnostopoulos, Phivos Mylonas, and Maria Bielikova (Eds.) Semantics in Adaptive and Personalized Services, 2010 ISBN 978-3-642-11683-4
Manolis Wallace, Ioannis E. Anagnostopoulos, Phivos Mylonas, and Maria Bielikova (Eds.)
Semantics in Adaptive and Personalized Services Methods, Tools and Applications
123
Dr. Manolis Wallace
Dr. Phivos Mylonas
Department of Computer Science and Technology University of Peloponnese End of Karaiskaki st. 22100, Tripolis Greece
National Technical University of Athens School of Electrical & Computer Engineering Division of Computer Science Zographoy Campus, Iroon Polytechneioy 9 15780, Athens Greece
E-mail:
[email protected] E-mail:
[email protected] Prof. Maria Bielikova
University of the Aegean Department of Information and Communication Systems Engineering Karlovassi, Samos, GR-83 200 Greece
Institute of Informatics and Software Engineering Faculty of Informatics and Information Technologies Slovak University of Technology in Bratislava Ilkovicova 3 842 16 Bratislava 4 Slovakia
E-mail:
[email protected] E-mail:
[email protected] Dr. Ioannis E. Anagnostopoulos
ISBN 978-3-642-11683-4
e-ISBN 978-3-642-11684-1
DOI 10.1007/978-3-642-11684-1 Studies in Computational Intelligence
ISSN 1860-949X
Library of Congress Control Number: 2010920317 c 2010 Springer-Verlag Berlin Heidelberg This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typeset & Cover Design: Scientific Publishing Services Pvt. Ltd., Chennai, India. Printed in acid-free paper 987654321 springer.com
Contents
Semantics in Adaptive and Personalized Services: Methods, Tools and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manolis Wallace, Ioannis Anagnostopoulos, Phivos Mylonas, Maria Bielikova Semantic-Enabled Information Access: An Application in the Electricity Market Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Panos Alexopoulos, Manolis Wallace, Konstantinos Kafentzis, Christoforos Zoumas, Dimitris Askounis Ontology-Based Profiling and Recommendations for Mobile TV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yannick Naudet, Armen Aghasaryanb, Sabrina Mignon, Yann Toms, Christophe Senot
1
9
23
The USHER System to Generate Semantic Personalised Maps for Travellers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zekeng Liang, Kraisak Kesorn, Stefan Poslad
49
Semantic Based Error Avoidance and Correction for Video Streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christian Spielvogel, Sabina Serbu, Pascal Felber, Peter Kropf
73
Semantics in the Field of Widgets: A Case Study in Public Transportation Departure Notifications . . . . . . . . . . . . . . . . . . . . . . Alena Kov´ arov´ a, Lucia Szalayov´ a
93
An Adaptive Mechanism for Author-Reviewer Matching in Online Peer Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Ioannis Giannoukos, Ioanna Lykourentzou, Giorgos Mpardis, Vassilis Nikolopoulos, Vassili Loumos, Eleftherios Kayafas
VI
Contents
Towards Emotion Recognition from Speech: Definition, Problems and the Materials of Research . . . . . . . . . . . . . . . . . . . . . 127 Christos-Nikolaos Anagnostopoulos, Theodoros Iliou Health Care Web Information Systems and Personalized Services for Assisting Living of Elderly People at Nursing Homes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Stefanos Nikolidakis, Dimitrios D. Vergados, Ioannis Anagnostopoulos Introducing Context-Awareness and Adaptation in Telemedicine Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Charalampos Doukas, Ilias Maglogiannis, Kostas Karpouzis Blog Rating as an Iterative Collaborative Process . . . . . . . . . . . 187 Malamati Louta, Iraklis Varlamis Simulation-Based UMTS e-Learning Software . . . . . . . . . . . . . . . 205 Florin Sandu, Szil´ ard Cserey Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Semantics in Adaptive and Personalized Services: Methods, Tools and Applications Manolis Wallace, Ioannis Anagnostopoulos, Phivos Mylonas, and Maria Bielikova
1 Introduction Semantics in Adaptive and Personalized Services, initially strikes one as a specific and perhaps narrow domain. Yet, a closer examination of the term reveals much more. On one hand there is the issue of semantics. Nowadays, this most often refers to the use of OWL, RDF or some other XML based ontology description language in order to represent the entities of a problem. Still, semantics may also very well refer to the consideration of the meanings and concepts, rather than arithmetic measures, regardless of the representation used. On the other hand, there is the issue of adaptation, i.e. automated re-configuration based on some context. This could be the network and device context, the application context or the user context; we refer Manolis Wallace Department of Computer Science and Technology, University of Peloponnese, End of Karaiskaki St., 22100, Tripolis, Greece e-mail:
[email protected] Ioannis Anagnostopoulos Department of Information and Communication Systems Engineering, University of the Aegean, Karlovassi, Samos, GR-83 200, Greece e-mail:
[email protected] Phivos Mylonas Image, Video and Multimedia Systems Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, Zografou Campus, Iroon Polytechneioy 9, Zografou, Greece e-mail:
[email protected] Maria Bielikova Institute of Informatics and Software Engineering, Faculty of Informatics and Information Technologies, Slovak University of Technology in Bratislava Ilkovicova 3, 842 16 Bratislava 4, Slovakia e-mail:
[email protected] M. Wallace et al. (eds.): Semantics in Adaptive and Personalized Services, SCI 279, pp. 1–7. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
2
M. Wallace et al.
to the latter case as personalization. From a different perspective, there is the issue of the point of view from which to examine the topic. There is the point of view of tools, referring to the algorithms and software tools one can use, the point of view of the methods, referring to the abstract methodologies and best practices one can follow, as well as the point of view of applications, referring to successful and pioneering case studies that lead the way in research and innovation. Or at least so we thought. Based on the above reasoning, we identified key researchers and practitioners in each of the aforementioned categories and invited them to contribute a corresponding work to this book. However, as the authors’ contributions started to arrive, we also started to realize that although these categories participate in each chapter to different degrees, none of them can ever be totally obsolete from them. Moreover, it seems that theory and methods are inherent in the development of tools and applications and inversely the application is also inherent in the motivation and presentation of tools and methods. As a result, and contrary to what one might expect based on the title, the book is not partitioned into distinct parts and every chapter simultaneously addresses all three issues: methods, tools and applications. Of course the editors’ work is only worth as much as the manuscripts that authors have trusted in them. We are grateful to all contributors who trusted us with their works, regardless of whether those works made the final cut for inclusion in the book, as well as the reviewers who have greatly assisted in safeguarding the quality of this volume. Our thanks also go out to all our friends from the SMAP Initiative events, as well as to Janusz Kacprzyk, the series editor, and Thomas Ditzinger, senior editor with Springer, for their support.
2 Book Contents The book consists of 11 more chapters, each one focusing on a different aspect of the theory and practice of semantics in adaptive and personalized services, as follows. In “Semantic-Enabled Information Access: An Application in the Electricity Market Domain”, prepared by Alexopoulos et al., a novel framework is presented for the generation of information retrieval systems. Borrowing the best from a variety of scientific fields related to the processing of knowledge and information, such as ontologies, case based reasoning and fuzzy systems, this framework is able not only to represent the inherent uncertainty of real life information, adapt to application context while at the same time modeling and being able to consider the end users’ interpretation of what is similar and what is not. The framework is described via the presentation of the architecture, design and development of a system that uses it: the electronic library of the Hellenic Transmission System Operator S.A. (HTSO), a deployed semantic information access system that provides the public with effective and efficient access to knowledge regarding the Greek electricity market. Experience gained from the implementation of
Semantics in Adaptive and Personalized Services
3
the system indicates that the framework is particularly effective, while knowledge elicitation might be the next barrier to target. In “Ontology-based Profiling and Recommendations for Mobile TV”, prepared by Naudet et al., we focus on the issue of automated recommenders for mobile television content. In particular, we go beyond conventional systems for personalized television content selection that merely allow users to specify general preferences and see an approach that allows matching between content and users along three distinct dimensions: categories or themes of interests, content description, and precise interest descriptions defined in an ontology. User interests can be formalized using one or multiple of those dimensions and can moreover be associated to contextual data. The computation of user profiles relies on both explicit and implicit profiling, based on incremental learning of interest degrees from content usage. This ontological formalization, used in conjunction with rules sets and a global matchmaking algorithm, has been successfully demonstrated in a mobile recommending system for broadcasted TV and Video on Demand, as part of the MOVIES project. The profiling engine prototype has been implemented in a larger scope of multiple content delivery platforms (IPTV/VoD, Web portals, mobile video) where the customers can use a diversity of terminals: TV/Set-Top-Box, mobile phone, and laptop, thus indicating its versatility and broad applicability. In chapter “The USHER System to Generate Semantic Personalized Maps for Travellers”, by Liang et al., still focusing on mobile users, we turn our attention to Geospatial Information Systems, which have recently emerged as a leading technology for the development of systems storing and delivering spatial information. Under the general umbrella of ontology-based personalized Spatial-Aware Map Services we see an ontology-based representation of dynamic user preferences interlinked to a domain model that is able to detect shifts in user interests, the creation of sharable user markup data governed by an access control matrix and the generation of personalized annotated GIS maps. The presented approach can enable users to set preferences based on their context and user profiles; to customise searching and selecting content; to markup maps in-situ forming a personalized spatial memory. The framework has been used in the context of the USHER project in order to develop a system that provides such services for the Queen Mary University of London (QMUL) Mile End campus and surrounding areas. The prototype application has been used to demonstrate that semantics make it possible for users’ annotations to be shared when they are relevant; as the authors note, clearly two new fronts open before us: the automated and context-aware definition of this relevance and the consideration of privacy issues. With “Semantic based Error Avoidance and Correction for Video Streaming”, by Spielvogel et al., we see how semantics can play a role in media streaming decisions such as whether/how to alter video quality in order to avoid errors or perform error correction, based not only on information regarding the network but also semantic information regarding the content itself. In this we assume that the content is available in multiple description coding format, which makes it possible to shift between different qualities and levels of compression, according to the current network and stream context.
4
M. Wallace et al.
The presented implementation contains a search subsystem that is able to quickly locate alternative sources for a desired video. Experimental results indicate that this subsystem is particularly efficient, even in the case of very rare videos, which automatically provides for more options regarding where from and how to implement the streaming. Additionally, simulation results indicate that the proposed methodology is capable of producing optimal decision regarding whether to perform error avoidance, error correction or a combination of the two. In “Semantics in the Field of Widgets: a Case Study in Public Transportation Departure Notifications”, prepared by Kovarova and Szalayova, we discuss the utilization of semantics in widgets. In this context, semantics can be used to describe widgets’ required input in a more generic form, thus making it possible for a given widget to be reusable by different users and in quite different application and environment contexts. The theory is presented through a helpful running example: John who lives in Bratislava and used a widget in order to quickly and easily acquire bus route information. If John moves, then the widget will need to draw input from a different source and then display information for different bus routes. The presented approach has actually been implemented as a personalizable widget that is linked to a site providing bus route information for the city of Bratislava. Once users have provided the semantic feedback regarding their typical routes the widget is able to automatically provide them with both conventional route information and relevant alerts (eg cancelations). It seems that the presented approach is quite efficient and could also be ported in other application domains, such as logistics or catering. In “An Adaptive Mechanism for Author-Reviewer Matching in Online Peer Assessment” authored by Giannoukos et al., a novel peer matching mechanism in the context of adaptation is proposed. This mechanism provides adaptive and personalized services for performing automatic optimal matching between authors and reviewers, taking into account the feedback provided by the authors who with their turn perceived usefulness from the comments received by the reviewers. The methodological background used is based on feed forward neural networks, and the main scope is to estimate the optimal reviewer set for a specific author. The proposed method uses past data to construct author and reviewer user profiles, which are semantically represented. In “Towards Emotion Recognition from Speech: Definition, Problems and the Materials of Research”, authored by Anagnostopoulos and Iliou, we provide some experiments regarding the problem of emotion recognition from speech, which is of high importance for many applications. Beside the combination of speech processing and artificial intelligence techniques, new approaches incorporating linguistic semantics are discussed, in parallel to classical artificial intelligence techniques that try to solve the problem addressed. The authors emphasize that such approaches could be applied especially in emotions of people from different cultures, where one can easily identify the significant role of semantics in linguistic emotion recognition. In “Health Care Web Information Systems and Personalized Services for Assisting Living of Elderly People at Nursing Homes” the authors Nikolidakis et al.
Semantics in Adaptive and Personalized Services
5
present a web application that can be used in nursing homes, in order to manage the health care services provided to elder people. The support provided in different types of health services is semantically represented. The proposed architecture can be used by doctors through PDAs or tablet PCs, in order to collect both personal and clinical information creating in parallel a personalized file record for the hospitalized persons. This application can also generate a total report in respect to the needs as well as the demographics and population status in nursing homes. Finally, there is the capability of exchanging semantically annotated information among different nursing homes. The chapter entitled “Introducing Context-Awareness and Adaptation in Telemedicine Systems” authored by Doukas et al., the authors present a context aware medical content adaptation platform that utilizes semantic content and context representation. Moreover, by using appropriate reasoning techniques, content adaptation as well as medical image and video transmission is performed only when determined necessary. The mechanism encodes the transmitted data properly according to the network availability and quality, in respect to the user preferences and the patient status. The architecture of the framework is open and does not depend on the monitoring applications used, the underlying networks or any other issues regarding the employed telemedicine system. In “Blog Rating as an Iterative Collaborative Process”, authored by Louta and Varlamis, we present an iterative collaborative process to provide a global rating for a set of blogs using local rating information expressed via blogroll and post hyperlinks. The rating model is mathematically and semantically formulated, comprising local accumulative blog site rating formation, collaborative local blog site formation, as well as global rating formation. The semantic information attached to each hyperlink allow bloggers to better describe their intentions behind creating the link, to prioritize affiliated blogs in the blogroll or even to provide topic information for the pointed posts. The rating mechanism is also adopted to update the local scores, and to employ them in providing collaborative and global scores. An initial experimental evaluation shows that the model performs well by ”punishing” spam blogs that receive many links from a single source and favouring blogs that receive inlinks in a standard basis. Finally, in “Simulation-based UMTS e-Learning Software” Sandu and Cserey describe an adaptable educational software, that allows university students or company workers (mainly in the mobile communication field) to learn, understand and study the processes, events and flows that appear in typical telecommunication technologies. The software consists of an editor capable of graphically representing semantic relations in information nodes and diagrams, based on personalization of educational services.
3 Related Work and Relevant Sources A few years back, we found ourselves working on topics that simultaneously borrowed from and linked to semantics, media, personalization, adaptation and other
6
M. Wallace et al.
fields. Of course we were not the only ones; we just did not know who the others were and where to look for the sum of the related work performed by them. Identifying this need, we organized the first meeting of our small informal society in Athens in 2006 as the 1st International Workshop on Semantic Media Adaptation and Personalization (SMAP). Several other such meetings have followed since. Certainly, in the proceedings of these meetings one can find very interesting works, completed, in progress or position statements that are closely related to the scope of this book. • P. Mylonas, M. Wallace, I. Anagnostopoulos (Eds.), Semantic Media Adaptation and Personalization (proceedings of the 4th International Workshop on Semantic Media Adaptation and Personalization, San Sebastian, Spain), IEEE Computer Society, 2009, • P. Mylonas, M. Wallace, M. Angelides (Eds.), Semantic Media Adaptation and Personalization (proceedings of the 3rd International Workshop on Semantic Media Adaptation and Personalization, Prague, Czech Republic), IEEE Computer Society, 2008, • P. Mylonas, M. Wallace, M. Angelides (Eds.), Semantic Media Adaptation and Personalization (proceedings of the 2nd International Workshop on Semantic Media Adaptation and Personalization, London, UK), IEEE Computer Society, 2007, • P. Mylonas, M. Wallace, M. Angelides (Eds.), Semantic Media Adaptation and Personalization (proceedings of the 1st International Workshop on Semantic Media Adaptation and Personalization, Athens, Greece), IEEE Computer Society, 2006. Similarly, relevant works are included in the edited volumes • M. Angelides, P. Mylonas, M. Wallace (Eds.), Advances in Semantic Media Adaptation and Personalization, Volume 2, CRC Press, 2009, • M. Wallace, M. Angelides, P. Mylonas (Eds.), Advances in Semantic Media Adaptation and Personalization, Springer Verlag Studies in Computational Intelligence, Vol. 93, ISBN 978-3-540-76359-8, February 2008 and journal special issues • P. Mylonas, M. Bielikova, Y. Kompatsiaris, R. Troncy (Eds.), Semantic Media Adaptation & Personalization, International Journal on Semantic Web and Information Systems, 2010, • M. Angelides, P. Mylonas, M. Wallace (Eds.), Semantic Media Adaptation and Personalization, Multimedia Tools and Applications, Volume 43, Number 3, 2009, • P. Mylonas, Hermann Hellwagner, Pablo Castells, M. Wallace (Eds.), Multimedia Semantics, Adaptation & Personalization, Signal, Image and Video Processing, Volume 2, Number 4, 2008, • M. Angelides, P. Mylonas, M. Wallace (Eds.), Semantic Media Adaptation and Personalization, ACM/Springer Multimedia Systems Magazine, Volume 13, Number 2, August, 2007
Semantics in Adaptive and Personalized Services
7
that our society has been regularly producing. The book you are holding is actually a part of this effort. Manolis Wallace Ioannis Anagnostopoulos Phivos Mylonas Maria Bielikova
Semantic-Enabled Information Access: An Application in the Electricity Market Domain Panos Alexopoulos, Manolis Wallace, Konstantinos Kafentzis, Christoforos Zoumas, and Dimitris Askounis
Abstract. In this chapter we combine theory from ontologies, case base reasoning and fuzzy algebra to construct a novel framework for semantic-enabled information access. This framework is able to provide a comprehensive and effective way for the development of semantic information retrieval systems aimed to serve specific domains and operate in under specific contexts. In order to facilitate readers and also demonstrate the effectiveness of the proposed framework the theory is presented through a real life application in the electricity market domain.
1 Introduction In this chapter we describe the development process of the electronic library of the Hellenic Transmission System Operator S.A. (HTSO), a deployed semantic information access system that provides the public with effective and efficient access to knowledge regarding the Greek electricity market. HTSO is a governmental organization responsible for the management and operation of the Greek electricity Panos Alexopoulos and Konstantinos Kafentzis IMC Technologies S.A., Fokidos 47, 11527, Athens, Greece e-mail: {palexopoulos,kkafentzisg}@imc.com.gr Manolis Wallace Department of Computer Science and Technology, University of Peloponnese, End of Karaiskaki St., 22100, Tripolis, Greece e-mail:
[email protected] Christoforos Zoumas Hellenic Transmission System Operator S.A., 22 Asklipiou Str., 14568, Krioneri, Greece e-mail:
[email protected] Dimitris Askounis School of Electrical and Computer Engineering, National Technical University of Athens, 9, Iroon Polytechniou str., Zografou 15773, Athens, Greece e-mail:
[email protected] M. Wallace et al. (eds.): Semantics in Adaptive and Personalized Services, SCI 279, pp. 9–22. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
10
P. Alexopoulos et al.
network and market and in this context it is has the main responsibility for providing relevant to the market information to the public. The electronic library, having the form of a knowledge portal that provides access semantic-enabled services such as search and navigation, serves this purpose. More specifically, the available knowledge to be accessed comprises a number of legal and technical documents which, due to their size and the lack of proper cross referencing, are difficult for an individual to understand and use. The system tackles the two problems by enabling the storage and retrieval of decomposed parts of the documents (usually paragraphs) as well as navigation across these parts. The above services are semantic-enabled in that the system implements them by utilizing domain ontological knowledge and relevant reasoning techniques for capturing and interrelating the parts’ semantic content. This allows for significantly more effective information retrieval, in terms of results relevance, as well as for more intuitive navigation across the content through various semantic structures such as concept taxonomies. All the semantic characteristics of the system are implemented by means of a novel semantic information retrieval framework that has been developed within our organization and which provides a generic but comprehensible and structured way of building semantic information retrieval systems in any domain. The framework draws upon ideas and techniques from the areas of Case Based Reasoning, Ontologies and Fuzzy Algebra and its basic characteristic is that it enables the knowledge engineer to adjust the knowledge representation and reasoning procedure to the users’ subjective perception of information relevance. In the rest of the chapter we describe the aforementioned framework and we illustrate its applicability in developing semantic information retrieval systems by describing the exact development process that we followed in the case of HTSO’s electronic library.
2 Semantic Information Retrieval Framework 2.1 Introduction As suggested in the previous section, the core functionality of the deployed system, namely information retrieval (IR), was based on a hybrid semantic IR framework that combines three distinct artificial intelligence reasoning techniques: Structural Cased Based Reasoning, Ontology-Based Reasoning and Fuzzy Algebra. This combination was made possible through a fuzzy ontology framework described in ([1]) that allows for customized assessment of semantic similarity between ontological concepts. In the following paragraphs we describe this hybrid approach by discussing how Structural Case Based Reasoning is used for information retrieval, how the use of ontologies within SCBR transforms this retrieval to semantic one and how Fuzzy Set Theory helps with dealing with the inherent fuzziness of the concept of relevance.
Semantic-Enabled Information Access
11
2.2 Structural Case Based Reasoning for IR The CBR technique originates from Schank’s concept of remindings ([9]) which states that when people are thinking they are merely recalling past experiences that are somehow similar to their current situation. When applied in problem solving, this is translated into trying to solve new problems by comparing them to problems already solved ([2], [8], [4]). The underlying assumption is that if two problems are sufficiently similar, then their solutions are probably also similar. Apart from problem solving, the CBR approach can be successfully applied for building information retrieval systems. In such systems information items are regarded as cases and they are retrieved according to the similarity between them and the query. Thus, a key requirement is to define each time a proper similarity measure that will produce the best results. Such a definition is heavily dependent on the application domain and on the intended users’ information needs. In commercial CBR systems there are three main approaches that differ in the sources, materials, and knowledge they use ([4]). These are the textual approach, the conversational approach and the structural approach. In the latter, namely structural CBR (SCBR), the basic idea is that cases are represented according to a common structure called the domain model. In different SCBR systems, this model can be as simple as a flat table or as complex as an object-oriented model. Applying SCBR to information retrieval, means creating metadata-based descriptions of documents (or information objects in general) which are then stored as cases in the case base (see Figure 1). Each description (or characterization) contains a link to the information object itself. What’s more, the vocabulary used to represent the cases is developed a-priori for the domain at hand and contains the relevant concepts of the domain that occur in the information object items.
Fig. 1 CBR Representation (Adapted from [6])
12
P. Alexopoulos et al.
When searching for information objects, the query of the user is first transformed to a characterization of a fictional (or ideal) information object (i.e. the information object which matches best the user’s query). This object, also referred as the “query-case”, is then compared to the cases stored in the system. The comparison is facilitated through the cases’ characterization and the use of some similarity measure while its results comprise a relevance score assigned to each of the stored cases. Thus, the system is able to retrieve those cases that are most similar (and therefore relevant) to the query-case.
2.3 SCBR and Ontologies for Semantic IR The key characteristic of the SCBR approach is that the definition of similarity measures is tightly integrated with object-oriented vocabulary representations ([5]). Such representations, however, cannot represent explicit semantics nor can they perform any kind of reasoning. That makes their use for semantic assessment of relevance between cases extremely limited and inefficient. On the other hand, representation mechanisms with formal semantics afford applications the luxury of automated reasoning. The latter is an important capability when it comes to comparing the meanings of different cases and determining their similarity and relevance to a user’s query. That’s why the incorporation of formal semantics, by means of ontologies, into CBR systems seems to be the next step in the evolution of these systems. Ontologies have been developed and investigated for some time in Artificial Intelligence as the main way of facilitating knowledge sharing and reuse. However, only recently has the notion of ontology attracted attention from fields such as intelligent information integration and retrieval, electronic commerce and knowledge management. This is due to the fact that through ontologies it is possible to annotate information sources with machine-processable semantics facilitating thus effective and efficient access of them by various software artifacts and agents. Technically speaking, ontologies are formal descriptions of the entities, relationships, and constraints that make a conceptual model. Depending on the expressiveness and the degree of formality of the underlying representation language, ontologies can range from a simple taxonomic hierarchy of concepts to a logic program utilizing first-order predicate logic, modal logic, or even higher order logics with probabilities. Given the above, incorporating formal semantics to SCBR means primarily replacing the object-oriented vocabulary with an ontology, as shown in figure 2. The resulting paradigm, namely Ontology-Based CBR, can then be used for more intelligent and efficient information retrieval. More specifically, most ontology-based systems utilize logic-based deductive inference while SCBR systems provide a search functionality that makes use of similarity measures for ranking results according to their utility with respect to a given query. In the Ontology-Based CBR paradigm these two types of reasoning are combined by defining similarity measures that are tightly integrated with the ontological model instead. Such a measure has been defined and used in our case.
Semantic-Enabled Information Access
13
Fig. 2 Ontology-based CBR
2.4 The Role of Fuzziness In figure 2, one can see that the object-oriented vocabulary has been actually replaced by a fuzzy ontology ([7], [10]). The reason for that is that fuzzy logic and fuzzy algebra may be exploited to enhance the power and expressiveness of ontologies, especially when it comes to dealing with the problem of assessment of semantic similarity and relevance [11]. Besides, according to Zadeh ([12]), relevance as a concept is fuzzy rather than bivalent as it denotes the degree at which a piece of information is relevant to another piece or a query. And to define fuzzy concepts, what is needed is the conceptual structure of fuzzy set theory where everything is, or is allowed to be, a matter of degree.
2.5 Assessment of Semantic Similarity As it can be deduced from the previous paragraphs, the most important aspect of our IR framework regards the assessment of semantic similarity between ontological concepts. This aspect, in our case, is facilitated through the framework described in ([1]). The basic idea there is that the assessment of semantic relevance should be application-oriented rather than domain-oriented. In other words, in different IR scenarios the same ontological information should be interpreted differently in order to yield the most appropriate results. And this “different” interpretation is heavily dependent on the actual IR scenario and on the users’ intended information needs. More specifically, the framework of ([1]) has in its basis a Fuzzy Ontology Framework according to which domain knowledge is modelled as a fuzzy ontology.
14
P. Alexopoulos et al.
This ontology captures both concrete and vague knowledge about the application domain by defining relevant concepts and fuzzy semantic relations between them. More formally, a Fuzzy Ontology is a tuple OF = {E, R} where E is a set of semantic entities (or concepts) and R is a set of fuzzy binary semantic relations. Each element of R is a function R : E2 → [0, 1]. In particular, R = {T, NT} where T is the set of taxonomic relations and NT is the set of non-taxonomic relations. Fuzziness in a taxonomic relation R ∈ T has the following meaning: High values of R(a, b), where a, b ∈ E, imply that b’s meaning approaches that of a’s while low values suggest that b’s meaning becomes “narrower” than that of a’s. On the other hand, a non-taxonomic relation has an ad-hoc meaning defined by the ontology engineer. Fuzziness in this case is needed when such a relation represents a concept for which there is no exact definition. In that case fuzziness reflects the degree at which the relation can be considered as true. In any case, the above semantic relations are the primary means for computing the similarity between concepts as they (usually) denote some kind of “semantic relatedness” between them. However, which of these relations, in what way and to what degree should participate in the assessment of semantic similarity is applicationdependent information which is captured separately from the ontological knowledge. This information is modelled by means of the Ontology Application Context (OAC). OAC is in essence a set of parameters which intend to characterize the expected role of the fuzzy ontology in the similarity assessment process and which take different values according to the application scenario. More formally, given a fuzzy ontology OF = {E, T, NT}, OACOF defines: • how each taxonomic relation R ∈ T should be used for computing similarity between concepts. • how each non taxonomic relation R ∈ NT should be used for computing similarity between concepts. • how each pair of a relation R1 ∈ T and a relation R2 ∈ NT should be used for computing similarity between concepts. To do that, OAC comprises three different contexts that correspond to each of the above cases: • The Taxonomic Relation Application Context which is defined as a function F = { fi }, i = {1, 2} where fi : T → [−1, 1]. • The Non Taxonomic Relation Application Context which is defined as a function G = {gi }, i = {1, 2} where gi : NT → [−1, 1]. • The Taxonomic - Non Taxonomic Relation Pair Application Context which is defined as a function H = {hi }, i = {1, 4} where hi : NT → [−1, 1]. The exact meaning of each context is the following: • If R ∈ T and a ∈ E then f1 (R) is the degree at which all concepts b ∈ E for which [Trt (R)](a, b)) = 0 should be considered similar to a. • If R ∈ T and a ∈ E then f2 (R) is the degree at which all concepts b ∈ E for which [Trt (R)]−1 (a, b)) = 0 should be considered similar to a.
Semantic-Enabled Information Access
15
• If R ∈ NT and a ∈ E then g1 (R) is the degree at which all concepts b ∈ E for which R(a, b) = 0 should be considered similar to a. • If R ∈ NT and a ∈ E then g2 (R) is the degree at which all concepts b ∈ E for which R−1 (a, b) = 0, should be considered similar to a. • If RNT ∈ NT, RT ∈ T and a ∈ E then h1 (RNT , RT ) is the degree at which all concepts b ∈ E for which [RNT ◦t Trt (RT )](a, b) = 0 or [Trt (RT )◦t RNT ](a, b) = 0 should be considered similar to a. • If RNT ∈ NT, RT ∈ T and a ∈ E then h2 (RNT , RT ) is the degree at which all cont cepts b ∈ E for which [RNT ◦t Trt (RT )−1 ](a, b) = 0 or [Trt (R−1 T ) ◦ RNT ](a, b) = 0 should be considered similar to a. • If RNT ∈ NT, RT ∈ T and a ∈ E then h3 (RNT , RT ) is the degree at which all t t t t −1 concepts b ∈ E for which [R−1 NT ◦ Tr (RT )](a, b) = 0 or [Tr (RT )◦ RNT ](a, b) = 0 should be considered similar to a. • If RNT ∈ NT, RT ∈ T and a ∈ E then h4 (RNT , RT ) is the degree at which all cont t −1 t −1 t −1 cepts b ∈ E for which [R−1 NT ◦ Tr (RT ) ](a, b) = 0 or [Tr (RT ) ◦ RNT ](a, b) = 0 should be considered similar to a. The values of all the above degrees might range from −1 to 1. A degree of −1 denotes that the relation or the pair of relations should not be considered at all in measuring similarity. A degree of 1 denotes the exact opposite, namely two concepts connected with this relation should be considered identical. Any degree between −1 and 1 denotes an intermediate situation. The utilization of OAC for the application-specific interpretation of the fuzzy ontology in the process of the semantic similarity assessment is done through a process called ‘‘contextualization”. The formal description of this process has as follows: Given a fuzzy ontology OF = {E, T, NT} and a corresponding application context OACOF = {F, G, H} we define the application context operator as follows: aco(R(a, b), f ) =
R(a, b)1− f (R) ,
0 ≤ f (R) ≤ 1 (1)
R(a, b) × (1 + f (R)), −1 ≤ f (R) < 0
where R ∈ R and f ∈ OACOF . Then we apply this operator to the fuzzy ontology through the following steps: 1. ∀RT ∈ T we take RT = aco(Trt (RT ), f1 ) and RT = aco(Trt (RT )−1 , f2 ) 2. ∀RNT ∈ NT we take RNT = aco(RNT , g1 ) and RNT = aco(R−1 NT , g2 ) 3. ∀RT ∈ T, RNT ∈ NT such that [RT ◦t RNT ] = 0/ we take RT,NT = aco([RT ◦t RN T ], h1 ) ∪ aco([R NT ◦t RT ], h2 )∪aco([RT ◦t RNT ], h3 )∪aco([R NT ◦t R T ], h4 ) 4. ∀RT ∈ T, RNT ∈ NT such that [RNT ◦t RT ] = 0/ we take R T,NT = aco([R NT ◦t R T ], h1 ) ∪ aco([RT ◦t R NT ], h2 ) ∪ aco([RNT ◦t R T ], h3 ) ∪ aco([R T ◦t R NT ], h4 ) 5. ∀RT1 , RT2 ∈ T, RNT ∈ NT such that [(RT1 ◦t RNT ) ◦t RT2 ] = 0/ we take R T1 ,NT,T2 = [RT1 ,NT ◦t (RT2 ∪ RT2 )]. In the end of the above procedure we take the fuzzy union of all the resulting relations and we end up with a fuzzy ontology that comprises a single contextualized fuzzy relation RC . Then the CBR engine of our system is able to determine the
16
P. Alexopoulos et al.
semantic similarity between any two concepts a, b ∈ E simply by getting the degree of the relation RC (a, b).
3 Application of the IR Framework in HTSO Case 3.1 Enabling Architecture and Development Methodology The application of the aforementioned semantic IR framework in the case of HTSO was facilitated through the architecture of figure 3 which reflects in a natural way the key aspects of the framework. More specifically, the system consists of a commercial CBR engine which provides the necessary for the framework SCBR reasoning functionality and of a “Fuzzy Ontology Contextualizer” subsystem which implements the process described in paragraph 2.5 for calculating the semantic similarity between concepts of the domain ontology.
Fig. 3 System Architecture
Semantic-Enabled Information Access
17
The CBR engine facilitates the definition of metadata for describing and storing documents as cases, the definition of vocabularies for assigning values to these metadata and the usage of all these for calculating the similarity between cases. For the latter, in particular, the engine is based on the assignment of similarity values between pairs of vocabulary terms by some domain expert or knowledge engineer. Thus, the engine is able to perform the kind of reasoning that our framework supports by merely using the contextualized fuzzy domain ontology as its vocabulary. This is made possible by the contextualizer subsystem which takes as input the fuzzy domain ontology and its application context (also represented as an ontology), performs the reasoning algorithm of paragraph 2.5 and transforms the resulting fuzzy relation in a compatible to the engine’s vocabulary format. This process is repeated each time the initial ontology changes. Thus, given the above architecture, the actual implementation of the framework for HTSO comprised the following steps: 1. We modelled the available documents as cases through a proper metadata schema and we stored them in the system’s case base. 2. We developed a fuzzy ontology covering the domain of the Electricity Market. 3. We used the concepts of the ontology for the semantic annotation of the cases. 4. We defined the Ontology Application Context for the specific ontology and the specific IR scenario and we applied it to the ontology in order to produce its contextualized version.
3.2 Case Representation For the first step we took in mind that a basic requirement was that the system’s answers to the users’ queries should be as detailed as possible, i.e. having the system returning whole documents was not an acceptable option. For that, we decided to decompose the documents at a paragraph level and to consider these paragraphs as the system’s cases. All the cases were represented by means of a common schema that included, among others, classical metadata such as title, author, language etc. Furthermore, we considered the attribute Thematic Content as the one to be used for the semantic characterization of the cases and the corresponding assessment of their semantic similarity. The values this attribute could take comprised semantic concepts derived from the HTSO domain ontology.
3.3 Ontology Modelling In the second step the HTSO domain ontology was developed and according to our framework it was structured as a fuzzy ontology comprising, in the end, nine categories of concepts, nine taxonomical relations and six non-taxonomical relations,
18
P. Alexopoulos et al.
all relevant to the Electricity Market Domain. The concept categories were to be used for grouping the domain’s concepts according to their abstract meaning and for identifying in a more intuitive way the various relationships between them.
Fig. 4 HTSO Sample Taxonomies
More specifically, after the knowledge acquisition phase which included content analysis and interviews with domain experts, the identified categories were: • • • • • •
Market Processes Market Rules Market Rights & Obligations Market Information Sources Market Participants Market Units & Systems
Semantic-Enabled Information Access
19
• Market Services • Market Actions • Market Extents Each of these categories contained a significant number of concepts with their overall number across all categories being about 1800. Furthermore, corresponding fuzzy taxonomical relations per category were defined, all having the semantics of fuzzy specialization as described in paragraph 2.5. Figure 4 depicts snapshots from two of these taxonomies, namely processes and rights/obligations. Finally, a number of non taxonomical relations, each relating concepts from different categories, were defined. These were: • • • • • •
participatesInProcess(Participant, Process) performsAction(Participant, Action) hasRightOrObligation(Participant, Right & Obligation) regardsProcess(Rule, Process) isRelatedToProcess(Extent, Process) foundInInformationSource(Extent, Information Source)
3.4 Case Semantic Annotation The semantic annotation of the system’s cases involved the assignment of values to the “Thematic Content” attribute of each case. These values should be derived from the domain ontology’s semantic concepts and should reflect as accurately as possible the semantic content of the cases. For that, the assignment was mainly performed by experts who knew well the domain and the content. This option, though quite demanding and time-consuming because of the large number of cases and concepts, was deemed as most appropriate because out of the many semantic terms contained in most of the system’s cases, only a few were actually indicative of the cases’ thematic content. Furthermore, the maximum number of annotation concepts in each case was not higher than three and that happened because the ontology, in most of the cases, made redundant the annotation of the latter with all the relevant concepts as the reasoning mechanism was able to infer this relevance.
3.5 Ontology Contextualization The final step of the process involved defining the Ontology Application Context. This was initially performed based on the results of the knowledge acquisition process while the final values of the context’s parameters were determined after a two-month period of testing the system and receiving feedback from the users. The values of the Taxonomic and the Non Taxonomic Relation Application Contexts are shown in table 1. As far as the Taxonomic - Non Taxonomic Pair Application Context is concerned, the values for all pairs were h1 = 1, h2 = 0, h3 = 1 and h4 = 0.
20
P. Alexopoulos et al.
Table 1 HTSO Taxonomic and Non Taxonomic Relation Application Contexts Parameter f1 (Processes Taxonomy) f1 (Extents Taxonomy) f1 (Rules Taxonomy) f1 (Rights & Obligations Taxonomy) f1 (Information Sources Taxonomy) f1 (Participants Taxonomy) f1 (Services Taxonomy) f1 (Units & Systems Taxonomy) f1 (Actions Taxonomy) g1 (participatesInProcess) g1 (performsAction) g1 (hasRightOrObligation) g1 (isRelatedToProcess) g1 (regardsProcess) g1 (foundInInformationSource)
Value Parameter 1 f2 (Processes Taxonomy) 1 f2 (Extents Taxonomy) 1 f2 (Rules Taxonomy) 1 f 2 (Rights & Obligations Taxonomy) 1 f 2 (Information Sources Taxonomy) 1 f2 (Participants Taxonomy) 1 f2 (Services Taxonomy) 1 f 2 (Units & Systems Taxonomy) 1 f2 (Actions Taxonomy) -0.2 g2 (participatesInProcess) -0.2 g2 (performsAction) 0 g2 (hasRightOrObligation) -0.6 g2 (isRelatedToProcess) -0.5 g2 (regardsProcess) -0.2 g2 (foundInInformationSource)
Value -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -0.95 -0.7 -0.8
4 System Deployment and Evaluation The Electronic Library of HTSO was deployed and made available to the public in October 15th 2008 through the URL http://emarketinfo.desmie.gr/htso/user (figure 5). The overall system’s implemented features and characteristics can be summarized as follows: • • • • •
Content and interface available in two languages, English and Greek. Content retrievable and viewable in HTML and PDF format. Structural navigation within the documents through their tables of contents. Semantic navigation by means of the ontology’s taxonomies. Semantic search through free text queries and filtering criteria.
The deployment of the system was preceded by a two-month period of thorough testing and fine-tuning of the framework’s parameters in order to increase the effectiveness of the retrieval. During the same period, people from HTSO were trained in using the system’s administrative tools for content management and semantic annotation as well as for management of the domain ontology. This type of administration is necessary as content is expected to get frequently added in the library and the need for new ontological concepts describing this content might always arise. The final evaluation of the system by the people of HTSO, right before its release, yielded satisfaction from their part in terms of the quality of the retrieval and the navigation capabilities. At the moment this chapter is being prepared, we are designing and HTSO plans to implement an end-user feedback mechanism in order to be able to collect the end-users’ comments on the system’s effectiveness. These comments are expected to be (continually) analyzed by the administrators and used
Semantic-Enabled Information Access
21
Fig. 5 HTSO Electronic Library
to further fine tuning of the system’s model; we will most probably be reporting on these results in some future publication.
5 Summary and Conclusions In this chapter we described the development process of the electronic library of the Hellenic Transmission System Operator, a knowledge portal that utilizes explicit semantics in order to provide effective access to documents related to the electricity market. Through this description, we have presented a novel semantic information retrieval framework that provides a generic yet comprehensible and structured way to build semantic information retrieval systems in any domain. The framework, among others, enables us to take into consideration and model the users’ subjective perception of semantic similarity in the context of the specific application and domain. This leads to much higher user satisfaction in terms of the system’s search effectiveness. Other semantic similarity measures in the literature fail to address the issue of subjectivity, so in this direction we have stepped on new ground. Similarly, it is novel and useful that we are now able to define which components of the ontology and in what way should be used for the information retrieval process. During the actual realization of the proposed approach and methodology in the development of the HTSO system, perhaps the greatest challenge met was that of the knowledge acquisition process and in particular the engagement of the domain experts into it. The reason for this is that explaining to the domain experts the system’s underlying retrieval mechanism without getting into too technical details proved to be a harder issue to tackle that what one might expect. It seems that the knowledge elicitation barrier is still to be overcome.
22
P. Alexopoulos et al.
Clearly, our future work will have to include the definition of a formal and detailed methodology through which knowledge engineers will be able to better “exploit” the domain experts in the process of implementing the framework.
References 1. Alexopoulos, P., Wallace, M., Kafentzis, K.: A Fuzzy Ontology Framework for Customized Assessment of Semantic Similarity. In: 3rd International Workshop on Semantic Media and Adaptation (SMAP 2008), Prague, Czech Republic, December 15-16 (2008) 2. Aamodt, A., Plaza, E.: Case-based Reasoning: Foundational Issues, methodological variations and systems approaches. AI-Communications 7(1), 39–59 (1994) 3. Abecker, A., Hinkelmann, K., Maus, H., M¨uller, H.J. (eds.): Gesch¨aftsprozessorientiertes Wissensmanagement. Springer, Heidelberg (2002) 4. Bergmann, R., Breen, S., G¨oker, M., Manago, M., Wess, S.: Developing industrial casebased reasoning applications. LNCS (LNAI), vol. 1612. Springer, Heidelberg (1999) 5. Bergmann, R.: Experience Management: Foundations, Development Methodology, and Internet-Based Applications. LNCS (LNAI), vol. 2432. Springer, Heidelberg (2002) 6. Bergmann, R., Schaaf, M.: Structural Case-Based Reasoning and Ontology-Based Knowledge Management: A Perfect Match? LNCS (LNAI), vol. 2432. Springer, Heidelberg (2002); Journal of Universal Computer Science 9(7), 608–626 (2003) 7. Calegari, S., Sanchez, E.: A Fuzzy Ontology-Approach to improve Semantic Information Retrieval. In: Bobillo, F., da Costa, P.C.G., D’Amato, C., Fanizzi, N., Fung, F., Lukasiewicz, T., Martin, T., Nickles, M., Peng, Y., Pool, M., Smrz, P., Vojtas, P. (eds.) Proceedings of the Third ISWC Workshop on Uncertainty Reasoning for the Semantic Web - URSW 2007. CEUR Workshop Proceedings, vol. 327, CEUR-WS.org (2007) 8. Leake, D., Wilson, D.: Case Based Reasoning: Experiences, Lessons & Future Directions. AAAI-Press, Menlo Park (1996) 9. Schank, R.C.: Dynamic Memory: A Theory of Reminding and Learning in Computers and People. Cambridge University Press, Cambridge (1982) 10. Straccia, U.: Towards a Fuzzy Description Logic for the Semantic Web (Preliminary Report). In: G´omez-P´erez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 167–181. Springer, Heidelberg (2005) 11. Wallace, M.: Ontologies and Soft Computing in Flexible Querying. Control and Cybernetics 2(38), 481–507 (2009) 12. Zadeh, L.A.: From search engines to question-answering systems the need for new tools. In: Menasalvas, E., Segovia, J., Szczepaniak, P.S. (eds.) AWIC 2003. LNCS (LNAI), vol. 2663, Springer, Heidelberg (2003)
Ontology-Based Profiling and Recommendations for Mobile TV Yannick Naudet , Armen Aghasaryan , Sabrina Mignon, Yann Toms, and Christophe Senot a
b
Abstract. In this chapter, we present a recommending system that has been developed for filtering TV content provided to users on their mobile devices. This recommender is fully based on ontologies which are used to formalize both the user and her/his interests, and the audiovisual content. The developed ontologies allow matchmaking between user and content at different levels, based on three means to define user interests: categories, content description, or any combination of concepts defined in an ontology. The computation of user profiles relies on both explicit and implicit profiling, based on incremental learning of interest degrees from content usage.
1 Introduction In converged mobile broadcast and cellular services, the ability to choose among hundreds of digital broadcast streams and to browse a multitude of IP-based content will necessitate integration of advanced user interfaces capable of personalized service discovery. Next generation mobile terminals should be capable of not only displaying available services in the ESG (Electronic Service Guide) but dynamically filter the content according to user preferences. This is one of the main objectives of the project “Mobile Video and Interactive Services” (MOVIES1) in the scope of which the work presented in this chapter has been carried out [18]. The cooperation between DVB-H broadcast [6] and 3G mobile networks (Fig. 1) allows offering new services, such as interactive TV, personalized services, on demand services, and media access rights protection and Yannick Naudet and Sabrina Mignon Centre de Recherche Public Henri Tudor 29, Av. John F. Kennedy, L-1855 Luxembourg-Kirchberg, Luxembourg e-mail: {yannick.naudet,sabrina.mignon}@tudor.lu Armen Aghasaryan, Yann Toms, and Christophe Senot Alcatel Lucent Bell Labs Centre de Villarceaux, route de Villejust, 91620 Nozay, France e-mail: {armen.aghasaryan,yann.toms}@alcatel-lucent.com {christophe.senot}@alcatel-lucent.com 1
Project partially funded by EUREKA Celtic initiative.
M. Wallace et al. (eds.): Semantics in Adaptive and Personalized Services, SCI 279, pp. 23–48. © Springer-Verlag Berlin Heidelberg 2010 springerlink.com
24
Y. Naudet et al.
security. The specificity of such a converged streaming & broadcast platform relies on the fact that the 3G return channel enables access to a centralized personalization logic (profiling intelligence and content recommendation) while the broadcast content selection and usage tracking is done on the mobile terminal.
Content provider
Cooperative Service Platform
Personalization
Interactive broadcast module
Content Broadcasting Broadcast network Operator (DVB- H) Terminal
Security and Conditional Access
Internet
Point-to-point delivery Information request Mobile network Operator
3G
Interaction software
Fig. 1 Converged DVB-H and 3G architecture.
For a few years now, the Semantic Web [3] and its associated technologies has been gaining interest, and has been used in particular for pushing personalization to a semantic level using ontologies. When classical recommending systems are often based on categories or matchmaking on a limited set of properties of the delivered content, ontology-based recommenders open new possibilities [2][4]. Using ontologies, for content description or indexing and at the same time for user modeling, allows a better matching of contents with users in the information filtering process. Concepts used in content and user profiles are formally defined within a common representation framework provided by ontologies. During matchmaking, there is thus no ambiguity between compared terms, leading then to more accurate results and a richer personalization thanks to inference possibilities. Personalization systems using ontologies for concept disambiguation or reasoning based on semantic relations have appeared recently. The main interest of those ontologies is to provide a basis for reasoning and to introduce inferred information in the matchmaking process [10][11], be it between user and content or user and user for respectively content-based or collaborative filtering approaches. In the multimedia domain, we can quote [16], in which semantic descriptions for user preferences have been added to MPEG7 and MPEG21. In [2], query semantic refinement is addressed, based on a set of ontologies for personalized information retrieval in TV multimedia content collections and cultural archives. In [5], multimedia content filtering based on ontologies is addressed; weighted concept vectors for user profiles and content descriptions are used.
Ontology-Based Profiling and Recommendations for Mobile TV
25
The existing works seem however not to have explored all the possibilities offered by the use of ontologies. In particular, the expression of user interests for precise things formalized as a combination of concepts is not discussed. We have conceived a set of ontologies allowing user and Audio/Video content matchmaking, along three dimensions: categories or themes of interests, content description, and precise interest description. User interests can be formalized using one or multiple of those dimensions and can moreover be associated to contextual data. This ontological formalization, used in conjunction with rules sets and a global matchmaking algorithm, has been successfully demonstrated in the mobile recommending system for broadcasted TV and Video on Demand, which we present here. This chapter is an extension of a preliminary version presented in [13]. In the remainder, section 2 first presents the mobile TV recommender architecture and the implementation of its main components: the profiling and recommending services. Section 3 presents the set of ontologies we have conceived. In section 4, we present the profiling service and discuss the used approach for explicit and implicit profile updating. Section 5 presents the recommending service and its associated algorithms. In section 6 we illustrate the recommender with two examples that have been tested in a fully integrated environment. Finally, section 7 concludes and gives some perspectives.
2 Mobile TV Recommender Architecture The architecture illustrated in Fig. 2 shows a profiling and personalization solution of mobile TV and video delivery in a converged broadcast and streaming service. A key component of such a service is the ESG delivered to the terminal in the form of an XML file encapsulated in the broadcast channel [1]. ESG lists the Services and Content available to the user. Besides TV channels, it can describe VoD services, Radio, or Data services (news, weather, stocks, etc). Because consumers cannot afford wasting time finding their way in the large amount of content and services now available from mobile devices, personalized ESG targeting user needs and interests are the next step in mobile applications. Our personalized ESG application proposes both non-filtered and filtered content, which can be done in different ways: ordering, faceted classification according to each interest, and/or categories, etc. In our architecture, personalization is done the following way. The ESG application makes a request to the recommendation service transmitting user and ESG identifier. The recommender first retrieves the user profile from the profiling module, together with ontological descriptions of contents from a dedicated knowledge base. It then processes the data and returns a list of contents matching the user profile, with the associated coefficients and corresponding interest categories. Additionally, the user is constantly monitored by the profiling service that maintains his profile up-to-date. The profiling service uses two mechanisms: explicit and implicit profiling. In the explicit profiling phase, the user declares some of his interests and non-interests via a web portal. The implicit profiling consists
26
Y. Naudet et al.
in learning and updating the profile from usage traces (log files) which are collected in the ESG application, and then, packaged and sent to the profiling service by its corresponding proxy (on the terminal).
Fig. 2 Mobile TV recommender architecture.
Generally, such a profiling service can enable both well-known types of personalization techniques: the content-based approach (e.g. [8]), and the collaborative filtering (e.g. [14]). The content-based algorithms look at the ‘similarity’ between the user (profile) and the item (metadata) to recommend, while the collaborative filtering algorithms recommend the item if it has been appreciated by ‘similar’ users (based on consumption history or ratings). However, in the scope of the current work and to fit the needs of the recommending service, only the content-based approach is used by the profiling service. Therefore, the profiling service also relies on content metadata describing the semantics of the consumed contents and services. Finally, as can be seen in Fig. 2, a more centralized solution for a pure VoD streaming service is also possible by using the same profiling and recommendation components. In this case, the personalization requests as well as usage traces are received from the VoD portal.
2.1 Implementation The profiling engine prototype has been implemented in a larger scope of multiple content delivery platforms (IPTV/VoD, Web portals, mobile video) where the customers can use a diversity of terminals: TV/Set-Top-Box, mobile phone, and
Ontology-Based Profiling and Recommendations for Mobile TV
27
laptop [15]. The profiling engine allows the building and the querying of profiles through a Web Service/SOAP2 interface. One of the specificities of the implementation in the context of converged streaming and broadcast Mobile TV consists in realizing two profiling proxies that interact with the central profiling service: 1/ the profiling proxy residing in the streaming platform, and 2/ the profiling proxy embedded in the mobile terminal (for the case of broadcasted content). The role of both proxies is to capture the content usage data for each user and to transfer them to the profiling service in the right format. The northbound interface of the profiling service provides an OWL user profile upon request from the recommendation service for a given user identifier. Such a profile can then be directly processed by the recommender. If profiles are provided to the recommender in OWL format, they are however stored in a Data-Base (DB) in the form of pairs. This storage of instances in a relational DB allows a faster access and thus faster processing for the profiling engine. However, the ontologies which define the structure of the profile are not stored in the database but kept as OWL files which brings flexibility by allowing an easy remodeling and assures a good synchronization with the recommender’s model. The recommending system relies on ontology and rules processing on one hand, and matching algorithms on the other hand. Although it has been designed in a generic way with reusability in mind, it is specialized for the processing of ontologies we present in the next section. As the whole system is written using the Java language, we have chosen to use the Jena3 semantic web library to manipulate ontologies and profiles, express rules and make inferences. During an initialization phase, the ontologies are loaded from their OWL representation. The engine infers a new knowledge base adding links accounting for proximity between some terms in the ontology set, in particular based on categories similarity, and statically defined rules linking related concepts. This first inference phase is required only when ontologies have been modified. Then, once the recommender is requested to filter a set of content for a given user, a second inference phase based on the matchmaking approach begins. The recommender is built as a service component, accessible from the mobile phone as a SOAP web service taking as a parameter a reference to an ESG. When this parameter is not specified, a VoD mode is selected. When the recommender is queried, it requests the user profile from the profiling service, as well as description of contents referred in the ESG, and contextual information. The inference engine of the recommender, fed with a set of rules, is then run and matchmaking between the user profile and content descriptions is performed. The recommendation is returned to the calling application. The whole processing is done by the recommending service on a server side. On the mobile side, only user and ESG identifiers are used as inputs. This kind of architecture where all the workload is put on the server side is less constraining in our mobile environment, since we do not have to deal with low processing capabilities of mobile devices. 2 3
Simple Object Access Protocol, http://www.w3.org/TR/soap12-part0/ http://jena.sourceforge.net/
28
Y. Naudet et al.
The inference engine used is built on the RETE algorithm [7]. This is an efficient pattern matching algorithm for rule systems, but it requires some tuning in its usage in order to obtain good performances. Indeed, the inference engine considers atomic conditions constituting a rule antecedent one after the other, from the top to the bottom. In order to insure good performances, we reordered our rule’s antecedents following this principle: for each rule, atomic conditions should be ordered from top to bottom according to their lowest probability to be matched. Basically, the time needed for examining which facts, i.e RDF triples here, match a condition depends on the number of facts that have to be scanned. It is thus better to put the most discriminatory condition near to the top.
3 The Set of Ontologies The ontology set we have conceived comprises ontologies for user, content, context, and also categories. Fig. 3 illustrates the relationships between the different ontologies. The user ontology (UO) allows expressing interests which are valid in a given context, hence the link to the context ontology (CXO).
Fig. 3 Ontology set for the mobile recommender.
Those interests can concern a content, as defined in the content ontology (CTO), or a category of things, which is the core concept of the category ontology (CATO). Contents are described by a specific ontology, which in our case is TV-Anytime4 (TV-A), and are assigned to categories. Finally, there is a specific link to model the fact that a user is situated in a given context at a given time: it is especially important to consider the context in which the user is to receive some content. The user ontology is inspired by different user models, among which GUMO (General User Model Ontology) [9]. We have only kept concepts that are relevant for our application case. In the proposed ontology, illustrated in Fig. 4, a User is represented by the Person he is, and his interests, preferences and usage history. 4
http://www.tv-anytime.org/
Ontology-Based Profiling and Recommendations for Mobile TV
29
Fig. 4 The User Ontology.
The user-as-a-person part might in the future be formalized as a separated ontology. It comprises in particular: • A personalia, which is represented by the Person concept, User being a subclass of it. It is defined with concepts and properties linked to the user himself: who he is, his demographic information, his working and leisure activities, etc. In particular, the Role concept is also used with the TV-A ontology to specify e.g. actors, or director of a movie. This personalia part can be used e.g. for age-restricted content, to provide the user with content related to his birthplace or sending him a gift at his birthday, to propose content related to his work or leisure, language, etc. • A list of Abilities, characterizing the physical capabilities of the user. This will be typically used for content alteration regarding handicaps, filtering content the user will not be able to access or understand, etc.
30
Y. Naudet et al.
Then, the remainder of the user ontology comprises: • The Interests of the user regarding categories of things, content or more specific interests that can be expressed using instances of rdf:Resource. • The UsageHistory, containing references to instances of contents that were consumed by a user.
Fig. 5 The Content Ontology.
The Content concept is defined in the ontology related to content (CTO), illustrated in Fig. 5. Any consumed content is associated to a context, defining the situation in which the content was consumed. The Context concept is defined in the context ontology (CXO). The model proposes a UserGroup class for which interests and usage history can be specified. The groups will be used for applications using collaborative filtering. The Interest concept is used to specify user interests as well as non-interests. Interests are associated to a validity period and a level of interest. The latter will be used to weight the interest in the global matching process. An interest can be expressed classically using categories. We have designed a specific ontology for categories (CATO), illustrated in Fig. 6. It is a simple taxonomy of categories, instances of a main class cato:ThingCategory, linked by a property isSubCategoryOf. Currently, we use categories and sub-categories related to movies, sport, music. For example, in the case of the Movie category, classical concepts such as Action, Adventure or Comedy can be found as sub-categories.
Ontology-Based Profiling and Recommendations for Mobile TV
31
Extensive lists of categories exist, e.g. in GUMO or TV-A, which could be reused if needed as soon as the structure is adapted. Indeed, at the time we write, we have formalized a TV-A category taxonomy structured according to CATO. The main foreseen advantage is a direct correspondence with classification usually used in ESGs. Another possibility offered by the model is to express interest for specific content characteristics. This is typically what will be exploited when the user consumption history is used. Instances of cto:Content are created from the consumed content profiles, summarizing the user's preferences in terms of content. Last, the user might be interested in specifying more precisely some interest for things or facts, that cannot be expressed using simple categories or through the description of a content: e.g. expressing an interest for someone in a particular role, or for a given place, an event, etc. Such interests can be formalized by specifying directly an instance of rdf:Resource as the subject of interest. This way of expressing user interest provides enough flexibility to express almost anything that can be said using facts and ontologies.
Fig. 6 An extendible taxonomy of categories.
The content ontology CTO (Fig. 5) is very small and generic so that any existing rich multimedia content description model can be used to detail the properties of specific kinds of content. In order to deal with TV programs, we have implemented a part of the TV-Anytime standard as an ontology. CTO defines a few common properties for multimedia content. In particular, a Content is linked to categories, instances of the category ontology CATO, associated to a given proportion. A specific property allows linking content instances with a technical description. In the case of the TV-A ontology, this is given by the tvao:BasicContentDescription concept, which is the main concept for content description according to the TV-A standard. Finally, the context ontology CXO, proposes concepts both for defining user mood and feelings, and environmental conditions (e.g. time, location, temperature, noise, etc.). We do not go into more details here as it is not yet used in the current version of the mobile TV recommender. The reader can refer to [12] for more details.
32
Y. Naudet et al.
4 Profiling Service This service is provided by a multi-platform and application-agnostic profiling engine [1] which realizes the automatic learning of each user’s profile and provides an estimation of its interest domains. For this purpose, usage traces from different sources (with different semantics and formats) are collected and analyzed, e.g. the consumption and purchase logs from diverse service delivery platforms (IPTV, Web Portals, or Mobile content) of a large telecommunication operator. The profiling engine is application-agnostic, in the sense that it is not tailored for a specific personalized application, but provides a generic intelligent interface with a number of reusable primitives to be used by different personalized applications like targeted ad, content recommenders, or social networking applications. In the scope of this chapter, we focus however on a single content recommendation application, personalized EPG on mobile TV, although usage traces from different sources are considered: VoD consumption traces (download/purchase logs) and DVB-H broadcast content viewing (duration, channel zapping). The profiling engine is designed in a way to decouple its internal logic from a particular user profile model and content metadata structure. It can thus be easily applicable to a new profile/metadata structure and semantics. In the next section, we first describe the basic concepts used in the internal data model of the profiling service as well as their mapping with the ontology concepts used by the recommending service. Then, the incremental profiling process is highlighted.
4.1 Concepts and Measurable Quantities Semantic concepts constitute core elements of the profiling engine’s data model. They represent the glue between user profiles and content characteristics. All the entities, in which a user interest can be expressed, are special cases of the SemanticConcept class, see the inheritance relation in the diagram of Fig. 7 (right side). In addition to cato:ThingCategory, the profiling engine supports rdf:Resource, and cto:Content that can be specified as interest objects in the UO ontology. The user profile is basically represented by a set of pairs, where each value is taken from the interval [0,1] and reflects the level of interest in the given (semantic) concept. More generally, the profiling engine manipulates three important classes of objects that allow associating a numerical value to a semantic concept, a mechanism that they inherit from their common ancestor SemanticQuantity class; see Fig. 7 (left side). These entities are defined as follows: • Quantity of Affiliation (QoA) characterizes the degree of affiliation of a content item to a given semantic concept. Each content item can be characterized by a set of QoA. For example, the film “Shrek” can be described by {Animation = 0.9, Comedy = 0.8}. • Quantity of Consumption (QoC) characterizes the degree of intensity of a consumption act with respect to a given semantic concept. For example, if two users watch “Shrek” for resp. 10 minutes and 1 hour, respectively, it
Ontology-Based Profiling and Recommendations for Mobile TV
33
could be inferred that the second user is more interested in that content (and its semantic concepts Animation and Comedy), than the first one. Thus, each consumption act can be characterized by a set of QoC. • Quantity of Interest (QoI) characterizes the degree of interest of the user in a given semantic concept. The user profile is composed of a set of QoI. Note that this model allows each class of semantic quantities QoA, QoC, and QoI to introduce its specific attributes, in addition to a single inherited attribute Value (Fig. 7). For example, an explicitly declared non-interest in a semantic concept can be expressed by an additional attribute Non-interest (Boolean) defined in the QoI class. In the next section, however, we focus on the implicit learning and update of its single inherited attribute expressing the interest value5.
Fig. 7 Profile modelling elements and their mapping to ontology concepts.
In the sequel, by abuse of notation, we will refer to semantic quantities QoA, QoC, and QoA to express both the class names and their respective attribute Value.
4.2 Incremental Profiling Process First of all, the user can declare some of his preferences (interests and noninterests) via a web portal; we call this phase explicit profiling. The data provided by the user at this step are taken as a starting point for the implicit profiling process that updates the profile data by further analyzing the usage traces. The next stage consists in characterizing each consumption act in terms of values on semantic concepts. 5
The non-interest attribute is not updated by the incremental profiling process and therefore it may contradict the implicitly learned and updated interest value.
34
Y. Naudet et al.
4.2.1 QoC Computation The user’s consumption act is described in terms of Quantities of Consumption (QoC). Each QoC value gives a normalized measure, QoC ∈ [0,1], of the observed user interest for the given semantic concept. This measure is based on the assumption that the longer the user consumes the content, the more interested he is by the subject of the content, or similarly, the more the user pays to watch some content, the more interested he should be for that type of content. Such a measure combining both viewing duration and price can be obtained as follows:
QoCni =
τ ∗ (1 + c) 2
∗ QoAni ,
τ = τ act τ max , c = c act cmax
,
(4.1)
where i represents the relevant semantic concept, n refers to the consumption event ordering, τ act is the actual consumption duration and τ max is the total video duration,
cact corresponds to the paid price for the consumed item, and cmax is the
maximum price of an item in a given domain. 4.2.2 QoI Update The Quantity of Interest (QoI) represents the estimated value of user interest in a given semantic concept. In the profiling engine, two complementary QoI update functions are used: 1/ consumption event-based QoI learning, and 2/ time-based QoI decay. The consumption event-based QoI learning function makes the QoI data on user interests evolve by combining their previously known values with their newly observed interest manifestation (QoC). We have considered a particular family of functions where the new QoI is obtained cumulatively by a weighted addition of the newly observed consumption, QoC, with the previous QoI:
(
)
(
)
QoI ni +1 = QoI ni + W QoI ni ∗ QoC ni .
(4.2)
The weight, given by the function W QoI ni < 1 , represents how much the new observation is influencing the profile evolution. Note that this function should be selected carefully so that the formula (4.2) always produces QoI values inferior to 1. Such a variable weight allows obtaining a “learning curve” behavior, where small interests grow relatively slowly (because of a small weight) and high interests are saturated by the upper limit of one (again, because of a diminishing variable weight). Here, the term “learning curve” makes reference to a relationship between the duration of student’s learning period and the gained knowledge or experience. The QoI evolution for a stable consumption pattern represents a sigmoid form as shown in Fig. 8.
Ontology-Based Profiling and Recommendations for Mobile TV
35
The time-based decay function is used to account for the aging of profile data:
QoI ki +1 = QoI _ Decay (QoI ki , Pki )
.
(4.3)
This function is called with a given periodicity, indexed by k, which should be significantly larger than the average time interval between consumption events. For example, it can be called on a monthly or quarterly basis. In order to decide how to decrease a given QoI, this function can take into account parameters, Pki , like the frequency or the recentness of consumption events on that semantic concept. So, depending on the consumption frequency, the QoI can be diminished linearly, exponentially, or without decay for a fixed period of non-consumption followed by a decay curve. As an example, Fig. 8 illustrates a QoI evolution driven both by (4.2) and (4.3). The consumption event-based QoI update (4.2) is pushed by a relatively stable consumption sample on a given semantic concept: a QoC sequence with on average 2 weekly occurrences during 160 days and the QoC values randomly selected in the interval [0, 0.5]. Due to the cumulative nature of the update function (4.2), the QoI approaches its upper limit of 1 after about 30 days of such a stable but moderate consumption. It follows a non-linear sigmoid-like evolution path that reflects the nature of the variable weight in (4.2). The time-based decay (4.3) is calculated on the basis of the consumption frequency (the number of days per month when consumption on a given semantic concept is registered). The decay function is applied every 3 weeks. The higher the consumption frequency, the less the decay applied on the QoI. Note that the decay function causes small step-wise perturbations on the QoI evolution curve without having a significant impact, except when the consumption frequency is constantly diminishing (i.e. beyond the period of first 160 days).
1 0,9 0,8 0,7 0,6 QoI 0,5
Freq QoC
0,4 0,3 0,2 0,1 0 0
10
20
30
40
50
60
70
80
90
100 110 120 130 140 150 160 170 180 190 200
time (day)
Fig. 8 A sigmoid QoI evolution curve given a temporal sequence of QoC.
36
Y. Naudet et al.
4.2.3 Profile Query The profile query interface enables different personalization systems (e.g. targeted advertisement, content recommender, community-based applications) by giving them access through Web Services to user profiles and by providing some reusable profile exploitation tools (distance computation, access to different views, etc). In the framework of the project for which the work reported here has been achieved, the profile query interface is in charge of mapping the user profile data into the User Ontology. Upon a request from the recommendation service the query interface provides the user’s profile in the form of ontology instances expressed in OWL, i.e. UO instances in the integrated system. The QoI values are mapped into the interestLevel attribute of the uo:Interest class.
4.3 Privacy Enhancement and Explicit Profiling The privacy of the end-user is a fundamental element to take into account when designing a personalization system which heavily relies on user’s profile knowledge. First, the service provider must ensure the compliance with the legal privacy rules in each country where the solution is deployed. Not the least is the user acceptance a major issue as the user profiling can easily be perceived by end-users as a threat and an intrusion into their private life. Furthermore, the user should be provided with a comprehensive interface for setting his privacy options. Our approach encompasses two aspects of user privacy protection: 1. Configuration of the intrusiveness level in the profiling process, i.e. which usage sources can be used and what kind of processing is allowed. 2. Access control of user profile data, i.e. which personalized applications can access to the (part of) profile data and under which circumstances. Privacy is handled by using high-level privacy policy rules. A part of these rules is introduced by the service provider in order to define its global profiling and personalization strategy in conformance with the acting legislation. For example, among the requirements put forward by the European Union [17] there are three core principles. • • •
Transparency: The user has the right to be informed about the purpose of the processing of his personal data, the recipients of the data, and all other information required to ensure the processing is fair. Legitimate purpose: Personal data can only be processed for specified, explicit and legitimate purposes and may not be processed further in a way incompatible with those purposes. Proportionality: Personal data may be processed so far as it is adequate, relevant and not excessive in relation to the purposes for which they are collected and/or further processed.
The remaining part of the privacy policy rules is introduced by each user in order to tune his personal privacy preferences. For example, the user can specify the
Ontology-Based Profiling and Recommendations for Mobile TV
37
types of services (mobile video streaming or broadcast, web browsing, IPTV, etc.) and the types of usage traces (viewing history, payments, interactivity, ESG navigation, etc.) that can be used for his profiling. The user can also identify the temporal and geographical contexts in which the profiling is activated (period of the day, location). In that sense, the filtering of traces directly at the user’s terminal level brings an additional protection to the user by avoiding the transmission of personal data onto the network. On the other hand, with respect to the access to his profile data, a user can define variable restrictions depending on the type of personalized application. For example, if one does not want to receive targeted advertisements based on some of his video interests, this part of the profile must be hidden to the targeted advertisements applications, but can be available for a recommender system. The restrictions can concern also the granularity of the information made available in a given domain of user interests (e.g. the maximal visible depth in the taxonomy tree). The graphical user interface for privacy management includes a multi-level optin mechanism, where at his first connection the user is asked to opt for each category of personalized applications (personalized EPG, VoD Recommender, targeted Ad, etc.). Then, the user is offered an option to configure more detailed privacy options if he wishes to do so. As an important part of the user interface for privacy management, we include also the explicit profiling feature: a read and write access of the user to his profile data. It not only allows to initialize the system, but also to rectify the learned profile. In the latter case, the profiling process continues with the new current profile modified explicitly by the user in the same way as in the initialization phase.
5 Recommending Service The recommending service is called, from the client terminal, by a mobile ESG application. Giving a client identifier, the corresponding user profile instantiated from UO is retrieved from the profiling service through a web service call. The matchmaker module then computes a matching value for each content referred in the ESG program. This is done by comparing each content description, instantiated from CTO, with the user profile. Profiles containing few information are sufficient to get a recommendation, but obviously the more complete a profile, the more accurate will be the proposed contents. Some content can match a user in two ways: when the content matches some properties of the user profile (i.e. the user itself), or when it matches a user’s interest or non-interest. The processing of content descriptions is performed in two steps. First an inference phase using a set of rules, resulting in the filtering of unwanted content and in an inferred fact base that will be used in the second phase. Second, a matchmaking phase, during which, matching levels corresponding to the user’s interest level for the contents are computed. These two steps are detailed in the next two sections.
38
Y. Naudet et al.
5.1 Inference Rules Associated to the set of ontologies is a set of inference rules allowing to deduce links between concepts used in profiles. Rules are classified in three types: a) inference rules, allowing to define the behavior linked to concepts defined in ontologies; b) filtering rules, allowing to exclude some contents from the processing; and c) interest creation rules, allowing to build new interests from the user profile. The first ones are necessary complements to ontologies and are an integral part of them. The others are specific to the applicative context and are exploited during the personalization process. The following rule, expressed using the syntax of the Jena rule language, is an illustration of the first kind of rule. It formalizes the fact that a user, able to speak and having a specific mother tongue, is able to speak in the corresponding language. Note that the “built-in” mechanism of the Jena API is used to call functions in rules. [speakAbility: (?user uo:nativeLanguage ?language), (?user uo:isAbleTo ?talkAbility), (?talkAbility rdf:type muo:Talk), (?talkAbility uo:level ?talkLevel), greaterThan(?talkLevel, 0.5), uriConcat(?user, '_speak_', ?language, ?speakAbility) -> (?user uo:isAbleTo ?speakAbility), (?speakAbility rdf:type uo:Speak), (?speakAbility uo:language ?language), (?speakAbility uo:level ?talkLevel)]
The next rule example is a filtering rule, allowing to pre-filter contents according to the user’s age and the parental guidance associated to these contents: [parentalGuidance: (?user uo:age ?userAge), (?contentDesc rdf:type tvao:BasicContentDescription), (?content cto:isDescribedBy ?contentDesc), (?contentDesc tvao:parentalGuidance ?parentalGuidance), (?parentalGuidance tvao:minimumAge ?minimumAge), lessThan(?userAge, ?minimumAge) -> (?user, uo:wontBeInterestedIn, ?content)]
In this rule, a new fact is created using the uo:wontBeInterestedIn property specifying that the content has to be removed from the list to process, before the matchmaking phase. In this approach, the contents to be filtered are annotated before being actually removed all together in one step. Another possible method would be to use the reactive rule principle by calling a built-in function (see [11]). This option has been left apart since it is more time consuming. In order to illustrate the last type, the following rule creates an interest for the “music” category when the user indicated in his profile that the “musician” activity is part of his leisure:
Ontology-Based Profiling and Recommendations for Mobile TV
39
[musician: (?musician rdf:type uo:Musician), (?musician uo:isLeisureActivityOf ?user), uriConcat(?user, '_music_interest_generated', ?musicInterest) -> (?user uo:isInterestedIn ?musicInterest), (?musicInterest rdf:type uo:Interest), (?musicInterest uo:hasSubject muo:music), (?musicInterest uo:interestLevel 0.5)]
In this case, an initial interest level is attributed to the new interest. It is then updated depending on the user consumption by the profiling service. All these rules are exploited during several phases of the recommendation process, as shown in Fig. 9.
Fig. 9 Recommendation Process
First, inference rules are used to infer on the user profile and deduce a set of facts depending only on his profile. This phase is performed only when the user has modified his profile or when ontologies have changed. When the user asks for a recommendation, a new inference is launched on a set of contents to process, based on filtering and interest creation rules. This phase generates a set of prefiltered contents, as well as a set of interests extending the user profile. In a last phase, matching levels are computed in order to finally provide resulting recommendations.
5.2 Matchmaking Approach The second step in the recommendation process consists in performing matchmaking between the user’s (non-)interests and content descriptions. For an interest I of [0, 1] between I and a content C as: a user U, we express the matching MI
∈
M ctx nb ( S ( I )) MI ( I , C ) = α ( I ) ∑ M SI (S ( I ) i , C ) , nb( S ( I )) i =1
∈
(5.1)
∈
where the interest level of I for U is written α(I) [0, 1]. Mctx [0, 1] is a matching function for contexts comparing the validity context of the interest I,
40
Y. Naudet et al.
ctx(I), and ctx(t) the context in which U is located at a given time t. α(I) is specified in an instance of the uo:Interest class with the interestLevel property. Mctx can be a binary function or consider context’s proximity depending on the application domain. It is not used for the moment in the mobile TV application. S(I) denotes a subject of I, as defined in the user ontology UO. As we consider all subjects with an equal importance, MI is calculated based on the average of matchings MSI(S(I)i, [0,1]. According to UO, an interest subject can be either a category S(I) = C) catI, a content S(I) = contI , or any resource S(I) = resI in the RDF sense.
∈
5.2.1 Categories Matching For categories, Mcat = MSI(catI, C) is a function of catI and of all the categories cat(C) of C. Let mcat = [mcat1, ..., mcatn], be a vectorial function whose elements are the individual matchings mcati = mcat(catI, cat(C)i) between catI and a category cat(C)i of C, n being their total number. We can write Mcat = f(mcat). Different parameters will influence the calculus of Mcat: the maximum individual matching mcati, the mean and variance of mcat, the number of categories n. We have chosen to consider the maximum, increased by a delta proportional to the mean, which brings coherent results according to our tests. Let’s call this function maxM, which we will reuse several times, defined for a vector x of dimension n as:
maxM (x) = xmax +
(1 − xmax) n ∑i =1, xi ≠ xmax ( xi ) , n −1
(5.2)
where xmax = max i =1 ( xi ) . For category matching, we have: n
∗
M cat = μ cat × maxM( mcat ) ,
(5.3)
with mcati = ϕi Sim(catI, cat(C)i). μcat is a tuning coefficient allowing to weight the importance of categories matching in the global matching calculus. According to the content ontology, a content can be linked to a category in two ways. For its main category, ϕ = 1, while for partial categories ϕ is the value specified by the proportion property. Sim is a generic similarity function for categories which is detailed below. The matching between two categories is calculated based on the similarity of their direct super-categories and on the levels of these two categories in a category i
j
taxonomy. The matching function sc (c1 , c2 , Pc1 , Pc2 ) computes the similarity bei
j
tween categories c1 and c2 considering one of their super-categories Pc1 and Pc 2 . It is a recursive function defined as:
sc(c1 , c2 , Pci1 , Pc 2j ) = λ × Sim( Pci1 , Pc 2j ) × min(levc1 , levc 2 ) ,
(5.4)
with levci the level of ci (for instance, category ”Sport” is of level 2, and category ”Fighting sport” is of level 3), and
λ=
max lev −1 2 max lev
, with maxlev the
Ontology-Based Profiling and Recommendations for Mobile TV
41
maximum level a category can have; i.e. the deep of the category taxonomy. Let
(
)
M sc c1 ,c2 = (msci , j ) = sc(c1 , c2 , Pci1 , Pc 2j ) , 1 ≤ i ≤ m, 1 ≤ j ≤ n be the matrix of
functions sc for c1 and c2, m and n being the number of parents respectively of c1 and c2. Let scmax(c1, c2) be the vector:
( (
) )
⎧⎪ max im=1 (msci , j ) 1≤ j ≤ n if (n ≤ m) . scmax = ⎨ n ⎪⎩ max j =1 (msci , j ) 1≤ i ≤ m otherwise
(5.5)
The similarity level between two categories c1 and c2 is given by Sim(c1 , c2 ) = maxM(scmax(c1 , c2 )) , which finally gives:
mcat i = ϕ i × maxM( scmax( cat I , cat( C )i )) ,
(5.6)
the final matching value between catI and cat(C)i being obtained by calculating Mcat as defined previously. 5.2.2 Resource and Content Matching For resources, Mres = MSI(resI, C) is a function of resI and of all resources res(C) describing C. This similarity function represents proximity in the domain the resources belongs to: how much two entities of a same kind are similar? As content descriptions are all stored in a same knowledge base, RDF triples concerning each content must first be retrieved in order to be compared with resI. We define resC = (res(C)i)1≤i≤n the vector of resources describing C and relevant for the matchmaking, as the set of all resources r verifying:
∀r ∈ res C , (5.7) (r = c) ∨ ∃t = ( s, p, o) ∈ KB : ( s ∈ res C ) ∧ (o = r ) ∧ ( p ∈ DOM ) where t is an RDF triple having for subject s, predicate p and object o and belonging to the knowledge base KB containing the content descriptions; c denotes the instance of content C in KB and DOM is the set of properties defined in the application domain (here, in our set of ontologies). The matching function for resources Mres [0, 1] is then defined as:
∈
M res = μ res × max in=1 (mres (res I , res iC )) ,
(5.8)
where μres is a tuning coefficient defining the importance attached to resource matching and mres [0, 1] is a recursive function computing the similarity between two RDF resources, defined as:
∈
42
Y. Naudet et al.
⎧1 if res1 = res2 ⎪ ⎪ ⎪ , mres (res1 , res2 ) = ⎨0 if res1 ≠ res2 ∧ res1 ∈ L ⎪ 1 ⎪ ∑ mres(o1, o2 ) ( res1 ), o1 , o 2 : ⎪ dim P(res1 ) ( res , p , op∈)∈PKB ∧ ( res1 , p , o 2 )∈KB 1 1 ⎩
(5.9)
where P(res) is the set of properties p ∈DOM such that ∃(res,p,o) ∈ KB ; and L is the set of literals. The “=” operator on RDF resources is defined as follows. If both resources are literals, the comparison is performed on values, including conversion if types are different. For instance, "5.0"^^ will be equal to "5"^^http://www.w3.org/2001/XMLSchema#string. Otherwise, URIs are directly compared. When the subject of interest is a content, the matchmaking comes to comparing two instances of the class cto:Content, which is achieved using the same approach as for resources. The function Mcont is written as a specialization of Mres:
M cont = M SI (cont I , C ) = M res (cont I , C ) ,
(5.10)
contI being the content that is a subject of I. By default, the weighting coefficient is the same, μcont = μres, but stays however independent. 5.2.3 Global Matching For all contents having gone through the filtering step, the global matching value [0, 1] of a content C for a user U is computed over the set of all interM(U,C) ests and non-interests expressed in the user profile, as:
∈
M (U , C ) = Γ(M I (U , C ), M NI (U , C )) ,
(5.11)
where MI = [MI1, ...,MIl] and MNI = [MNI1, ...,MNIm] are vectors containing matching values for respectively interests and non-interests calculated according to the formula for MI(I,C) defined in the preceding sections. The function Γ is chosen according to the desired behavior of the recommender. A default choice will be proposed to the user, but he will always be given the possibility to modify this choice in order to better reflect his expectations. We have identified two borderline cases that we take as a basis to determine the function Γ: a) the number of non-interests (resp. interests) exceeds the number of interests (resp. non-interests), whereas the average matching level of interests (resp. non-interests) is greater; b) the average matching level of interests and non-interests are equivalent, leading to an average matching level near zero.
Ontology-Based Profiling and Recommendations for Mobile TV
43
If we consider that the average matching level must prevail over the number of interests (resp. non-interests), we may use the following function:
Γ1 =
1 l 1 m MI ( U , C ) − ∑ i ∑ MNI j (U , C ) . l i =1 m j =1
(5.12)
On the other hand, if we consider that the number of interests is important, the following function will be more appropriate:
Γ2 =
m ⎞ 1 ⎛ l ⎜ ∑ MI i (U , C ) − ∑ MNI j (U , C ) ⎟ . ⎟ l + m ⎜⎝ i =1 j =1 ⎠
(5.13)
Choosing between using Γ1 or Γ2 determines the behavior of the recommender for the first case. In the case matching levels of zero should be avoided when interests and non-interests have been defined (case b), Γ2 may be used. In order to obtain a more representative result, we may also give priority to the interest or non-interest having the maximum matching level, which leads to the following function:
Γ3 =
1 (β × dmax(U , C ) + γ × Γ2 (U , C ) ) , β +γ
where dmax (U , C )
(5.14)
= max M I (U , C ) − max M NI (U , C ) ; and β, γ are two
coefficient empirically chosen.
6 Experimentation The integrated system has been successfully demonstrated during the final review of the MOVIES project. We present in this section two scenarios that have been tested, and finally discuss the global efficiency of the approach.
6.1 Illustrative Examples Different use-cases have been tested and have helped enhancing the ontologies as well as the whole profiling and recommending process. One of them is the case of Billy (Fig. 10), a 20-year old boy mainly interested in adventure and action films; in addition he has some other less important interests (e.g. rock). Each time he watches a new movie, his profile evolves accordingly. Fig. 10 shows the profile evolution from the beginning of the observation (t=n). In our scenario, Billy first watches (t=n+1) James Bond “Die Another Day” characterized with action, adventure and automobile racing categories; then (t=n+2), “The Fast and the Furious” characterized with crime, action and automobile racing categories. These two consumption lead to the reinforcement of the action category that was already present in the profile. Additionally, the repeated consumption of the automobile racing category brings in this new interest in Billy’s profile.
44
Y. Naudet et al.
Profile evolution (selected categories) 1
interest level
0,8 0,6 0,4 0,2 0
n
n+1
n+2
Action
0,75
0,81
0,83
Adventure
0,75
0,81
0,81
Rock
0,08
0,08
0,08
AutoRacing
0
0,15
0,3
Crime
0
0
0,1
consumption history
Fig. 10 Use Case 1.
When Billy opens the ESG on his mobile phone and asks for a recommendation, the system suggests a motor sport related documentary: a reporting on stock cars on one of the channels is currently playing… Another use-case shows a more elaborated situation for the recommendation service. According to his profile data, John is a fan of wrestling and jazz, and dislikes westerns. As a jazz fan, he particularly appreciates Charlie Parker. Strangely, he also likes films directed by Clint Eastwood, except westerns. The modelling of these interests is partly illustrated in Fig. 11. To better illustrate the behavior of the recommender, we consider a video-ondemand case, where lots of videos are available. However, the behavior and results would have been similar for the same contents proposed in an ESG. In the content database, two westerns are present among others: “Unforgiven” and “Il buono, il bruto, il cattivo”. Except for the language, which would have been a discriminating factor, these ones are given a low matching value because of the
Ontology-Based Profiling and Recommendations for Mobile TV
45
Fig. 11 Use Case 2.
user’s non-interest. For the second one, Clint Eastwood is actor of the film, not director. Thus the movie is still considered not being relevant. The interesting case is the one of “Million dollar baby”, which is directed by Clint Eastwood and is about boxing. The first property directly corresponds to one of the user’s interests. The second one corresponds to a category which is close to wrestling in our category taxonomy: they have the same parent category (fightingSport). Hence, the movie’s matching score is increased. Another interesting case is the one of “Bird”, which is somehow related to jazz (with Charly Parker) and is also directed by Clint Eastwood. Because it corresponds to multiple interests of the user, this movie is logically rated better than the others. In this scenario, we have demonstrated the use of interests and non-interests, the matching for categories (Jazz, Wrestling, and Western) and the matching for resources (Charly Parker_Jazzman and Clint Eastwood_Director). Table 1 shows the results obtained with the different functions Γ presented in the previous section. The results are coherent with the user’s interests. The case of “Unforgiven” illustrates the difference between the three Table 1 Sample matching values obtained with the different Γ functions. Γ1
Γ2
Million Dollar Baby
0,49
0,32
Γ3 0,61
Letters from Iwo Jima
0,35
0,23
0,58
Bird
0,35
0,23
0,58
Mystic River
0,35
0,23
0,58
Unforgiven
-0,05
0,1
0,25
Il buono, il brutto, il cattivo
-0,45
-0,22
-0,39
46
Y. Naudet et al.
functions. Indeed, it is a western directed by Clint Eastwood, which corresponds to both an interest and a non-interest in John’s profile. In this case, Γ1 gives a value near 0 as the total weight of considered interests and that of non-interests are balancing each others; Γ2 gives a positive value because the number of interests is more important (the movie is also categorized in “drama”); and Γ3 brings a better value as it considers both the number and maximal matching values. Finally, if the user chooses to follow the system recommendation and consumes the first movie “Million Dollar Baby”, it will reinforce the concept Clint Eastwood Director and create new concepts such as boxing, Clint Eastwood Actor, etc.
6.2 Efficiency Efficiency of the recommendations has been currently measured, for interests targeting categories, on a test set constituted by contents referred in IMDB completed by fake content descriptions to have a good balance between movies and other kinds of contents. In this case, the obtained precision/recall values are almost perfect. Some false negatives appear when a same subcategory appears in different independent categories hierarchies. At the level of categories, the relevance of filtering only depends on the structure of the category taxonomy, which then must be carefully designed. The influence of multiple and potentially conflicting interests is hardly quantifiable and has not been measured yet. Finally, since we do not currently consider approximate matchmaking, the matching with content or resource related interests, as well as with the contextual data, solely relies on the presence of corresponding statements in the content description and is thus only linked to the efficiency of the inference. To validate the efficiency of the profiling service (i.e. the relevance of learned user interests) a more sophisticated usage data base is needed comprising a measurement of the user consumption intensity. Such a study is underway with consumption data provided by BARB6 for TV consumers.
7 Conclusion and Perspectives We have been able to empirically demonstrate the potential of a complete profiling and recommending system based on incremental profile learning and semantic web technologies, for mobile TV. This system has been tested both in the case of VoD and broadcasted audiovisual content, on devices able to use 3G, WiFi and DVB-H. The implementation is modular and web-based, thus avoiding some limitations inherent to mobile devices. The recommendation system we have experienced is in a stage that still can be enhanced. Results are very promising when considering user profiles concerning categories and also any ontology concept describing content or any entity. The system is currently extended in a research project to other application cases, exploiting e.g. additional linked content, targeted advertisements, or communities of 6
Broadcasters Audience Research Board Ltd.
Ontology-Based Profiling and Recommendations for Mobile TV
47
users. In further research works, we intend to exploit the context-awareness capabilities of our ontologies, which we did not discuss here. This will indeed be particularly important in mobile environments. From the profiling service perspective, there are several dimensions for extension of the current solution: elaboration of adaptive decay functions taking into account consumption patterns for each content category, learning of non-interests and their integration within the incremental profiling process, multi-scale profile evolution approaches differentiating the long-term and short terms interests, extension of the current approach taking into account the statistical correlations between different semantic concepts, etc. Finally, a major next step studied currently is the learning of user community profiles for enabling peer-to-peer content sharing or community-based applications on mobiles.
References [1] Aghasaryan, Betgé-Brezetz, S., Senot, C., Toms, Y.: A Profiling Engine for Converged Service Delivery Platforms. Bell Labs Technical Journal’s Summer 2008 issue on Applications and their Enablers in a Converged Communications World 13(2) (2008) [2] Aroyo, L., Bellekens, P., Björkman, M., Houben, G.-J.: Semantic-based framework for personalized ambient media. Multimedia Tools and Applications 36(1-2), 71–87 (2008) [3] Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American Magazine (2001) [4] Buriano, L., Marchetti, M., Carmagnola, F., Cena, F., Gena, C., Torre, I.: The Role of Ontologies in Context-Aware Recommender Systems. In: Proc. of 7th International Conference on Mobile Data Management (MDM 2006), May 10-12, p. 80 (2006) [5] Cantador, M., Fernández, D., Vallet, P., Castells, J.: A Multi-Purpose OntologyBased Approach for Personalized Content Filtering and Retrieval. In: Advances in Semantic Media Adaptation and Personalization. Studies in Computational Intelligence series, vol. 93. Springer, Heidelberg (2008) [6] DVB-CBMS A099, IP Datacast over DVB-H: Electronic Service Guide (ESG) (November 2005) [7] Forgy, Rete: A fast algorithm for the many pattern/many object pattern match problem. Artificial Intelligence 19(1), 17–37 (1982) [8] Germanakos, P., Mourlas, C.: Adaptation and Personalization of Web-based Multimedia Content. In: Proc. of the Workshop on Personalization for e-Health of the 10th International Conference on User Modeling (UM 2005), Edinburgh, July 29, pp. 67– 70 (2005) [9] Heckmann, D., Schwartz, T., Brandherm, B., Schmitz, M., von WilamowitzMoellendorff, M.: GUMO - the general user model ontology. In: Ardissono, L., Brna, P., Mitrović, A. (eds.) UM 2005. LNCS (LNAI), vol. 3538, pp. 428–432. Springer, Heidelberg (2005) [10] Krunoslav, T., Alisa, D., Gordan, J., Mario, K., Sasa, D.: Semantic Matchmaking of Advanced Personalized Mobile Services using Intelligent Agents. In: 12th Conference on Software, Telecommunications and Computer Networks SoftCOM (2004)
48
Y. Naudet et al.
[11] Li, L., Horrocks, I.: A software framework for matchmaking based on semantic web technology. In: Proceedings of the Twelfth International World Wide Web Conference (2003) [12] Mignon, S., Groues, V., Naudet, Y.: Advanced Personalisation by Ontologies: Audiovisual Content Filtering on Mobile Devices. In: Proc. of JFO 2008, Lyon, France, December 1-3 (2008) [13] Naudet, Y., Aghasaryan, A., Toms, Y., Senot, C.: An Ontology-based Profiling and Recommending System for Mobile TV. In: Proc. of the 3rd International Workshop on Semantic Media Adaptation and Personalization (SMAP 2008), Prague, Czech Republic, December 15-16, pp. 94–99. IEEE Computer Society Publishers, Los Alamitos (2008) [14] Pampapathi, R., Mirkin, B., Levene, M.: A Review of the Technologies and Methods in Profiling and Profile Classification. EPALS (2005) [15] Senot, Y.T., Aghasaryan, A., Betgé-Brezetz, S.: Multi-Platform User and Usage Profiling Demonstration: Video-on-Demand Services in IPTV and Mobile Video environments. Demonstration Paper at User Modeling 2007, Athens, Greece (2007), http://www.iit.demokritos.gr/um2007/ UM2007-Demos-Leaflet.pdf [16] Tsinaraki, C., Christodoulakis, S.: A multimedia user preference model that supports semantics and its application to mpeg 7/21. In: 12th Int. Conf. on Multi Media Modeling (MMM 2006), Beijing, China, January 4-6 (2006) [17] European Parliament and Council Directive 95/46/EC of 24, on the Protection of Individuals with regard to the processing of Personal Data and on the free movement of such Data, Official journal L281 of 23.11 (October 1995), http://ec.europa.eu/justice_home/fsj/privacy/docs/ 95-46-ce/dir1995-46_part1_en.pdf, http://ec.europa.eu/justice_home/fsj/privacy/docs/ 95-46-ce/dir1995-46_part1_en.pdf [18] The official site of Eureka Celtic initiative, Movies project, http://www.celtic-initiative.org/Projects/MOVIES/
The USHER System to Generate Semantic Personalised Maps for Travellers Zekeng Liang, Kraisak Kesorn, and Stefan Poslad
a
Abstract. Map applications based upon Geospatial Information Systems (GIS) are seen as a key application area for mobile users, e.g., to enable travellers and mobile assets to be located and tracked, with respect to spatial views, or maps, of destinations and routes. However, current GIS map services tend to lack support for personalisation to: enable users to set preferences based on their context and user profiles; to customise searching and selecting content; to markup maps in-situ forming a personalised spatial memory. For example, current services can’t store, spatial short-cuts, good parking spaces, etc, which have been discovered in-situ, in the physical world. These GIS map services also tend to lack a provision to enable such tagged personal spaces to be used within shared social spaces, i.e., to share spatial memories. An ongoing spatial-aware framework called USHER (Ucommerce Services HEre for Roamers), has been extended, to semantically adapt and personalise maps, and tested. The contributions of this framework are: an ontology-based representation of dynamic user preferences interlinked to a domain model that is able to detect shifts in user interests; the creation of sharable user markup data governed by an access control matrix; the generation of personalised annotated GIS maps.
1 Introduction Spatial-Aware Map Services (SAMS) enable business, everyday and leisure travellers to relate a location to a spatial context and to a spatial view of that context. Spatial contexts include specific services, buildings, persons at a location, and a location in relation to other locations, e.g., destinations, or routes and regions. Spatial contexts are defined with respect to a specific spatial view of a physical environment space (the map) which is normally a direct ‘overhead’ view of a region. Typical components of SAMS are wireless smart mobile devices which enable travellers to seamlessly access spatial information services, anytime and anywhere, maps which can be pre-cache or accessed on demand, location sensors such as a satellite based Global Positioning System (GPS) and a Geographic Information Systems (GIS). A GIS defines and organises spatial objects at varying Zekeng Liang, Kraisak Kesorn, and Stefan Poslad School of Electronic Engineering and Computer Science, Queen Mary, University of London, UK e-mail: {zekeng.liang,kraisak.kesorn}@elec.qmul.ac.uk
[email protected] M. Wallace et al. (eds.): Semantics in Adaptive and Personalized Services, SCI 279, pp. 49–71. © Springer-Verlag Berlin Heidelberg 2010 springerlink.com
50
Z. Liang, K. Kesorn, and S. Poslad
layers of spatial abstraction enabling GIS application to query and select spatial objects, then to build customised spatial views that relate to particular applications and user tasks. SAMS applications such as vehicle SatNav systems offer maps that are locationaware, i.e., are centred around the current location and that show locations with reference to routes and to a destination. Although these SAMS are location-aware, they offer only very limited user-awareness, e.g., preferences for route constraints such as fastest route, avoiding main roads etc. SAMS that are not user-aware must either provide lowest-common denominator spatial contexts and views, e.g., positions along road routes, or combine many spatial views, e.g., positions along roads in relation to main tourist sights, business-driven building annotation, etc. These approaches either crowd too much information, much of which is unneeded, which is a particular problem for low-resource devices, or they may omit useful content because they adopt a lowest denominator approach to select content. In contrast, user-context aware SAMS can adapt maps to the traveller, e.g., content about footbridges for crossing over main roads can be included for pedestrians whereas it can be excluded for motorists. Context-aware systems tend in practice to orientate system outputs to the current context, e.g., mode of travel, what their travel activities are etc. More advanced user context aware systems may refer to the current context in relation to a goal-context [8]. In addition, these leverage the history of past user contexts in order to (partially) predict future (including goal) contexts [8]. User (context) awareness is often taken to be synonymous with personalisation. Personalisation focuses on adapting system outputs to particular interests of individual users [11], e.g., the preferences in visiting different types and instances of buildings and travelling specific routes. A user model or profile generally refers to the use of either or both user context and user preferences. User contexts and preferences and spatial contexts can be complex, multi-valued, heterogeneous, dynamic and contradictory [8]. Increasingly, semantic representations such as Ontologies are used to represent these because of their higher expressivity and precision [11] [14] [22]. SAMS can filter and adapt spatial views to user contexts and preferences, e.g., specific travellers may be interested in specific types of building by architecture or by function. Travellers may also prefer to customise the presentation of content, e.g., to include both local names of services and any translations of names relative to their home language in order to make content more understandable to them. Other preferences may be used to quality selections or service recommendations derived from the set of all possible services. Travellers often wish to create and store their own customised spatial contexts as map annotations, e.g., good or bad routes to a particular destination, good or bad vehicle parking areas, which they directly experienced, in the field. Users may wish to reuse these spatial experiences when they revisit an area. Users may also wish to share the information that they create with others and to share relevant information created by others. However, as the amount of such shared information increases, searching becomes harder. Furthermore, context awareness and in particular personalization and location-awareness generates a raft of privacy concerns [20]. If
The USHER System to Generate Semantic Personalised Maps for Travellers
51
privacy issues within a context-aware service environment such as Location-Aware Services (LAS) are not properly addressed, users risk revealing their context, publicizing their personal details and even compromising their safety at the current location, at the intended destination and on route. There is often a legal requirement in many countries to protect the privacy of the mobile users’ information. If users perceive that the risks of using a technology outweigh their potential benefits, they may stop using that technology. The issue of keeping personalized, mobile, LAS private has to be addressed in order to exploit their full market potential. The objectives of this research are to model and develop a system based on semantic geospatial services that adapts spatial content for mobile users to users’ tasks and to users’ preferences; to allow users to create, manage and share their own markup. The rest of the paper is constructed as follows. Section 2 gives a survey of the related work including personalised map services and semantic user profiling techniques. Section 3 describes how the personalised and user-aware SAMS applications as part of the USHER [7] system is designed and implemented. Some results of the semantic based personalised SAMS application demonstrator are presented in section 4. Finally, a discussion and conclusions of this research is given in sections 5 and 6 respectively.
2 Related Work Personalisation has been proposed as a means to reduce information overload for over two decades [12]. The main motivation for personalisation for travellers is that it can act as an additional filter to the location for retrieving information, reducing the information overload for travellers as it filters this according to a specific user profile rather than to all users. Personalisation is an added advantage when used in lower-resource service access devices used by travellers as information overload can also overload a person’s mobile device. In this section, existing personalised SAMS applications, user context and personal profile acquisition techniques, user profiling techniques are surveyed and analysed in order to identify their best practices and limitations. Open-StreetMap [5] allows registered users to create user diaries with location information, in a simple form which includes a diary topic, the content, author and creation date. This system allows users to set their profile description and home location. Freebase [2] provides a strong semantic structure for registered users to create their own types of data that can be shared on the Freebase web. However it does not provide specific support for mobile users and for sharing and overlaying the marked up data on a map. The GUIDE project [1] supports non-semantic based direct input of user preferences. The system mainly uses the user location as user context to retrieve location related information for users. Individual users can not filter the map data based on their own preferences or their individual tasks. In the CRUMPET project [7], personal profiles are specified by combining a mix of persona models with direct and indirect input by the user such as observations of where and what users choose to visit, but these models are not semantic. The AmbieSense project [3] [4] situates each user task within a use-case using case-based
52
Z. Liang, K. Kesorn, and S. Poslad
reasoning and combines this with location-awareness in order to make user recommendations. RECO [6] is similar to AmbieSense but instead of using casebased reasoning, it situates each user task within a sequence, by learning a user’s preferences over time, in order to make user recommendations. Before information services such as SAMS can be personalised, travellers’ context and profile need to be acquired. An important distinction for personal context acquisition is to classify these acquisition techniques by whether or not the user context or profile is directly input or indirectly gather through user interaction or a hybrid system is used. Probably most personalisation systems that can applie to SAMS include at least a basic element of directly gathering input from the user. Any information entered by the user into SAMS such as destinations and route preferences can be used to profile users. Information entered into other external applications such as calendar could also be input into a person modeller. Gauch et al [14] proposed a method that automatically creates user profiles from Web-based information retrieval searches. First, a reference ontology is created automatically by spidering any of a number of online subject hierarchies. Second, Web pages are linked within each subject, are spidered, and used as training data for a text classifier. Liu [5] proposed two steps to improve retrieval effectiveness. First, the system automatically deduces, for each user, a small set of categories for each query submitted by the user, based on his/her search history. Second, the system uses the set of categories to augment the query to conduct the web search. The framework relies on user’s usage history. It lacks the capability to adapt flexibility to a user’s changes in interests and ignores short-term interests. Xu and colleagues [16] [17] proposed a novel framework for semantic annotation and personalized information retrieval. However, users’ preference acquired from users’ queries may be ambiguous returning irrelevant results to user. Alternatively, multiple sensors can be used to acquire a user’s context indirectly but this has its own challenges in determining what can be sensed unobtrusively and dealing with false positives where the indirectly derived user context is incorrect. There are two main ways to sense the users’ context, either smart device that travellers carry around with them acquire the user context information or smart environments can be instrumented with sensors to sense and acquire the user context or both. There are several main disadvantages with instrumented environments, they can be expensive to create and maintain, may be far from being pervasive and may invade traveller’s privacy by collecting information and passing information about them to third parties. Location sensors in mobile devices can be used to generate user contexts. If, for instance, a user visits a number of old churches, then he is probably interested in churches and perhaps also other historic buildings in this town, like an old city hall [7]. The rate of movement of users could be used to indicate the mode of transport. Smart phones increasingly incorporate micro sensors such as accelerometers and gyroscopes to support 3D user gestures and these can also be exploited to provide information about user contexts. By incorporating temporal information, Widyantoro and colleagues [13] presented novel scheme to represent a user’s interest categories, and an adaptive algorithm to learn the dynamics of the user’s interests through positive and negative feedback which is the main
The USHER System to Generate Semantic Personalised Maps for Travellers
53
novelty of this framework. However, user preferences are stored in metrics. Thus, no semantic relationships between concepts are stored. To summarise, the surveyed work tends not to construct rich user models based on user and spatial contexts, user preferences, user annotations, user goals and tasks in order to adapt map content to individual users. There are several challenges with directly using user inputs. Inputs may only be gathered very intermittently, often only before travelling or when the user is stationary on-route. If travellers’ contexts are dynamic and rich, the system needs to more frequently ask or monitor the user for detailed input about their context and this in itself can be obtrusive and can overload the user [8]. The majority of the research and applications tend to focus on sensing specific isolated actions rather than being used to build persistent user models and to apply these to travellers. Hence, a hybrid system of user model is needed in practice. In addition, surveyed systems do not provide a capability for travellers to create and share knowledge through creating and sharing their spatial markup information based on semantic modelling. There is a lack of representations of user’s profile at an appropriate level of details of user interests, and the methods lack support for the terminology heterogeneity problem e.g. terms in user’s interest may not appear in existing ontologies. These are the main objective of this research.
3 Semantic Based Personalised SAMS The architecture of the semantic-based personalised application uses an extension of the CRUMPET system called USHER [7] based upon a three tier client server architecture, which consists of the client access device, combined client proxy and mediator, and service provider. The implementation of the map server is based upon a spatial extension of MySQL to store and retrieve spatial data. The client calls the GeoTools1 map API based middleware that supports advanced interactive map services via a client proxy which masks some of the complexity of the map retrieval and adaptation from the client device. The map demonstrator uses GIS content based on the Queen Mary, University of London (QMUL) Mile End campus and surrounding areas. These spatial services are described in subsequent sections.
3.1 System Architecture Overview The system framework in Fig.1 shows the main components and their data flows within the system. The User Model (of Traveller) component receives a number of inputs: user goals & tasks, e.g., attending a meeting in Queen Mary University of London; user preferences such as interests in specific types of building by architecture, either directly from user input or indirectly generated by a User Preferences Acquisition component, acquired from the user queries; user annotation, e.g., markup data which is created by individual user in the field and can be shared with others; current user contexts to construct user model for individual 1
GeoTools: The Open Source Java Toolkit, See http://geotools.codehaus.org/
54
Z. Liang, K. Kesorn, and S. Poslad
User Events User Context Acquisition Current User Contexts
User Annotation, Mark-up
User Queries User Preferences Acquisition
Personalised, Location-aware Maps, Annotations
Personalised SAMS Application
User Goals & PreferTasks ences
User Model ( of Traveller)
Adapted Context User Context Processing Model Data (Mediation & Adaptation)
Contexts
Store / Retrieve
Discovery / access
Context Management (Storage & Access Control) Spatial Temporal Context Acquisition
Environment Events
Fig. 1 A personalised SAMS application framework
mobile users. Current User Contexts are generated by User Context Acquisition component from the user events such as user information queries and through sensing users’ spatial-temporal context and a history of context changes. Spatial Temporal Context Acquisition component acquires environment context such as user location by gathering GPS data of the user, user movement by measuring three axes acceleration data of the user mobile device from environment events. The Context Processing supports mediation of multiple heterogeneous contexts and generates the context to adapt specific applications. Context Management component handles context storage, retrieval and access control. The Personalised SAMS Application component takes the input processed contexts to generate and delivery the personalised location-aware map to the mobile user.
3.2 Traveller (User) Context Acquisition and Modelling The core part of the traveller personal context ontology model is described in Fig 2. The definitions of three traveller instances, the user stereotypes are shown as follows: 1. Tourist, someone is new to a place and wants to know more about the area. 2. Business Man: someone is new to the area and just wants to know the essential necessary map information for them to finish their work. 3. Regular user: someone is familiar with the area, knows most of the basic map information about the area. What they really want to know for the area is something new in the area that they still don’t know.
The USHER System to Generate Semantic Personalised Maps for Travellers Business man
55
Regular traveller
Tourist
Walking
Preferences, Interests
hasInstance Driving
hasInterests
Cycling
has Instance
Jogging
has Travel Mode
Traveller
Travel Mode
Annotation, Markup
hasMarkup hasDestination
hasActivity
Single-shot hasInstance
hasGoal
Destination
Multi-shot
hasInstance ...
Transit
hasInstance hasInstance
Stop
Is-A Travel Activity
Travel Goal
... Sight
Type hasType
InstanceOf Posture
Repeat
Walking has Instance
See something
Meeting someone
hasSight Gather resource
Museum
….
hasType
Pub Sitting
Park Eating Place
Standing Cafe
Restaurant
...
Fig. 2 Traveller personal context ontology model
The Traveller ontology consists of the following main properties, Travel Mode, Travel Activity, Travel Goal, Destination, User Markup and User Preferences/Interests. Travel Mode defines how the user is traveling when they are using this map application, having the instances of walking, driving, cycling and jogging. User travel mode can be classified based on their movements detected by measuring based on three axes acceleration data received from the user mobile device. Travel Activity has the properties of Posture and Repeat, and the instances of Transit and Stop. Posture defines what kinds of postures the user might be in such as walking, sitting and standing. Repeat defines weather user activity is periodic or not. The experiments to indirectly determine travel activity and posture has been carried out in other research, and will be reported elsewhere, and also by other project such as [27]. Travel Goal defines the purpose of travelling such as seeing something, meeting someone, delivering something or someone or collecting something or someone. Destination defines a traveller’s destination, which contains the property of Type and others. User Markup/Annotation defines how individual knowledge can be stored and presented in the map, they are sharable among users depended on the access restriction set by the markup owner. User Markup/ Annotation have the property of Type that classifies them as different places. User Preferences/ Interests define individual user interests.
56
Z. Liang, K. Kesorn, and S. Poslad
Each of these concepts can be expanded further at a finer level of granularity. For example, the destination concept can be expanded as follows. The Type property also contains other properties such as Eating Place and Sight. Eating Place indicates the places for food, having types such as pub, café, and restaurant. Sight defines the sigh seeing places such as museum, park and others. Destination has the instances of Single-shot and Multi-shot to represent single place and multi places involved in the destination respectively. By defining the traveller personal context ontology model and gathering such user contexts, the system is capable of better understanding user’s goal, travel mode, destination and preference in order to generate a more personalised location-aware map for the mobile user.
3.3 Ontological-Based Traveller (User) Preference Representation A hybrid method is used for acquiring knowledge about user preferences which combine a statistic calculation and an ontological Knowledge Base (KB) model. Ontological-based users’ preferences are constructed from user queries and unstructured textual data of markup information. An external lexical reference system, WordNet [18], and domain ontology are exploited into the process to achieve a higher degree of automation. Because some travellers’ interests (preferences) may change over time, the system, which only relies on usage history, might become worse when a traveller changes his/her interests. For example, a traveller A usually would choose the route that takes shorter time to reach the destination as a regular traveller, but might prefer a scenic route as a tourist in a new area. Thus, the system should be dynamic by continuous and incremental refining, extending, and updating traveller preferences during system operation in order to cope with new facts and evidence about users’ preferences. This requirement led to the development of a learning model with respect to dynamic versus static traveller preferences. To solve this problem, we create two types of profiles for each user to representation their preferences. 3.3.1 Dynamic User Preferences To create dynamic user preferences, usage information is collected during a user query session. Some initial preferences for a new user will be created for his first use based on the selected user stereotype instance of user model. Ontologies enable initial user preferences to be matched with existing concepts in the domain ontology and with relationships between these concepts. The method in this paper is based that of Gauch et al [14]. Whereas Gauch links a visited page onto five categories in the Open Directory Project, we link user interests to our traveller context domain ontology and to any markup descriptions associated with the markup points as the selected return results from user queries. Building an Ontological model of user’s interest may cause inconsistencies if the domain ontology does not contain any of the words that form a given user’s preferences (terminological problem). To solve this problem, after processing the Natural Language
The USHER System to Generate Semantic Personalised Maps for Travellers
57
Processing (NLP) technique2, key words from user queries and markup descriptions can be augmented by adding semantically similarity or related terms. WordNet is exploited as a lexical reference system in order to find these additional related terms. Hence, the similarity between terms and concepts in the domain ontology are computed to determine the best match category to users’ preferences. The concept which has highest similarity value will be selected in order to construct the user preferences. Then, the user preferences consists of all concepts resulting from the previous step and is constructed based upon the domain ontology. The main advantage of this technique over existing learning algorithms is that it does not require a large number of training sets to identify a strong pattern. This is suitable for modelling dynamic user’s preferences. Figure 3 shows the algorithm for generating user preferences. The result of this step is that the initial user’s preferences ( ) is created. All concepts in are called the user interest concepts ( ). 1. 2. 3.
4.
Extract words from user query or markup description using NLP algorithm; Get set of keywords {K} by remove stop words and stemming; Get keywords {C} from class labels in domain ontology; a. For each keyword pairs {Ki,Ci} b. Look up all word senses in WordNet; c. Compute similarity between {Ki,Ci}; d. Select the concept ( ) which has the highest similarity value of word sense. Construct the user’s preferences profile ( );
Fig. 3 User preferences acquisition algorithm that exploits WordNet
After creating , the presented system will recommend other relevant concepts ( ) to users. We hypothesise that the lower the concept is in the hierarchicalbased user preferences, the more relevant concepts to user interests, or higher the level of concepts, the more general they are. Therefore, the proposed system will implicitly recommend instances based on the leaf nodes (LNs) in . For example, a user accesses a shared markup point with description ‘Yongfa Chinese restaurant with nice buffet in Mile End’. The user preferences are constructed using the domain ontology and WordNet. Hence, we can acquire the following information: 1. Eating Place ⊇ Restaurant ⊇ Chinese 2. Yongfa– -Chinese Restaurant where is the ‘hypernym’ relationship. The hierarchical model of user preferences is depicted in Fig. 4. In this example, the LN is the ‘Chinese’. The system 2
In this framework, we employ Espotter framework. See, ESpotter- Adaptive Named Entity Recognition for Web Browsing; http://people.kmi.open.ac.uk/jianhan/ESpotter
58
Z. Liang, K. Kesorn, and S. Poslad
′
will recommend only two types of relevant concepts to users by adding to based on the similarity to LN; Sibling Similarity (SS) (i.e. Happy Chops and Lotus), and its parents, the so-called Parent Similarity (PS), e.g., Thai (restaurant concept). The similarity degrees between and LN are measures based upon a distance vector, the number of nodes and their properties. For instance, the ‘Happy Chops’ and ‘Lotus’ (distance is 1 from LN) have a higher semantic relevance than the ‘Thai Smile’ (distance is 3 from LN). For those instances which have same parents with LN, the similarity is calculated from the properties between siblings. For instance, the ‘Happy Chops’ has a higher degree of similarity than the ‘Lotus’ in terms of the style property as Happy Chops is also a buffet restaurant whereas Lotus is a different style restaurant called “DimSum”. By this technique, the presented system is able to model the taxonomy of users’ preferences profiles at an appropriate granularity more than the surveyed frameworks. The advantage of this technique is that users’ preferences can be assigned a well-defined meaning using the global Ontology domain model. Ontologies consist of term descriptions and their interrelationships and support for logical inferences such that content retrieval can extend beyond the capability of keyword-based searches, e.g., semantic searches can find the relevant markup information even the searching keyword in the query does not appear in description of markup points.
Fig. 4 The Ontology-model of part of traveller model
3.3.2 Static User Preferences User preferences can be learnt from user’ usage history, referred to as multisession user preferences which are recorded by the user-model component using
The USHER System to Generate Semantic Personalised Maps for Travellers
59
the statistical model. The markup data from user can be accumulated from previous markup information to form two metrics: Markup-Term (MT) and ConceptTerm (CT) metrics (Fig. 5.). The MT matrix holds the relationships between the markup point and key-terms in markup description. Stop words have been removed and Porter stemming has been performed before constructing the MT matrix. The CT matrix derives information from the MT matrix and stores the relationships between concepts, from WordNet, and the key-terms. The value in each cell in CT(i,j) is a weighted value of each term which measures the important degree between the key-terms and concepts. We apply IF-TDF to calculate the weight of each term. We select this weight scheme because it is simple and effective. It can be scaled to a large dataset [25]. Mark-up/Term M1 M2
Concept/Term Restaurant food animal meat
Weatherspoon 1 0
Weatherspoon 0.855 0.577 0 0
Steak Fishbone 1 0 0 1 (a) Markup-Term matrix (MT) Steak Fishbone 0 0.855 0.855 0.577 0 0 0.855 0 (b) Concept-Term matrix (CT)
Fish 0 1
Fish 0 0.855 0.855 0
Chip 0 1
Chip 0 0.855 0 0
Fig. 5 Matrix representation of markup information
Some studies argue that only keywords and their frequencies (weights) are insufficient data for an accurate model of the user in semantic manner. Hence, we try to solve the above problem by inferring high-level knowledge about the user preferences by transforming CT matrix to the ontological-based model. However, these static user preferences rely on previous usage data. This can result in a failure to filter irrelevant markup points because users’ interests are dynamic and are likely to change over time. Therefore, multi-shot interests are not always reliable and not always accurately reflect the user’s interests. Therefore, a dynamic model is needed to cope with this problem. In contrast, the static model is needed when the dynamic model is not able to identify the user interests. 3.3.3 Leaning and Updating User Preferences The learning component is needed in order to improve further retrieval results by detecting user’s interest shifts and update the user preferences, updating weight of terms, and removing existing knowledge about users. User preferences can be updated implicitly during and after the retrieval process. However, here we do not focus on improving the learning algorithm. Thus, we adopted an adaptive learning algorithm proposed in [15] as follows: M (i, j ) t =
N it −1 1 M (i, j ) t −1 + t N it Ni
∑ MT (k , j ) * CT (k , i) k
(1)
60
Z. Liang, K. Kesorn, and S. Poslad
is the modified user preferences at time t; is the number of markup where points which are related to the i-th concept that have been accumulated from time zero to time t; the second term on right hand side of (2) is the sum of the weight of the j-th term in the markup description that are related to the i-th concept and obtained between time t-1 and time t divided by . This approach allows the system to learn and update users’ interests rapidly and makes user preferences more dynamic than the surveyed frameworks.
3.4 Personalised Map Content (Markup) Information Retrieval Once knowledge-based and user preferences are obtained, semantic retrieval will be performed. The retrieval component applies the Ontology model in order to support semantic queries on text-based markup descriptions. Again there are several sub-processes involved: eliminating stop words within descriptions, processing query, and formulating queries etc. 3.4.1 WordNet WordNet [18] is a semantic network database developed by Princeton under the direction of George A. Miller. The basic building block in WordNet is the synset. A synset is a set of synonyms denoting the same concept, paired with a description of the synset. The synsets are interconnected with different relational links, such as hypernymy (is-a-kind-of), meronymy (is-a-part-of), antonymy (is-an-oppositeof) and others. We exploit WordNet to disambiguate word sense in user preferences. 3.4.2 Query Processing and Word Sense Unambiguous This is because keywords in the user’s query could be ambiguous by containing more than one word senses. Hence, word sense disambiguation is necessarily. The system expands those keywords to other relevant concepts implicitly e.g., finding hypernymy (is-a-kind-of) concept and other synonyms from WordNet. The algorithm to disambiguate word sense of user keyword is shown in Figure 6. 1. 2. 3.
4.
Get set of keywords {Q} by remove stop words and stemming; Get keywords {U} from user’s preferences; For each keyword pairs {Qi,Ui} a. Look up all word senses in WordNet; b. Compute similarity between {Qi,Ui}; c. Select the highest similarity value of word sense. Perform semantic search;
Fig. 6 Disambiguate word-sense algorithm base-on user’s preferences
The USHER System to Generate Semantic Personalised Maps for Travellers
61
In summary, the user query is processed in order to extract keywords by removing stop words and stemming. Stop words include: a, an, the, in, of, on, are, be, if, into, which etc. These words do not provide a significant meaning to the documents or images in this research. Therefore, they should be removed to reduce ‘noise’ and to reduce the computation time. Stemming attempts to reduce a word to its stem or root form. Thus, the key terms of a query is represented by stems rather than by the original words. In our framework, Porter Stemming3 algorithm is applied. The remaining keywords from user’s query are called a set of query keywords {Q}. Likewise, a set of keywords {U} is created from user preferences. All pairs of U and Q are used to look up all word senses in WordNet and, then, computer the similarity between them. The highest similarity value of word sense is selected and perform semantic search later. 3.4.3 Semantic Search After disambiguating word sense, the system will automatically formulate queries to be represented as SPARQL queries4. The SPARQL query performs a semantic search on the RDF file and returns results to a user. The SPARQL query language is a W3C recommendation for querying data from RDF documents which form part of the the KB. The SPARQL returns a list of instance tuples that satisfies the query. In order to perform semantic search, the similarity between user’s query, concepts in the domain ontology, and concepts in user’s preferences are needed to be measured. There are two types of measurements, cosine similarity and personal relevance, are deployed in this framework. 3.4.3.1 Similarity Measures To ensure that the results are relevant to the query, a statistical computation, in the form of a cosine similarity measurement, is performed. Equation (2) defines the cosine similarity formula. The similarity between the query (q) and concepts (p) in the map content KB is measured using the following inner product: sim( p, q) =
p⋅q p q
(2)
The obtained results from cosine similarity measure are further filtered according to the user profile. Personal relevance measurement has been proposed in [19]. We adopt this formula to calculate similarity between user preference (u) and concepts (p) in the map content KB. The personal relevance measure is defined as shown in Equation (3): prm(u , p ) =
3 4
u⋅ p u p
Porter Stemming, See http://www.ling.gu.se/~lager/mogul/porter-stemmer/index.html SPARQL query, See http://www.w3.org/TR/rdf-sparql-query
(3)
62
Z. Liang, K. Kesorn, and S. Poslad
3.4.3.2 Similarity Aggregation To calculate the similarity between user preference, query and visual content, integrating between cosine similarity and personal relevance measure so-called combSum model [19] is needed. The combSum model merges the two rankings by a linear combination of the relevance scores.
score(d , q, u ) = λ ⋅ prm(u, p) + (1 − λ ) sim( p, q)
(4)
where λ ∈ [0,1] . The choice of the λ coefficient in the linear combination above is critical and provides a way to gauge the degree of personalization, from λ = 0 producing no personalization at all, to λ = 1, where the query (current user interests) is ignored and results are ranked only on the basis of global user interests. The searching results are presented to user in descending order according to the value of score. More detail about combSum model can be found in [19].
3.5 Traveller Map Markup To model users’ point of interests, a semantic ontology model has the advantage of building up a potentially detailed relation between the different types of user markup to allow better management and more precise searching. The construction of user markup point is showed in Fig.7. It contains the properties of hasContent which is used to store the content of the point, hasCreatedDate stores the date the point created, hasLocationX and hasLocationY keep the longitude and latitude of the point location, hasModifiedDate stores the date the point is being modified, hasName represents the name of the point, hasOwner stores the owner of the point, hasType stores the group of the point and isLocked to indicate weather this point is locked by the owner. More detailed about the markup point grouping and their accessing control will be discussed in section 3.6. The following example illustrates how user markup data can be shared among users. User A can search the markup information that User B has created and shared in specific groups providing they both joined these tow groups. Searches are based on RDF instances represented using SPARQL [10]. User A can limit its search to only User B’s shared markup, because markup can be filtered by owner. The results can also be filtered based on additional constraints, e.g., to filter out the instances that are not within the two groups.
Fig. 7 The ontology model of the markup point
The USHER System to Generate Semantic Personalised Maps for Travellers
63
Fig 8 illustrates the structure of the RDF file of the markup point called PointOfInterests in-stance. Using a semantic mediation model, the ontology can be converted to different formats to better support different specific applications. There are two approaches to store the user points of interest data. One way is to store them in RDF file format which can be used directly by the system. A second way is to convert the RDF instances data in order to support storage into a relational database data such as MySQL. Storing all the data in a relational database can provide extra data storage management and access control, but it does require extra processing to convert RDF instances to database data format and vice versa. A hybrid approach can also be used to enable the database to store user sensitive data while an RDF file can be used to store point of interest data.
Fig. 8 RDF format of an ontology instance of markup point (PointOfInterests)
3.6 Map Markup Sharing 3.6.1 Restricting Access to User Markup Information How requesters access an owner’s context is defined by an owner’s access control matrix showed in Table 1. The rows in this table specific the access levels for different groups. Access level R defined as read only permission, while access level R+W defined as both read and write permission. The columns in this table describe how the requesters are grouped. There are three main types of grouping: Anonymous Groups, Public Groups and Private Groups. Anonymous Groups represent the markup data are shared anonymously; owners do not have restricted access control on them for read only on access, but they can specify the access constrains for the second access level. The Private Groups represent the groups that are created by the owner. The access controls of the private groups are fully determined by the owner. Each owner will have their own private group access control matrix. It is
64
Z. Liang, K. Kesorn, and S. Poslad
Table 1 Access control matrix Grouping Anonymous Groups Access
AG1 AG2
Public Groups PG1
PG2
All
GID010R
UID UID 0002 0003
GID010W
Private Groups …
Family
Friend
GID013R
UID0010, UID0012
UID0032, UID0044
GID013W
UID0050, UID0053
UID0061, UID0068
…
level R R+W
All
stored online, in the network, rather than in the mobile client for facilitate robustness and efficiency. Owners only need to set up a private group’s access constraints once and upload them into the system server side. The authentication of the access control does not need to involve the owner every time a requester makes a request for individual owner’s markup data. The public groups are the groups an owner joined or created for other people to join. The read access controls of public groups are not determined directly by the owners of the groups, every group member will have at least read access level for a group. However, the read and write access level to the public groups can be decided by owners. This proposed solution is based on access control mechanisms to specify and interpret preferences about who can access what information, at which level. Requesters are separated into groups based on the public groups (organisations) they join, or based on the information owner’s private group settings such as family, friend, colleague and others (see Table 1). The user assigns each requester, an access level depending on its group membership. The combination of group and access level along with the requester IDs and group IDs form a grid-based access control table of requester group versus access level. When making a request, a requester specifies their ID, the pseudonym of the owner (or holder) the owner’s ID. A credential-based mechanism is used to bind the service identifier to a specific type of credential. The token plays a role similar to the traditional x509 certificate but it is more general – it can bind any credential type to a service identity. 3.6.2 Access Control Evaluation Access control evaluation is done by evaluating requests and credential tokens against the Information Owner’s preference policies [21], using the algorithm given below. The access controls for requesters are defined by the information owner. When a request is made for access to the user shared markup data via the system middleware (the broker component), the broker requests the access control matrix of the requested information owner ID. The Broker then evaluates the request based upon the access control matrix, the requestor’s credentials and the description of the user markup groups. There are three possible outcomes of the evaluation: reject
The USHER System to Generate Semantic Personalised Maps for Travellers
65
requests that are not permitted by the requester if the requester does not have a valid token; reveal the requested data when the requester exists in the owner’s matrix for accessing the data in terms of a valid token, access level, entity group (entity id); or notify the requester that there is no shared information from the specific owner that available to them;. The algorithm for access control evaluation consists of four steps and an example of accessing an individual owner’s markup information requested by a requester is given in pseudo-code below. 1. Validate the requester’s identity with the provided ID again the provided token. 2. Collect list of privileged group IDs available for the requester 3. Collect markup data based on the list of privileged group IDs: For each group in owner’s Private Groups sector { Check if the requester ID exists in the group access constraints list If it exists collect the shared markup data in that group; } For each group in Public Groups sector that the owner joined { Check if the requester ID exists in the public group access constraints list If it exists collect the markup data shared by the owner in that group; } For each group in Anonymous Sector { Collect all markup data in this sector; } 4. Return the collected markup data if there is any or return the empty list if there is no markup data available to the requester from the owner.
4 Travellers Personalised Spatial Map Service Travellers personalised spatial map service can be constructed based upon the semantic user modelling of travellers and then filtering and generating individualised maps to meet their needs. User markup information as part of the traveller model can be created and shared amongst groups of users controlled by a Control Access Matrix type mechanism (see section 3.6.1). The SAMS system of USHER used in this demonstrator can capture part of the traveller’s context indirectly through user events, user annotation / markup, user queries and environment event and directly from user input to construct the traveller model. Indirect user context input includes detecting user location by gathering GPS data of the user, obtaining user movement by measuring three axes acceleration data and retrieving user preferences/ interests by analysing user queries and markup information. User direct input includes setting a destination, travel goal and preferences/ interests. Traveller’s semantic model has three predefined user stereotypes instances which are
66
Z. Liang, K. Kesorn, and S. Poslad
Business Man, Tourist and Regular. Different user stereotypes instances are initialised with different default setting and can be changed based on the direct and indirect user input during their usage of the system. To construct the map service to meet the user’s task/goal, the system needs to load the ontology instance of the traveller model. The system will then extract the map raw data and convert it into a map based on the filters generated from the traveller model instances. Other associated map data such as the shared markup points that are available and meet user interests from different users can be displayed as additional layers of the map content.
Fig. 9 Traveller A’s Tourist map in walking mode
An example of Traveller A’s Tourist map in pedestrian mode is shown in Fig 9. This is based on the destination, e.g., Queen Mary University of London, (QMUL), the user goal and task, e.g., visiting the campus and seeing something, set by traveller A, and the record history of the places the traveller has been to. The system decides the traveller’s status and generates the map accordingly. Most of the map content about the campus areas will be displayed to the user. The map’s presentation is based on the display preferences e.g. using different colours for different GIS objects and different symbols for different types of user mark-up information, etc. Another example of Traveller B’s Business Man map in pedestrian mode is showed in Fig 10. Traveller B has also set the destination to QMUL but with a different propose of having a meeting so the system set his stereotype as Business Man in walking mode as the system detects he is walking. The system generates the map bases on Traveller B’s user model which contains the basic content of the area and some useful/ important information relates to the meeting such as the meeting place.
The USHER System to Generate Semantic Personalised Maps for Travellers
67
Fig. 10 Traveller B’s Business Man map in walking mode
After having the meeting, Traveller B needs to see someone. The system changes his map mode and the displaying map content accordingly as it detects the change of the user goal and the user is driving. The route in purple colour (user preferences for map presentation) between the meeting place (EE) and the place (W J Meade) to see someone is showed in Fig 11. Some of the markup information that shared by others which is relevant to the user goal is displayed, such as the the New Global (Pub) near his meeting place. In the driving mode, the map focuses on the road information, and associated spatial objects and filters out other irrelevant map content.
Fig. 11 Traveller B’s Business Man map in driving mode
68
Z. Liang, K. Kesorn, and S. Poslad
5 Discussion Key issues for the presented ontology-based personalised SAMS include the construction of a semantic traveller model that can be used to personalise the map. The traveller model construction involves modelling the traveller, acquiring the traveller context including the travel mode, the use of the travel activity, travel goal, destination, user markup and user preferences (interests). The indirect user context acquisition is one of the main challenges for traveller model creation including methods to acquire user preferences and the travel mode that the traveller is in. Integrating statistical computation into a personalisation model enables the use of more user-centred terminology in user models. However, the fact that the statistical technique relies solely on numeric data can result in a failure to understanding the meaning of users’ interests [19]. Use of only a statistical model, however, fails to capture the context in the shared markup information, which is user’s interest. This feature is not supported by usage mining techniques, but a semantic model (ontology). In this framework, the context of traveller preferences can be captured by NLP technique from shared markup points and then, restructures that information to form traveller preferences in a hierarchical structure in order to keep relationships between concepts found in the markup description. Consequently, the ontology is able to share the concept-based representation proposed for retrieval, and the expressiveness of ontologies to define user interests on the basis of the same concept space used to describe the map data. The rich concept descriptions of traveller interests and their relations provide useful information in order to easily retrieved markup information using a semantic query because a structured query (SPARQL) can express more precise information, leading to more accurate answers. In this framework, the personalised semantic search is achieved by exploiting an external knowledgebase (WordNet), a domain-specific ontology, and traveller model. This can be seen as a form of query expansion leading to a more effective search mechanism. Semantic searches are able to find the relevant markup points when querying class instance even if keyword(s) in the query are not presented in the descriptions of markup points or as concepts in traveller model. For example, Traveller A might want to find information about Chinese restaurant in a certain area e.g., Mile End. This is because the ontology contains semantic relationships with subconcepts of restaurant and place (see Fig. 4). Therefore, the proposed system is thus able to recognize the restaurant information annotated with a restaurant style which belongs to the ‘Chinese’ concept even if the ‘Chinese’ word does not appear in the markup description whereas the tradition type of user model cannot. This means that the personalised search obtains better precision and recall than previous user models. Learning dynamic user preferences (interests) from only the most recent observation leads to a traveller model that can adjust more rapidly to a traveller’s changing interest. This makes a traveller model more dynamic than previous frameworks e.g., [23] [24] [26]. The method used to detect the travel mode of the traveller is based on the previous experiment and the results are not always accurate as there some similarity
The USHER System to Generate Semantic Personalised Maps for Travellers
69
in the movement pattern between some travel mode. For example, walking and jogging might have some similarity movement pattern, when the speeds between these two are not very distinctive, the outcome from the three axes accelerator can be similar and it will be difficult to separate them. Another feature of this system is it provides the ability for travellers to create their own markup based on the visited places so that they will be shown on the map for their own convenience. More importantly, this markup information can be shared amongst their users. These requirements create challenges for how to design the markup point structure in terms of scalability and usability. The system needs to be able to cope with a certain amount of information and store this in an organised way to facilitate precise searching. Anther key issue about managing this markup points is the restricted access control as travellers may only want to share their markup information within certain groups or certain users or they just want to share with any other travellers. To address this issue, an access control matrix is created for each traveller such that they can decide how their markup information can be shared. By doing it this way, the traveller markup information can be safely shared amongst identified travellers.
6 Conclusion Existing more advanced spatial aware map services can automatically adapt spatial content to users’ preferences and to the terminal display characteristics. A semantic extension to such a personalisation model has been proposed, to enable the model to adapt to users’ tasks, to support sharing of information and to support more finely grained searches using the relations among the instances of the ontology models. Users can create their own personalised markup in the field and can share this information with others. The semantic markup can also be stored in a relational database to support added access control and to improve data storage management.
References [1] Cheverst, K., Davies, N., Mitchell, K., et al.: Developing a context-aware electronic tourist guide: some issues and experiences. In: Proc. SIGCHI conference on Human factors in computing systems, pp. 17–24 (2000) [2] Freebase, Open, Shared Database of the World’s Knowledge developed by Metaweb, http://www.freebase.com/view/guid/ 9202a8c04000641f80000000010c2d43 (accessed in May 2008) [3] Göker, A., Myrhaug, H.I.: User Context and Personalisation. In: European Conference on Case-Based Reasoning (ECCBR), pp. 1–7 (2002) [4] Kofod-Petersen, A., Aamodt, A.: Case-based situation assessment in a mobile context-aware system. In: Proc. Artificial intelligence in Mobile Systems 2003 (AIMS), pp. 41–49 (2003) [5] OpenStreetMap, Map Features (2008), http://wiki.openstreetmap.org/index.php/Map_Features (accessed in April 2008)
70
Z. Liang, K. Kesorn, and S. Poslad
[6] Pignotti, E., Edwards, P., Grimnes, G.A.: Context-Aware Personalised Service Delivery. In: European Conference on Artificial Intelligence, ECAI 2004, pp. 1077–1078 (2004) [7] Poslad, S., Laamanen, H.R., Malaka, A., et al.: CRUMPET: Creation of User-friendly Mobile services PErsonalised for Tourism. In: Proc. 3G 2001 Mobile Communication Technologies, London, pp. 28–32 (2001) [8] Poslad, S.: Ubiquitous Computing: Smart Devices, Environments and Interaction. Wiley, London (2009) [9] Titkov, L., Poslad, S., Tan, J.J.: An Integrated Approach to User-Centered Privacy for Mobile Information Services. Applied Artificial Intelligence 20, 159–178 (2006) [10] W3C, SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/ (accessed in May 2008) [11] Vallet, D., Castells, P., Fernandez, M., et al.: Personalized Content Retrieval in Context Using Ontological Knowledge. IEEE Transactions on Circuits and Systems for Video Technology 17, 336–346 (2007) [12] Maes, P.: Agents that reduce work and information overload. Communications of the ACM 37, 30–40 (1994) [13] Widyantoro, D.H., Ioerger, T.R., Yen, J.: Learning User Interest Dynamics with a Three-Descriptor Representation. Journal of the American Society for Information Science and Technology 52, 212–225 (2001) [14] Gauch, S., Chaffee, J., Pretschner, A.: Ontology-based personalized search and browsing. Web Intelligent and Agent Systems 1, 219–234 (2003) [15] Liu, F., Yu, C., Meng, W.: Personalized Web Search For Improving Retrieval Effectiveness. IEEE Transaction on Knowledge and Data Engineering 16, 28–40 (2004) [16] Zhang, Y., Zhang, X., Xu, C., et al.: Personalized retrieval of sports video. In: Proc. of the International Workshop on Multimedia Information Retrieval, pp. 313–322 (2007) [17] Xu, C., Wang, J., Lu, H., et al.: A Novel Framework for Semantic Annotation and Personalized Retrieval of Sports Video. IEEE Transactions on Multimedia 10, 421– 436 (2008) [18] Miller, G.A.: WordNet: a lexical database for English. Communications of the ACM 38, 39–41 (1995) [19] Castells, P., Fernández, M., Vallet, D., et al.: Self-tuning Personalized Information Retrieval in an Ontology-Based Frame-work. In: OTM Workshops on the Move to Meaningful Internet Systems, pp. 977–986 (2005) [20] Kobsa, L.: Personalised Hypermedia and International Privacy. Communications of the ACM 45(5), 64–67 [21] Titkov, L., Poslad, S., Tan, J.J.: Enforcing Privacy via Brokering within Nomadic Environment. In: Proc. of the 4th International Symposium from Agent Theory to Agent Implementation (2004) [22] Castells, P., Fernandez, M., Vallet, D.: An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval. IEEE Transactions on Knowledge and Data Engineering 19, 261–272 (2007) [23] Mylonas, P., Vallet, D., Castells, P., et al.: Personalized Information Retrieval Based on Context and Ontological Knowledge. The Knowledge Engineering Review 23, 73–100 (2008) [24] Daoud, M., Tamine, L., Boughanem, M., et al.: Learning Implicit User Interests Using Ontology and Search History for Personalization. In: Proc. of Web Information Systems Engineering – WISE 2007, pp. 325–336 (2007)
The USHER System to Generate Semantic Personalised Maps for Travellers
71
[25] Chen, S., Williams, M.: Learning Personalized Ontologies from Text: A Review on an Inherently Transdisciplinary Area. In: Chen, N. (ed.) Personalized Information Retrieval and Access: Concepts, Methods and Practices, New York (2008) [26] Gondra, I.: Personalized Content-Based Image Retrieval. In: Chen, N. (ed.) Personalized Information Retrieval and Access: Concepts, Methods and Practices, New York (2008) [27] Kobayashi, A., Iwamoto, T., Nishiyama, S.: UME: Method for Estimating User Movement Using an Acceleration Sensor. In: Proc. of International Symposium on Applications and the Internet- SAINT 2008, pp. 169–172 (2008)
Semantic Based Error Avoidance and Correction for Video Streaming Christian Spielvogel, Sabina Serbu, Pascal Felber, and Peter Kropf
Abstract. Video streaming over best effort networks remains a challenging task. Video quality decreases with an increasing number of frames that are corrupted, lost or only received after playback time. We use semantic information about the video and the network to decide between alternative or cooperative streaming sources to avoid or to correct data loss. We propose a distributed architecture that combines a peer-to-peer indexing archive for videos with error avoidance and error correction mechanisms to select the best delivery method from the corresponding sources. Our indexing-cache peer-to-peer overlay has two interesting properties for our selection model: it efficiently locates several sources for a video (if they exist) and even rare videos. Based on the coding characteristics of the available videos and the state of the network we apply a model for selecting between error avoidance, error correction and a combination of both approaches. This model is evaluated by using the network simulator NS-2 and a modified version of EvalVid.
1 Introduction Delivering videos in the desired quality over best effort networks remains an important challenge. If a video is streamed from an arbitrary server to an arbitrary client, the perceived quality typically varies in an unpredictable way. The commonly used methods for handling packet loss are Forward Error Correction (FEC) and Automatic Repeat Request (ARQ). The problem about Forward Error Correction and Automatic Repeat Request is that under certain circumstances the first one produces additional packet loss and the second one causes a too large delay for real time data. Forward Error Correction leads to additional loss when the redundant packets are transmitted over the same crowded path as the original ones. Christian Spielvogel, Sabina Serbu, Pascal Felber, and Peter Kropf University of Neuchˆatel, Switzerland e-mail: {firstname.lastname}@unine.ch M. Wallace et al. (eds.): Semantics in Adaptive and Personalized Services, SCI 279, pp. 73–92. c Springer-Verlag Berlin Heidelberg 2010 springerlink.com
74
C. Spielvogel et al.
The reason for too high delay of ARQ is that in case of continuous loss the same data needs to be retransmitted multiple times before it arrives successfully. This chapter presents a model for error avoidance in combination with error correction in peer-to-peer video source networks. By using the model a tradeoff between error avoidance and error correction can be found, so that the probability of additional packet loss or too late data arrival can at least be minimized or fully avoided at last. Error avoidance is achieved by using semantic information about the content and the network conditions in order to adapt the media quality by using Multiple Description Coding. The semantic information about the stream is composed of the number, size and type of frames within a GOP. The semantic information about the network is composed of the estimated loss rate of the network path between the sender and the receiver. Multiple Description Coding enables a scalable solution by allowing adaptation without transcoding. A detailed overview about Multiple Description Coding can be found in section 3. In case error avoidance is not sufficient to deliver the data without packet loss, error correction is used additionally. The error avoidance and correction model has been evaluated in an overlay peerto-peer network. Video sources are located based on indexing-caches that contain information about the videos in the network. We have evaluated our model using the NS-2 [10] network simulator and an extended version of the EvalVid plug-in [5]. We have extended EvalVid to support the evaluation of multiple sub-streams (descriptions) that are delivered within NS2. The rest of the chapter is structured as follows: Section 2 presents related work, Section 3 gives and overview about Layered Coding and Multiple Description Coding, Section 4 describes the peer-to-peer overlay and the efficient way we use to locate videos. Section 5 introduces the error model, followed by Sections 6 and 1.7 presenting the evaluation of the Multiple Description Coding approach, the stream location mechanism and the distributed streaming between the peers. Finally Section 8 summarizes the chapter.
2 Related Work Approaches relying on error avoidance and error correction are not new. Forward error correction is based on the principle of reconstructing data that has been lost during network transmission. A good overview about forward error correction can be found in [1], [2] and [7]. The problem of all these Forward Error Correction mechanisms is that they do not take into consideration the path of the redundant network packets. Error avoidance is based on the principle of reducing the packet loss probability by sending parts of the data either from different sources or over parallel paths [9]. In [3] it is shown that Multiple Description Coding in combination with Multiple Source Streaming is able to deliver video streams in much better quality than the classical server client approach. The main problem of these approaches is that they either consider network or stream characteristics, but none of them considers both.
Semantic Based Error Avoidance and Correction for Video Streaming
75
3 Introduction to Layered Coding Layered coding is an approach for producing a compressed media stream that consists of multiple dependent or independent layers. The technique for producing dependent layers is called Scalable Coding, the one for producing independent layers is called Multiple Description Coding. The advantage of Scalable Coding is high compression efficiency, while the advantage of Multiple Description Coding lies in high robustness against data loss. Since our model for error avoidance and error correction in Peer-to-Peer networks is based on Multiple Description Coding, we give an overview about this technique in section 3.1.
3.1 Overview of Multiple Description Coding Multiple Description Coding (MDC) is used to produce multiple independent media streams of the same content. The streams are called descriptions and have roughly the same storage size and influence on the resolution, frame rate or quality. Each of the descriptions can be used independently or in combination with other descriptions. Single descriptions are used to produce the base quality, by combinig multiple descriptions it is possible to improve the resolution, frame rate or quality of the overall bit stream. The highest resolution, frame rate or quality is achieved when all descriptions are used in combination. The advantage of Multiple Description Coding is the possibility of adapting the media characteristics without transcoding. The adaptation decisions can be influenced by the server capacity, the state of the network or the resources of the playback device. Application scenarios for multiple description coding are manifold. Multiple Description Coding in the temporal domain can be applied to support heterogeneous devices with different frame rates. Devices with sufficient resources get the full frame rate (e.g., 30 frames per second), devices with limited resources, like mobile devices, receive a limited number of layers resulting in a lower frame rate (e.g., 15 frames per second). An application scenario for Multiple Description Coding in the spatial domain is the support of devices with different resolutions. For example an HDTV-set with a resolution of 1650x1080 pixels would need all layers to render the video in high quality without using interpolation – for a smartphone it would be sufficient to receive only the base layer with a resolution of 320x480 pixels that can be displayed without discarding pixels. A scenario for Multiple Description Coding in the quality domain is graceful degradation. Graceful degradation is the process of selecting a couple of enhancement layers that are not transmitted in case of insufficient network bandwidth. By dropping descriptions it is possible to adapt the required bandwidth of the stream to the available bandwidth of the network and avoid random loss. In the following sections we are going to explain Multiple Description Coding in the temporal, spatial and quality domain in more detail.
76
C. Spielvogel et al.
3.2 Temporal Scalability Temporal scaling is used to encode a video sequence into multiple descriptions, each having a subset of frames with the same spatial resolution. The lowest frame rate is achieved by decoding any of the descriptions, by adding remaining descriptions the frame rate is increased until the full rate is achieved. A block diagram that shows a simple example of producing two independent descriptions for one stream can be found in Figure 1. The two descriptions can be created very simply by splitting the frames between the descriptions transforming them using the discrete cosine transform, quantizing them and applying variable length coding.
Fig. 1 Block diagram for MDC in the temporal domain
3.3 Spatial Scalability Spatial scaling is used to encode a video sequence into multiple descriptions having the same frame rate but each of them contributing a part to the full spatial resolution. When only one description is decoded the spatial resolution of the resulting video is minimal. Decoding additional descriptions increases the spatial resolution towards the full size of the raw video. A block diagram for the encoder can be found in Figure 2. As an example, two descriptions can be created in 6 steps as follows: 1. The raw video is spatially down-sampled, transformed using DCT and quantized to get the input for the second description. 2. To produce the 2nd description each frame is reconstructed using inverse quantization and the inverse discrete cosine transform. 3. Each frame is spatially up-sampled to the original size using interpolation. 4. For the 2nd description each frame is up-sampled and subtracted from the original image. This difference is known as the residual. 5. The residual is transformed using the discrete cosine transformation and quantized. 6. The coefficients from both descriptions are encoded using variable length coding.
Semantic Based Error Avoidance and Correction for Video Streaming
77
Fig. 2 Block diagram spatial scalable encoder
An evaluation of our Multiple Description Implementation and an the argumentation why MDC in the temporal domain is preferred over MDC in the spatial domain can be found in Section 7.
4 Peer-to-Peer Overlay 4.1 The indexing-Cache Overlay We introduce the distributed indexing architecture that is used by our selection model for network-error treatment. We consider a peer-to-peer (P2P) system composed by peers (computers) sharing video files. Each peer has a partial view of the file system: it can communicate directly with only a small set of peers, called neighbours. The whole set of peers forms an overlay network. In our scenario, the user application provides each peer with a set of videos, which can be delivered on request to the other peers. This means that, in order to find a certain video, a peer has to issue a search request in the peer-to-peer system, which is then responsible to efficiently find the peer(s) that store and provide that video. A single location is enough when the network allows the transmission of the video in the desired quality. However, when multiple peers that have the video are detected, they can participate in the process of selecting the network-error treatment as alternative or cooperative streaming sources. Intuitively, there are some videos that will be requested much more often than other ones. Typically, the less popular videos will be available from only few peers, while the more popular ones will be provided by many peers. However, the success
78
C. Spielvogel et al.
rate of finding a video should not be influenced by its popularity. Thus, mechanisms have to be provided in order to also efficiently locate unpopular videos. In order to assure a high success rate when searching for both unpopular and popular videos while keeping the overlay maintenance and network costs low, at each peer we use a simple and dynamic structure called indexing-cache. This structure implemented at each peer contains information about videos that the overlay can deliver. This way, a peer is not only aware of the videos that it can deliver itself, but also of other videos and of the peers that can deliver them. The searching time for a video can thus be considerably reduced. In the overlay, each peer has a neighborhood (i.e., a set of peers that are known to it), from which it periodically collects up-to-date information about videos. This information is then placed or refreshed in its indexing-cache as a pair containing the video and the peer that can deliver it. In order to be able to contact other peers, the neighborhood is also periodically updated. The number of peers in a neighborhood is limited, so a peer has to replace an existing neighbor with a new one. The information from the existing neighbor is still kept in the indexing-cache, however, since it is no longer refreshed (this neighbor is not anymore part of the neighborhood), it has now an increasing age associated to it. The information from the new neighbor is added as up-to-date information. Whenever the limit of the indexing-cache is reached, the information with the highest age is removed. An example with 7 peers and their videos can be found in Figure 3. For each peer, we show the videos that they own: peer A owns video v1, peer B owns videos v2 and v3, and so on. For peer A, we highlight its indexing-cache, which contains a list of the videos located on its neighbors: v4 on peer D, v2 on peer F. Before this configuration, peer C and then peer G used to be neighbors of A. This is the reason why peer A has in its indexing-cache also the list of videos that peers C and G own, and with ages associated to them. When A will find a new neighbor to replace one of
C
v4
v2,v3
B G
v4 v1
v4 v2 v6 v7 v4
v6, v7
D
A D F G age 1 G age 1 C age 2
E F
v2
Fig. 3 Indexing-cache overlay Architecture
v5
Semantic Based Error Avoidance and Correction for Video Streaming
79
its current neighbors, at least the entry of video v4 will be discarded (since it has the biggest age). The vertical arrow on the left of the indexing-cache of peer A shows the direction of insertion of new index entries.
4.2 Indexing-Cache Maintenance Algorithm 1 presents the pseudo-code for finding a new neighbor. In order to avoid network partitions, each node keeps track of its number of incoming links (computed from the requests with different sources that it receives) and the new neighbor is chosen as the peer from the random walk that has the smallest number of incoming links. This strategy provides strong connectivity between the peers from the system, with a non-biased in-degree (which also provides load balancing).
Algorithm 1. Pseudo-code for the neighborhood update algorithm at peer pi 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:
Find new neighbor: if incLinks(pi ) < RW.incLinksMin then RW.incLinksMin ← incLinks(pi ) RW.newNeighbor ← pi end if Add pi to RW.nodes. RW.length++ if RW.length < RW.maxLength then / RW.nodes Choose rn ∈ neigh(pi ) and rn ∈ Forward request to rn else Reply with RW.newNeighbor end if
The update neighborhood process works as follows. A peer issues a random walk to find a new neighbor. Peer pi is any peer the random walk goes through. The random walk message keeps track of the node (RW.newNeighbor) with the smallest number of incoming links (RW.incLinksMin). If pi has a smaller number of incoming links, these values will be updated (lines 1-5). Node pi is added to the list of peers that the random walk went through (line 6) and the length of the random walk is increased (line 7). Then, the request is forwarded to a randomly chosen neighbor rn, excluding the peers that the random walk had already gone through (lines 812). The last node in the random walk is in charge of sending a reply containing the new neighbor (line 13). The new neighbor is thus the peer with the smallest number of incoming links from the whole random walk. In order to accommodate the new neighbor in the local view, an existing neighbor has to be removed, which, for efficiency, is the node that was used to send the random walk.
80
C. Spielvogel et al.
4.3 Searching Given that each peer has an accurate knowledge of the videos stored in its neighborhood and (possibly outdated) information about videos from other peers, we use random walks of finite length for the search procedure. This method is expected to perform in practice as well as a TTL-limited flooding, but with much less traffic generated. To search for a video, a peer sends a video request to a random neighbor that checks its indexing-cache for the video, and if not found, it will repeat the process by sending the video request further to a random neighbor. The periodic neighborhood updates give the peers a high diversity in their indexing caches, which makes the random walk a simple strategy that is expected to find the requested video in a short number of hops. In order for the search result to contain multiple peers that have the requested video, the indexing-cache can contain several locations for a video; also, the random walk can finish only when a certain number of locations have been found without exceeding the random walk maximum length. If needed, several random walks can be issued. More than two peers having the same video are necessary to enable multiple source streaming (in case of error avoidance), as well as delivering correction streams from alternative sources (in case of error correction).
4.4 Churn and Video Updates The overlay deals easily with peer failure. When a peer from the indexing-cache fails, its corresponding entries are discarded. When a neighbor fails to respond, it is simply replaced with another peer from the overlay. For joining the overlay, a new peer simply issues a random walk of a fixed size, and then it picks from the path the peers with the smallest number of incoming links, with the purpose of reducing the risk of network partitioning. After joining, to ensure an initial degree of reachability, the peer can advertise its videos through a number of random walks. Then, its videos will gradually become more reachable through the periodic indexing-cache updates of the other peers in the system. The video information of any peer can change over time, if the peer obtains more videos from application level or if some of the videos are deleted. In such case, for a quick propagation of the information, the peer should notify the peers that have it as a neighbor in order for them to update their indexing-caches.
5 The Model for Error Avoidance and Error Correction in Peer-to-Peer Networks When using the indexing approach described in Section 4, the network bandwidth between the sender and the receiver is not taken into consideration. In order to deliver the content in the desired quality, it might be necessary to apply (1) error correction, (2) error avoidance or (3) a combination of both approaches. In this section
Semantic Based Error Avoidance and Correction for Video Streaming
81
we present a model to select between these three alternatives based on current network and content characteristics. The model combines two measures called Quality Probability and Network Probability. The combination of Quality Probability and Network Probability is called Success Probability. SuccessProbability = M
QualityProbability ∗ ∏ NetworkProbabilityi
(1)
i=1
where M is the number of streaming peers and NetworkProbabilityi represents the probability of successfully sending packets between peer i and the receiver. QualityProbability represents the probability that network errors are not propagated within the video stream. The combination of both (i.e., the success probability) can take values between 0 and 1.
5.1 Network Probability Network probability is used to select between alternative peers based on the available bandwidth to the receiver and the bit rate of the video stream. We calculate the available bandwidth between the sender and the receiver based on the ”TCPfriendliness” formula obtained from [6]: AvailableBandwidth = tRT T
2p 3
s 2 + tRTO (3 3p 8 )p(1 + 32p )
(2)
where s is the packet size, tRT T is the round-trip time, p is the packet loss probability and tRTO is the TCP retransmit timeout value. Network Probability is calculated as the ratio between the required bit rate and the available bandwidth: NetworkProbability = min(1,
AvailableBandwidth ) RequiredBandwidth
where AvailableBandwidth is the TCP-friendly available bandwidth (Equation 2) between the sender and the receiver and RequiredBandwidth is the bit rate of the (partial) video stream. Network Probability can take values between 0 and 1. In case the available bandwidth from the sender to the receiver is sufficient to deliver the content without loss, Network Probability has the value of 1.
5.2 Quality Probability Quality probability expresses the probability that all video frames that arrive at the receiver can be decoded successfully. This probability depends on (1) the number
82
C. Spielvogel et al.
of lost packets and (2) the type of frames affected by the packet loss. MPEG coded video streams [8] consist of three main frame types, I, P and B [4]. I-frames (Intracoded frames) have the advantage of being self-contained and allowing random access. They have the disadvantage that the compression rate is usually much lower compared to P- or B-frames. P-frames (predictive-coded frame) have a better compression ratio than I-frames but encoding and decoding requires information from the previous I- or P-frames. The third type of frames are bidirectionally predictivecoded frames (B-frames). The advantage of B-frames is that they have the highest compression ratio compared to I- and P-frames but they additionally depend on one preceding and one succeeding frame in the Group of Pictures (GOP). B-frames are always preceded by an I- or P-frame and the succeeded by a P-frame. So quality probability is used to consider the structure of the stream additionally to the loss rate of the network (NetworkProbability). As another example, a stream encoded using only I-frames and losing many packets, usually results in a better quality in case that the same content is streamed with a lower bit rate but encoded using I-, P- and B-frames. Different packet losses have different effects on the media quality and thus error handling has to be adapted to the relative importance of the frames. The model requires knowing the number of network packets belonging to each video frame as well as the loss probability of the network path. The loss probability can be expressed as the ratio of transmitted and received packets: LossProbability =
PacketsReceived PacketsTransmitted
(3)
The number of received packets (PacketsReceived ) is determined by sending test packets from the sender to the receiver. The number of packets to be transmitted can be calculated by parsing the structure of the stream. The LossProbability takes values between 0 and 1: 0 means that all packets are received, while 1 means that all packets are lost. Knowing the LossProbability of the path and the number of packets belonging to a video frame, the arrival probability, which is the probability for successfully receiving one single frame, can be calculated using the statistical binomial distribution as follows: ap(T, F, p) =
T +F
∑
i=T
T +F ∗ pi ∗ (1 − p)T+F−i i
(4)
where T is the number of network packets, F is the number of forward error correction packets and p is the loss probability of the path (defined in Equation 3). Computing the arrival probability (ap) for one single frame is not sufficient for selecting between streams from alternative peers. Videos have playback times ranging from several seconds to hours and thus analyzing the complete structure would take too long. However the fact that video streams are organized in subsequent groups of pictures (GOPs) can be used to simplify calculations. In the test streams used in our experiments each GOP follows the same frame pattern, ”IBBP...”, providing sufficient information to make predictions about the complete video.
Semantic Based Error Avoidance and Correction for Video Streaming
83
In order to model packet loss for a group of pictures, I-, P- and B-frames must be analyzed separately as they have different sizes and dependencies: apI = ap(NI , FI , p) apP = ap(NP , FP , p) apB = ap(NB , FB , p)
(5)
where apI , apP , apB are the probabilities that I-, P- and B-frames are not lost. NI , NP , NB are the numbers of packets for each type of frame, FI , FP , FB are the numbers of forward error correction packets used and p is the LossProbability (defined in Equation 3). The probability (QualityProbability) for being able to successfully decode all frames belonging to the GOP is defined as: QualityProbability = apI ∗ apCPP ∗ apCBB where CP is the total number of P-frames and CB is the total number of B-frames. In order to be able to determine the required amount of forward error correction packets (FI , FP , FB ) for the I-, P- and B-frames, we compute the arrival probability for each frame separately. The necessity for doing so is explained by giving an example with two frames. Consider that an I-frame has an arrival probability of 50% and the depending P-frame an arrival probability of 100%. Taking into account the dependency between the I-frame and the P-frame, the P-frame can also only be used with a probability of 50%. Sending correction packets for the B-frame would be useless but by using the equations that are explained in the rest of this section, it can be seen that it is the I-frame that needs to be protected. The computation of the arrival probability (RI ) for the I-frame is simple because no dependencies need to be considered: RI = apI
(6)
The dependencies of P- and B-frames are considered in the rest of this section. When computing the arrival probability for P-frame i, the dependencies to the I- and all previous P-frames have to be considered: RI ∗ ap p if i = 1, RP(i) = (7) RP(i−1) ∗ ap p if i > 1 where P(i) is the ith P-frame in the GOP. In case of the first P-frame in the GOP (i = 1), only the probability of successfully decoding the I-frame and the P-frame itself is considered. In case that i > 1 also the dependencies to all previous P-frames are included. As B-frames depend on I- and P- frames, the probability of arrival of a B-frame at position j is calculated as: RB( j) = RP(k) ∗ apB
(8)
84
C. Spielvogel et al.
where B( j) is the jth B-frame in the GOP and P(k) is the immediate successor frame that is referenced.
6 Evaluation We first evaluate our Multiple Description Coding implementation (Section 7) and argue why we prefer MDC in the temporal domain over MDC in the spatial domain, then we show an evaluation of the efficiency of our video location mechanisms (i.e., the indexing-cache overlay) and finally in Section 7.1 and Section 7.2 we present two scenarios for streaming the content to the end-client.
7 Evaluation of Multiple Description Coding in the Temporal and Spatial Domain In this section we evaluate two characteristics of Multiple Description Coding, namely the additional storage/bandwidth requirement, resulting from lower redundancy within each of the descriptions, as well as the graceful quality degeneration capability. In the first experiment we have measured the storage/bandwidth overhead of Multiple Description Coding in the temporal domain (see Table 1) where the MDC streams are composed of two descriptions. The table shows the storage size of the conventional stream, the sizes of the two descriptions and the overhead of the two descriptions compared to the original stream. Table 1 Storage overhead for MDC in the temporal domain FileName Size original Stream Size Descr.1 Size Descr.2 Overhead Descr.1+Descr.2 bridge 1.9MB 998KB 998KB 5.05 % carphone 382KB 213KB 210KB 10.70 % clair 226KB 125KB 123KB 0.97 % coastguard 424KB 265KB 255KB 22.60 % container 208KB 117KB 109KB 8.65 % foreman 477KB 283KB 267KB 15.30 % grandma 511KB 276KB 276KB 8.02 % highway 1.5MB 802KB 783KB 5.60 % lotrings 2.4MB 1.3MB 1.3MB 8.30 % mother 168KB 93KB 87KB 7.14 % news 274KB 157KB 147KB 10.94 % salesman 352KB 197KB 198KB 12.21 % silent 284KB 156KB 148KB 7.04 %
Analyzing the results from Table 1 it can be seen that for the 13 test streams the average storage/bandwidth overhead is 8.88%. In the best case the overhead is only 0.97 %, in the worst case 22.6 %.
Semantic Based Error Avoidance and Correction for Video Streaming
85
Table 2 Mean Opinion Score values using MDC in the temporal domain FileName Original Stream (MOS) Description 2 (MOS) bridge 1.08 3.0 carphone 1.47 3.0 clair 2.57 4.0 coastguard 1.34 2.85 container 1.82 3.0 foreman 1.28 3.07 grandma 2.13 4.0 highway 1.13 3.86 lotrings 1.19 4.32 mother 1.83 4.0 news 1.98 4.34 salesman 1.98 4.79 silent 1.66 4.26
In the next experiment we skip Description 1 and compare the quality against the effect from randomly loosing the same amount of data from the single stream. Due to the graceful degradation the MDC based streaming achieves a much higher Mean Opinion Score (MOS). MOS is an objective measure for representing the satisfaction of an end-user receiving a video stream. With this metric the value for the best quality is 5 and for the worst quality is 1. In Table 2 it can be seen that receiving only Description 2 always yields a better result than losing the same percentage of data randomly from the single stream. Summarizing the experiment it can be said that the quality of the 13 test streams, that were encoded using MDC in the temporal domain was at least 23.6 % on average 41.58% and in the best case 62.6 % better than the quality of the corresponding original video streams that were transmitted under the same conditions. Evaluation of Multiple Description Coding in the spatial domain Similarly to multiple description coding in the temporal domain we have evaluated the additional storage space/bandwidth requirement for multiple description coding in the spatial domain. The evaluation results can be found in Table 3. When these results are compared to the results from MDC in the temporal domain (Table 1) it can be seen that the storage/bandwidth overhead resulting from using the temporal multiple description encoder is at least 15.7 %, on average 37.94 % and in the best case 72.53 % lower than using the spatial multiple description encoder. In the last experiment the loss of Description 1 is compared against randomly loosing the same amount of data from the original stream. Analyzing the results from Table 4 it can be seen that loosing 1 description still yields a better result than sending the original stream in full quality and randomly loosing the same amount of data. Summarizing the evaluation it can be said that quality of the 13 test streams
86
C. Spielvogel et al.
Table 3 Spatial MDC downsampling results FileName Size original Stream Size Descr.1 Size Descr.2 Overhead Descr.1+Descr.2 bridge 1.9MB 1.2MB 1.3MB 31.58 % carphone 382KB 292KB 311KB 57.58 % clair 226KB 216KB 230KB 95.13 % coastguard 424KB 291KB 298KB 41.51 % container 208KB 131KB 137KB 29.81 % foreman 477KB 402KB 437KB 75.68 % grandma 511KB 339KB 376KB 39.33 % highway 1.5MB 989KB 1100KB 39.27 % lotrings 2.4MB 697MB 713MB 16.67 % mother 168KB 143KB 151KB 76.79 % news 274KB 193KB 201KB 43.8 % salesman 352KB 223KB 235KB 30.11 % silent 284KB 210KB 216KB 50.7 % superman 14MB 9.3MB 11MB 21.43 % f1-canada 7.8MB 5.9MB 6.5MB 61.54 % davinci 5.8MB 3.7MB 4.3MB 37.93 %
Table 4 Spatial MDC downsampling results FileName Throughput Kbit/s Original Stream (MOS) Description 2 (MOS) bridge 431 1.08 3.0 carphone 372 1.47 2.26 clair 111 2.57 3.29 coastguard 476 1.34 1.98 container 182 1.82 2.76 foreman 449 1.28 1.76 grandma 310 2.13 3.28 highway 375 1.13 2.35 lotrings 1003 1.19 4.71 mother 161 1.83 3.28 news 210 1.98 2.32 salesman 210 1.98 3.01 silent 210 1.66 2.63
that were encoded using MDC in the temporal domain was at least 7.8 %, on average 16.8% and in the best case 18.4 % better than the quality resulting from applying MDC to the same streams in the spatial domain under the same conditions. Conclusion from evaluating our Multiple Description Evaluation Due to the much lower storage space and bandwidth overhead as well as the better loss probabilities of the streams, we are using multiple description coding in the temporal domain.
Semantic Based Error Avoidance and Correction for Video Streaming
87
7.1 Searching the Overlay In order to evaluate the success rate of both unpopular and popular videos, we have executed an experiment with 1,000 peers and 643 videos, where each peer has 5 neighbors and an indexing-cache of up to 50 entries. In the indexing-cache there can be up to 2 entries per movie. Each entry of the indexing-cache specifies a location, i.e., a peer that provides the movie. The association of videos to peers follows a Zipf distribution with α =1. Each peer issues a search request for all videos in the form of two random walks, each one with a maximum length of 20 hops. The search procedure stops when at least one location for the requested movie has been found.
Video popularity (A) Random network (B) Random network with cache (C) Indexing-cache overlay (D)
1000
Number of peers
800
600
400
200
0 0
100
200
300
400
500
600
Videos, in order of popularity
Fig. 4 Request success per video
The results are presented in Figure 4, where we show the request success for each video. The horizontal axis represents the videos, in order of popularity (most popular movies on the left side). The vertical axis shows the number of peers that successfully find the specified movie. For comparison purposes, we have included in the figure the success rate of the search procedure of the following cases (the legend, from top to bottom): (A) local-search only, which is actually the Zipf distribution of the videos (i.e., number of peers that have a certain video); (B) random walks in a random network, no notion of cache; (C) random walks in a random network, where each peer stores locally, in a cache, the results (i.e., locations) of the video requests that it had issued; (D) random walks in the indexing-cache overlay; the information from the neighbors (not the search results as before) is cached. The caches of (C) and (D) have the same size and they use the same aging process as replacement policy. The particularity of these two cases is that whenever a
88
C. Spielvogel et al. 250
Random Walk of length 10 Random Walk of length 20 Actual number of locations
Number of found locations
200 10 9 8 7 6 5 4 3 2 1 0
150
100
Random Walk of length 10 (caption) Random Walk of length 20 (caption) Actual number of locations (caption)
0
100
200
300
400
500
600
50
0 0
100
200
300
400
500
600
Videos, in order of popularity
Fig. 5 Average number of found locations for each movie
random walk containing a video request arrives at a peer, if the peer does not own the video, it searches for the video in the cache. The results for the indexing-cache overlay (D) show that the popular videos are always found and moreover, most of the unpopular videos are found by at least half of the peers. Under the same overlay configuration, we have done an experiment that shows the number of locations found for each movie during the search procedure. This time, the search stops only when the maximum random walk length has been reached. (Otherwise, there will be at most 2 hits). Again, each movie is requested from all peers, and we have computed the average number of locations found in the request path using a random walk of length 10 and 20, respectively. The results are shown in Figure 5, where we have also added, for comparison purposes, the number of real locations of each movie in the overlay. As expected, popular movies are found in more locations, while less popular movies are found in a smaller number of locations. The advantage is that the search procedure already returns multiple locations for the movies that are in at least 2 or 3 locations. Figure 6 shows an analysis of the request success rate in a random network and in the indexing-cache overlay, while varying the number of issued random walks and their length. The z-axis shows the success rate as the percentage of times where at least a location of the requested video was found. The experiments were done for 500 peers and 380 videos with a popularity according to a Zipf distribution with α =1. Each peer requests each video, making in total 190,000 requests. For both the indexing-cache overlay and the random network, a larger random walk length for the same number of random walks gives a higher request success rate than a larger number of random walks for the same random walk length, since in the latter case the same nodes might be visited, which is useless. As can be seen from the figures, the indexing-cache overlay returns a much higher request success rate than the random overlay, even for low values of the random walk length.
Semantic Based Error Avoidance and Correction for Video Streaming
89
Success Rate
Indexing-cache Overlay Random Network 100 80 60 40 20 0
1
1.5 2 # Rando 2.5 3 m Walk s
3
3.5
4 1
4
2 m ndo
5
6
lk Wa
7
8
ngt
Le
h
Ra
Fig. 6 Request success in a random network
7.2 Streaming Scenarios In this part we show that our model is able to find the best alternative among error correction, error avoidance and a combination of both approaches. To keep the examples comprehensible we pick a small subset of peers. The stream used consists of two descriptions encoded using MDC in the temporal domain. The full stream has a rate of 1462 Kbit/s when it is encoded using I-frames only and 1081 Kbit/s when it is encoded using I-, P- and B-frames. The experiments have been performed using the network simulator NS-2 [10] and a plug-in called EvalVid [5]. Data streams from multiple servers are merged and forwarded as one single stream to the player. 7.2.1
Scenario 1
The following example illustrates the necessity of combining Network Probability and Quality Probability. The content is provided by two alternative peers in different qualities (Alternatives A and B, see Figure 7). The question is which peer to select as streaming source. Alternative A is encoded using only I-frames; alternative B is encoded using I-, P- and B-frames. Calculating only Network Probability (Equation 5.1) yields a better result for alternative B (Table 5). When Success Probability (Equation 1) is calculated (Table 6) it can be seen that alternative A yields a better result (because of the higher QualityProbability). In order to verify the success
Table 5 Stream and Network Characteristics Alternative Bitrate Avail.-BW Netw.-Probability A 1462 1257 0.86 B 1081 989 0.90
90
C. Spielvogel et al.
Fig. 7 Logical view - Scenario 1
Table 6 Success Probability - Scenario 1 Alternative Success Probability MOS A 0.66 4.02 B 0.5 3.50
probability calculation of our model, both decisions are simulated. For comparing the alternative qualities again the Mean Opinion Score (MOS) metric [5] is used. By sending both streams, it can be seen that considering Network Probability alone is not sufficient and selecting alternative B would have been the wrong decision. The MOS values of alternatives A and B are 4.02 and 3.50, respectively (see Table 6). Alternative B (the one with the lower bit rate) scores worse because of the temporal dependencies to the frames that were lost. It can be seen that our model is able to select the better alternative. This small example is used to show that computing the ratio between the available bandwidth and the bit rate of the stream is not sufficient to decide between alternative streaming sources, because the structure of the stream has a strong influence on the resulting quality. 7.2.2
Scenario 2
In the second scenario it is assumed that two peers provide the requested content in the same quality. The problem is that the network bandwidth is not sufficient to send any description without loss. Both network paths to the receiver have an average bandwidth of 300 Kbit/s, the required bandwidth for sending description 1 and description 2 are 539 Kbit/s and 542 Kbit/s respectively. The question is either to send one description stream from each of the peers (and accept some loss) or one description stream and one forward error correction stream. When success probability is calculated it can be seen that sending one description plus one forward error correction stream is better than sending two descriptions (see the higher value of 0.88
Semantic Based Error Avoidance and Correction for Video Streaming
91
Fig. 8 Logical view - Scenario 2 Table 7 Success Probability - Scenario 2 Alternative Success Probability MOS 2 Descriptions 0.62 2.25 1 Description + FEC 0.88 2.45
compared to 0.62 in Table 7). In order to verify the success probability (Equation 1), both decisions are simulated. The MOS values from the simulations are also listed in Table 7. It can be seen that the result from sending one description and one forward error correction is 8.2 % better than sending two descriptions (see the higher MOS value of 2.45 compared to 2.25). The reason that sending two descriptions leads to a worse result than sending one stream and one forward error correction stream is that none of the two descriptions can be fully received.
8 Conclusions We have presented a semantics based model for selecting between network-error avoidance, network-error correction and a combination of both approaches to deliver multimedia streams over best effort networks in the desired quality. This model was presented in the context of a low-cost indexing-cache overlay that has been shown to deal well with requests for both popular and rare videos and, moreover, to locate multiple peers having the same video. The error handling model is based on considering network characteristics in combination with stream characteristics. The evaluation has been performed by doing simulations using varying stream and network characteristics within NS-2. The simulation results show that the model can be used to take the decision — error avoidance, error correction or the combination of both — that allows the system to deliver the stream in the best quality.
References 1. Lamparter, B., Boehrer, O., Effelsberg, W., Turau, V.: Adaptable forward error correction for multimedia data streams. Technical Report TR-93-009, University of Mannheim (1993)
92
C. Spielvogel et al.
2. Liu, H., Ma, H., Zarki, M.E., Gupta, S.: Error control schemes for networks: An Overview. Mobile Networks and Applications 2(2), 167–182 (1997) 3. Lee, I., Guan, L.: Reliable video communication with multi-path streaming using mdc. In: IEEE International Conference on Multimedia and Expo, ICME 2005 (2005) 4. Boyce, J.M., Gaglianello, R.D.: Packet loss effects on mpeg video sent over the public internet. In: Multimedia 1998: Proceedings of the sixth ACM international conference on Multimedia, pp. 181–190. ACM Press, New York (1998) 5. Klaue, J., Rathke, B., Wolisz, A.: Evalvid - a framework for video transmission and quality evaluation. In: Computer Performance Evaluation/Tools, pp. 255–272 (2003) 6. Padhye, J., Firoiu, V., Towsley, D., Kurose, J.: Modeling TCP throughput: A simple Model and its Empirical Validation. In: SIGCOMM 1998: Proceedings of the ACM SIGCOMM 1998 conference Applications, Technologies, Architectures and Protocols for Computer Communication, pp. 303–314. ACM Press, New York (1998) 7. Park, K., Wang, W.: Qos-sensitive transport of real-time MPEG video using adaptive forward error correction. In: IEEE International Conference on Multimedia Computing and Systems, vol. 2, pp. 426–432 (1999) 8. Claypool, M., Zhu, Y.: Using interleaving to ameliorate the effects of packet loss in a video stream. In: ICDCSW 2003: Proceedings of the 23rd International Conference on Distributed Computing Systems, Washington, DC, USA, p. 508. IEEE Computer Society, Los Alamitos (2003) 9. Maxemchuk, N.F.: Dispersity Routing in Store-and-Forward Networks. PhD thesis, University of Pennsylvania (1975) 10. The Network Simulator NS-2 (v2.1b8a) (October 2001), http://www.ns-2.com
Semantics in the Field of Widgets: A Case Study in Public Transportation Departure Notifications Alena Kovárová and Lucia Szalayová a
b
Abstract. Widgets are becoming increasingly present in our everyday routines, which makes their portability and reusability desirable properties. As a particular example, we consider public transportation passengers who are extensively using the internet to make their lives simpler. In order to minimize the time spent at the bus stop, they use to check their bus line departures on the web before they leave their homes or offices. For this purpose, there exist several internet portals providing information on local transportation time schedules. This chapter starts by presenting a better way (quicker and easier) of obtaining the same information – using an adaptive desktop widget with comfortable user interface. The second step is the utilization of the semantics of considered data in order to make the widget portable through different data sources of the same domain.
1 Introduction Due to the continually growing volume of information that is made freely available online, people often find themselves in the inconvenient situation where they have to invest disproportional effort and time in order to interact with the information sources they use. Everyone subconsciously or consciously estimates how long it will take to obtain the desired information and more importantly whether this information is worth this time and effort. This process includes for example decisions such as which electronic newspaper to read, which sports section to monitor, which broadcast to watch, which web pages contain relevant information and so on. This is of course a daily struggle; most of us would appreciate the time-saving and effort-saving option of having Alena Kovárová Faculty of informatics and information technologies, Slovak University of Technology, Bratislava, Slovakia Lucia Szalayová Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic M. Wallace et al. (eds.): Semantics in Adaptive and Personalized Services, SCI 279, pp. 93–107. springerlink.com © Springer-Verlag Berlin Heidelberg 2010
94
A. Kovárová and L. Szalayová
this “personalized” information wait for us somewhere nicely aligned. To come as close as possible to this vision, we come to the point of choosing a favorite newspaper, favorite channels and programs, favorite web pages; simply said: favorite information sources. But this is still not enough; even within these favorite sources it is still needed to search and to filter. This simply reflects the fact that the majority of the available information sources are built for the masses and therefore do not have any implemented personalization / personal adaptation features to serve the needs of each individual person. The abovementioned quest for information can be divided into three distinct types, which refer to a case in which someone is searching for: 1. general knowledge (whether in an unknown or a known field) 2. specific information in an unknown field 3. specific information in a known field In this work we focus on information quests of the third type. This means that a user is interested in specific information from some known area and that he knows where to search for it and how to filter the information that is available at that location; the user already has a favorite source for this information. In other words, in our case the user is able to formulate his requirements in greater details and to be explicit. Examples of such requirements could be: “I want to monitor this specific list of stocks on the stock-market and I have no interest in the fluctuation of other stocks or of the general index.” or “I need to have the current weather forecast for the city where I live and I prefer to have it in textual and image form.” While this is a known area and the experienced user knows where and how to find (manually) the information he is interested in, the problem that remains is how to transform such a requirement into a computer language so that a computer can look for the information (automatically) instead of the user. To understand this problem practically, let’s have a look at a specific case, the average morning of a “John”. John looks for information on bus departures from home to work. It takes John a little bit of time till he opens the relative web page in his web browser – it always takes John a few seconds to perform this task. The time depends on the degree to which John is capable of customizing the system he is working with and also on how much different settings will allow him (if they exist as an option) to speed up obtaining his desired information. Our goal is twofold: to minimize this time and to relieve the user from the manual customization of the information source. Clearly John's (and also our) requirement can be formulated like this: "I want to know, when my bus is going from where I am now and in the usual direction." It is important to notice words "my", "where" and "usual" because these assume an application is able to estimate the number of his bus, where he is and which direction he wants to travel. Once an application fulfilling this requirement is developed, a second question rises: "If John would move, could he still use the application?" If the widget worked at a semantic, rather than flat information, level, then
Semantics in the Field of Widgets
95
that kind of portability would be possible too, allowing John to continue using the tool he is accustomed to, even when his own circumstances and context change. The design and development of such a tool is the objective of this chapter. The remainder of this chapter is structured as follows: Section 2 contains a brief survey of different solutions for the retrieval of desired information via browsers and widgets. We point on their pros and cons in view of our purpose. Section 3 describes our widget starting with possible data sources, through system overview and widget basic functionality. We also explain widget architecture and give a closer view at its data model. Section 3 is closed by widget evaluation. Section 4 deals with semantics and the corresponding ontology model, which could make the widget independent on data source. We compare our model with other ontology models, which belong to the same area, but they are based on different requirements. Section 6 lists our concluding remarks.
2 Related Background Let's have a closer view of the user's possibilities of • • • • •
searching, filtering, retrieving the web data within specific site, customizing web-application for his own benefit how to obtain some web page the quickest way
It is the same for any kind of problem of the third type – when the user knows where to search and how to filter. So our first question is: What is the usual way to obtain information from Internet? Omitting the highly specialized webapplications, it is the well-known browsing.
2.1 Traditional Access to Resources on the Web Using Web Browsers The most used are internet browsers e.g. Microsoft Internet Explorer, Mozilla Firefox, Safari or Opera. And how can the average internet browser save time? The user can set up some settings e.g. to save his favorite web page via “Add to Favorites”, to make some pages as his “home-page”, to “Show the windows and tabs from the last time” when internet browser starts. Such settings allow the user to set up different things about web pages but there is no possibility to specify or to ask for specific information within the web page (if the user wants just a part of the page). A little improvement brought Microsoft Internet Explorer 8 with Web Slices, which use simple HTML markup to represent a clipping of a web page, enabling users to subscribe to content directly within a web page1. 1
Internet Explorer 8: Features – Web Slices http://www.microsoft.com/windows/internet-explorer/features/easier.aspx
96
A. Kovárová and L. Szalayová
To move closer to user needs, next to generic internet browsers we find site specific browsers. This type of browser is designed to create a more comfortable environment for the user, especially when browsing “the favorite” sites e.g. for e-mails or on different types of social networks. Examples of the site specific browsers are: Fluid (for Mac OS X), Mozilla Prism, Google Chrome or Bubbles. They are web-applications, which have the same core as web browsers but from the outside they look like desktop applications. They offer drag & drop function and have many other nice features, maybe they have some settings, which can be manually set up and then the user can obtain his information even quicker as in web browser, but they still do not guess user’s focus, do not give a chance to filter (specify which part of which web page) and do not offer the way of presentation. Apart from bookmarking systems built-in web browsers, users can take advantage of bookmarking web services, such as social bookmarking system Delicious2 (formerly del.icio.us). Such services provide them with the possibility to organize their bookmarks by using tags and to have their bookmarks available independently of user’s location and browser. Another option, which can significantly speed-up user access to relevant information are personalized and adaptive web-based systems [2], especially when combined with site-specific browsers. Appropriately trained personalized web based system can often display the information the user is looking for directly on the first page.
2.2 Widgets Without implementing own engine or robust system, a chance for solving our problem could be found between widgets (sometimes also called gadgets). In our context, they are not some elements, which help the user to navigate or to orientate or to pick a choice, but they are single-purpose mini-(web-)applications, which typically have a minimal size and are dedicated to bring simple solution based effect while a user is working with a computer. Their functionality is oriented to one, specific goal – to display very specific information. They can be of two types, either for the web (web-widgets) or for the desktop (widgets) [3]. The latter one can be for computer as well as mobile devices [1]. In this work we focus on the desktop widget for computers, which can be freely located and easily combined within the desktop. Most often used engines for widgets or gadgets are: • Konfabulator3 from Yahoo! for Windows XP+ and MacOS o known as Yahoo! widgets4 • Windows Sidebar from Microsoft for Windows Vista o sidebar with gadgets on Windows Vista desktop 2
Delicious – social bookmarking http://delicious.com/ Konfabulator, Reference manual, Version 4.5 http://manual.widgets.yahoo.com/ 4 Yahoo! widgets, http://widgets.yahoo.com/win 3
Semantics in the Field of Widgets
97
• Google Desktop Gadgets5 from Google for Windows XP+ o in a form of Google Desktop • Opera Widgets6 from Opera for Beta MacOS 10.5 and Windows XP+ • Dashboard7 from Apple for MacOS 10.5 o as the 2nd desktop with widgets • Joost Widgets Joost 1.0 Beta Mac OS 10.5, Windows XP, Windows Vista Most of them use a kind of API which processes mainly HTML, JavaScript, XML and CSS files. There are some differences between different enterprises of widgets and gadgets for desktops. From the user perspective some widgets are represented by views or icons which are located in a standard sidebar of the desktop and the widget become active only after click initiation where the icon spreads itself to the desktop. After this the widget can be relocated as wished. On the other hand some gadgets have almost double sized sidebar wideness as the widgets where gadgets are providing the service during all the time of activeness. After clicking on it gadget spreads itself and increases the service quality or quantity whereas relocation is limited within the sidebar. From an implementation point of view there are three possibilities for the user on how to have their own personal widget. The user should first decide which API he wants to use and if it is not already a part of his system or application, he needs to install it. Then those three choices are: 1. To find it on the web page with plenty of complete widgets, download it, manually set up it and use it. 2. To read a tutorial for extending a generic widget and follow simple instructions to created a specific one. 3. To read a tutorial for developers and program their own widget. Which of these three will be chosen is highly dependent on the type of information, which should be displayed (the way of displaying is not now taken in to account). Just like the site specific browser, the complete widgets cover the demand of the majority. Therefore non standard requirements are not covered by the first choice. If there is already a service as an RSS or a web-service, which can be requested for information, the second choice is sometimes enough. But in case of non-existent complete widget or service, the only choice is the third. The last one also gives a space for developer to implement some features, which would offer to the user some kind of personalization. But generally there is no effort to implement widgets for one purpose with a broad usage (i.e. independent of information source); those, which obtain information from Internet, all are exactly one site or exactly one web-service oriented. It is because there is no standardization for these sites or services, which would be applied in such widget. 5
Google Desktop, http://desktop.google.com/index.html Dev.Opera,, http://dev.opera.com/articles/view/creating-your-first-opera-widget/ 7 Dashboard widgets for Mac OS X Dashboard, http://www.apple.com/downloads/dashboard 6
98
A. Kovárová and L. Szalayová
3 Public Transportation Departure Widget Based on the survey presented in the previous section, we can implement a widget, in order to John’s request: "I want to know, when my bus is going from where I am now and in the usual direction." Widget technology is suitable for this purpose, while the request needs a very little space of user's desktop to display relevant information - the closest departures of chosen (guessed) stop, direction and line from public city transport. This tiny desktop application is mostly suitable for laptop owners (where the mobility can increase the need of extensive transportation) as well as for any computer user who is interested in his/her favorite line schedules. Here and in the following section we explain what the needs of our user are, which the features of the widget fulfilling these needs are and how they work together. Finally, we tackle the two main related theoretical questions: “Would it be possible to use metadata describing data semantics in order to make the widget independent of an information source?” and “What should the ontology model look like?” Our first key point was to look for suitable information sources (to show, they are not good enough and to choose one of them as our data source) and the second is to gather user’s requirements for the application.
3.1 Sources of Public City Transport Departures in Bratislava There are three well-known web sources of public transport information for Bratislava. In following lines are shortly described all of them with emphasis on user possibilities. The very first source is the web site http://www.dpb.sk [4]. This web site is administrated by the public transportation provider for the area of the capital city Bratislava in Slovakia. Process of reaching information (there are only timetables with departures) is relatively complicated and there is required a manual action – there are six steps needed within the browser. Thus, this source is not very favored between users. More over, it is not possible to personalize these pages. The second solution can be found within the web site http://www.imhd.sk [7] (imhd). This site is probably the most used. There is for example a useful feature where the user can search also the stop to stop combination. An attractive service of the imhd is an email notification possibility - where actual changes, exclusions, news and useful information can be provided. Personalization possibility is very limited – thus, searching for relevant information is not brisk. The last and the most recent source is web site from http://www.cp.sk [5], what is the National information system of timetables for Slovakia. This web site offer all kind of timetables for trains, buses, flights and different public city transports within Slovakia. Taking in to account only public city transport, the user can find his route by setting the starting stop, the last stop and time of departure or desired
Semantics in the Field of Widgets
99
arrival. The connection is found within interchanges, but user can ask also for direct connections only. The other choice is to get the entire timetable for one line at some stop for set date or to get the schedule for one bus and its route. The only possible personalization is to save the displayed page as the favorite one. From previous lines it is clear, that there is no service, which would give us the required information on demand; there are only different web sites. Since our requirement is so specific, we had to choose the third choice: to program our own widget. Evaluating the pros and cons of different widgets APIs we have decided to implement the widget using the Konfabulator and we chose imhd as a data source for our widget, while it had structured html code good enough to parse it to our database.
3.2 System Overview The idea of widget with line departures shall not substitute any of above mentioned information sources. To explain the difference closer, imagine a following scenario: John is at work. He knows which buses stop next the building and knows which one is suitable for him. But he does not remember its departures and just wants to know what the closest time his bus comes is, because he does not want to stand on that bus stop for ages. Of course, he does not want to browse internet, where he either has to click many times or has to fill some input boxes always with the same strings. He used to print out the entire timetable for his bus, but he always needed to check time and search for relevant value in paper. John is not interested in transfer between lines, he does not search for the quickest or the cheapest route. He does not need to know, when he will arrive to his destination. As we already mentioned, our two goals are to minimize time/effort and manual customization, in other words, we want to fulfill John’s requirement the way, which would minimize the number of his actions and accelerate the access to the information. The widget, which can follow this, has to have at first some input and output. Example of input is when the user chooses a number of a line. This input is continuously monitored, what enables our widget to adjust to the user. The output is displayed to the user - view. Our output is desired departures, which are loaded either from a local database, or downloaded from a web. When downloading is induced, new data are stored in local database. The last case for user is the possibility to set up predefined locations (Fig. 1, upper part), that enables the user to adjust the widget from the first touch. As departure schedules are from time to time changed, these changes need to be translated into the local database update to provide the user with the most up to date information. This updating process can run automatically every week, but the user can at any time, switch off this updating. There is also case for automatic clean up to erase data which are not used and are old. And the most important is to keep fresh data in displayed area – current departures, what is the last case of time actor (Fig. 1, lower part).
100
A. Kovárová and L. Szalayová
Fig. 1 Use case diagram of widget system
3.3 Widget Basic Functionality Basic widget functionality is to display the upcoming five departures of selected line from the chosen stop in a set direction (Fig. 1, case Input choices). To get this, the user has to go through three steps, which should be done in proper and intuitive order: 1. Select a line number – from a list within the dropdown-menu (Fig. 2, point 1), selection is needed only if the user does not want the automatically chosen. 2. Change a direction – simple click (Fig 2, point 3), needed only if the widget wrongly proposed the inverse one 3. Choose a stop – from a list within the dropdown-menu (Fig. 2, point 2), shown are only those stops which belong to the previously selected line. This selection has to be done only if the automatically chosen stop is not the wanted one. In the case of the first-time line selection, the first stop of selected line is preselected. After these three steps, whether they were done automatically or by the user, the upcoming five departures are displayed (from current time). The widget displays exactly: line number + direction + departure time + time-left in minutes (Fig. 2, point 4). To have current data at any time, actualization is performed every minute.
Semantics in the Field of Widgets
101
Fig. 2 Widget description
To alleviate the user from permanent time checking – how many minutes remains to a departure - we implemented also one extra feature – sound. The widget can announce the time of the next departure e.g., "Next bus arrived at 12:00. That is in 3 minutes." Of course, this function can be turned off (Fig. 2, point 5). Finally, every application should have a Help (Fig. 2, point 6). Our Help contains a user manual. To make it more user friendly, we gave the user the possibility to set up his favorite locations manually (Fig. 1, case Pre-defined location settings): The user can for every location choose several lines (with respective stops and directions), which he usually travels with, for example from school or office. The user can name it e.g., route “school->home”. The output is the same as within the basic functionality, only the upcoming five departures differ in line number and name of stop. Departures are ordered in the usual way – according to time of departure (Fig. 3).
Fig. 3 Widget setup for multiple lines within one route (in Slovak language, translation of route: Home -> Work)
102
A. Kovárová and L. Szalayová
The last of the basic widget functionalities is widget ability to adjust to the user's needs. As we do not use any other information sources (e.g. browsing history) to find out what are user’s usual bus stops and bus lines, the widget has empty database (except default data) at the beginning. While the user uses the widget, it monitors his choices and stores number of selection of each choice in the local database (together with downloaded data). Finally, the most often chosen option can be pre-selected automatically and thus accelerate the service access.
3.4 System Architecture We chose the Konfabulator as an engine for our widget. It means, we used mainly XML and JavaScript for programming and supported SQLite for our local database. Our system can be divided in following parts (Fig. 4): GUI – Graphics User Interface, which use to send data (user choices) to the Task manager and according to them can ask the Task manager for new data from local database. The GUI can also send information about user's choices to User profiler. The User profiler updates in database the number of user's selections. And remember the user's settings including his favorite locations / routes. Anytime the user chooses a line number, stop or direction, its relevancy raises. The Task manager • updates GUI (departures) either because of time or user's different choice, • updates the local database (data downloaded from Public transport information provider, if there was an Internet connection) and • cleans up the local database - due to performance optimization the Task Manager will erase the least selected lines out of the database in certain period
Fig. 4 Conceptual architecture of the public transportation departures widget
Semantics in the Field of Widgets
103
The Downloader downloads entered web page, therefore it is needed an Internet connection, when user wants to download new time tables or a new calendar. An input of the Parser is raw data (HTML code of a web page), which is parsed and stored in respective columns of the local database – wherefrom it will be loaded for the user as requested.
3.5 Data Model To parse one web page takes several seconds, what was contrary to our goal. Therefore we needed to store the data in our local database. The most important is to store lines, their stops and departures for terminal stops. While there is a difference in timetables depending on day type, we enlarged our database with two small separated tables – public and school holidays (Fig. 5). The line table contains data about the line previously loaded by system. By lines there is a learning ability applied - so one of the attributes is used to specify the incremental value of line selection count. The line stops table is loaded by data parsing of the left part of the schedule list. It contains information about stops of a respective line and time lag between each two upcoming stops in a route. Here is the learning capacity of the system done by incrementing the station selection count - selection of the station for specific line and direction. The departure table, in database, represents departure times out of the base station - so the time of arrivals for specific station is calculated using the initial departure time and summary of time lags until the desired station. As departures are differentiated based on the actual day (working day, weekend, public holiday or school holiday) this feature is taken into consideration.
Fig. 5 Logical data model of the widget database
Previously mentioned day differentiation is being done by recognizing a week day (working or not) whereas a special feature for recognition of public or school holidays is represented within separate tables with these special days. A list containing the school holidays is updated yearly - this list can be gathered in the site
104
A. Kovárová and L. Szalayová
of The Ministry of Education in Slovakia8.Attribute region is necessary, while school holidays in our country differ on the basis of it.
3.6 Evaluation Evaluation was done among the students of the Faculty of informatics and information technologies of Slovak University of Technology in Bratislava. Tests were performed by 10 volunteers who use the computer on a daily basis. Their task was to download a new line (of public transport) in the application to display departures for one of its stops. By starting the widget, instruction guidelines were displayed, but were usually skipped by the testers. As testers realized during their first attempts that the widget displays only one default line, guidelines were used to get the information on how to extend the widget’s functionality. Overall, it took generally less than three minutes for users to find the desired link information. Testers observed the specific feature of the application - due to data parsing after the URL was set – that it was not possible to influence the widget for a moment. This feature has been previously well documented also within the guidelines. One special feature of the widget is sound – the widget can announce the time of the next departure. This feature was also tested (the speech was realized by using the Windows functionality of automatic reading of given text). This voice functionality was evaluated as being very popular by the users, whereas the widget was rated in a very positive way as a whole. No negative features were found. Testers came out with one recommendation: to display departures in centralized printout within the frame. The system has been implemented according to its design. During the implementation several traps occurred. One of the most complicated was not wellstructured HTML code of imhd pages. Pair tag rules were many times broken, what forces us to deep study of the source code. It was necessary to identify key points within the HTML code which were used to identify the load sections. This way is complicated for the implementation and execution as well. Due to this fact implementation of automatic data updating has not been implemented – to update database (departures of a few lines) would take several minutes and during this time widget would be out of order. Due to this fact updating can be done if initiated by user in the same way as adding a new line.
4 Extending the Widget with Semantics Coming back to question posed earlier in this chapter: "If John would move, could he still use the application?" It would be suitable, if our widget would work although it will have different information source with the same type of information – line departures. This idea assumes that the provider provides data also with their semantics. Such providers are very rare as well as widgets working with such data; more often are web widgets e.g., in project of Eetu Mäkelä with colleagues [6]. 8
The web site of The Ministry of Education in Slovakia, http://www.minedu.sk
Semantics in the Field of Widgets
105
But the principle is the same, so we created our own ontology model (Fig. X.6) to represent the semantics and relations within data we are working with – parsing, storing and displaying in our widget. This ontology model includes all three main tables and their attributes from our data model. xsd: int sn ha
be um is
r
Vehicle type
e t yp icle h e v
Line
has label
xsd: string
xsd: int
* to p
Stop of the line
s inu te in m
ift e-sh m ti has has direction
rd f: t
xsd: int
Direction
is a t sto p
m
ss ha
i is
r rd e no
fro
yp e
to
Stop has
Terminal stop
sd ha
e* ur art ep
is scheduled at
Departure is
va lid
el
m fro
lab
p sto
xsd: string
xsd: time
in
Type of day
has label
xsd: string
Fig. 6 Ontology model of data from public transportation departures widget
To check the compatibility with provider, let's assume that the provider provides the same model as presented Junli Wang and his colleagues in their work [8]. Their model is not meant for widgets, but it also deals with public transportation. Their purpose is oriented on public transport query as transfer trip scheme, route query and station query. That is a wider range of public transport domain than ours, thus also their ontology model is wider (Fig. 7). Omitting the concepts, we do not use in our model, and leave the same ones out, it is noticeable only one serious difference: the concept of route with its timetable. We do not have anything like this in our model, while we can calculate it from departure from terminal stop plus time-shift to selected stop. Our model expect the timetable of departures (from terminal) without knowing the last stop, but their always need to have set the first and final one. This leads us to two conclusions. The first one is, our widget would not work on their ontology model unless we would reimplement our widget, and the second is that our ontology model is better, since we used departures, what is semantically lower concept than route – route can be easily calculated from departures.
106
A. Kovárová and L. Szalayová
Fig. 7 Urban public transport ontology [8]
5 Conclusions It was already well known that it is possible to implement a widget (as a client), which downloads and parses data from some web source (server side). Moreover, such a widget can be personalized, because it can adjust itself to best serve the user, thus making the retrieving of information more comfortable and quick. This accommodation is achieved by monitoring the user's choices and storing the number of selection for each choice in the local database. The only one disadvantage is that such widget is totally dependent on the data source. In this chapter, in order to make such widgets portable through different web sources in the same domain, we proposed the creation of an ontology model which can reflect data semantics. We created such a model and compared it with an other one from the same domain but with a different purpose. The comparison showed that the two ontological models differed in the main concept. This conclusion implies that although it is useful to use semantics in the widget (as in any other client application), it will work only if the server provides data with the same semantics. Regarding further applications of the work presented herein, the widget could take a benefit of such semantic model which could be applied also in other kinds of systems with regular departures e.g., logistics or catering. In the same time, our ontology model can be extended so it would serve also for other purposes e.g., route planning. Acknowledgement. This work was partially supported by the Scientific Grant Agency of Slovak Republic under the contract No. VG 1/0848/08.
Semantics in the Field of Widgets
107
References [1] Boström, F., Nurmi, P., Floréen, P., Liu, T., Oikarinen, T., Vetek, A., Boda, P.: Capricorn - an intelligent user interface for mobile widgets. In: Proceedings of the 10th international Conference on Human Computer interaction with Mobile Devices and Services, MobileHCI 2008, pp. 327–330. ACM, New York (2008) [2] Brusilovsky, P., Millán, E.: User Models for Adaptive Hypermedia and Adaptive Educational Systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) Adaptive Web 2007. LNCS, vol. 4321, pp. 3–53. Springer, Heidelberg (2007) [3] Caceres, M.: Widgets 1.0: The Widget Landscape. W3C (2008), http://www.w3.org/TR/widgets-land/ (accessed 17 September 2009) [4] Dopravný podnik Bratislava, a.s (company, provider), Public transportation for the area of the capital city Bratislava (web site), http://www.dpb.sk (accessed 17 September 2009) [5] INPROP, s. r. o (company, provider), National information system of timetables for Slovakia (web site), http://www.cp.sk/ (accessed 17 September 2009) [6] Mäkelä, E.: Enabling the Semantic Web with Ready-to-Use Web Widgets Export. In: Nixon, L.J.B., Cuel, R., Bergamini, C. (eds.) Proc. of the First Industrial Results of Semantic Technologies Workshop (FIRST 2007), pp. 56–69 (2007) [7] mhd.sk (citizen union, provider), imhd.sk (web site of public transportation for the area of the capital city Bratislava), http://www.imhd.sk (accessed 17 September 2009) [8] Wang, J., Ding, Z., Jiang, C.: An Ontology-based Public Transport Query System. In: Proceedings of the First International Conference on Semantics, Knowledge and Grid table of contents, pp. 62–64. IEEE Computer Society, Los Alamitos (2005)
An Adaptive Mechanism for Author-Reviewer Matching in Online Peer Assessment Ioannis Giannoukos, Ioanna Lykourentzou, Giorgos Mpardis, Vassilis Nikolopoulos, Vassili Loumos, and Eleftherios Kayafas
a
Abstract. Peer assessment techniques are an effective means to take advantage of the knowledge that exists in web-based peer environments. Through these techniques, participants act both as authors and reviewers over each other’s work. However, as web-based cooperating environments continuously grow in popularity, there is a need to develop intelligent mechanisms that will retrieve the optimal group of reviewers to comment on the work of each author, with a view to increasing the usefulness that these comments will have on the author’s final result. This paper introduces a novel technique that incorporates feed forward neural networks to determine the optimal reviewers for a specific author during a peer assessment procedure. The proposed method seeks to match author to reviewer profiles based on feedback regarding the usefulness of reviewer comments as it was perceived by the author. The proposed mechanism is expected to improve the peer assessment procedure, by making it adaptive to individual user characteristics, increasing the quality of the projects of a group overall and speeding up the peer assessment procedure. The method was tested on educational data derived from an e-learning course and the preliminary results that it yielded are promising. Keywords: peer assessment, user matching, machine learning.
1 Introduction During the last few years, web-based social networks have met rapid development. Such networks consist of individuals from different expertise backgrounds and with various profiles who cooperate with one another to boost their knowledge and performance. The presence of a large number of peer users, the common goals shared by the members of the community, as well as the diversity of knowledge and expertise of the participants, makes these networks an ideal environment for the incorporation of peer assessment techniques. These techniques enable users Ioannis Giannoukos, Ioanna Lykourentzou, Giorgos Mpardis, Vassilis Nikolopoulos, Vassili Loumos, and Eleftherios Kayafas Multimedia Technology Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, Zographou Campus, 15773 Athens, Greece e-mail: {igiann,ioanna,gmpardis,vnikolop}@medialab.ntua.gr, {loumos,kayafas}@cs.ntua.gr M. Wallace et al. (eds.): Semantics in Adaptive and Personalized Services, SCI 279, pp. 109–126. springerlink.com © Springer-Verlag Berlin Heidelberg 2010
110
I. Giannoukos et al.
to evaluate and comment on each other’s work and thus benefit from the knowledge of some of their peers to improve the quality of their assignments. Peer assessment techniques may be used in various real-world situations, such as the evaluation of novel research contributions, the assessment of medical diagnoses and all fields of corporate and academic education. Especially as far as the academic and educational domain is concerned, the beneficial aspects of peer assessment have been widely discussed throughout the literature. Researchers agree that peer assessment stimulates student motivation and encourages deeper learning and understanding [33, 29]. Although peer assessment techniques are broadly recognized as an important quality assurance mechanism, they also present a major drawback. More specifically, they need to be manually coordinated by a supervising expert that selects peers, seeking to ensure that the group members will receive the best possible reviews. If this matching procedure is successful, it will help enhance the quality of the users’ work. The decision of the expert supervisor is typically based on his perceived knowledge of the user profiles and expertise. Relying, however, on human experts to perform this task is inevitably a time-consuming process, since the supervising authority needs to carefully examine each case and decide accordingly. This considerable time loss delays the final outcome and in some cases results in lowering the quality of the information which is finally released. On the other hand, if the matching procedure is forced to be made quickly, then mistakes are very likely to occur and a less than appropriate peer matching may be arranged, due to the fact that the time spent to examine possible peer pairs is limited. All the above shortcomings may not have a substantial impact on small-scale groups, which involve a limited number of users, but in case where large user populations are involved, the outcome quality of the peer community might be significantly low. Therefore, the traditional method of peer matching performed by a few expert individuals seems to be less suitable and viable for modern web-based environments which involve a large number of participants. Instead, these environments require the use of an intelligent mechanism that will automatically and efficiently match peer reviewers to peer authors with a view to provide the author with the best possible quality of reviews; a fact which is expected to lead to improved performance and better final quality results. In this paper, a novel peer matching mechanism is proposed. This mechanism provides adaptive and personalized services for performing automatic optimal matching between authors and reviewers, taking into account the feedback that the authors provided in terms of their perceived usefulness of the comments received by the reviewers. More specifically, this mechanism is based on a popular machine learning technique, namely feed forward neural networks, to estimate the optimal reviewers for a specific author. The proposed method uses past data to construct author and reviewer user profiles. In addition, it uses the author’s perceived usefulness made over a specific review, which is obtained through a quality feedback attribute. Then, the method uses the aforementioned data to estimate the usefulness that the author will probably find in the comments of each reviewer. Next, based on these estimations, it automatically assigns each author to the most
An Adaptive Mechanism for Author-Reviewer Matching in Online Peer Assessment
111
fitting reviewer. Therefore, the proposed method adapts itself to the personal characteristics of each individual and a fact which is expected to make the peer assessment procedure more efficient. The rest of this paper is structured as follows: Section 2 presents the strengths and limitations of related research literature and section 3 introduces the reader to the theoretical background of the feed forward neural network technique. In section 4 the proposed method is analytically described. The method results on elearning peer assessment data are presented and discussed in section 5. Section 6, includes a discussion regarding the potential of the proposed method as well as future extensions. Finally, section 6 concludes the study.
2 Related Literature Various research studies can be found in the literature that refer to the use of peer assessment in different domains. It should be noted that although it has been applied to various domains, the most prominent use of peer assessment is in the education sector. The peer assessment process has been found to be effective in promoting peer learning [33] and in improving students’ interpersonal relationships inside a classroom [29]. Supporting the above findings, the study of Berg et al. [3], which is applied on university level students, reports that a significant improvement can be observed when students process the feedback they receive from peer assessment and incorporate it in their work. This study also reports that the time that peer assessment takes place is a very important factor to the education procedure, since it should not coincide with teacher assessment of the students’ work, in order to be mostly effective. The value that peer assessment can bring to a class is not limited to students of university or high school level, but it can also be beneficial for more advanced student groups, such as the ones involved in teacher education, as reported by Sluijsmans et al. [28]. The results of this work are based on three empirical studies and suggest that the peer assessment procedure leads to a general improvement in students’ peer evaluation skills, as well as to their task performance in the course field. Taking into account the quality of the students has also been found to be beneficial for the peer assessment procedure. To this end, the study of Ljungman et al. [20], which is performed at a university level education, examines the effect that peer assessment has on student performance, when “older” students are involved as peer examiners for “younger” students. The study concludes that involving students into this type of peer assessment procedure increases their motivation to learn, makes them acquire tacit knowledge and makes them understand the meta-cognitive competences that are necessary in order to become responsible and autonomous in learning. Apart from improving the performance of the students, peer assessment evaluations have also been found to be equally reliable and valid to the assessments produced by the teacher in [32]. This is mainly due to the fact that a peer assessor has more time to spend on the peer assessment procedure than the instructor, a fact that compensates for the decreased knowledge that the peer assessor may have.
112
I. Giannoukos et al.
Apart from classical education, peer assessment has also been widely used in online courses. In this type of courses, where instructors have less means to assess the students’ knowledge, while the number of students may be large, peer assessment is found to present a variety of advantages, while at the same time it overcomes the time and place restrictions posed by traditional peer assessment processes. More specifically, students that actively participate in on-line peer assessment activities receive higher grades on final exams compared to those that do not [2], especially at initial course stages [5]. In addition, the study of Prins et al. [30], which refers to peer assessment applied on a computer supported collaborative learning environment, showed that the students’ attitude towards peer assessment was positive and the assessment results added value on their performance. A different implementation of peer assessment, performed by Chang et al. [4], showed that this procedure can be used for further purposes apart from performance enhancement. More specifically, in this study a fuzzy peer assessment system to be used in online peer assessment is developed. Using this system, students are divided into smaller groups that are assigned with a specific task to complete and then students within the groups assess the level of each others’ contributions in the cooperative activities of the student group. This use of peer assessment allows for all students to be rewarded based on their true participation to the final outcome of the group. Peer assessment has also been used on the vocational sector. Keely et al. [19] reports a peer assessment procedure performed on the written correspondence between health care providers, and more specifically on the consultation letters that these providers exchange. This study concludes that a high degree of satisfaction with the peer assessment procedure is observed. In addition, the participants report that peer assessment results in positive changes to the quality of their consultation letters. After performing a follow-up period of six months on the same participants, the aforementioned study also reports that peer assessment also presents longstanding changes in the way that the participants complete their letters, affecting the latter in a positive way. The method of peer assessment is also used as a means to examine the professional competence among peer medical students. In the study of Dannefer et al. [10], fifteen users evaluate the work habits, preparedness, initiative, respect and trustworthiness of their peers. The findings of this study suggest that peer assessment can be used to foster reflection about professional qualities and as a means of assessing professional skills. Another study focusing on the issue of peer assessment among professionals is the one made by Tsai et al. [34]. In this study, twenty four teachers were involved in a three-round peer assessment in order to develop their science activities. Results of this work show that teachers develop more creative science activities -as a result of this procedure- both in a theoretical and in a practical level. Another field where peer assessment is broadly used is the academic publication sector. Scientific journals internationally recognize this method as a quality assurance mechanism. To examine the effects of peer assessment on the journal sector, Yue et al. [36] performs this procedure on forty one clinical neurology journals with peer opinions obtained from 254 members of the World Federation on Neurology. Results imply that peer assessment is a viable technique that can be used to assess journal quality in the health sciences and provides a valuable tool
An Adaptive Mechanism for Author-Reviewer Matching in Online Peer Assessment
113
for collection development decision-making by health care librarians. However, the study of Grainger [12] suggests that peer assessment is only useful to the academic publishing sector if the peer participants, especially those serving as reviewers, are characterized by professional conduct and responsibility. In addition, this study stresses out that the review process should also be timely and qualitative, a condition which if not met might endanger the credibility and responsiveness of the journal. The problem of time required to receive, file and organize article submissions, track article versions, match of authors to reviewers as well as maintain the correspondence with them exists even in journal publications with relatively few article submissions. The advent of technology used in the peer review process has helped to minimize the amount of time needed and reduce the costs related to the peer review process [24]. Another interesting result comes from the study of Schroter et al. [27]. This study examines the case where the authors of an article are given the opportunity to suggest the reviewers they consider as the most suitable to review their article and compares author- and editor- suggested reviewers in order to examine the differences in the review quality. Using data from ten biomedical journals this study reports that the quality of the reviews between author and editor suggested reviewers did not differ significantly, although the author-suggested reviewers tended to make more favorable recommendations for publication. Therefore, the study concludes that editors can rely on the reviewers that were suggested by the authors to make reviews of an adequate quality, but should be cautious when considering their recommendations for the article publication. However, considering author-suggested reviewers as candidates to undertake the peer assessment procedure is not always favored, since, as reported by the study of Clark et al. [6], these reviewers might be well pre-disposed towards the authors. Instead of this practice, this study suggests that the editors can ask the authors of each article to shortly describe their contribution in relation to their prior work. This procedure enables the editors to identify those peers that have knowledge relevant to the subject of each submitted article and select the most appropriate reviewers accordingly. From the above, one may observe that peer assessment has been found to be an especially beneficial quality assurance mechanism for various sectors. However, since typical peer assessment is performed manually, through a few expert individuals, little attention has been given in automatically retrieving the optimal reviewer for a specific author. To this end, prototypes of author-reviewer pairs according to their level of proficiency are defined in [8] and [7]. In this study, students are categorized into “proficient” or “having difficulties”. Then, fuzzy logic is used to evaluate the possible level of satisfaction that the author would have towards the comments of the reviewer. This is performed by assigning positive weights to the pairs {proficient, having difficulties} and {proficient, proficient} and negative weights to the pair {having difficulties, having difficulties}. Next, genetic algorithms are used to find the optimal match among alternative possible mappings. However, this method does not adapt its estimations according to the specific characteristics and preferences of the author as they are determined by the authors’ feedback, but instead it uses a predetermined static fuzzy logic model to determine the optimal pairs.
114
I. Giannoukos et al.
Therefore, a mechanism that is adaptive to the special characteristics of each user should be sought in order to increase the peer assessment procedure effectiveness. This mechanism should automatically match authors to reviewers with a target to increase the usefulness that each author finds in the reviewers’ comments and thus improve quality of the authors’ works in a timely manner.
3 Feed Forward Neural Networks Artificial neural networks have been successfully applied to various research and industry fields to perform tasks including forecasting, data classification and regression analysis. The feed forward neural network architecture (FFNN) is one of the most popular forms of artificial neural networks. These networks have been developed as a computational model of the functions and the learning processes of the human brain. Therefore, by mimicking the biological neural networks, they attempt to learn from examples and generalize their findings to an unseen population. Typically, a FFNN, as described in [14], consists of layers that are composed of several processing elements, called neurons. There are three types of layers, the input, the hidden and the output layers. In this type of network, neuron connections, called synapses, do not form a directed cycle. These synapses exist only between neurons of subsequent layers. Additionally, the information moves only forward, from the input to the output nodes. A FFNN can be considered as an acyclic graph, as shown in figure 1.
Input layer Hidden layer Output layer Fig. 1 Feed Forward Neural Network Architecture
The output y of kth neuron can be calculated by multiplying its input x with a weight vector w, summing the bias b of the neuron and applying the result to the activation function f, as follows:
An Adaptive Mechanism for Author-Reviewer Matching in Online Peer Assessment
115
yk = f ( w ⋅ x + bk ) The activation function can be either linear or non linear, but its most common form is the logistic sigmoidal function:
f ( x) =
1 , where 1 + exp( − βx )
β
is a slope parameter.
During its learning phase, the network is presented with a set of examples which form the network training set. Each example consists of an input vector and the corresponding output vector. The goal of the FFNN training is to minimize a cost function, which is typically defined as the mean square error between its actual and target outputs, by adjusting the network synaptic weights and neuron biases. A very popular training algorithm is the back-propagation algorithm, proposed in [25] and [26]. According to this algorithm, information is passed forwardly from the input nodes, through the hidden layers, to the output nodes and the error between the desired and the actual response of the network is calculated. Then, this error signal is propagated backwards to the input neurons, and the signal is used to adjust weights and biases of the network. This process is repeated for each example in the training set. As soon as the whole training set has been inserted to the network then an epoch elapses. The training set may be inserted to the network several times, therefore many epochs may be needed for the network training to finish. A popular variation of the back-propagation algorithm is the LevenbergMarquardt algorithm [13]. This algorithm increases the speed convergence and effectiveness of the network training, since it has been found to be effective in solving non-linear least squares problems, as in the case of minimizing the cost function of a FFNN. However, a FFNN may end up being overtrained. In this case, the weights and biases of the network are over-adjusted and reflect only the specific characteristics of the training set. In this case the FFNN loses its generalization abilities. This phenomenon, called over-fitting, can be avoided by using in the training process a separate set, called the validation set. At the end of each epoch, the network error is calculated in both the training and validation sets. While the error of the training set is used to adjust the network parameters, the validation set error is only used to determine when to stop the learning process in order to prevent overtraining of the FFNN. More specifically, as soon as the network performance deteriorates on the validation set, meaning that overtraining has probably occurred, training stops and the state of parameters of the previous network epoch is stored. Therefore, the training phase can be terminated by reaching a minimum in the cost function, meeting the performance goal or by detecting that the validation set has produced an increasing mean square error. To examine the network efficiency over a specific problem, this study uses two typical strategies. The first one is called k-fold repeated random subsampling. According to this method, the network is trained k times, using different validation sets that are randomly extracted from the dataset, at each training session. Thus, the accuracy result of the network can be calculated by estimating the mean performance of the k networks. The second method uses a data set which is disjoint to
116
I. Giannoukos et al.
the validation and training sets, called the test set. The test set is used to estimate the generalization ability of a specific network on data different from those used during the training phase.
3.1 Strengths and Limitations of Using FFNNs Neural networks present various strengths which make them suitable for classification and prediction tasks. One of their main advantages is that FFNNs are universal function approximators. They can estimate any continuous function to any degree of accuracy [9, 11, 15-17]. As a result, neural networks have the ability to efficiently map nonlinear relationships between their input and output. Additionally, FFNNs have the ability to generalize on an unseen population. A neural network can learn from examples and correctly predict the output of data that are not included in its training set, even if the training examples contain noisy information. The robustness of neural networks, in the presence of noise in the input data, is one of their most significant advantages [31]. FFNNs have the advantage of being data-driven instead of model-driven, that is, they do not a-priori assume an explicit relationship model among the data, as model-based linear or nonlinear methods do. Instead, the model structure and the model parameters that they use are derived from the actual dataset of the problem. Moreover, real-world problems are often nonlinear and the relationship among their data is difficult to describe analytically. Usually, the only available information, regarding these problems, is prior experience, in the form of past data. Therefore, taking into consideration the characteristics of the FFNNs, which include arbitrary function approximation, nonlinearity, generalization capability, it is to be expected that they can be used to predict future events in an efficient manner. Trained neural networks can quickly make predictions on an input set. This characteristic, along with their high degree of accuracy, makes them suitable for applications where training needs to be made sporadically but predictions should be made in real-time. Nevertheless, besides their strengths neural networks also present certain limitations. Firstly, neural networks usually require some time for training due to the number of iterations needed to achieve their optimal performance. More specifically, while minimizing the cost function during training, they may be trapped in local minima, therefore not achieving the optimal solution. To overcome this, multiple training iterations usually take place and the most efficiently trained network is selected [18]. Another limitation that neural networks present is their dependency on the size and quality of the data used for their training [14]. The more indicative the examples of the problem they are presented with, the more accurate the predictions they are expected to make. In addition, although they can infer a correct solution based on noisy data, they have difficulty in making correct predictions on data which are contradictory to the ones used for their training. Finally, neural networks are black-box methods. As such, they cannot be analyzed in great detail like linear models and the data relationship that they approach cannot be easily described [1].
An Adaptive Mechanism for Author-Reviewer Matching in Online Peer Assessment
117
4 The Proposed Method In this section the proposed method is analytically described. The method uses a trained feed forward neural network to estimate the optimal set of reviewers for each author in order to facilitate the peer assessment process to increase its effectiveness. In order for the proposed method to provide results that adapt to the characteristics of each user, peer profiles are firstly constructed. Since an individual may serve either as a reviewer or as an author, two distinct profiles are created for each type of peer user. Reviewer profiles consist of information about the proficiency of the reviewer, the average strictness that this reviewer has demonstrated in grading past author projects, the average usefulness that the reviewer comments have received and the reviewer’s willingness to participate in the peer assessment procedure. The reviewer proficiency can be calculated taking into consideration the available data about his achievements in the field. For instance, in the case of students, past academic performance and average grades can be used while in the case of peer assessment in the academic sector, the method can use the number of journal articles that the author has published in the past. The element of average strictness refers to the average ratings that the reviewer has provided authors in the past reviews that he has submitted. The average usefulness attribute is calculated by using the prior quality feedback values that the reviewer has received from authors in the past. The usefulness attribute is quantified through a 5-item Likert scale instrument which is placed at the end of each comment that the authors receive. More specifically, after reading a comment –made by a specific reviewer – the author is asked to determine how useful he found this comment to be. The usefulness attribute may thus receive five possible values which range from 1 (not useful at all) to 5 (very useful). Finally, the reviewer’s willingness to participate in the procedure derives from the number of reviews he has completed versus the total number of reviews that he has been assigned with. Author profiles are constructed based on their proficiency level and the average reviewer grading that they have received. The aforementioned attributes are summarized in Table 1. The first time the algorithm is applied, it randomly creates reviewer-author pairs. As soon as this first reviewing phase is over, authors rate the usefulness of the reviewer comments that they have received. Then, according to the rating that the previously assigned reviewer-author pairs demonstrated during the peer assessment procedure, the algorithm adapts itself to calculate the optimal pairs. At each stage of the peer assessment procedure, the algorithm uses a trained FFNN to estimate the optimal order of reviewers for each author, that is, to provide a list of possible reviewers for each author. This list is calculated based on usefulness rates that the FFNN has estimated that the author would probably assign to the comments of each reviewer. Figure 2 depicts the way the FFNN technique uses author and reviewer profiles as an input to estimate the usefulness level that the author of each pair would suggest. Specifically, each input vector is the concatenation of the reviewer profile and the author profile. The FFNN output is the usefulness rate that the input author would probably present.
118
I. Giannoukos et al.
Table 1 Peer profile attributes Peer type
Level of proficiency in the field
Average strictness
Average usefulness rate
Willingness to participate in the procedure
Average reviewer grading
Author
X
-
-
-
X
Reviewer
X
X
X
X
-
Reviewer Profile FFNN
Usefulness
Author Profile
Fig. 2 The use of peer profiles by the FFNN technique
Each time the algorithm is used to generate optimal reviewer-author pairs, it updates the peer profiles to incorporate recent user behavior. Therefore, as time progresses the method adapts itself to the more detailed user data that have been gathered, and in this way it increases its effectiveness. Figure 3 describes the algorithmic steps of the proposed method. Firstly, for each author, an ordered list named RevList of the estimated preferred reviewers is calculated by matching the author profile to the profile of each reviewer. This list comprises the reviewer profiles and the estimated usefulness rate calculated by the FFNN, as described earlier. Then, the first k reviewers, taking into consideration the estimated usefulness quality feedback attribute, are copied from the preferred reviewer list (RevList) to a set containing the possible reviewers (PosRev) for this author. However, since a reviewer may only comment on a specific predetermined number of peer projects, a number of reviewers, in the PosRev set for this author, might be preoccupied with other assignments and thus be unavailable for the review process. Thus, before selecting any reviewer, the algorithm examines the availability of the reviewers in the PosRev set and removes those that cannot review further assignments. Next, if the number of the remaining possible reviewers is positive, it randomly selects one and assigns this reviewer the task to comment on the author’s project. As soon as a reviewer has been matched to an author, his availability in RevAvail is updated; then, he is removed from the set of possible reviewers for this author and inserted into the set of selected reviewers (called Rev). However, after examining of the availability of currently selected possible reviewers, the algorithm may detect that there are no available reviewers in the possible reviewer list, PosRev. In that case, it examines the availability status of the next reviewer in the ordered list returned by the FFNN, until an available reviewer is found. The aforementioned procedure is repeated until all users have been matched to a pre-defined number of reviewers, n.
An Adaptive Mechanism for Author-Reviewer Matching in Online Peer Assessment
Fig. 3 Algorithm Description
119
120
I. Giannoukos et al.
where RevList is an ordered list of the predicted reviewers for a specific author, PosRev is the set of candidate reviewers, RevAvail is a table indicating reviewer availability, Rev is the set of selected reviewers, k is the initial size of PosRev, b is the index of last examined reviewer in RevList and n is the predefined number of reviewers that an author should have been assigned with. The procedure of randomly selecting one reviewer among the estimated optimal k ones was chosen to ensure that the algorithm will not end up assigning the same reviewer to the same authors and therefore boost the fairness of the algorithm.
5 Experimental Results 5.1 Method Implementation on e-Learning Data To examine the effectiveness of the proposed method, this study uses educational peer assessment data, derived from an introductory level e-learning course on “Web Design”. The course is provided by the e-learning team of the Multimedia Technology Laboratory of the National Technical University of Athens [22], through the Moodle open-source LMS platform [23]. The Web Design course consists of seven educational sections and is offered twice a year, in the Spring and Fall semesters. During the seven sections, the educational material of each program is delivered to the students and their knowledge is assessed through testing material which consists of five multiple choice tests, to examine the theoretical knowledge that the students acquired, and seven projects to test the application of this knowledge on practical terms. The projects require from the students to create a web site which is assessed in terms of functionality, design and technical soundness. Functionality refers to how easy is for a user to find the desired information in a web page, while design to the choice of colors, images and character fonts in the page. The technical soundness of web page refers to how well-written is its source code. For a page to achieve the maximum possible grade, it should excel in all of the aforementioned criteria. The course level is introductory and it is targeted towards adults of various educational backgrounds, ranging from high-school graduates to master-degree holders. Nevertheless, students are advised to have basic computer and English language skills, since an important part of the material is delivered in English. Since the Spring 2008 course and at the end of each educational module, students have been participating in the peer assessment procedure. More specifically, each student is asked to review two randomly selected projects of fellow classmates and receives the comments of two reviewers. Students grade each others’ projects by filling in a review form with four questions regarding the assessed design of the project, its technical soundness, functionality and overall impression. Therefore, the reviewers are asked to give their opinions about the three web page criteria mentioned above and provide an overall grade. The grading scale for each question ranges from 1 (negative impression) to 5 (positive impression). As soon as a review form has been filled in, it is made visible to the author who is then asked to evaluate the usefulness of the received
An Adaptive Mechanism for Author-Reviewer Matching in Online Peer Assessment
121
comments. To ensure that students will provide as an objective feedback as possible, peer assessment ratings do not contribute to student grading.
5.2 Method Results In this section the preliminary experimental results of the method are presented. The dataset consists of 152 reviews, conducted by 16 students during the Spring 2008 semester. The method was implemented using the Matlab R2008a platform environment [21]. To examine network efficiency, we first used 1000-fold repeated random subsampling. According to this method, the network was trained 1000 times, using at each training session a validation set which was randomly extracted from the dataset. The validation set was chosen to be the 15% (23 examples) of the dataset, leaving the rest 85% (129 examples) for the training set. The accuracy result of the network was calculated by estimating the average performance of the 1000 networks. The Mean Absolute Error calculated was 0.7682. This result indicates that the network estimations of author perceived usefulness over a reviewer’s comment are acceptably accurate, since the error does not exceed one usefulness level in a scale of five. The second strategy, used to estimate network efficiency, uses three disjoint sets, namely the training, validation and test set as described earlier. In this case, the data regarding a single student as an author were used as the test set and the rest were used as the training and validation sets. This strategy was applied on each one of the 16 students. Since student rating criteria may vary, with some students being stricter than the others, the network outputs were used to determine the optimal reviewer order for each student. To this end, these outputs were sorted in descending order from the better matching reviewer to the least preferred one. Next, the estimated network order was compared against the actual preference order of each author. Table 2 presents the accuracy results in predicting the first one, two, three, four and five best reviewers that eight indicative students prefer. As one may observe, the proposed method achieved good results as far as student 1 is concerned. It was accurate at 87% of the times in finding the best reviewer for student 1, 83% in predicting the best 2 reviewers, 80% in finding the best 3 and 78% accurate in proposing the best 4 and 5 reviewers, according to the author’s profile. The best reviewers of Students 2, 3 and 8 were also accurately predicted, but at less accuracy rates. The proposed method was not as accurate in the case of Students 5 and 6 as far as the first two criteria are concerned, but increased its effectiveness in the rest of the criteria. The method was not successful in predicting best 1, 2 and 3 reviewers for Student 4 but presented satisfactory results in criteria 4 and 5. The method failed to predict the correct reviewers for Student 7, as it was correct only in 5% and 3% of the times in criteria 1 and 2 respectively and did not exceed 50% in the rest. The overall method results were 49%, 54%, 72%, 75% and 76% for the five criteria respectively. These preliminary results indicate that the selection for the best reviewers should be made using the estimated first 3, 4 and 5 reviewers.
122
I. Giannoukos et al.
Table 2 Indicative examples of the method accuracy Criteria Student no.
1
2
3
4
#1
87%
83%
80%
78%
5 78%
#2
64%
70%
75%
76%
78%
#3
31%
83%
76%
72%
69%
#4
51%
45%
37%
60%
60%
#5
29%
26%
93%
88%
84%
#6
21%
25%
68%
64%
64%
#7
5%
3%
49%
48%
49%
#8
72%
75%
75%
76%
74%
6 Discussion This study proposes a novel method that uses a popular machine learning technique to improve the efficiency of peer assessment procedures. The method attempts to match a reviewer to an author in order to maximize the estimated quality feedback that the author provides to the assigned reviewer in the form of the usefulness that the author finds in the reviewer comments. The preliminary results presented in the study were acquired by applying the proposed method on an elearning course regarding “Web Design” and seem promising. By improving the peer assessment procedure, it is expected that the group that evaluates itself will increase the quality of its work. In the case of the e-learning course, a student’s work or project is evaluated in terms of three criteria, functionality, technical soundness and design. Therefore, a student might have great taste, so, the projects he submits may have nice design, but the student might lack the knowledge of creating efficient source code. So, a reviewer who is familiar with the development of web page code would be the best choice for this author in order to help him become more efficient in web developer. The opposite example can also be observed, where a student has the ability to develop a bug-free web site, yet, he does not know how to improve the page aesthetically. Therefore, by matching each author to the best estimated reviewer, each author’s deficiencies should be amended through the personalized peer assessment procedure. Additionally, the proposed method is expected to facilitate the peer assessment supervisors to automatically and efficiently match author – reviewer pairs. Especially in large populations, matching a reviewer to an author is a process that requires a large amount of time from the supervisor. Additionally, sometimes it is difficult to estimate a good reviewer for an author. The proposed method could help reduce the shortcomings of peer assessment, speed up the process and help the population to get more reviews of high quality. The proposed method highly depends on the user profiles that are gathered and thus the presence of highly detailed data could enable the system to provide better reviewer-author matches. In the e-learning course case study, the profiles that
An Adaptive Mechanism for Author-Reviewer Matching in Online Peer Assessment
123
were used include the user proficiency, average strictness, average usefulness that the reviewer had received in the past -from the authors whose work he has commented- and the average grade an author has received in the peer assessment procedure. In order to increase the effectiveness of the proposed method, more data could be examined. These data could derive from student demographic characteristics, student engagement to the course and progress. Nevertheless, the preliminary results presented in this study seem promising, and can be considered a first step in facilitating the peer assessment procedure. Additionally, the size of the group might be an important factor for the production of accurate results. Firstly, a large training set can be available at a short period, therefore the method can learn from very different student cases. Secondly, applying the peer assessment procedure to a relatively small group might invalidate the anonymity that is essential for peer assessment to succeed. In this case, students get acquainted with each other and so the ratings they provide tend to increase. Finally, a large group, where the competition among its members is high, can provide more comprehensive reviews and in this way, the authors may be provided with more quality feedback. Furthermore, the proposed method can by applied on other fields, besides elearning courses. To this end, the only factor which should be changed in the gathering of the user profile data which are relevant to the specific process upon which the method needs to be applied. For instance, in the professional sector, the use of peer assessment has already helped towards providing better collaborative projects. Another sector that the proposed method could be applied on is the peer review that takes place on the research publications sector. In both cases, user profile characteristics might follow the general attribute descriptions of table 1. As far as machine learning is concerned, FFNNs are universal function approximators, that is, they can estimate any function that can be described analytically. However, they reduce their ability to produce accurate results when the training examples they are fed with contradict each other. In the dataset used in this study, there are examples where students rate their reviewers in an inconsistent manner. They sometimes assign a higher grade to a bad rather than a good review, especially at early course sections. This is related to the fact that at the initial stages of the course students are not familiarized with the peer assessment procedure. Later on, they start to provide both better reviews and more reliable quality feedback. Therefore, training the students to participate in peer assessment seems a very important factor of the procedure. Trained neural networks can quickly make predictions on unseen data. This characteristic, along with their high degree of accuracy, makes them suitable for applications where training needs to be made sporadically but predictions should be made in real-time. However, their training requires a certain amount of time to complete. Moreover, FFNN training might fail, therefore multiple training sessions might be needed for the network to be efficient, a fact that can further increase the time required for the FFNN training to finish. Future work includes testing the method on a larger dataset. A large dataset can provide more indicative examples to the network training procedure. Additionally,
124
I. Giannoukos et al.
the dataset should be firstly preprocessed or transformed before its use, in order to alleviate the social factors that may have influenced the review process. More machine learning techniques can also be tested on the task of matching the optimal reviewers to authors based on user profiling. Another issue which should also be investigated in the future, is whether the proposed method actually benefits authors into increasing their performance and quality of final submitted work or not. Finally, the quality feedback, in the form of the usefulness attribute, should be reexamined in the future and the use of a fully objective and unbiased metric may also be considered.
7 Conclusion This study proposes a method that uses a popular form of machine learning, feed forward neural networks, to determine the optimal reviewers for a specific author, during a peer assessment procedure. The proposed method matches reviewer to author profiles and aims at assigning the work of each author to the reviewer that will make the most useful comments. Preliminary experimental results on educational e-learning data indicate that the method yields promising results, as the use of neural networks in estimating the optimal 3 to 5 peer reviewers achieved over 72% accuracy. The method may be applied on various types of peer assessment procedures in web-based environments, besides e-learning, where past data regarding the users involved in the process are available.
References [1] Andrews, R., Diederich, J., Tickle, A.B.: Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowledge-Based Systems 8, 373–389 (1995) [2] Barak, M., Rafaeli, S.: On-line question-posing and peer-assessment as means for web-based knowledge sharing in learning. International J. Human Computer Studies 61, 84–103 (2004) [3] Berg van den, I., Admiraal, W., Pilot, A.: Design Principles and Outcomes of Peer Assessment. Stud. in High Education 31, 341–356 (2006) [4] Chang, T., Chen, Y.: Cooperative learning in E-learning: A peer assessment of student-centered using consistent fuzzy preference. Expert Systems Appl. 36, 8342– 8349 (2009) [5] Chen, Y.C., Tsai, C.C.: An educational research course facilitated by online peer assessment. Innovations Education Teach International 46, 105–117 (2009) [6] Clark, T., Wright, M.: Reviewing Journal Rankings and Revisiting Peer Reviews: Editorial Perspectives. J. Management Studies 44, 612–621 (2007) [7] Crespo, R.M., Pardo, A., Pérez, J.P.S., Kloos, C.D.: An Algorithm for Peer Review Matching Using Student Profiles Based on Fuzzy Classification and Genetic Algorithms. In: Ali, M., Esposito, F. (eds.) IEA/AIE 2005. LNCS (LNAI), vol. 3533, pp. 685–694. Springer, Heidelberg (2005)
An Adaptive Mechanism for Author-Reviewer Matching in Online Peer Assessment
125
[8] Crespo, R.M., Pardo, A., Kloos, C.D.: An adaptive strategy for peer review. In: Frontiers in Education, Savannah, ASEE/IEEE (2004) [9] Cybenko, G.: Approximation by superpositions of a sigmoidal function. Mathematics Control Signals Syst. 2, 303–314 (1989) [10] Dannefer, E.F., Henson, L.C., Bierer, S.B., et al.: Peer assessment of professional competence. Med. Educ. 39, 713–722 (2005) [11] Funahashi, K.I.: On the approximate realization of continuous mappings by neural networks. Neural Networks 2, 183–192 (1989) [12] Grainger, D.W.: Peer review as professional responsibility: A quality control system only as good as the participants. Biomaterials 28, 5199–5203 (2007) [13] Hagan, M.T., Menhaj, M.B.: Training feedforward networks with the Marquardt algorithm. IEEE Trans. Neural Networks 5, 989–993 (1994) [14] Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall, Englewood Cliffs (1999) [15] Hornik, K.: Some new results on neural network approximation. Neural Networks 6, 1069–1072 (1993) [16] Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Networks 4, 251–257 (1991) [17] Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Networks 2, 359–366 (1989) [18] Iyer, M.S., Rhinehart, R.R.: A method to determine the required number of neuralnetwork training repetitions. IEEE T Neural Networ. 10, 427–432 (1999) [19] Keely, E., Myers, K., Dojieiji, S., et al.: Peer assessment of outpatient consultation letters – feasibility and satisfaction. BMC Med. Education 22, 7–13 (2007) [20] Ljungman, A.G., Silen, C.: Examination Involving Students as Peer Examiners. Assessment & Evaluation in Higher Education 33, 289–300 (2008) [21] Matlab, Matlab Environment (2008), http://www.mathworks.com/products/matlab/ [22] Medialab, E-Learning Services, Multimedia Technology Laboratory, National Technological University of Athens (2008), http://elearn.medialab.ntua.gr [23] Moodle, Moodle LMS (2008), http://moodle.org [24] Rowland, F.: The Peer Review Process. Learned Publishing 15, 247–258 (2002) [25] Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Parallel distributed processing: Explorations in the micro-structure of cognition 1, 318–362 (1986a) [26] Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by backpropagating errors. Nature 323, 533–536 (1986b) [27] Schroter, S., Tite, L., Hutchings, A., Black, N.: Differences in Review Quality and Recommendations for Publication Between Peer Reviewers Suggested by Authors or by Editors. The Journal of the American Medical Association 295, 314–317 (2006) [28] Sluijsmans, D.M.A., Prins, F.: A conceptual framework for integrating peer assessment in teacher education. Studies in Educational Evaluation 32, 6–22 (2006) [29] Sluijsmans, D.M.A., Brand-Gruwel, S., Merrienboer, J.J.G.: Peer assessment training in teacher education: effects on performance and perceptions. Assessment and Evaluation in Higher Education 27, 443–454 (2002) [30] Prins, F.J., Sluijsmans, D.M.A., Kirschner, P.A., Strijbos, J.W.: Formative peer assessment in a CSCL environment: A case study. Assessment and Evaluation in Higher Education 30, 417–444 (2002)
126
I. Giannoukos et al.
[31] Thrun, S.B.: Extracting provably correct rules from artificial neural networks, Technical Report, UMI Order Number: IAI-TR-93-5, University of Bonn, Germany (1994) [32] Topping, K.J.: Peer Assessment. Theory into Practice 48, 20–27 (2009) [33] Topping, K.: Peer assessment between students in colleges and universities. Review of Educational Research 68, 249–276 (1998) [34] Tsai, C.C., Lin, S.S.J., Yuan, S.M.: Developing science activities through a networked peer assessment system. Computers & Education 38, 241–252 (2002) [35] Wen, M.L., Tsai, C.-C.: University students’ perceptions of and attitudes toward (online) peer Assessment. Higher Education 51, 27–44 (2006) [36] Yue, W., Wilson, C.S., Boller, F.: Peer assessment of journal quality in clinical neurology. Journal of the Medical Library Association 95, 70–76 (2007)
Towards Emotion Recognition from Speech: Definition, Problems and the Materials of Research Christos-Nikolaos Anagnostopoulos and Theodoros Iliou
a
Abstract. One hundred thirty three (133) sound/speech features extracted from Pitch, Mel Frequency Cepstral Coefficients, Energy and Formants were evaluated in order to create a feature set sufficient to discriminate between seven emotions in acted speech. After the appropriate feature selection, Multilayered Perceptrons were trained for emotion recognition on the basis of a 23-input vector, which provide information about the prosody of the speaker over the entire sentence. Several experiments were performed and the results are presented analytically. Extra emphasis was given to assess the proposed 23-input vector in a speaker independent framework where speakers are not “known” to the classifier. The proposed feature vector achieved promising results (51%) for speaker independent recognition in seven emotion classes. Moreover, considering the problem of classifying high and low arousal emotions, our classifier reaches 86.8% successful recognition. The second classification model incorporated Support Vector Machine with 35 predictive variables. The latter feature vector achieved promising results (78%) for speaker independent recognition in seven emotion classes. Moreover, considering the problem of classifying high and low arousal emotions, our classifier reaches 100 % successful recognition for high arousal and 87% for low arousal emotions. Beside the combination of speech processing and artificial intelligence techniques, new approaches incorporating linguistic semantics could play a critical role to help computers understand human emotions better. Keywords: Emotion recognition, speech processing, neural networks.
1 Introduction Communication is an important capability, not only based on the linguistic part but also based on the emotional part. In the field of human-computer interaction (HCI), emotion recognition from the computer is still a challenging issue, especially when the recognition is based solely on voice, which is the basic mean of human communication. In human-computer interaction systems, emotion recognition could Christos-Nikolaos Anagnostopoulos and Theodoros Iliou Cultural Technology and Communication Department University of the Aegean Mytilene, Lesvos Island, GR-81100 e-mail: {canag,th.iliou}@ct.aegean.gr M. Wallace et al. (eds.): Semantics in Adaptive and Personalized Services, SCI 279, pp. 127–143. springerlink.com © Springer-Verlag Berlin Heidelberg 2010
128
C.-N. Anagnostopoulos and T. Iliou
provide users with improved personalization services by being adaptive to their emotions. Therefore, emotion detection from speech could have many potential applications in order to make the computer more adaptive to the user’s needs. The most expressive way humans display emotions is through facial expressions and speech characteristics. Recently, the information provided by cameras and microphones enable the computer to “see” and “hear” the user though advanced image and sound processing techniques in systems similar to the one presented in Figure 1. Therefore, one of the skills that computer potentially can develop, is the ability to understand the emotional state of the person. Feedback from the user has traditionally been through the keyboard, mouse or through specialized interfaces, such as data gloves, touch screens and biosensors. A possible automated human affect analyzer should include all human interactive modalities (sight, sound and even aptics) and moreover it could be able to analyze nonverbal interactive signals as well (facial expressions, body gestures, and physiological reactions). Another possibility is to include also modules for linguistic processing of the speech. Generally, speech carries linguistic information (i.e. words) that can be somehow associated with emotions (e.g. the word “happy” is correlated to a happy person), along with paralinguistic information which is extracted by speech processing methods. Linguistic information identifies qualitative patterns that the speaker has articulated, while paralinguistic information is usually measured by quantitative features describing variations in the way that the linguistic patterns (i.e words or phrases) are pronounced. These latter includes variations in pitch and intensity without linguistic information and voice quality and are related to spectral properties that cannot be correlated to word identity. A disadvantage in linguistic information relates to the cross cultural diversities of nations. An extremely interesting research is reported in [1]. According to Wierzbicka, bilingual people know well that when they try to describe the same experience in their two different languages they are often forced to present it differently in each, because emotion words in the two languages may not match. For this reason, she proposes that focus should be given on how the use of a methodology developed in linguistic semantics known as NSM (Natural Semantic Metalanguage) can help us to understand human emotions better. This could be applied especially in emotions of people from different cultures, but also those of people from our own west-cultural sphere. Therefore, one can easily identify the significant role of semantics in linguistic emotion recognition. According to Wierzbicka, εmotion terms are always language- and culturespecific and therefore carry with them a particular linguistic and cultural slant. By contrast, cognitive scenarios formulated in simple and universal human concepts can be free of any such slant and therefore can be closer to the reality of emotional experience. Therefore, the use of NSM would allows someone to compare emotion concepts across languages and cultures, and thus to elucidate both cultural differences and transcultural similarities. The use of NSM makes it possible to study human emotions from a genuinely cross-linguistic and cross-cultural, as well as a psychological, perspective and thus opens up new possibilities for the scientific understanding of subjectivity and psychological experience. However, multi-modality architectures (i.e. including speech, image processing and body sensors) would affect significantly the user-friendliness of an emotion
Towards Emotion Recognition from Speech
129
recognition system. As a result, the research in the literature is directed towards to visual interpretation of facial gestures and voice processing as well. The former carries information concerning the facial expressions, while the latter provides useful data related to vocal intonations and characteristics. These two channels (i.e. visual and auditory) are considered as the most important in the human recognition of affective feedback [33]. Sound processing
Speech content Voice hue Microphone Camera
Head positions Gestures Image processing
Dialogue/feedback Fig. 1 Increased user personalization through an emotion recognition software system with two channels.
Relatively few of the existing works combine different modalities into a single system for human affective state analysis. Examples are the works of Chen et al. [4], [5], De Silva and Ng [6], and Yoshitomi et al. [7], who investigated the effects of a combined detection of facial and vocal expressions of affective states. Almost all other existing studies investigate various human affective states separately in a singlemodal analysis framework. Moving to the same direction, in this research, we deal with a single-modality system based on non linguistic speech processing module.
2 Related Work 2.1 Basic Emotions Proponents of discrete emotion theories, inspired by Darwin, have suggested different numbers of so-called basic emotions [8], [9], [10], [11], [12], [13]. Most of these are emotions that play an important role in adapting to frequently occurring and prototypically patterned types of significant events in our life, such as anger,
130
C.-N. Anagnostopoulos and T. Iliou
fear, joy, and sadness, which are relatively frequently experienced. However, the list of emotion does not end here as other emotions are also evident in our life such as anxiety, boredom and neutral just to name a few of them. Scherer [14] proposed the following “working definition of emotion” for which there is increasing consensus in the literature. Emotions are episodes of coordinated changes in several components (including at least neurophysiological activation, motor expression, and subjective feeling but possibly also action tendencies and cognitive processes) in response to external or internal events of major significance to the organism. According to the definition mentioned above, social science scholars propose various representations of the human basic emotions. Adopting a theoretically based approach, Fontaine et al. [14] has shown that four dimensions are needed to satisfactorily represent similarities and differences in the meaning of emotions. In order of importance, these four dimensions (or axes in the emotion space) are evaluation-pleasantness, potency-valence, activation-arousal, and unpredictability. From this 4-dimensional space, the research community focuses mainly in the 2-D space of valence and arousal as shown in Figure 2. According to this two-dimensional view of emotions, large amounts of variation in emotions can be located in a two-dimensional space, with coordinates of valence and arousal [14]. The valence dimension refers to the hedonic quality of an affective experience and ranges from unpleasant to pleasant. The arousal dimension refers to the perception of arousal associated with the experience, and ranges from very calm to very excited at the other. For the identification of emotional expressions using a computer, the basic set of emotion includes joy, anger, disgust, fear, sadness, boredom and neutral. Figure 2 demonstrates this set of seven emotion classes that can also be well separated into two hyper classes, namely high arousal containing anger, happiness, anxiety/fear and low arousal containing neutral, boredom, disgust and sadness. The classification of disgust High arousal
Joy: Excitement
Anger
Anxiety/ fear Neutral Negative Disgust valence
Positive valence
Boredom Sadness Low arousal
Fig. 2 Emotions of Berlin Database according to valence and arousal.
Towards Emotion Recognition from Speech
131
into low arousal can be challenged, but according to the literature disgust belongs to low arousal emotions [32]. Table 1 highlights the effect of 5 emotions in well known speech parameters as reported in [35]. Table 1 Emotions and Speech Parameters as appear in [35]. Anger Rate Pitch Average Pitch Range Intensity Voice Quality Pitch Changes Articulation
Happiness
Sadness
Fear
Disgust
Slightly faster
Faster or slower
Slightly slower
Much faster
Very much faster
Very much higher
Much higher
Slightly lower
Very much higher
Very much lower
Much higher
Much wider
Higher Breathy, chest
Higher Breathy, blaring tonic
Much wider Normal Irregular voicing
Abrupt on stressed
Smooth, upward inflections
Downward inflections
Normal
Slightly wider Lower Grumble chest tone Wide, downward terminal inflects
Tense
Normal
Slurring
Precise
Slightly narrower Lower Resonant
Normal
2.2 Databases in Emotion Research A comprehensive survey of the available emotional speech databases is given in [15]. Reading this survey, it is concluded that automated emotion recognition on these databases cannot achieve a correct classification that exceeds 50% for the four basic emotions. Moreover, the authors in [15] underline that natural (spontaneous) emotions cannot be easily classified as simulated ones (acted) can be. Another important finding in their survey is that the most common emotions that are investigated are anger, sadness, happiness, fear, disgust, joy, surprise, and boredom (see Table 2). Table 2 Emotions recorded in the databases surveyed in [15]. Emotions Occurrences in databases Anger 26 Sadness 22 Hapiness 13 Fear 13 Disgust 10 Joy 9 Surprise 6 Boredom 5 Stress 3 Contempt 2 Dissatisfaction 2 Shame, pride, worry, startle, elation, despair, humour 1
132
C.-N. Anagnostopoulos and T. Iliou
Since even a human cannot classify easily natural emotions, it is difficult to expect that machines can offer a higher correct classification. Therefore, for the shake of simplicity, the majority of the databases include acted emotional speech, which is sometimes exaggerated. Professional actors, drama students or normal people are used as actors for the creation of these emotional utterances. Table 3 indicates the types of speech emotion grouped in two classes (acted and spontaneous) and their frequency of occurrence as reported in [15]. Our research was conducted using the Berlin Emotional Database (EMO-DB) [34]. In Berlin Emotional Database, ten German sentences have been acted in the above seven emotions by ten professional actors, five of them female. The database contains 535 phrases representing all the possible emotional instances. In our experiments, always whole utterances were analysed. Table 4 depicts the speaker codes, the utterance codes and the emotions that were acted by the actors. Berlin Emotional Database was selected since it is the most complete and rich speech recordings database, which is freely available to the scientific community. Table 3 Acted and spontaneous speech occurrences in the databases surveyed in [15]. Type of emotion Acted Spontaneous 50% spontaneous speech/ 50% acted speech Semi-spontaneous
Occurrences 21 8 2 1
Table 4 Speaker codes, Utterances and Emotions in Berlin Database. Speaker code (gender)
Utterance code/context
Emotion
03 (male)
a01: Der Lappen liegt auf dem Eisschrank.
W (anger)
08 (female)
a02: Das will sie am Mittwoch abgeben.
L (boredom)
09 (female)
a04: Heute abend könnte ich es ihm sagen.
E (disgust)
10 (male)
a05: Das schwarze Stück Papier befindet sich da oben neben dem Holzstück.
A (anxiety /fear)
11 (male)
a07: In sieben Stunden wird es soweit sein.
F (happiness)
12 (male)
b01: Was sind denn das für Tüten, die da unter dem Tisch stehen?
T (sadness)
13 (female)
b02 Sie haben es gerade hochgetragen und jetzt gehen sie wieder runter.
N (neutral)
14 (female)
b03: An den Wochenenden bin ich jetzt immer nach Hause gefahren und habe Agnes besucht.
15 (male)
b09: Ich will das eben wegbringen und dann mit Karl was trinken gehen.
16 (female)
b10: Die wird auf dem Platz sein, wo wir sie immer hinlegen.
Towards Emotion Recognition from Speech
133
3 Sound/Speech Features in Our Experiments Many diverse acoustic low level and high-level features have been tested and assessed in the literature considering their performance. The fundamental frequency (F0), often referred to as the pitch, is one of the most important features for determining emotion in speech [16], [17] [18], [19]. Bäzinger et al. argued that statistics related to pitch conveys considerable information about emotional status [20]. However, pitch was also shown to be most gender-dependent feature [21]. If the recognition system ignores this issue a misclassification of utterances might be the consequence. It should be noted, that most of the features that will be described below are gender-dependent to varying degrees. Beside pitch, other commonly employed features are related to energy, speaking rate, formants as well as spectral features such as mel-frequency cepstral coefficients (MFCCs). Wang & Guan [22] and [23] used prosodic, Mel- Frequency Cepstral Coefficient (MFCC) and formant frequency features to represent the characteristics of the emotional speech while the facial expressions were represented by Gabor wavelet features. Accordind to Kostoulas et al. [24] an individual’s emotional state is strongly related to pitch and energy while pitch and energy of a speech signal expressing happiness or anger is, usually, higher than those associated with sadness. Mel Frequency Cepstrtal Coefficients have been widely used for speech spectral representation in numerous applications, including speech, speaker, gender and emotion recognition. They are also increasingly finding uses in music information retrieval applications such as genre classification and audio similarity measures [25]. In this paper, pitch, energy, MFCCs and Formants were extracted from the speech waveform using Praat [26]. Using a frame length of 100ms, the pitch for each frame was calculated and placed in a vector to correspond to that frame. If the speech is unvoiced the corresponding marker in the pitch vector was set to zero. In addition., for each 5ms frame of speech, the first four standard MFCC parameters were calculated by taking the absolute value of the STFT, warping it to a Mel-frequency scale, taking the DCT of the log-Mel spectrum and returning the first 4 components. Energy, often referred to as the volume or intensity of the speech, is also known to contain valuable information. Energy provides information that can be used to differentiate sets of emotions, but this measurement alone is not sufficient to differentiate basic emotions. In the work presented in [27], Scherer concludes that fear, joy, and anger have increased energy level, whereas sadness has low energy level. The choice of the window in short-time speech processing determines the nature of the measurement representation. A long window w would result in very little changes of the measurement in time whereas the measurement with a short window would not be sufficiently smooth. The energy frame size should be long enough to smooth the contour appropriately but short enough to retain the fast energy changes which are common in speech signals and it is suggested that a frame size of 10–20 ms would be adequate. Two representative windows are widely used, Rectangular and Hamming. The latter has almost twice the bandwidth of the former, for the same length. Furthermore, the attenuation for the Hamming window outside the passband is much
134
C.-N. Anagnostopoulos and T. Iliou
greater. Short-Time energy is a simple short-time speech measurement. It is defined as:
E n = ∑ [ x( m) ⋅ w(n − m)] 2 where m is the overlapping length of the original signal x and Hamming windowed signal w with length n. For the length of the window a practical choice is 160-320 samples (sample for each 10-20 msec) for sampling frequency 16kHz. For our experiments the Hamming window was used, taking samples every 20msecs. The resonant frequencies produced in the vocal tract are referred to as formant frequencies or formants [28]. Although some studies in automatic recognition have looked at the first two formant frequencies (F1 and F2) [29], [30], the formants have not been extensively researched. Scherer [27] refers some observations concerning the formant frequencies along with several emotion classes. For happiness, the mean value of Formant 1 (F1) is decreased while the F1 range is increased. For anger, fear, and sadness, the F1 mean is increased while the F1 bandwidth is decreased. F2 mean is decreased for sadness, anger, fear, disgust. In our experiments, the first five formant frequencies will evaluated. Based on the acoustic features described above and the literature relating to automatic emotion detection from speech, 133 features are calculated based on four prosodic groups which are represented as contours: the pitch, the 12 MFCCs, the energy, and the first 5 formant frequencies. From these 19 contours, we extracted seven statistics: the mean, the standard deviation, the minimum value, the maximum value, the range (max-min) of the original contour and the mean and standard deviation of the contour gradient. All the 133 measurements are shown in Table 5. Table 5 The 133 sound features. Shaded cells indicate the selected features Prosodic group
Prosodic Feature
Mean
Std
Mean of derivative
1
Pitch MFCC1 MFCC2 MFCC3 MFCC4 MFCC5 MFCC6 MFCC7 MFCC8 MFCC9 MFCC10 MFCC11 MFCC12 Energy F1 F2 F3 F4 F5
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127
2 9 16 23 30 37 44 51 58 65 72 79 86 93 100 107 114 121 128
3 10 17 24 31 38 45 52 59 66 73 80 87 94 101 108 115 122 129
2
3
4
Std of derivative 4 11 18 25 32 39 46 53 60 67 74 81 88 95 102 109 116 123 130
Max
Min
Range
5 12 19 26 33 40 47 54 61 68 75 82 89 96 103 110 117 124 131
6 13 20 27 34 41 48 55 62 69 76 83 90 97 104 111 118 125 132
7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133
Towards Emotion Recognition from Speech
135
3.1 Sound Feature Selection In order to select the most important prosodic features and optimise the classification time, a subset evaluator was used. Subset evaluators take a subset of features and return a number which measure a quality of the subset and guides the further search. For the selection of the method, the WEKA data mining tool was used [32]. WEKA is a data mining workbench that allows comparison between many different machine learning algorithms. Moreover, WEKA offers many feature selection and feature ranking methods, where each method is a combination of feature search and evaluator of currently selected features. Several combinations have been tested in order to assess the feature selection combination that gives the optimum performance for our problem. The feature evaluator and search method (offered in WEKA) that presented the best performance in the data set were CfsSubSetEval and BestFirst. The Correlation-based Feature Selection Sub Set Evaluator (CfsSUbsetEval) assesses the predictive ability of each feature individually and the degree of redundancy among them. It prefers sets of features that are highly correlated with the class but are not correlated with other features. An option iteratively adds attributes that have the highest correlation with the class, provided that the set does not already contain an attribute whose correlation with the attribute in question is even higher. Best First feature search method searches the space of attribute subsets using the greedy hill-climbing approach and backtracking. Setting the number of consecutive non-improving nodes allowed controls the level of backtracking done. Best first may start with the empty set of attributes and search forward, or start with the full set of attributes and search backward, or start at any point and search in both directions (by considering all possible single attribute additions and deletions at a given point). The combination of the above mentioned methods proposed 23 from the total of 133 features that were originally extracted. The shaded cells in Table 4 indicate the selected features. It can be seen, that from the first prosodic group (pitch), two features have been selected, namely the mean and min pitch. In addition, 16 features related to Mel Frequency Cepstral Coefficients were found important, while for the third prosodic group (energy) four features were proposed. Finally, only one formant feature (mean value of F1) was selected.
4 Classification The first classification was performed using WEKA. The first classifier was an Artificial Neural Network following the multi-layer perceptron architecture. After experimentation with various network topologies, highest accuracy was found using one hidden layer with as many neurons as the sum of inputs (23 features) and outputs (7 emotions). Therefore, the topology was always 23-30-7. The early stopping criterion was used based on a validation set consisting of 10% of the training set in the experiments and the number of training epochs was selected to be 200. This ensures that the training process stops when the meansquared error (MSE)
136
C.-N. Anagnostopoulos and T. Iliou
begins to increase on the validation set avoiding the over-fitting problem in this problem. The learning and momentum rate were left to the default setting of WEKA (0.3 and 0.2 respectively). Error backpropagation was used as a training algorithm. Moreover, all neurons follow the sigmoid activation function, while all attributes have been normalized for improved performance of the network. The second classification was performed using DTREG [2]. The classifier was Support Vector Machine. A Support Vector Machine (SVM) performs classification by constructing an N-dimensional hyperplane that optimally separates the data into two categories. SVM models are closely related to neural networks. In fact, a SVM model using a sigmoid kernel function is equivalent to a two-layer, feed-forward neural network. Support Vector Machine (SVM) models are a close cousin to classical neural networks. Using a kernel function, SVMs are an alternative training method for polynomial, radial basis function and multi-layer perceptron classifiers in which the weights of the network are found by solving a quadratic programming problem with linear constraints, rather than by solving a nonconvex, unconstrained minimization problem. In the parlance of SVM literature, a predictor variable is called an attribute, and a transformed attribute that is used to define the hyperplane is called a feature. The task of choosing the most suitable representation is known as feature selection. A set of features that describes one case (i.e., a row of predictor values) is called a vector. So the goal of SVM modeling is to find the optimal hyperplane that separates clusters of vector in such a way that cases with one category of the target variable are on one side of the plane and cases with the other category are on the other size of the plane. The vectors near the hyperplane are the support vectors. After several experiments, the highest accuracy was found with 35 predictor variables, using Radial Basis Function as the SVM kernel function, while the type of SVM model that was C-SVC.
4.1 Speaker Independent Recognition in Berlin Database Speaker independent emotion recognition in Berlin database with Artificial Neural Network was evaluated averaging the results of five separate experiments. In each experiment, the measurements of a pair of speakers (e.g. speaker 03 and speaker 08), were extracted from the training set and formed the testing set for the classifier. The pairs were selected in order to include one male and one female speaker each time. The training and testing sets for the five experiments are shown in Table 6. Table 6a-6e represents the confusion matrices for the 5 experiments. Judging from the main diagonal of the confusion matrix of Table 7, the MLP performance in the 7 class recognition problem does not reach high accuracy. Overall, we are witnessing approximately 51% correct classification in the seven emotions. The 23-feature vector seems that it is not sufficient enough to distinguish the 7 emotions accurately. On the other hand, observing the results in the two hyper-classes (low and high arousal), the recognition rate reach 88.8% for high arousal and 84.8% for low arousal emotions (see Table 8).
Towards Emotion Recognition from Speech
137
Table 6 Testing and Training set for our experiments Experiment
Testing set
Training set
1
10,11,12,15 (male), 09,13,14,16 (female)
03 (male), 08 (female)
2
03,11,12,15 (male), 08,13,14,16 (female)
10 (male), 09 (female)
3
03,10,12,15 (male), 08,09,14,16 (female)
11 (male), 13 (female)
4
03,10,11,15 (male), 08,09,13,16 (female)
12 (male), 14 (female)
5
03,10,11,12 (male), 08,09,13,14 (female)
15 (male), 16 (female)
Table 6a Experiment 1: evaluation in speakers 03 and 08. High arousal emotions Anger
Happiness
5 (19.2%) 0 (0.0%)
anxiety /fear
0 (0.0%)
Boredom
0 (0.0%)
Disgust
0 (0.0%)
Sadness
0 (0.0%)
Neutral
0 (0.0%)
Anger
happiness 20 (76.9%) 15 (83.3%) 5 (50.0%) 0 (0.0%) 1 (100.0%) 2 (12.5%) 0 (0.0%)
anxiety/ fear 1 (3.8%) 0 (0.0%) 1 (10.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
Low arousal emotions boredom
disgust
sadness
neutral
0 0 (0.0%) (0.0%) 0 3 (0.0%) (16.7%) 4 0 (40.0%) (0.0%) 12 3 (80.0%) (20.0%) 0 0 (0.0%) (0.0%) 7 2 (43.8%) (12.5%) 11 0 (52.4%) (0.0%)
0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 5 (31.3%) 0 (0.0%)
0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 10 (47.6%)
Table 6b Experiment 2: evaluation in speakers 10 and 09. High arousal emotions anger
Low arousal emotions
happiness anxiety/ fear boredom disgust sadness neutral 0 2 (8.7%) 0 0 Anger 18 (78.3%) 1 (4.3%) 2 (8.7%) (0.0%) (0.0%) (0.0%) 0 0 0 Happiness 2 (25.0%) 2 (25.0%) 3 (37.5%) 1 (12.5%) (0.0%) (0.0%) (0.0%) 0 (0.0%) 0 0 0 0 8 (88.9%) 1 (11.1%) anxiety /fear (0.0%) (0.0%) (0.0%) (0.0%) 0 (0.0%) 2 (16.7%) 0 (0.0%) 0 0 0 Boredom 10 (0.0%) (0.0%) (0.0%) (83.3%) 0 (0.0%) 7 (77.8%) 0 (0.0%) 0 0 0 Disgust 2 (0.0%) (0.0%) (0.0%) (22.2%) 0 (0.0%) 0 (0.0%) 2 (28.6%) 0 0 3 (42.9%) 2 (28.6%) Sadness (0.0%) (0.0%) 0 (0.0%) 6 (46.2%) 0 2 (15.4%) 4 (30.8%) 0 Neutral 1 (7.7%) (0.0%) (0.0%)
138
C.-N. Anagnostopoulos and T. Iliou
Table 6c Experiment 3: evaluation in speakers 11 and 13. High arousal emotions Anger Happiness anxiety /fear Boredom Disgust Sadness Neutral
anger happiness anxiety/ fear 1 (4.5%) 14 0 (63.6%) (0.0%) 2 (11.1%) 6 2 (33.3%) (11.1%) 13 1 (5.9%) 0 (76.5%) (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 2 0 (20.0%) (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%)
Low arousal emotions boredom 0 (0.0%) 0 (0.0%) 1 (5.9%) 7 (38.9%) 1 (10.0%) 0 (0.0%) 1 (5.6%)
disgust sadness 6 0 (27.3%) (0.0%) 6 0 (33.3%) (0.0%) 1 1 (5.9%) (5.9%) 0 1 (0.0%) (5.6%) 7 0 (70.0%) (0.0%) 0 9 (0.0%) (75.0%) 0 1 (0.0%) (5.6%)
neutral 1 (4.5%) 2 (11.1%) 0 (0.0%) 10 (55.6%) 0 (0.0%) 3 (25.0%) 16 (88.9%)
Table 6d Experiment 4: evaluation in speakers 12 and 14. High arousal emotions Anger Happiness anxiety /fear Boredom Disgust Sadness Neutral
anger happiness anxiety/ fear 14 2 12 (50.0%) (7.1%) (42.9%) 0 (0.0%) 5 4 (50.0%) (40.0%) 9 (50.0%) 8 1 (44.4%) (5.6%) 0 (0.0%) 1 (7.7%) 0 (0.0%) 0 (0.0%) 3 0 (30.0%) (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 2 (18.2%)
Low arousal emotions boredom 0 (0.0%) 0 (0.0%) 0 (0.0%) 5 (38.5%) 1 (10.0%) 2 (14.3%) 0 (0.0%)
disgust 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 4 (40.0%) 0 (0.0%) 0 (0.0%)
sadness 0 (0.0%) 0 (0.0%) 0 (0.0%) 6 (46.2%) 0 (0.0%) 10 (71.4%) 6 (54.5%)
neutral 0 (0.0%) 1 (10.0%) 0 (0.0%) 1 (7.7%) 2 (20.0%) 2 (14.3%) 3 (27.3%)
Table 6e Experiment 5: evaluation in speakers 15 and 16. High arousal emotions Anger Happiness anxiety /fear Boredom Disgust Sadness Neutral
Low arousal emotions
anger happiness anxiety/ fear boredom disgust 2 (7.4%) 0 0 25 0 (0.0%) (0.0%) (92.6%) (0.0%) 1 (5.9%) 0 0 9 7 (0.0%) (0.0%) (52.9%) (41.2%) 8 (53.3%) 0 0 7 0 (0.0%) (0.0%) (46.7%) (0.0%) 12 (52.2%) 0 (0.0%) 0 9 0 (0.0%) (39.1%) (0.0%) 1 (6.3%) 2 (12.5%) 8 2 3 (50.0%) (12.5%) (18.8%) 1 (7.7%) 0 (0.0%) 0 0 1 (0.0%) (0.0%) (7.7%) 3 (17.6%) 0 (0.0%) 0 2 0 (0.0%) (11.8%) (0.0%)
sadness 0 (0.0%)
neutral 0 (0.0%)
0 (0.0%)
0 (0.0%)
0 (0.0%)
0 (0.0%)
2 (8.7%)
0 (0.0%)
0 (0.0%)
0 (0.0%)
10 (76.9%)
1 (7.7%)
1 (5.9%)
11 (64.7%)
Towards Emotion Recognition from Speech
139
Table 7 Overall performance after the execution of the 5 experiments in the 7 emotion classification framework. High arousal emotions anger Anger
76 (60.3%)
23 (18.3%)
Happiness
22 (31.0%)
30 (42.3%)
anxiety /fear 28 (40.6%) 6 (8.7%) Boredom
1 (1.2%)
Low arousal emotions
happiness anxiety/ fear boredom disgust
2 (2.5%)
18 (14.3%) 0 (0.0%) 6 (8.5%)
1 (1.4%)
27 (39.1%) 6 (8.7%)
sadness
neutral
0 (0.0%)
1 (0.8%)
0 (0.0%) 9 (12.7%)
3 (4.2%)
1 (1.4%)
0 (0.0%)
8 (6.3%)
1 (1.4%)
12 (14.8%)
9 (11.1%) 43 3 (53.1%) (3.7%)
11 (13.6%)
Disgust
6 (13.0%) 9 (19.6%)
2 (4.3%)
0 (0.0%) 11 16 (23.9%) (34.8%)
2 (4.3%)
Sadness
0 (0.0%)
2 (3.2%)
1 (1.6%)
37 (59.7%) 11 3 (17.7%) (4.8%)
8 (12.9%)
neutral
1 (1.3%)
2 (2.5%)
3 (3.8%)
10 (12.5%) 20 0 (25.0%) (0.0%)
44 (55.0%)
Table 8 Overall performance after the execution of the 5 experiments in the 2 hyper-class classification framework.
High arousal emotions
High arousal emotions 236 (88.8%)
Low arousal emotions 30 (11.3%) 228 (84.8%)
41 (15.2%)
Low arousal emotions
Table 9 Overall performance after the execution of the 25% Random Sampling Validation Method by SVM. High arousal emotions anger
Low arousal emotions sadness
neutral
Anger
84.38%
6.25%
9.37%
0.0%
0.0%
0.0%
0.0%
Happiness
5.55%
88.9%
5.55%
0.0%
0.0%
0.0%
0.0%
anxiety /fear
5.6%
0.0%
94.4%
0.0%
0.0%
0.0%
0.0%
5%
5%
10%
55%
15%
0.0%
10%
Disgust
0.0%
27.27%
0.0%
0 (0.0%)
0.0%
Sadness
0.0%
0.0%
0.0%
0.0%
0.0%
80%
20%
neutral
5%
0.0%
0.0%
10%
0 (0.0%)
5%
80%
Boredom
happiness anxiety/ fear boredom disgust
18.18% 54.55%
140
C.-N. Anagnostopoulos and T. Iliou
Table 10 Overall performance after the execution of the the 25% Random Sampling Validation Method by svm in the 2 hyper-class classification framework.
High arousal emotions Low arousal emotions
High arousal emotions 100% 13%
Low arousal emotions 0.0% 87%
For Speaker independent emotion recognition with SVM in Berlin database, Random Sampling 25% Validation Method was used. Therefore, DTREG selected a random set of data rows (134 rows from 535) and held them out of the model building process. These rows were executed through the generated model and the misclassification error rate was reported. Overall, we are witnessing approximately 78% correct classification in the seven emotions, as presented in Table 9. The 35-feature vector seems that it is sufficient enough to distinguish the 7 emotions accurately. On the other hand, observing the results in the two hyper-classes (low and high arousal), the recognition rate reach 100% for high arousal and 87% for low arousal emotions as shown in Table 10.
5 Conclusions – Discussion In the field of human-computer interaction (HCI), emotion recognition from the computer is still a challenging issue, especially when the recognition is based solely on voice, which is the basic mean of human communication. Generally the difficulty of the speech emotion recognition problem should be emphasized. In this interdisciplinary field of research, aspects of psychology and physiology are not always considered and literature still offers ideas rather than solutions. The literature in emotion detection in speech is not very rich and researchers are still debating what features influence the recognition of emotion in speech. There is also considerable uncertainty as to the best algorithm for classifying emotion, and which emotions to class together. In Table 10, important issues such as number of features, number of classes and overall performance of similar researches are briefly presented. Concluding this paper, the 23-input vector in ANN, and the 35 feature vector in SVM seems to be quite promising for speaker independent recognition in terms of high and low arousal emotions when tested in Berlin database. Therefore, more sound descriptors like periodicity, speaking rate, voiced/unvoiced time ratio should be further evaluated in a future research. Although it is impossible to accurately compare recognition accuracies from this study to other due to different data sets used, the feature set implemented in this work seems to be promising for further research. The proposed feature set contains 23 features for ANN and 35 for SVM, which provide information about the prosody of the speaker over the entire sentence. A future work should encompass more features for further evaluation. Ultimately, samples of various speech databases could be assessed from the classifier in order to tackle also the problem
Towards Emotion Recognition from Speech
141
of multilingual context. The latter was interestingly addressed in [32]. In addition, the researchers usually deal with elicited and acted emotions in a lab setting from few actors, just like in our case. It is also the case that assembling databases has not traditionally been considered a high-profile or intellectually challenging area. Good quality recording and large balanced samples tend to be thought of as the basic requirements, with the human side assumed to be relatively straightforward. A little thought shows that in the domain of emotion that cannot be the case. The human race expends a huge proportion of its resources trying (with mixed success) to direct people out of some emotional states and into others. If it were easy to achieve the shifts, there would be no need for whole industries and cities to exist. As a result, capturing a faithful, detailed record of human emotion as it appears in real action and interaction is an incredibly challenging task. Nevertheless, the payoff could also be tremendous. At root, it is enlisting computers to co-operate in the old task of directing people away from some emotional states and into others. The lure of technologies capable of doing that is enough to keep the enterprise going in spite of the difficulties [3]. However, in the real problem, different individuals reveal their emotions in a diverse degree and manner. There are also many differences between acted and spontaneous speech. Speaker-independent detection of negative emotional states from acted and real-world speech was investigated in [31]. The experimentations demonstrated some important differences on recognizing acted versus non-acted speech, which cause significant drop of performance, for the real-world data.
References [1] Wierzbicka, A.: Emotions across languages and cultures: Diversity and universals. Cambridge University Press, Cambridge (1999) [2] Software for Predictive Modelling and Forecasting (2009), http://www.dtreg.com/ [3] Cowie, R., Cowie, E.D., Cox, C.: Beyond emotion archetypes: databases for emotion modelling using neural networks. Neural Networks 18(4), 371–388 (2005) [4] Chen, L.S., Huang, T.S.: Emotional expressions in audiovisual human computer interaction. In: Proc. of International Conference of Multimedia and Expo (ICME), pp. 423–426 (2000) [5] Chen, L.S., Huang, T.S., Miyasato, T., Nakatsu, R.: Multimodal human emotion/expression recognition. In: Proc. of 3rd IEEE International Conference on Automatic Face and Gesture Recognition (FG), pp. 396–401 (1998) [6] De Silva, L.C., Ng, P.C.: Bimodal emotion recognition. In: Proc. of 4th IEEE International Conference on Automatic Face and Gesture Recognition (FG), pp. 332–335 (2000) [7] Yoshitomi, Y., Kim, S., Kawano, T., Kitazoe, T.: Effect of Sensor Fusion for Recognition of Emotional States Using Voice, Face Image and Thermal Image of Face. In: Proc. of 9th IEEE International Workshop on Robot and Human Interactive Communication, pp. 178–183 (2000) [8] Ekman, P.: Universals and Cultural Differences in Facial Expression of Emotion. In: Cole, J.R. (ed.) Motivation. University of Nebraska Press (1972)
142
C.-N. Anagnostopoulos and T. Iliou
[9] Ekman, P.: An Argument for Basic Emotions. Cognition and Emotion 6(3), 169–200 (1972) [10] Izard, C.E.: The Face of Emotion. Appleton-Century-Crofts, New York (1971) [11] Izard, C.E.: Basic Emotions, Relations among Emotions and Emotion – Cognition Relations. Psychological Review 99, 561–565 (1992) [12] Tomkins, S.S.: Affect, Imagery, Consciousness: The Positive Affects. Springer, New York (1962) [13] Tomkins, S.S.: Affect Theory. In: Scherer, K.R., et al. (eds.) Approaches to Emotion. Erlbaum, Hillsdale (1984) [14] Fontaine, J.R.J., Scherer, K.R., Roesch, E.B., Ellsworth, P.C.: The world of emotions is not two dimensional. Psychological Sciences 18(12), 1050–1057 (2007) [15] Ververidis, D., Kotropoulos, C.: A State of the Art Review on Emotional Speech Databases. In: Proc. of the 1st Richmedia Conference, pp. 109–119 (2003) [16] Kim, S., Georgiou, P., Lee, S., Narayanan, S.: Real-time emotion detection system using speech: Multi-modal fusion of different timescale features. In: Proc. of IEEE Multimedia Signal Processing Workshop, pp. 48–51 (2007) [17] Morrison, D., Wang, R., De Silva, L.C.: Ensemble methods for spoken emotion recognition in call-centres. Speech Communication 49, 98–112 (2007) [18] Ang, J., Dhillon, R., Krupski, A., Shriberg, E., Stolcke, A.: Prosody-based automatic detection of annoyance and frustration in human–computer dialog. In: Proc. of the International Conference on Spoken Language Processing (ICSLP), pp. 2037–2040 (2002) [19] Petrushin, V.: Emotion recognition in speech signal: experimental study, development, and application. In: Proc. of the 6th International Conference on Spoken Language Processing (ICSLP), pp. 222–225 (2000) [20] Bänziger, T., Scherer, K.R.: The role of intonation in emotional expression. Speech Communication 46, 252–267 (2005) [21] Abdulla, W.H., Kasabov, N.K.: Improving speech recognition performance through gender separation. In: Proc. of the 5th Biannual Conference on Artificial Neural Networks and Expert Systems (ANNES), pp. 218–222 (2001) [22] Wang, Y., Guan, L.: Recognizing human emotion from audiovisual information. In: Proc. of International Conference on Acoustic and Signal Processing (ICASP), pp. 1125–1128 (2005) [23] Vogt, T., Andre, E.: Improving Automatic Emotion Recognition from Speech via Gender Differentiation. In: Proc. of Language Resources and Evaluation Conference (LREC), pp. 1123–1126 (2006) [24] Kostoulas, T.P., Fakotakis, N.: A Speaker Dependent Emotion Recognition Framework. In: Proc. of Fifth International Symposium on Communication Systems, Networks and Digital Signal Processing (CSNDSP), pp. 305–309 (2006) [25] Fingerhut, M.: Music Information Retrieval, or how to search for (and maybe find) music and do away with incipits. In: International Association of Music Libraries, Archives and Documentation Centers (IAML) and the International Association of Sound and Audiovisual Archives (IASA), IAML-IASA Congress (2004) [26] Boersma, P., Weenik, D.: Praat, a system for doing phonetics by computer, Technical Report 132, Inst Phonetic Sciences, Univ. Amsterdam (2003), http://www.praat.org [27] Scherer, K.R.: Vocal communication of emotion: a review of research paradigms. Speech Communication 40, 227–256 (2003)
Towards Emotion Recognition from Speech
143
[28] Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice-Hall, Englewood Cliffs (1978) [29] Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., Narayanan, S.: Emotion recognition based on phoneme classes. In: Proc. of the International Conference on Spoken Language Processing, ICSLP (2004) [30] Waikato Environment for Knowledge Analysis, WEKA (2006), http://www.cs.waikato.ac.nz/ml/weka/ [31] Kostoulas, T., Ganchev, T., Fakotakis, N.: Study on speaker-independent emotion recognition from speech on real-world data. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 235–242. Springer, Heidelberg (2008) [32] Hozjan, V., Kacic, Z.: Context-independent multilingual emotion recognition from speech signals. International Journal of Speech Technology 6, 311–320 (2006) [33] Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine 18, 32–80 (2001) [34] Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proc. of Interspeech, pp. 1515–1520 (2005) [35] Murray, I.R., Arnott, J.L.: Towards a simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. Journal of Acoustic Society America 93(2), 1097–1108 (1993)
Health Care Web Information Systems and Personalized Services for Assisting Living of Elderly People at Nursing Homes Stefanos Nikolidakis, Dimitrios D. Vergados , and Ioannis Anagnostopoulos a
b
Abstract. Nursing homes, where home adaptations -environmental improvementsand assistive technology (AT) are provided, represent an increasingly attractive means of helping senior citizens maintain their independence and enhance the quality of their life. Doctors and specialists are also involved in order to provide elderly people with personalized health care services, in this way improving their treatment and life conditions. The main difference that nursing homes have compared to a typical health home is that the former use new technologies and applications in order to collect data from the elderly and create an electronic file for each individual. The basic idea of this paper is to use a web application along with tablet PCs or PDAs in order to collect the personal information and the clinical characteristics of the patient. Moreover, this application helps doctors manage the nursing home and have a better view of the health status of each patient, while it also provides doctors with a report regarding the medical supplies needed at the nursing home and the overall status of the health condition of the population.
1 Introduction A number of elderly people in the community receive a great deal of informal care from one or more sources, but others, some of whom are very frail, receive little or no informal care, often because they have few or no relatives [1]. Friends and neighbors rarely provide much care, and even if they do, it is usually not efficient. Despite the fact that elderly people have a lot of support from their relatives and friends, many of them need official health services, since most of them live on Stefanos Nikolidakis and Dimitrios D. Vergados University of Piraeus Department of Informatics 80 Karaoli & Dimitriou St., GR-185 34, Piraeus, Greece e-mail:
[email protected],
[email protected] Ioannis Anagnostopoulos University of the Aegean Department of Information and Communication Systems Engineering Karlovassi, Samos Island, GR-832 00, Greece e-mail:
[email protected] M. Wallace et al. (eds.): Semantics in Adaptive and Personalized Services, SCI 279, pp. 145–162. © Springer-Verlag Berlin Heidelberg 2010 springerlink.com
146
S. Nikolidakis, D.D. Vergados, and I. Anagnostopoulos
their own while less live with an elderly person, who also has health problem. In case that they live with younger people, most of these people feel the need for some formal services and specialized personnel to take care of them. Most of the elderly are known to social services, so it could be presumed that they would receive some formal services [4]. What is not known is the extent to which such a group of people -who are all thought to be in some way at the margin of community and residential care- would be supported by the statutory sector.
2 Health Care Services and Public Health Information Systems Health care stands for any service [5], supply, equipment or prescription that people get in order to help them stay healthy. It includes preventive care (such as the yearly check-up), care for illness or injury, a hospital stay, surgery, visits to a doctor’s office, lab tests and X-rays, and even drug prescriptions. An individual may use some other types of services in order to improve their health, like buying overthe-counter medicine or keeping track of their own blood pressure. In this paper though, “health care” stands for those treatments that people receive from a trained and licensed health care practitioner, like their doctor or nurse practitioner. Health care services include services provided by different kinds of trained and licensed providers. These services usually operate in places like a hospital, a doctor’s office or a health clinic. The health plan may include a large number of these providers, but often people may need to pay the costs on their own if the provider is not in the health plan’s network. The health care providers [3] are entities that provide services approved as medical and other health services in the Medicare law. The medical and other health services presented in detail in the Medicare law are listed below, according to [3]: 1. 2. 3.
Physicians’ services Nursing services Services and supplies furnished as an incident to a physician’s professional services, or services or supplies which are commonly furnished in a physician’s office and commonly rendered without charge or included in a physician’s bill 4. Diagnostic services. Furnished to an individual as an outpatient by a hospital or by others under arrangements with them made by a hospital, and ordinarily furnished by a hospital to its outpatients for the purposes of diagnostic study 5. Outpatient physical therapy services 6. Outpatient health care services 7. Rural health clinic services 8. Federally-qualified health care services 9. Home dialysis supplies and equipment, self-care home dialysis support services and institutional dialysis services and supplies 10. Antigens prepared by physicians for a particular patient 11. Services furnished by contract to a member of an eligible organization by a physician assistant or by a nurse practitioner
Health Care Web Information Systems and Personalized Services
147
12. Blood clotting factors for haemophilia patients 13. Prescription drugs used in immunosuppressive therapy furnished to an individual who receives an organ transplant, but only in case of certain drugs 14. Services furnished by a nurse that would be a physician’s services 15. Certified nurse-midwife services 16. Qualified psychologist services 17. Clinical social workers services 18. Erythropoietin for dialysis patients 19. Diabetes outpatient self-management training screening 20. Surgical dressings, splints, casts, and other devices used for reduction of fractures and dislocations 21. Durable medical equipment 22. Prosthetic devices (other than dental) which replace all or part of an internal body organ and including one pair of conventional eyeglasses or contact lenses furnished subsequent to cataract surgery 23. Services of a certified register nurse anaesthetize 24. Screening mammography 25. Screening pap smear and screening pelvic exam
2.1 Health Care Providers Generally, the definition of health care providers is based on the activities performed and not on the titles or labels of the professionals or institutions. The health care providers [3] listed in the Medicare law include: 1. 2. 3. 4. 5. 6.
Nursing home Hospitals Critical access hospitals Comprehensive outpatient Rehabilitation facilities Home health agencies and Hospice programs
2.2 Nursing Home Services A nursing home is an entity that provides skilled nursing care and rehabilitation services to people with illnesses, injuries or functional disabilities. Most facilities serve the elderly people and take care of their needs. However, some facilities provide services to younger individuals with special needs, such as the developmentally disabled, mentally ill, and those requiring drug and alcohol rehabilitation. Nursing constitutes independent facilities, although some of them are operated within a hospital or retirement community. The level of care provided by nursing homes has increased significantly over the past decade. Many homes now provide a great part of the nursing care that was previously provided in a hospital. As a result, most nursing homes now focus their attention on rehabilitation, so that their clients
148
S. Nikolidakis, D.D. Vergados, and I. Anagnostopoulos
can return to their own homes as soon as possible. Some of the services [6] a nursing home may provide include: 1. Therapies: (Physical therapy, Occupational therapy, Speech therapy, Respiratory therapy) 2. Specialty Care: (Alzheimer's treatment, Head trauma, Hematological conditions, Mental disease, Neurological diseases, Neuromuscular diseases, Orthopedic rehabilitation, Pain therapy, Pulmonary disease, Para/quadriplegic impairments, Stroke recovery) 3. Independent Living: Independent living is for people who can take care of themselves and includes residing in one's own home or apartment. 4. Assisted Living: Assisted living provides apartment-style accommodations where services focus on providing assistance with daily living activities. 5. Congregate Care: Congregate care is similar to independent living, but features a community environment, with one or more meals per day prepared and served in a community dining room. Many other services and amenities may be provided such as transportation, pools, a convenience store, bank, barber/beauty shop, resident laundry, housekeeping, and security. 6. Intermediate Care: Intermediate care is nursing home care for residents needing assistance with activities of daily living, but without significant nursing requirements. 7. Skilled Nursing: Skilled nursing facilities are traditional nursing facilities that provide 24-hour medical nursing care for people with serious illnesses or disabilities. 8. Continuing Care Retirement Communities or Life Care Communities: These communities are planned and operated to provide a continuum of care from independent living through skilled nursing. 9. Sub-acute Care: Sub-acute care is intensive nursing care for patients recovering from surgery or illness patients receive this care in a nursing home setting. 10. Hospice Care: Hospice care is a combination of facility-based and home care provided to benefit terminally ill patients and support their families. 11. Hospitals: In addition to traditional services, many hospitals offer skilled or sub-acute nursing services either in their facility or on their campus. 12. Respite Care: Respite care is provided on a temporary basis to allow a primary care provider or family member relief for a few hours or days. 13. Adult Day Care: Adult day care programs provide meals and care services in a community setting during the day while a caregiver needs time off or must work. 14. Out-patient Therapy: Many facilities offer the same therapies provided in a nursing home on an out-patient basis. For those choosing a home-based option, out-patient therapy may be a necessary professional service. 15. Home Health Care: Home health care is provided in an individual's home by outside providers and aims to keep the individual functioning at the highest possible level.
Health Care Web Information Systems and Personalized Services
149
2.3 Nursing Home for Elderly People It seems that most of the elderly people do not want to go to a nursing home [2], and most of their relatives do not want to institutionalize a loved one. But even though they may be committed to home care and have no intention of utilizing the services of a nursing home, circumstances make institutionalizing a necessity, and not a choice. A lot of times, during a long period of staying hospitalized, the elderly people may need a period of specialized care that they can not receive at their own home. Like so many issues in care provision, the decisions surrounding this process involve practical considerations overlaid with emotional components [2]. Feelings of sadness, relief, guilt, and a sense of failure may all be experienced when there is the need to institutionalize a loved one in a nursing home. As time passes, and the raw emotions of the moment subside, one of the most important areas of comfort is the knowledge that one has the right home for their care recipient. It is impossible to choose a nursing facility without first determining the type of care the patient needs. This information not only assists elderly people in finding a home that provides the proper level of care, but it also will be a major factor in determining the public aid that the care recipient will be eligible for. The three most common types of care for elderly people are personal care, often referred to as custodial, intermediate, and skilled nursing. Custodial care means that residents need help with personal activities such as dressing, bathing, and eating. This type of care is essentially non-medical and is administered by aides rather than trained medical personnel. Residents who need rehabilitative therapy and medications in addition to personal custodial care are candidates for intermediate care. Intermediate care is delivered by licensed therapists and registered or licensed practical nurses. When the level of disability is such that the resident is not able to take care of himself or herself and may even be bedridden, skilled nursing care is needed. this is administered on the orders of an attending physician by licensed medical personnel.
3 Health Care and Information Technology In recent years, grid computing has evolved as a standards-based approach for the coordinated sharing of distributed and heterogeneous resources to solve largescale problems in dynamic virtual organizations [7]. Much of the existing developments in grid computing have focused on compute grids and data grids. A compute grid provides distributed computational resources to meet the computational requirements of applications, while a data grid provides seamless access to large amounts of distributed data and storage resources. Although both healthcare and the use of IT to support the development of effective treatment, delivery and management of healthcare, are top priorities in the health field in many countries, there are many competing areas of investment [8]. The benefits of using even basic IT to provide high quality information and decision support to clinicians and patients are intuitively very significant. However,
150
S. Nikolidakis, D.D. Vergados, and I. Anagnostopoulos
progress even in basic IT has been patchy and slow in the healthcare industry. In other words, few high quality, well documented business cases with results and very few large scale IT implementation can be found. There are even fewer cases that demonstrate the benefits of dramatically new IT technologies (like Grid) or in innovative areas of healthcare such as genetics, imaging, or bioinformatics. Therefore in applying for funding and prioritisation of resources to continue developing HealthGrid applications, it is vital to create a clear and highly compelling business case which will act on all the types of healthcare. In the future, the nursing homes will be an arena for medical treatment and care [9]. Health care will play an important role in achieving this. Due to factors, such as a change in demography and shortage of people working in the health sector, there will be a need to change the way hospitals are organised as well as the way that care is provided. Nursing homes can benefit from a variety of services, which include rehabilitation after operations, policlinic controls and patient training. Possible benefits from medical treatment at nursing homes could result in better utilization of hospital resources, improved care quality, and improved quality of life for the patients. Thus, emerging technologies, such as the grid, have the potential to facilitate nursing home based health services, and to enable a closer integration of the home environment as a part of a hospital or other cooperating health institutions. In order to fulfil the vision of a more extensive use of nursing home based on health services, there is a need for a new computer infrastructure. The grid will play an essential role in realizing nursing home in the future. For nursing home treatment and care, there is a need for a technological infrastructure that integrates the nursing home with the medical/health institutions in question. The concept of virtual organisations is well suited to nursing home based health care. In a nursing home a possible scenario is that during a recovery period of a patient, different virtual organisations will be formed. These organizations will depend on the medical condition of the patients as well as on the different phases of treatment and recovery. Initially a nursing home will be a part of the virtual organisation. As soon as the patient has recovered to some extent, the virtual organisation could include both the nursing home and a primary care institution, with or without the hospital participating. When the patient has fully recovered, and there is no longer the need for a nursing home, the virtual organisation will cease to exist. It is not unrealistic to anticipate that in the future -and for certain diseases- the healthcare services will be directly provided to the nursing home from nurses and doctors abroad. But, which grid functionality is needed in order to support medical care and treatment at nursing home in an optimal way? Running a nursing home requires collaboration and information exchange among the nursing homes that constitute the virtual organisation. Speaking in terms of computational, information and collaboration grids, as a way to categorise functionality, it is questionable whether there will be a need for vast amounts of computational power. But there will be a need to store rather large amounts of data, and functionality for collaboration. There is a need to perform consultations, exchange data and information such as monitoring data, remote control of medical equipment and virtual visits to the patients at nursing home. The information that is related to a particular patient is today scattered around in different health institutions. Common ontologies are
Health Care Web Information Systems and Personalized Services
151
important in order to combine this information. An infrastructure for virtual hospitals and nursing home care has to meet security requirements for different levels of quality of service and facilitate interoperability in dynamically formed virtual organisations. Today one may observe different non grid initiatives addressing the need for the nursing home based on computing infrastructure. The benefits of having a common standard grid for nursing home based healthcare in the future, is easy the deployment and operability.
4 Improve Health Care at Nursing Home – Our Approach The Web has been formed to be an integral part of numerous applications in which a user interacts with many entities that may be a service provider, a product seller, a friend or a colleague. Contents and services are available at different sources and places. Based on these, Web Applications have to combine all available knowledge in order to offer personalized and user-friendly services. Thus, one of the main goals of the Semantic Web is to enable applications that could offer to the end users high quality health care services that will take advantage from electronically stored information. For this purpose, it is important and vital to propose and use some specific techniques in order to mine this data for actionable knowledge. Also discovered knowledge can effectively be used to enhance the users' Web experience. However, it is important that the people, that will use these techniques, will also take into consideration the large size and the heterogeneous nature of the stored data, as well as the dynamic nature of user interactions with the Web. Personalization is used more and more often in several areas of interactive multimedia, mainly in web applications. This is caused by the need to adapt the content and presentation style to the preferences of a given user or set of users in order to offer them better services. The health and personal care that a resident receives will be based on the individual’s needs. In our work we assume that based on the health conditions of the patients that ender the nursing home the relevant health care services are adopted and applied to them. For example if the health condition of patient is critical then the relevant picture above the room (figure 4.2) turns into black. This helps the doctors to categorise the patient and offer personalized services that best fit to their needs. Moreover, we assume that there is a number of nursing homes that are cooperating among each other and through our software the doctors can add a patient to the best proper home in order to provide better treatment to the specific patient. Moreover, the use of a common database helps the doctors to view the patients’ medical history and provide to them better health care treatment and health care services. As it has already been stated it is important for elderly people to have some trained personnel taking care of them and thus helping them have better living conditions. The nursing home for the elderly can provide these services and help them with their daily activities. However, there are several ways to improve the services provided at nursing homes so that elderly people will receive better health care which will focus on their individual problems. Consequently, it would be useful to know the problems that elderly people face. In fact a good idea is to keep a file for each one, which will contain the personal information and the clinical
152
S. Nikolidakis, D.D. Vergados, and I. Anagnostopoulos
characteristics that the personnel at the nursing home should be aware of, in order to provide to elderly people the proper services. It is also important that different nursing homes should be able to cooperate [7] in order to enable the transfer of senior citizens from one place to the other. For example, an elderly person who faces orthopaedic problems but currently stays at a nursing home which does not specialize in this kind of disorders, is vital to be transferred to a specialized home in order to receive better treatment [8]. Our approach is to use a web application along with a tablet PC or PDA in order to collect the personal information and the clinical characteristics of the people in the nursing home and to monitor their health status. It is important to create such a file for each elderly person in order to have a better view of their clinic history and in this way provide them with better treatment. In order to reach the above target [9], the nursing home should have some special technical equipment. Firstly a wireless connection at home and tablet PCs or PDAs is necessary. The personnel of the institute should also be trained to use the tablet PCs or PDAs as well as the web application. This web application will help doctors collect the information of the patients in a timely manner, and monitor their health status in order to have a better view of their needs. The most important functionality of this application is that it will be able to generate a report with the needs of the nursing people in medical supplies including the health status of nursing people at the nursing home.
Fig. 1 The main page where the doctors can select a nursing home among the cooperating ones
Health Care Web Information Systems and Personalized Services
153
We assume that we have a list of cooperating nursing homes which are specialized in specific types of health care services and a web application that can manage these nursing homes and is used in order to exchange the patients from one home to the other according to their needs. This means that the doctors can select a nursing home among some cooperating nursing homes and can then start a number of actions for the management of the elderly people (Fig. 1). The doctors can add an elderly person to a specific room and bed by clicking on the circles above the doors as depicted in Fig. 2. The circles above the doors represent the beds, while the doors represent the rooms. They can also view the status of an elderly person in a room by clicking on the door they prefer.
Fig. 2 The page where the doctors can add an elderly people to a specific room and bed or view the status of the nursing home
154
S. Nikolidakis, D.D. Vergados, and I. Anagnostopoulos
Fig. 3a The form in which the doctors can add the general information of the elderly people and information about their health status
Health Care Web Information Systems and Personalized Services
155
Fig. 3b The form in which the doctors can add the general information of the elderly people and information about their health status
In case that the doctors want to add an elderly person at the home, they have to press the relevant button, the cycles above the doors, (Fig. 2) that are driven to another page with a form that contains the personal information of the people and their clinical characteristics. If they have already added some person in a bed, they can add or change the clinical characteristics of that elderly person and give some medical supplies. As soon as the doctors have entered all the necessary information for that person, they have to press the submit button and the person is added in the home in the bed they have selected. As a result, in case that the elderly person has a good health condition the circle above the doors will be replaced by a white icon. In case that the elderly person has a problem it will be replaced by a grey icon and if the elderly person needs special care it will be replaced by a black icon. This can help the doctors acquire a better view of the condition of the elderly people in the nursing home and monitor the health status of the nursing home. Prior to their hospitalization, the doctor creates the personal record file of each patient. Doctors use a tablet PC or a PDA, and fill in a form designed to store and save this information in a database. The form is illustrated in Fig. 3a and 3b.
156
S. Nikolidakis, D.D. Vergados, and I. Anagnostopoulos
In the option the user can choose several diseases predefined from a list. According to the category that has been chosen, the relative diseases appear. Similarly, according to the disease relevant information regarding the necessary medication treatment appears. For example, if the doctors choose diseases of blood the deseases of iron deficiency anemia and coagulation factor deficiency will appear. Then, if the doctors choose iron deficiency anemia, the medication legofer oral sol 800 mg will appear. In case that the doctors need a more specific view of the elderly people located in a room, they can press into the doors and get their name and a photo that the doctors or a nurse have captured for them (Fig. 4). The doctors can see the patients of this room and they can select the elderly person they want and view the person’s details and photo. Moreover, they can change the general information of the elderly person, for example the name the residence, and they can also view and change the clinical status of this person and add a disease and medical treatment and a description of the health status of the specific elderly person.
Fig. 4 Getting the clinical view of a specific patient
Doctors can also search for personalized information in respect to the health status of each elderly person, as shown in Fig. 5. The results provided are the age, the room, the bed and the special needs of the elderly people. Moreover the doctors can see the list of the diseases that the elderly people suffer from. Finally, the proposed application provides a report with the supplies and the equipment needed for the nursing home (Fig. 6). Doctors and physicians can see how many people are institutionalized in the nursing home, as well as what kind of diseases have been recorded and need to be supervised by the doctors. Moreover, the medical staff can see the total needs in supplies for the nursing home, helping in this way in managerial issues.
Health Care Web Information Systems and Personalized Services
Fig. 5 The page where the doctors can search for elderly people
157
158
S. Nikolidakis, D.D. Vergados, and I. Anagnostopoulos
Fig. 6 The report page of the nursing home needs
5 Ethical Issues on Health Care Services The electronic health, or e-health, enhances the communication between patients and doctors. It also provides education through online resources, as well as information sharing irrespective of their location [10]. This implies a need for strict confidentiality and enforced protection of privacy [11]. Privacy is in fact recognized as a fundamental human right, at least in Europe. Public authorities are sharply aware of these repercussions, and they are putting considerable effort into privacy protection legislation. Health information includes [12] information for staying well, preventing and managing disease, as well as making other decisions related to health and health
Health Care Web Information Systems and Personalized Services
159
care. It includes information [13] for making decisions about health products and health services. It may be in the form of data, text, audio, and/or video. It may involve enhancements through programming and interactivity. Health products [12] include drugs, medical devices, and other goods used to diagnose and treat illnesses or injuries or to maintain health. Health products include both drugs and medical devices subject to regulatory approval by agencies such as the U.S. Food and Drug Administration or U.K. Medicines Control Agency and vitamin, herbal, or other nutritional supplements and other products not subject to such regulatory oversight. Health services [13] include specific, personal medical care or advice; management of medical records; communication between health care providers and/or patients and health plans or insurers, or health care facilities regarding treatment decisions, claims, billing for services, etc.; and other services provided to support health care. There is an appropriate concern about the proper treatment of sensitive data. To better illustrate the problem of the privacy of personal data we will use the following scenario. A patient with acute abdominal pain is admitted into the Emergency Department of a hospital. The patient is assigned to a doctor that will perform the Acute Abdominal Pain Diagnosis procedure. The diagnosis procedure requires the doctor to access the patient history, then to carry out a physical exam, and finally to ask for some lab and imaging exams. Optionally, the doctor can ask the opinion of one or more colleagues, depending on the nature of the patient’s symptoms. The basic assumption is that the medical records of the patients are stored in a database and are accessible from any computer in the hospital. However, since the records contain sensitive information, the medical staff of the hospital should have specific restrictions on accessing the records. For instance, a doctor can only access records of patients assigned to his ward. Such a restriction requires that a patient is admitted to the ward, possibly requiring that the patient is physically present in the ward and that access can only be made from terminals in the ward. However, when a doctor from another ward needs to access a patient record the above mechanism will not allow the access unless authorization is provided, for example, by the first doctor for a limited time. The above is a typical procedure that is usually followed, so there is the need for an access control mechanism that will be used to protect the privacy of the patient records. It is natural to use roles to reflect the various responsibilities in organizations. Privacy Enhancing Technologies (PETs) are fairly new (the concept has only been around since the ‘90s), and have been extensively researched in both the USA and in Europe. In healthcare, PETs are mainly used for protection of the privacy of persons involved in medical data collection. The goal of these PETs is to guarantee anonymity of data subjects while making information available for clinical practice and research. The use of such techniques in healthcare has been demonstrated in several research projects and solutions that are already commercially deployed, in clinical trials, disease studies, for the exchange of research data, for the daily handling of sensitive data, etc. PETs such as anonymisation have even already reached the first steps that lead to standardization.
160
S. Nikolidakis, D.D. Vergados, and I. Anagnostopoulos
As we described earlier, trust is fundamental to healthcare. Patients rely on healthcare providers to keep their personal information confidential, to provide accurate and appropriate information about their conditions and possible treatments, and to recommend the therapy they believe to be in the patient’s best interest. In response to the ever-growing public scrutiny of the Internet health arena, several organizations have championed e-health ethics initiatives [14]. These include: 1. 2. 3. 4. 5.
Health On the Net (HON), Code of Conduct (www.hon.ch/HONcode/ Conduct.html) American Medical Association, Guidelines for Medical and Health Information Sites on the Internet Health Internet Ethics (Hi-Ethics), Ethical Principles for Offering Internet Health Services to Consumers (www.hiethics.org/Principles/index.asp) Internet Healthcare Coalition, e-Health Code of Ethics (www. ihealthcoalition.org/ethics/ethics.html) URAC “Health Web Site Standards” (www.urac.org/documents/ HealthWebSitev1-0Standards040122.pdf)
The goals of these initiatives and organizations are to draft ethical guidelines for creating credible and trustworthy health information and services on the Internet. According to [14] a summary on e-Health Code of Ethics is the following: 1.
2.
3.
4.
5.
6.
7.
Candor: Disclose information that, if known by consumers, would likely affect their understanding or use of the site, or purchase or use of a product or service. Honesty: Be truthful and not deceptive. People who seek health information on the Internet need to know that products or services are described truthfully and that information they receive is presented in clearly. Quality: To make decisions about their health care, people need and have the right to expect that sites will provide accurate, well-supported information and products and services of high quality. Informed Consent: Respect users’ right to determine whether or how their personal data may be collected, used, or shared. People who use the Internet for health-related reasons have the right to be informed that personal data may be gathered, and to choose whether they will allow their personal data to be collected and whether they will allow it to be used or shared. They have a right to be able to choose, consent, and control when and how they actively engage in a commercial relationship. Privacy: Respect the obligation to protect users’ privacy. People who use the Internet for health-related reasons have the right to expect that personal data they provide will be kept confidential. Professionalism in Online Healthcare: Respect fundamental ethical obligations to patients and clients. Inform and educate patients and clients about the limitations of online healthcare. Responsible Partnering: Ensure that organizations and sites with which they affiliate are trustworthy. People need to be confident that organizations and individuals who operate on the Internet undertake to partner only with trustworthy individuals or organizations.
Health Care Web Information Systems and Personalized Services
8.
161
Accountability: Provide meaningful opportunity for users to give feedback to the site. People need to be confident that organizations and individuals that provide health information, products, or services on the Internet take users’ concerns seriously and that sites make good faith efforts to ensure that their practices are ethically sound.
6 Conclusions – Future Work There is a growing demand for health care services, which will be provided at nursing homes, particularly for the elderly and chronically ill people. The nursing homes and the trained personnel, who work there, should provide health care services that help elderly people to recover and improve their quality of life. The medical staff should also provide them with secure and supervised health care services. New technologies make this feasible and affordable. In this paper, we introduce a web application that can be used in nursing homes, in order to manage the health care services that are provided to the elderly people and support different types of health services according to the demands of the senior citizens. The doctors can use our application through PDAs or tablet PCs, in order to collect both personal and clinical information creating in parallel a personalized file record for the hospitalized persons. The important part of the application is the monitoring tool of the health status of the people in the nursing homes that helps doctors have a better view of the health status at the nursing home. This application can also generate a total report with the needs of the nursing home in respect to medical supplies as well as the demographics and population status of the nursing home. Last but not least is the exchange of information among different nursing homes and the use of this application in order to exchange and process the clinical data of the elderly people from one home to the other. As far as future considerations are concerned, the expansion of the use of applications and internet technologies for personalized health care services will help elderly people acquire better treatment and life conditions.
References [1] Lansley, P., McCreadie, C., Tinker, A.: Can adapting the homes of older people and providing assistive technology pay its way? Age and Ageing, Oxford Journals 33(6), 571–576 (2004) [2] Choosing a Nursing Home: A Caregiver’s Guide, National Family Caregivers Association (NFCA), 00/896-3650, http://www.thefamilycaregiver.org/pdfs/ NursHomeChecklist.pdf [3] HIPAA workgroups, Information Access Management, Health Insurance Portability and Accountability Act, Policy Memorandum, Chapter 1: Entity Status, Health Care Providers (2004), http://www.hipaa.org/ [4] Allen, I., Hogg, D., Peace, S.: Elderly People: Choice, Participation and Satisfaction. Policy Studies Institute (PSI), London (1992)
162
S. Nikolidakis, D.D. Vergados, and I. Anagnostopoulos
[5] Consumer Guide to Health Care Coverage, Consumer Affairs & Business Regulation (OCABR), http://www.mass.gov/ [6] What is a Nursing Home? Nelson & Wallery Ltd., http://www.nursinghomeinfo.com/nhserve.html [7] Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of High Performance Computing Applications 15(3), 200–222 (2001) [8] Dean, K., Lloyd, S.: The Healthgrid White Paper. In: Proceedings of Healthgrid 2005, From Grid to Healthgrid, vol. 112, pp. 18–21. IOS Press, Amsterdam (2005) [9] Burkow, T.M., Bakkevoll, P.A.: The Grid as an Enabler for Home Based Healthcare Services. In: Engelbrecht, R., et al. (eds.) Proceedings of MIE 2005, Connecting Medical Informatics and Bio-Informatics, ENMI, pp. 1305–1310 (2005) [10] Eysenbach, G.: What is e-health? J. Med. Internet Res. 3(2), e20 (2001) [11] De Moor, G., Claerhout, B.: From grid to healthgrid: Confidentiality and ethical issues. HealthGrid White Paper, ch. 8 [12] Code of Ethics, http://www.advamed.org/MemberPortal/About/code/ [13] Ahmad, R.: eHealth Code of Ethics (1998) [14] Mack, J.: Beyond HIPAA: Ethics in the e-Health Arena. Healthcare Executive (2004)
Introducing Context-Awareness and Adaptation in Telemedicine Systems Charalampos Doukas , Ilias Maglogiannis , and Kostas Karpouzis a
b
c
Abstract. Proper coding and transmission of medical and physiological data is a crucial issue for the effective deployment and performance of telemedicine services. This chapter presents a platform for performing proper medical content adaptation based on context awareness. Sensors are used in order to determine the status of a patient being monitored through a medical network. Additional contextual information regarding the patient’s environment (e.g., location, data transmission device and underlying network conditions, etc.) is represented through an ontological knowledge base model. Rule-based evaluation determines proper content (i.e., biosignals, medical video and audio) coding and transmission of medical data, in order to optimize the telemedicine process. The paper discusses the design of the ontological model and provides an initial assessment.
1 Introduction A number of telemedicine applications exist nowadays, providing remote medical action systems (e.g., remote surgery systems), patient remote telemonitoring facilities (e.g., homecare of chronic disease patients), and transmission of medical content for remote assessment ([[1]]-[5]). Such platforms have been proved significant tools for the optimization of patient treatment offering better possibilities for managing chronic care, controlling health delivery costs and increasing quality of life and quality of health services in underserved populations. Collaborative applications that allow the exchange of medical content (e.g., a patient health record) between medical experts for educational purposes or for assessment assistance are also considered of great significance ([6]-[8]). Due to the remote locations of the involved actuators, a network infrastructure (wired and/or wireless) is needed to enable the transmission of the medical data. The majority of the latter data is Charalampos Doukas University of the Aegean, Department of Information & Communication Systems Engineering, Greece Ilias Maglogiannis University of Central Greece, Department of Biomedical Informatics Lamia, Greece Kostas Karpouzis Image, Video and Multimedia Systems Lab, National Technical University of Athens, Greece M. Wallace et al. (eds.): Semantics in Adaptive and Personalized Services, SCI 279, pp. 163–185. © Springer-Verlag Berlin Heidelberg 2010 springerlink.com
164
C. Doukas, I. Maglogiannis, and K. Karpouzis
usually medical images and/or medical video related to the patient. Thus, telemedicine systems cannot always perform in a successful and efficient manner; Issues, like large data volumes (e.g., video sequences or high quality medical images), unnecessary data transmission occurrence and limited network resources can cause inefficient usage of such systems ([9], [10]). In addition, wired and/ or wireless network infrastructures often fail to deliver the required quality of service (e.g., bandwidth requirements, minimum delay and jitter requirements) due to network congestion and/or limited network resources. Appropriate content coding techniques (e.g., video and image compression) have been introduced in order to assess such issues ([11]-[13]), however the latter are highly associated to specific content type and cannot be applied in general. Additionally, they do not consider the underlying network status for appropriate coding and still cannot resolve the case of unnecessary data transmission. Scalable coding and context-aware medical networks can overcome the aforementioned issues, through performing appropriate content adaptation. The realization and integration of Semantic Medical Devices can allow: • • • • • • •
To develop solutions for the realization of smart hospitals. To provide mobile access to the patient’s Electronic Health Record (EHR). To enable the medical devices to send parts of the EHR (i.e. measurement results), alerts, status information etc. to the PDA of the physician. To develop medical devices that inherently support the interoperability among each other. To enable medical devices to send alerts (i.e. SMS) to the handheld devices (i.e. mobile, pager) of the caregivers of the patients, if something goes wrong with the patient. To develop solutions that provides pervasive healthcare services (everywhere, anytime) to the patients, whether staying at home or mobile. To develop solutions for ambient assistant living for the elderly patients (i.e. medication reminders, cognitive assistance etc.)
In addition, policy and rule-based mechanisms can provide better adaptivity of medical networks: For example, there is a need to adapt the frequency of measurements on a sensor depending on the activity and clinical condition of the patient. This enables optimizing power consumption whilst ensuring that important episodes are not missed. Similarly, the use of variable thresholds for transmitting sensor readings reduces the need for communication and thus power consumption. Typically, sensor configuration may also change depending on the user’s context, e.g., location, current activity and medical history. Physiological parameters such as heart rate thresholds then need to be configured and customized accordingly. Policy-based techniques have been used for over a decade in network and systems management in order to define how the system should adapt in response to events such as failures, changes of context or changes in requirements. By specifying the policies (i.e., what actions should be performed in response to an event) declaratively and separately from the implementation of the actions, it is possible to dynamically change the adaptation directives without changing the implementation
Introducing Context-Awareness and Adaptation in Telemedicine Systems
165
or interrupting the functioning of the device. Thus, policy-based mechanisms provide feedback control over the system and a constrained form of programmability. This chapter presents a context aware medical content adaptation platform that utilizes semantic representation of the content and the context. Using proper reasoning techniques, content adaptation is performed; medical image and video transmission only when determined necessary and encode the transmitted data properly according to the network availability and quality, the user preferences and the patient status. The framework’s architecture is open and does not depend on the monitoring applications used, the underlying networks or any other issues regarding the telemedicine system used. The rest of the paper is organized as follows: Section 2 presents the notion of context awareness in telemedicine platforms as found in literature and Section 3 discusses design issues in context-aware medical networks. Section 4 provides information on how context awareness can be achieved, whereas Section 5 discusses context representation issues. Section 6 describes content adaptation techniques and Section 7 provides information regarding the reasoning scheme based on semantic rules for the content adaptation decision. Section 8 presents the proposed platform architecture and Section 9 concludes the chapter.
2 Related Work Context-awareness has been around for more than six years and a lot has been written on this concept. There are several different applications and application frameworks for modeling and evaluating context but only a few in the domain of healthcare and telemedicine. JCAF [40] is built to operate using a network approach wherein different sensory, control and output devices are connected in a peer-to-peer fashion. Entities such as locations, persons or items have their own context, in which context items can be placed. Entities can request and set context items, and/or subscribe to context changes. An example usage described is the context-aware interactive hospital bed [40]. A touch screen computer attached to a patient’s bed uses context, and adjusts its display on context changes, effectively interacting with the environment. Based on proximity, entities and users are identified and authenticated such that a different interface can be shown to surgeons, to nurses, or other personnel. The experimental setup is able to detect RFID chips on medicine trays, and match the retrieved information with known entities within the infrastructure, such that it is able to distinguish between several physical objects. Using the beds context and patient information, the bed is able to tell whether or not the medicines on the medicine tray are actually prescribed to the patient or are misplaced by a nurse, potentially becoming a health risk to the patient. As an alternative, [41] describes a service-based infrastructure. The chapter poses the position that “to greatly simplify the task of creating and maintaining context-aware systems, we should shift as much of the weight of context-aware computing onto network-accessible middleware infrastructures”. Although the do not cover privacy in the application infrastructure they recognize that regarding sensory input “if it were processed in a context infrastructure, it is likely that the interactivity would be stilted due to network latency”. Biegel and Cahill [42] describe a model
166
C. Doukas, I. Maglogiannis, and K. Karpouzis
designed for mobile context-aware applications based on ubiquitous computing. Their approach, called sentient object model, describes a network of sensors, actuators and services which run independently but interact in an ad hoc setup. All the aforementioned works present general frameworks for medical context modeling and utilize the latter for the provision of specific health services. To our best knowledge there is no other framework in the literature that exploits context awareness for proper medical content adaptation in telemedicine.
3 Design Issues in Context-Aware Medical Networks The goal of research into context-awareness in clinical work is to provide a conceptual and technical framework, which can help application programmers create context-aware clinical computer systems. Such a framework should enable the programmer to design, develop, and deploy application-specific context-awareness features that are required in specific usage settings, while it automatically supports aspects of context-awareness, which are common across applications. This approach is similar to other frameworks and toolkits supporting the development of context-aware application, like the Context Toolkit [43]. Requirements for context awareness systems and/or frameworks have been widely discussed and described (see e.g. [43], [44], [45], [46], [47]). Context aware medical applications introduce however additional special requirements; In a hospital there are a wide range of clinical computer systems in use, and new systems are installed and removed on a regular basis. Furthermore, many clinicians (typical research active doctors) build their own applications, such as quality databases supporting a specific clinical experiment. In order to make such applications context-aware there is a need for a stable infrastructure that can be accessed by these applications, and there is a need for a programming interface used by the developers of such applications. The basic design principle in a context-awareness framework for medical purposes is therefore to divide it into two parts. One part supports the deployment of ’context services’, which are robust, scalable, flexible adaptable, extensible, etc. Such services run independently of the applications supplying or using context information. The other part enables developers of context-aware applications to represent, acquire, handle, store, and use context information. Considering the aforementioned, the main design requirements for contextaware medical networks can be summarized into the following: Distributed and Cooperating Services: Gathering and applying context information is often tied to specific spaces or environments dedicated to a specific purpose. For example, using a context-aware computer system to aid and guide a surgeon is highly dependent on accurate and detailed context information about things going on in the operating room. Therefore, a context-awareness infrastructure should be distributed and loosely coupled, while maintaining ways of cooperating in a peerto-peer fashion. Security and Privacy: Clinical data about patients are important context data for clinical applications, and such data should be handled secure and its privacy respected. For example, the hospital bed uses information about the treatment of the
Introducing Context-Awareness and Adaptation in Telemedicine Systems
167
patient as context information, enabling it to adjust itself to the patient. Hence, context data should be protected, subject to access control, and not revealed to unauthorized clients. Therefore, the context services should embed an access control mechanism. Furthermore, it is important to know the validity of clients delivering context data. Lookup and Discovery: Context-aware clinical application will continuously enter and leave the hospital, e.g. running on mobile equipment or being deployed as new applications. Such clients should be able to locate and connect to relevant context services in the infrastructure. Services are therefore required to register at Lookup and Discovery services and reveal what they can do. Extensibility: Clinical applications using new context information and acquisition methods will constantly be deployed in treatment facilities. Therefore, a contextawareness infrastructure should be extensible in several ways. First, it should be possible to deploy, modify, and remove context services. Second, the infrastructure should support evolvement of supported types of context by dynamically load context definitions, functionality, and acquisition mechanisms, like new context sensors.
4 Enabling Context Awareness Context awareness refers to the ability of systems to react based on their environment. Devices and computer systems may have information about the circumstances under which they are able to operate and based on rules, or an intelligent stimulus, react accordingly. The term context-awareness in ubiquitous computing was introduced by Schilit [14], [15]. Context aware devices may also try to make assumptions about the user's current situation. Dey defines context as "any information that can be used to characterize the situation of entities." [16]. Three important aspects of context are: (1) where the individual is; (2) who the individual is with; and (3) what resources are nearby. Although location is a primary capability, location-aware does not necessarily capture things of interest that are mobile or changing. Context-aware in contrast is used more generally to include nearby people, devices, lighting, noise level, network availability, and even the social situation; e.g., whether you are with your family or a friend from school. In the domain of patient remote care context awareness refers to detection of patient status and appropriate adaptation of the medical services according to the latter status and environmental conditions.
4.1 Patient Status Awareness Patient status awareness can be achieved by continuously monitoring the patient state through collecting information either directly related to the individual’s health (e.g., biosignals like heart rate, temperature, blood oximetry and others summarized in Table 1) or information that can be processed and indicate emergency cases (e.g., detection of fall events, call for help, etc.).
168
C. Doukas, I. Maglogiannis, and K. Karpouzis
A broad definition of a signal is a ‘measurable indication or representation of an actual phenomenon’, which in the field of biosignals, refers to observable facts or stimuli of biological systems or life forms. In order to extract and document the meaning or the cause of a signal, a physician may utilize simple examination procedures, such as measuring the temperature of a human body or have to resort to highly specialized and sometimes intrusive equipment, such as an endoscope. Following signal acquisition, physicians go on to a second step, that of interpreting its meaning, usually after some kind of signal enhancement or ‘pre-processing’, that separates the captured information from noise and prepares it for specialized processing, classification and decision support algorithms. Table 1 Broadly used biosignals with corresponding metric ranges, number of sensors required and information rate. Biomedical Measurements Voltage range (Broadly Used Biosignals) (V)
Number of sen- Information rate sors (b/s)
ECG Heart sound Heart rate EEG EMG Respiratory rate Temperature of body
5-9 2-4 2 20 2+ 1 1+
0.5-4 m Extremely small 0.5-4 m 2-200 μ 0.1-5 m Small 0-100 m
15000 120000 600 4200 600000 800 80
Biosignals require a digitization step in order to be converted into a digital form. This process begins with acquiring the raw signal in its analog form, which is then fed into an analog-to-digital (A/D) converter. Since computers cannot handle or store continuous data, the first step of the conversion procedure is to produce a discrete-time series from the analog form of the raw signal. This step is known as ‘sampling’ and is meant to create a sequence of values sampled from the original analog signals at predefined intervals, which can faithfully reconstruct the initial signal waveform. The second step of the digitization process is quantization, which works on the temporally sampled values of the initial signal and produces a signal, which is both temporally and quantitatively discrete; this means that the initial values are converted and encoded according to properties such as bit allocation and value range. Essentially, quantization maps the sampled signal into a range of values that is both compact and efficient for algorithms to work with. The latter information is usually collected by equipment installed on the patient or on his/her surrounding environment and is transmitted to monitoring units. Proper processing and classification follows in order to detect the patient status from the data.
Introducing Context-Awareness and Adaptation in Telemedicine Systems
169
4.2 Patient Data Collection and Transmission The data acquisition is usually performed either through sensor devices placed on user’s body or monitoring devices at the user’s environment. The first collect biosignals, sounds, and/or movement related data, whereas the latter capture and process audiovisual content and generate estimation for events like patient falling, abnormal movement, distress situations like fire, etc. [17], [18], [19]. Previous works [25], [26] present overviews of such system and a prototype platform for detecting fall incidents and distress situation based on user motion and sound data. Sensor devices illustrated in Figure 1 have been used for data collection and transmission to the monitoring unit. Regarding communication, there are two main enabling technologies according to their topology: on-body (wearable) and off-body networks. Recent technological advances have made possible a new generation of small, powerful, mobile computing devices. An off-body network connects to other systems that the user does not wear or carry and it is based on a Wireless Local Area Network (WLAN) infrastructure, while an on-body or Wireless Personal Area Network (WPAN) connects the devices themselves; the computers, peripherals, sensors, and other subsystems and runs at ad hoc mode. Table 2 Wireless connection technologies for telemedicine systems. Technology
Data rate
Range
Frequency
IEEE 802.11a
54 Mbps
150 m
5 GHz
IEEE 802.11b
11 Mbps
150 m
2.4 GHz ISM
Bluetooth (IEEE 802.15.1)
721 Kbps
10 m - 150 m
2.4 GHz ISM
HiperLAN2
54 Mbps
150 m
5 GHz
HomeRF (Shared Wireless Access Protocol, SWAP)
1.6 Mbps (10 Μbps for Ver.2)
50 m
2.4GHz ISM
DECT
32 kbps
100 m
1880-1900 MHz
PWT
32 kbps
100 m
1920-1930 MHz
IEEE 802.15.3 (high data rate wireless personal area network)
11-55 Mbps
1 m - 50 m
2.4GHz ISM
IEEE 802.16 (Local and Metropolitan Area Networks)
120 Mbps
City limits
2-66 GHz
IEEE 802.15.4 (low data rate wireless personal area network), Zigbee
250 kbps, 20 kbps, 40 kbps
100 m - 300 m 2.4 GHz ISM, 868 MHz, 915MHz ISM
IrDA
4Mbps (IrDA-1.1)
2m
IR (0.90 micrometer)
170
C. Doukas, I. Maglogiannis, and K. Karpouzis
Telemedicine systems set high demanding requirements regarding energy, size, cost, mobility, connectivity and coverage. Varying size and cost constraints directly result in corresponding varying limits on the energy available, as well as on computing, storage and communication resources. Low power requirements are necessary also from safety considerations since such systems run near or inside the body. Mobility is another major issue for pervasive e-health applications because of the nature of users and applications and the easiness of the connectivity to other available wireless networks. Both off-body and personal area networks must not have line-of-sight (LoS) requirements. The various communication modalities (see Table 2) can be used in different ways to construct an actual communication network. Two common forms are infrastructure-based networks and ad hoc networks. Mobile ad hoc networks represent complex systems that consist of wireless mobile nodes, which can freely and dynamically self-organize into arbitrary and temporary, ”ad hoc” network topologies, allowing devices to seamlessly inter-network in areas with no pre-existing communication infrastructure or centralized administration. The effective range of the sensors attached to a sensor node defines the coverage area of a sensor node. With sparse coverage, only parts of the area of interest are covered by the sensor nodes. With dense coverage, the area of interest is completely (or almost completely) covered by sensors. The degree of coverage also influences information processing algorithms. High coverage is a key to robust systems and may be exploited to extend the network lifetime by switching redundant nodes to power-saving sleep mode.
Fig. 1 Wearable medical sensor devices: (a) A 3-axis accelerometer on a wrist device enabling the acquisition of patient movement data [37], (b) A ring sensor for monitoring of blood oxygen saturation [38], (c) Wearable heart rate monitoring system by Numetrex [39].
4.3 Medical Devices Access, Communication and Interoperability Issues The discovery and description of the medical devices must be semantic in order to discover appropriate medical devices to which one device wants to communicate. Thus, we suggest that the profiles of medical devices must be described by using existing ontologies, i.e. FIPA [17] or CC/PP [18], or by further specializing these ontologies for medical devices. The FIPA ontology specifies a frame-based structure to describe devices, and is intended to facilitate agent communication for purposes such as content adaptation. On the other hand, CC/PP is an RDF-based framework for describing software and hardware profiles of the devices, specifically to facilitate
Introducing Context-Awareness and Adaptation in Telemedicine Systems
171
the decision making process of a server, on how to customize and transfer web content to a client device in a suitable format. On the other hand, medical devices can also interoperate with existing legacy systems, being operated on different health standards, i.e. HL7, OpenEHR etc. As shown in Figure 2, a component with the name “Device Management Module” enriches the legacy systems with the capabilities of discovery and communication with external medical systems utilizing semantic annotations for devices and the retrieved context awareness. We suggest that this module should be developed for each of a particular health standard (i.e. HL7 v.2.3) compliant system in a language that can be executed on a number of platforms without its recompilation. The obvious choice for this purpose is Java, because the runtime environments to execute Java byte code exist for a number of platforms (software/hardware). Once this module has been developed for a particular health standard with the aforementioned capabilities, devices can easily discover this HIS/LIS and can query the functionalities that it provides and communicate with it seamlessly by understanding the semantic meanings of the functionalities that it offers.
4.4 Semantic Medical Devices and Services We propose the use of Semantic Web Services (SWS) [14] to expose the functionalities of the medical devices as well as the functionalities of HISs/LISs, and to resolve the interoperability issues on each end. By exposing the various functionalities as Web Services and advertising them via SWS, medical devices can discover the services available in a hospital, laboratory or a clinic wherever they are physically present. Finally, the semantic descriptions of the Web Services provided by medical devices will automatically enable them to select, compose and execute the desired composite task. Being a constituent part of the Ambient Intelligence, a medical device must have context-awareness capability, so that it could adapt itself to the rapidly changing situations. The various types of contextual information that can be used in the environment must be well defined so that different medical devices have a common understanding of the context. Also, there must be mechanisms for the medical device users to specify how different applications and services should behave in different contexts. A proposed architecture for medical devices interoperability through semantics is illustrated in Figure 2. The Context Awareness Management (CAM) component manages the context awareness behavior of a medical device. It includes Context Manager (CM), which retrieves the contextual information from the subcomponents, i.e. Device Context, User Context, Security Context and the Physical Context. The device context provides information about the device (i.e. status, battery power etc.); the user context provides information about the user of the device (i.e. patient/health professional, personal prefers.); the physical context provides information about the present environment (i.e. hospital, clinic, laboratory, home etc.); and the security context provides information about the required and provided security level for a particular environment (i.e. a health professional must
172
C. Doukas, I. Maglogiannis, and K. Karpouzis
Fig. 2 Proposed Architecture for Medical Device Interoperability utilizing Context awareness and Semantic modules
provide his user identity (i.e. smart card, eToken) to send or receive patient’s information from/on the device etc.). These sub-components provide basic contextual information in the form of context markups (i.e. an RDF graph), which support the CM not only to retrieve the contexts from Context Knowledge Base (CKB) through the Knowledge Query Engine (KQE), but also to infer higher-level contexts, with the help of Knowledge Reasoner (KR). The CKB provides persistent knowledge storage, in the form of an extended context ontology for a particular environment (i.e. hospital, laboratory etc.) and the context markups that are given by the users or gathered from the basic context provider components (device context, physical context etc.). The CKB links the context ontology and markups in a single semantic model and provides interfaces for the KQE and the KR to manipulate correlated contexts. The KQE provides an abstract interface to the CM for extracting desired contexts from the CKB. To support expressive queries, any RDF Data Query Language can be used as context query language.
4.5 Patient Location Technologies Positioning of individuals provides healthcare applications with the ability to offer services like supervision of elderly patients or those with mental illnesses who are ambulatory but restricted to a certain area. In addition, assisted care facilities can use network sensors and radiofrequency ID badges to alert staff members when
Introducing Context-Awareness and Adaptation in Telemedicine Systems
173
patients leave a designated safety zone. Network or satellite positioning technology also can be used to quickly and accurately locate wireless subscribers in an emergency and communicate information about their location. Proximity information services can direct mobile users to a nearby healthcare facility. Locationbased health information services can help find people with matching blood types, organ donors, and so on. A more extensive list of location-based health services can be found in [21]. Positioning techniques can be implemented in two ways: Self-positioning and remote positioning. In the first approach, equipment that the user uses (e.g., a mobile terminal, or a tagging device) uses signals, transmitted by the gateways/antennas (which can be either terrestrial or satellite) to calculate its own position. More specifically, the positioning receiver makes the appropriate signal measurements from geographically distributed transmitters and uses these measurements. Technologies that can be used are satellite based (e.g., the Global Positioning System (GPS) and assisted-GPS), or terrestrial infrastructure-based (e.g., using the cell id of a subscribed mobile terminal). The second technique is called remote positioning. In this case the individual can be located by measuring the signals traveling to and from a set of receivers. More specifically, the receivers, which can be installed at one or more locations, measure a signal originating from, or reflecting off, the object to be positioned. These signal measurements are used to determine the length and/or direction of the individual radio paths, and then the mobile terminal position is computed from geometric relationships; basically, a single measurement produces a straight-line locus from the remote receiver to the mobile phone. Another Angle Of Arrival (AOA) measurement will yield a second straight line, the intersection of the two lines giving the position fix for this system. Time delay can also be utilized: Since electromagnetic waves travel at a constant speed (speed of light) in free space, the distance between two points can be easily estimated by measuring the time delay of a radio wave transmitted between them. This method is well suited for satellite systems and is used universally by them. Popular applications that are based on the latter technique for tracking provision are the Ekahau Positioning Engine [22], MS RADAR [23] and Nibble [24]. More information regarding positioning techniques and systems can be found in [20].
4.6 Data Processing and Classification The collected data contain information regarding the user’s physiological status (in case of biosignals), potential distress situations (e.g., falls in case of movement data) and general information that can be correlated with the patient state. The data need further processing upon collection until the latter information can be acquired. Proper filtering might be required in order to remove irrelevant data like noise (e.g., in case of movement or sound data). In some cases patient state can be determined by applying simple value thresholds (e.g., in case of body temperature or heart rate) but in cases motion detection and interpretation advanced data classification techniques might be required. In [27] an overview of classification algorithms is
174
C. Doukas, I. Maglogiannis, and K. Karpouzis
presented that can be applied on movement and sound data collected by on-body sensors for patient fall event detection.
4.7 User Environment Context Awareness Apart from determining the patient status, context aware medical treatment and monitoring systems must incorporate information related to user’s environment. More specifically: User’s indoor or outdoor location can be determined by external devices (i.e. GPS, mobile or WLAN phones) and facilitate the process of ambulatory dispatching in case of emergency events. Based on location, proper proactive or reactive data transmission may also be performed. Information regarding the communication equipment used (e.g., laptop computer, mobile phone or PDA) can facilitate the content adaptation in case of video communication. Transmission capabilities of the underlying networking infrastructures (e.g., network interface type used, allocated bandwidth, real time network traffic information, etc.) can affect the communication and thus facilitate the determination of proper content adaptation like application of compression schemes. More information regarding context-aware medical networks and telemedicine services can be found at [28].
Fig. 3 Illustration of the semantic representation of the context aware data adaptation system using an ontological structure. Major component and actuator classes are illustrated among with most important features for each class.
5 Context Semantic Representation In order to semantically represent the context aware system and the content adaptation the ontology illustrated in Figure 3 has been developed. Both the patientrelated context and content have been modeled. More specifically: regarding the medical content, a representative class with three subclasses has been created. Each subclass represents image and video medical data, audio data and biosignals
Introducing Context-Awareness and Adaptation in Telemedicine Systems
175
respectively. Most important features for the proper content adaptation are the transmission data rate, type of encryption used (e.g., PKI [29], simple symmetric, or none), compression ratio (in case of scalable compression), codec used (e.g., H.264 for video, JPEG2000 for images and ITU G.723 for audio), and analysis (specifically for images and video according to the network status and the presentation device). The patient status is characterized according to physiological state, distress state (i.e. more generic from the latter containing status indications based on vocal and sound analysis), and movement state (e.g., detection of falls or long periods of inactivity). The basic attributes for the aforementioned states are the severity of the status (e.g., numerical representation of the emergency severity level), description of the incident and indication of fall or long inactivity status. A patient environment-related class has also been developed for representing the status of the underlying network infrastructures, the user location and the device types that are used for data collection, transmission. Concerning the network status, wired or wireless interfaces can be used. For both interfaces, the type of the medium, the total available bandwidth and the current throughput can affect the data transmission and thus content adaptation, whereas in the case of wireless interface the received signal strength might also be an important factor for the content adaptation. User location has been categorized into indoor and outdoor with a simple description as a respective attribute. Finally, the class “Device Type” refers to the transmission device the patient/user operates for communicating with the treatment/monitoring units. In case of static devices (e.g. PCs) the operating system and the screen resolution might determine content like the video analysis, and frame rate, whereas in case of mobile devices (e.g., mobile phones, PDAs, etc.) memory and power resources can also affect the transmission and presentation of the medical content respectively. The ontological model has been developed within the Protégé [34] semantic framework using the Ontology Web Language (OWL). The main advantages of the semantic representation of the context aware adaptive system can be summarized into the following: −
−
Flexibility to modify and extend the contextual scheme by adding more classes. In case the parameters that define the context of the patient (e.g.., status, environment, location, etc.) need to be modified, the ontological model can be altered without invoking modifications to the implementation modules or the architecture of the platform. Better and more flexible evaluation of the context facilitating the decisions for the medical content adaptation. Using advanced semantic rule evaluation techniques (to be discussed in Section 5) content adaptation decisions can be made according to a plethora of contextual parameters. The rules can be updated and extended without any need for system platform software modifications.
Additionally, ontologies are explicit because define the concepts, properties, relationships, functions, axioms and constraints that compose the contextual model. They are formal because they are machine readable and interpreted.
176
C. Doukas, I. Maglogiannis, and K. Karpouzis
6 Content Adaptation Content adaptation refers to proper medical data coding and proactive or reactive transmission for achieving better utilization of network and system resources during the monitoring and treatment process. The most demanding data in terms of network and system resources for transmission and processing are the medical and audiovisual data. Additionally, content adaptation can also include different data encryption schemes that can be applied according to data sensitivity and severity of an emergency incident.
6.1 Image and Video/Audio Coding The coding of medical image and audiovisual data refers to data compression. According to the patient status and underlying network interfaces and conditions, several compression schemes can be applied; for instance, uncompressed data can be transmitted in case of a fast wired network connection, whereas higher compression schemes can be applied when using wireless connections with lower data rate availability. In case of visual assessment it might be important to maintain particular parts of the image/video of visual context at higher quality and increase the compression on less diagnostic important regions. Examples of special region of interest (ROI) coding with scalable compression can be found at [30], [31] for both medical image and video data.
6.2 Adapted Data Security Policies The medical context as presented in previous sections contains sensitive information regarding the patient status, location and context of the surrounding environment. Therefore, several security issues are introduced and must be considered by context-aware medical networks: − −
− −
Maintaining information privacy, i.e. to prevent any disclosure of information directly related to the individual to a service or application without the user’s prior approval or knowledge. Maintaining context privacy, i.e. to prevent any disclosure of information related to the context in which the user is using the service (for example her current device parameters) and from which indirect information for the user could be extracted. Maintaining location privacy of the user, i.e. to deny an attacker the knowledge of a device’s current and past location and preventing linkability. Preserving anonymity of the users’ identifiable parameters for distinct scenarios, i.e. preserving their “state of being not identifiable within a set of subjects”.
Introducing Context-Awareness and Adaptation in Telemedicine Systems
177
Proper solutions for resolving the latter issues can be: −
− − −
Mechanisms for protecting any type of sensitive information which the user considers private and for any level of granularity; the user decides how to protect her sensitive information and anonymity, and location privacy. Data abstractions over all types of low-level sensitive data are part of the mechanism and they are processed first allowing faster filtering and default setting; To help for the personalization, for hiding all the complexity of the system from the user, for delegating privacy decisions from the user to her device, descriptive profiles of user, user roles, scenarios, context are used. Rule-based access over the private data helps to delegate the decisions to the device and to take actions concerning the correct providing privacy parameters to the services Any time when the context attributes change, the privacy protection mechanisms evaluates the overall privacy status and acts accordingly based on the predefined rules
In the presented platform several data encryption schemes can be applied for providing medical content privacy, confidentiality, non repudiation and encryption. According to the sensitivity of the data and the severity of the case, simple symmetrical encryption schemes [32] to more complex public key infrastructures can be applied [33]. The platform decides according to specific context parameters, which data encryption methodology will be utilized prior to transmission. For instance in case of an emergency incident in an area where only low-bandwidth networks are available, the platform skips the encryption process.
6.3 Reactive Data Transmission Unnecessary transmission of medical data or monitoring data (e.g., video from user’s environment) can be avoided by using reactive data transmission. In case of normal patient state, data related to the patient context and status (e.g., visual data and biosignals) can be transmitted to monitoring units proactively in specified time intervals. In case of a detected distress situation, reactive transmission can begin. More information on data transmission based on context awareness can be found in [28].
7 Content Adaptation Based on Semantic Rules Evaluation In order to perform the appropriate medical content adaptation that has been discussed in the previous sections, several semantic rules have been defined. These rules concern features of the ontological class that represents the context aware model semantically. By performing proper evaluation of the latter, decision regarding the content adaptation can be made.
178
C. Doukas, I. Maglogiannis, and K. Karpouzis
The creation of semantic rules required the description of the latter through abstract semantic languages like the Semantic Web Rule Language (SWRL) [35]. The syntax for SWRL abstracts from any exchange syntax for OWL [48] and thus facilitates access to and evaluation of the language. An OWL ontology in the abstract syntax contains a sequence of axioms and facts. Axioms may be of various kinds, e.g., subClass axioms and equivalentClass axioms. It is proposed to extend this with rule axioms. axiom ::= rule
A rule axiom consists of an antecedent (body) and a consequent (head), each of which consists of a (posibly empty) set of atoms. A rule axiom can also be assigned a URI reference, which could serve to identify the rule. rule ::= 'Implies(' [ URIreference ] { annotation } antecedent consequent ')' antecedent ::= 'Antecedent(' { atom } ')' consequent ::= 'Consequent(' { atom } ')'
Informally, a rule may be read as meaning that if the antecedent holds (is "true"), then the consequent must also hold. An empty antecedent is treated as trivially holding (true), and an empty consequent is treated as trivially not holding (false). Rules with an empty antecedent can thus be used to provide unconditional facts; however such unconditional facts are better stated in OWL itself, i.e., without the use of the rule construct. Non-empty antecedents and consequents hold if all of their constituent atoms hold, i.e., they are treated as conjunctions of their atoms. Rules with conjunctive consequents could easily be transformed into multiple rules each with an atomic consequent. atom ::= description '(' i-object ')' | dataRange '(' d-object ')' | individualvaluedPropertyID '(' i-object i-object ')' | datavaluedPropertyID '(' i-object d-object ')' | sameAs '(' i-object i-object ')' | differentFrom '(' i-object i-object ')' | builtIn '(' builtinID { d-object } ')' builtinID ::= URIreference
Atoms can be of the form C(x), P(x,y), sameAs(x,y) differentFrom(x,y), or builtIn(r,x,...) where C is an OWL description or data range, P is an OWL property, r is a built-in relation, x and y are either variables, OWL individuals or OWL data values, as appropriate. Atoms may refer to individuals, data literals, individual variables or data variables. Variables are treated as universally quantified, with their scope limited to a given rule. As usual, only variables that occur in the antecedent of a rule may occur in the consequent. i-variable ::= 'I-variable(' URIreference ')' d-variable ::= 'D-variable(' URIreference ')'
While this abstract syntax is consistent with the OWL specification, and is useful for defining XML and RDF serialisations, it is rather verbose and not particularly
Introducing Context-Awareness and Adaptation in Telemedicine Systems
179
easy to read. Often a relatively informal "human readable" form is used similar to that used in many published works on rules. In this syntax variables are indicated using the standard convention of prefixing them with a question mark (e.g.,?x). Using this syntax, a rule asserting that the composition of parent and brother properties implies the uncle property would be written: parent(?x,?y) -> brother(?y,?z) -> uncle(?x,?z)
Within this context, the SWRL Factory [34] mechanism and an integrated Jess rule engine [36] using the Protégé tool have been utilized. Jess provides both an interactive command line interface and a Java-based API to its rule engine. This engine can be embedded in Java applications and provides a flexible two-way runtime communication between Jess rules and Java. The Jess system consists of a rule base, a fact base, and an execution engine. Two indicative sample SWRL rules follow that can be used within the presented framework in order to facilitate the decision on the content adaptation based on patient’s context parameters: Patient(?x) ^ PhysiologicalState(?y) ^ hasSeverity(?x,?y, ?severity) ^ hasDescription(?y,?description)^ Biosignal(?BS) ^ BiosignalRate(?Rate)^ swrlb:otherThan(?severity,?Normal) -> DefineTransmissionRate(?Rate,”100kbps”)^ StartTransmission(“true”) Patient(?x) ^ MovementState(?move) ^ FallDetected(?x,?move) ^ hasDescription(?move,?description)^ UserLocation(?Location) ^ VideoRate(?Rate)^NetworkStatus(?Wired)^ swrlb:equals(?move,”Fall”)^swrlb:equals(?Location, “Indoor”)^swrlb:equals(?Wired, ”true”) -> DefineVideoRate(?Rate,”300kbps”)^ StartTransmission(“true”)
The first rule examines the physiological state of the patient as characterized by the status awareness modules in terms of status severity. If the latter is considered to be other than “Normal” then transmission of the collected biosignals to the monitoring units begins at a specific data rate. The second rule is more advanced and takes into consideration potential indication of a fall event, the location of the user and the network status. According to the rule, video transmission of the patient’s premises will begin in case a fall has been detected. High transmission rate will be used if the user is located indoor and a wired network infrastructure is used.
8 Proposed Architecture Scheme This Section presents the proposed architecture scheme that incorporates modules that feature the discussed aspects of context awareness and medical content adaptation. The interconnection and communication of the different components can be illustrated as five different application layers (see Figure 4). Initial data acquisition from the sensor and monitoring devices is followed by proper processing for feature extraction. The context awareness is performed by classifying the generated features
180
C. Doukas, I. Maglogiannis, and K. Karpouzis
and utilizing semantic evaluation of the latter. Application of semantic rules facilitates the determination of patient status and detection of emergency events. According to the detected patient status and additional contextual information regarding the patient’s environment and underlying network conditions, proper content adaptation to the medical data is performed. The content related to incident is coded (i.e. compressed and encrypted) accordingly and transmitted to the monitoring units.
Fig. 4 Illustration of the incorporated application layers for context awareness and content adaptation and transmission.
Figure 5 illustrates a proposed architecture scheme for interconnected all the involved components for enabling context awareness and proper content coding. The provision of the contextual data (i.e., estimated patient status based on rules evaluation, medical data and other context data) can be performed either through appropriate web-based and application interfaces or through creating appropriate Web Services, as discussed in the following paragraph.
8.1 Context Information Provision through Web Services Web Services are emerging as a promising technology to build distributed applications. It is an implementation of Service Oriented Architecture (SOA) that supports the concept of loosely-coupled, open-standard, language - and platform-independent systems. Web Services are accessed through the HTTP/HTTPS protocols and utilize XML (eXtendible Markup Language) for data exchange. This in turn implies that Web Services are independent of platform, programming language, tool and network infrastructure. Services can be assembled and composed in such a way to foster the reuse of existing back-end infrastructure. The basic SOA includes three service components: provider, requester and registry. WSDL (Web Service Description Language) is commonly defined by the service provider for invoking the service. SOAP (Simple Object Access Protocol) is adopted as message transfer protocol between requester and provider and UDDI (the Universal Description, Discovery and Integration) is used for service registration and discovery.
Introducing Context-Awareness and Adaptation in Telemedicine Systems
181
The typical scenario illustrated in Figure 6 is based on publishing (WSDL reference), searching for a service and binding to a service provider. The XML messaging between service consumer and provider exploits the SOAP protocol. SOAP provides automatic marshalling/unmarshalling of the arguments, like Remote Procedure Call (RPC).
Fig. 5 The proposed architecture that incorporates modules and components for proper medical content adaptation based on context awareness.
Fig. 6 General Web Services component framework
Web services provide several technological and business benefits, a few of which include application and data integration, versatility, code re-use and cost savings. The inherent interoperability that comes with using vendor, platform, and language independent XML technologies and the ubiquitous HTTP as a transport
182
C. Doukas, I. Maglogiannis, and K. Karpouzis
mean that any application can communicate with any other application using Web services. Web services are also versatile by design. They can be accessed by humans via a Web-based client interface, or they can be accessed by other applications and other Web services. Code re-use is another positive side-effect of Web services' interoperability and flexibility. One service might be utilized by several clients, all of which employ the operations provided to fulfill different business objectives. In order to provide direct and efficient access to the contextual information generated by the platform, a Web Service module has been developed. The latter can expose specific functionality to developers for creating external client applications that can monitor the acquired biosignals of the platform, get information regarding the context of the patient and perform content adaptation based on rules evaluation. Figure 7 illustrates a sample WSDL definition for the developed Web Service. Two functions are described that concern the status of the patient and the parameters that describe the adaptation of the medical content based on the rules evaluation of the platform.
Fig. 7 WSDL sample description for the provided Web Services. “getPatientstatus” and “ContentAdaptationParams” refer to functions that provide information regarding the patient’s state and the medical content adaptation parameters as indicated by the proposed framework.
9 Conclusions A context-aware medical content adaptation platform has been presented. The platform utilizes sensor data for determining the patient status and takes into account additional contextual information like underlying network conditions, and data transmission devices. A semantic representation for the patient context has been developed and appropriate rule-based system is used in order to perform proper medical content adaptation according to the context, facilitating and improving the diagnosis and treatment process. In addition, a Web Service module provides access to information related to the context of the patient and the medical content adaptation.
Introducing Context-Awareness and Adaptation in Telemedicine Systems
183
Future work might include the deployment of the proposed platform in a real remote treatment and monitoring environment for assessing the actual contribution of context awareness and content adaptation to the remote medical care process.
References 1. Lin, J.C.: Applying telecommunication technology to health care delivery. IEEE Engineering in Medicine and Biology Magazine 4, 28–31 (1999) 2. Pavlopoulos, S., Kyriacou, E., Berler, A., Dembeyiotis, S., Koutsouris, D.: A novel emergency telemedicine system based on wireless communication technologyAMBULANCE. IEEE Transactions on Information Technology in Biomedicine 4, 261–267 (1998) 3. Deb, S., Ghoshal, S., Malepati, V.N., Kleinman, D.L.: Tele-diagnosis: remote monitoring of large-scale systems. In: Proc. of IEEE Aerospace Conference, pp. 31–42 (2001) 4. Choi, Y.B., Krause, J.S.H., Seo, C.K., Chung, E.K.: Telemedicine in the USA: standardization through information management and technical applications. IEEE Communications Magazine 44, 41–48 (2006) 5. Pattichis, C.S., Kyriacou, E., Voskarides, S., Pattichis, M.S., Istepanian, R., Schizas, C.N.: Wireless telemedicine systems: an overview. IEEE Antennas and Propagation Magazine 44, 143–153 (2002) 6. Aakay, M., Marsic, I., Medl, A., Bu, G.: A system for medical consultation and education using multimodal human/machine communication. IEEE Transactions on Information Technology in Biomedicine 2(4), 282–291 (1998) 7. Zhou, J., Shen, X., Georganas, N.D.: Haptic tele-surgery simulation. In: Proc. of the 3rd IEEE International Workshop on Haptic, Audio and Visual Environments and their Applications, pp. 99–104 (2004) 8. Fontelo, P., DiNino, E., Johansen, K., Khan, A., Ackerman, M.: Virtual Microscopy: Potential Applications in Medical Education and Telemedicine in Countries with Developing Economies. In: Proc. of the 38th Annual Hawaii International Conference on System Sciences, p. 153 (2005) 9. Lage, A.-L., Martins, J., Oliveira, J., Cunha, W.: A quality of service approach for managing tele-medicine multimedia applications requirements. In: Proc. of IEEE Workshop on IP Operations and Management, pp. 186–190 (2004) 10. LeRouge, C., Garfield, M.J., Hevner, A.R.: Quality attributes in telemedicine video conferencing. In: Proc. of 35th Annual Hawaii International Conference on System Sciences, pp. 2050–2059 (2002) 11. Yu, H., Lin, Z., Pan, F.: Applications and improvement of H.264 in medical video compression. IEEE Transactions on Circuits and Systems 52(12), 2707–2716 (2005) 12. Bernabe, G., Gonzalez, J., Garcia, J.M., Duato, J.: A new lossy 3-D wavelet transform for high-quality compression of medical video. In: Proc. of IEEE EMBS International Conference on Information Technology Applications in Biomedicine, pp. 226–231 (2000) 13. Doukas, C.N., Maglogiannis, I., Kormentzas, G.: Medical Image Compression using Wavelet Transform on Mobile Devices with ROI coding support. In: Proc. of the 27th Annual International Conference of the IEEE EMBS, Shanghai, China, 14. Schilit, B., Adams, N., Want, R.: Context-aware computing applications. In: IEEE Workshop on Mobile Computing Systems and Applications (WMCSA 1994), Santa Cruz, CA, US, pp. 89–101 (1994)
184
C. Doukas, I. Maglogiannis, and K. Karpouzis
15. Schilit, B.N., Theimer, M.M.: Disseminating Active Map Information to Mobile Hosts. IEEE Network 8(5), 22–32 (1994) 16. Dey, A.K.: Understanding and Using Context. Personal Ubiquitous Computing 5(1), 4–7 (2001) 17. Wang, S., Yang, J., Chen, N., Chen, X., Zhang, Q.: Human activity recognition with user-free accelerometers in the sensor networks. In: Proc. of International Conference on Neural Networks and Brain, pp. 1212–1217 (2005) 18. Miaou, S.G., Sung, P.-H., Huang, C.-Y.: A Customized Human Fall Detection System Using Omni-Camera Images and Personal Information. In: Proc. of 1st Transdisciplinary conference on Distributed Diagnosis and Home Healthcare, pp. 39–42 (2006) 19. Istrate, D., Castelli, E., Vacher, M., Besacier, L., Serignat, J.F.: Information extraction from sound for medical telemonitoring. IEEE Transaction on Information Theory in Biomedicine 2(10), 264–274 (2006) 20. Zeimpekis, V., Giaglis, G.M., Lekakos, G.: A taxonomy of indoor and outdoor positioning techniques for mobile location services. ACM SIGecom Exchanges 3(4), 19–27 (2003) 21. Shih-wei, L., Shao-you, C., Yung-jen, H.J., Polly, H., Chuang-wen, Y.: Emergency Care Management with Location-Aware Services. In: Pervasive Health Conference and Workshops, pp. 1–6 (2006) 22. Ekahau LBS, http://www.ekahau.com (accessed on September 26, 2005) 23. Bahl, P., Padmanabhan, V.N.: RADAR: An In-Building RF-based User Location and Tracking System. In: INFOCOM, pp. 775–784. IEEE Press, Los Alamitos (2000) 24. Castro, P., Chiu, P., Kremenek, T., Muntz, R.: A Probabilistic Room Location Service for Wireless Networked Environments. In: Abowd, G.D., Brumitt, B., Shafer, S. (eds.) UbiComp 2001. LNCS, vol. 2201, pp. 18–34. Springer, Heidelberg (2001) 25. Doukas, C., Maglogiannis, I.: Enabling Human Status Awareness in Assistive Environments based on Advanced Sound and Motion Data Classification. Presented at The 1st ACM International Conference on PErvasive Technologies Related to Assistive Environments (PETRAE), Athens, Greece, July 16-19 (2008) 26. Doukas, C., Maglogiannis, I.: Advanced Patient or Elder Fall Detection based on Movement and Sound Data. Presented at 2nd International Conference on Pervasive Computing Technologies for Healthcare (2008) 27. Doukas, C., Maglogiannis, I.: Human Distress Sound Analysis and Characterization using Advanced Classification Techniques. Presented at 5th Hellenic Conference on Artificial Intelligence, Syros, Greece, October 2-4 (2008) 28. Doukas, C., Maglogiannis, I., Kormentzas, G.: Advanced Telemedicine Services through Context-aware Medical Networks. In: Proceedings of the IEEE EMBS cosponsored International Special Topic Conference on Information Technology in Biomedicine (ITAB 2006), Ioannina-Epirus, Greece, October 26-28 (2006) 29. Public Key Infrastructure, online information, http://www.ietf.org/html.charters/pkix-charter.html 30. Maglogiannis, Doukas, C., Kormentzas, G., Pliakas, T.: Optimized Mobile Access to DICOM Images using Wavelet compression with ROI coding support. To appear in IEEE Transactions on Information Technology in Biomedicine 31. Doukas, C., Maglogiannis, I.: Adaptive Transmission of Medical Image and Video using Scalable Coding and Context-aware Wireless Medical Networks. EURASIP Journal on Wireless Communications and Networking, Article ID 428397 2008, 12 (2008) 32. Makris, L., Argiriou, N., Strintzis, M.G.: Network and data security design for telemedicine applications. Informatics for Health and Social Care 22(2), 133–142 (1997)
Introducing Context-Awareness and Adaptation in Telemedicine Systems
185
33. Bao, S.-D., Shen, L.-F., Zhang, Y.-T.: A novel key distribution of body area networks for telemedicine. In: Proc. of 2004 IEEE International Workshop on Biomedical Circuits and Systems, pp. 1–17-20a (2004) 34. Protégé Ontology Editor and Knowledge Base Framework, more information, http://protege.stanford.edu/ 35. The Semantic Web Rule Language definition, http://www.w3.org/Submission/SWRL/ 36. The Jess Rule Engine, http://www.jessrules.com/jess/index.shtml 37. Malan, D., Fulford-Jones, T., Welsh, M., Moulton, S.: CodeBlue: An Ad Hoc Sensor Network Infrastructure for Emergency Medical Care. In: International Workshop on Wearable and Implantable Body Sensor Networks (2004) 38. Rhee, S., Yang, B.-H., Chang, K., Asada, H.H.: The Ring Sensor: a New Ambulatory Wearable Sensor for Twenty-Four Hour Patient Monitoring. In: Proceedings of the 20th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 20(4), pp. 1906–1909 (1998) 39. Numetrex cardio shirt, http://www.numetrex.com/about/cardio-shirt 40. Bardram, J.E.: The Java Context Awareness Framework (JCAF): A Service Infrastructure and Programming Framework for Context-Aware Applications (2005) 41. Hong, J., Landay, J.: An Infrastructure Approach to Context-Aware Computing (2001) 42. Biegel, G., Cahill, V.: A Framework for Developing Mobile, Context-aware Applications (2004) 43. Dey, A., Abowd, G.D., Salber, D.: A conceptual framework and a toolkit for supporting the rapid prototyping of context-aware applications. Human-Computer Interaction 16, 97–166 (2001) 44. Hohl, F., Mehrmann, L., Hamdan, A.: A context system for a mobile service platform. In: Schmeck, H., Ungerer, T., Wolf, L. (eds.) ARCS 2002. LNCS, vol. 2299, pp. 21–33. Springer, Heidelberg (2002) 45. Henricksen, K., Indulska, J., Rakotonirainy, A.: Modeling context information in pervasive computing systems. In: Mattern, F., Naghshineh, M. (eds.) PERVASIVE 2002. LNCS, vol. 2414, pp. 167–180. Springer, Heidelberg (2002) 46. Hightower, J., Brumitt, B., Borriello, G.: The location stack: A layered model for location in ubiquitous computing. In: Proceedings of the Fourth IEEE Workshop on Mobile Computing Systems and Applications (WMCSA 2002). IEEE Computer Society Press, Los Alamitos (2002) 47. Abowd, G.D.: Software engineering issues for ubiquitous computing. In: Proceedings of the 21st international conference on Software engineering, pp. 75–84. IEEE Computer Society Press, Los Alamitos (1999) 48. Patel-Schneider, P.F., Hayes, P., Horrocks, I. (eds.): OWL Web Ontology Language Semantics and Abstract Syntax. W3C Recommendation 10 February (2004), http://www.w3.org/TR/owl-semantics
Blog Rating as an Iterative Collaborative Process Malamati Louta and Iraklis Varlamis
a
Abstract. The blogosphere is a part of the World Wide Web, enhanced with several characteristics that differentiate blogs from traditional websites. The number of different authors, the multitude of user-provided tags, the inherent connectivity between blogs and bloggers, the high update rate, and the time information attached to each post are some of the features that can be exploited in various information retrieval tasks in the blogosphere. Traditional search engines perform poorly on blogs since they do not cover these aspects. In an attempt to exploit these features and assist any specialized blog search engine to provide a better ranking of blogs, we propose a rating mechanism, which capitalizes on the hyperlinks between blogs. The model assumes that the intention of a blog owner who creates a link to another blog is to provide a recommendation to the blog readers, and quantifies this intention in a score transferred to the blog being pointed. A set of implicit and explicit links between any two blogs, along with the links’ type and freshness, affect the exchanged score. The process is iterative and the overall ranking score for a blog is subject to its previous score and the weighted aggregation of all scores assigned by all other blogs. Keywords: blog, ranking, collaborative rating, local and global rating.
1 Introduction In the competitive industry of web search, the increase of web coverage and the improvement in ranking of results are the two main aims of any potential player. Due to the rapid increase of its content, blogosphere attracted the interest of popular web search engines (e.g. Google, Yahoo! and AskJeeves), companies that provide access exclusively to the blogosphere content (e.g. Blogpulse [1], and Technorati [2]) and researchers that focus to web search [3]. Every blog consists of a series of entries (namely posts), which carry apart from text or other media content, several hyperlinks to other entries or web pages and a timestamp information concerning the post creation. Using this linking mechanism, blogosphere is converted to an interconnected sub-graph of the web, with links to the surrounding web graph too. Similarly to normal links, blog links are used as suggestions or as a means to express agreement or disagreement [4] to Malamati Louta and Iraklis Varlamis Harokopio University of Athens, Department of Informatics and Telematics 176 71, Athens, Greece e-mail: {louta,varlamis}@hua.gr M. Wallace et al. (eds.): Semantics in Adaptive and Personalized Services, SCI 279, pp. 187–203. springerlink.com © Springer-Verlag Berlin Heidelberg 2010
188
M. Louta and I. Varlamis
a blogs’ content. However, due to the ease of the publishing mechanism, they have been utilized to bias search engine results (e.g. splogs, google bombs etc). Since publishing in blogs comes at no cost for web users, the content and number of links provided by individual writers world-wide can easily surpass those in registered websites. This change affects the structure of the web graph and forces search engines to adapt. The ranking mechanisms of web search engines have two main options, concerning blog links: a) to completely ignore them, in order to avoid spamming and b) to take them into account. In the latter case, they have to tackle several trust related issues. This work perceives hyperlinks in blogs as recommendations to blog readers and models the network of hyperlinked blogs as a continuous process where respectful or disrespectful sources recommend other trustful or distrustful ones. The overall ranking score for a blog is computed on top of all its incoming links (inlinks). Moreover, the time information, which is attached in blog posts, is exploited in order to compute hyperlink freshness and re-calculate the overall score for a blog. In the following section we provide reference to research works on web document ranking that make use of various web page information and to works that emphasize on the additional information that blogs carry. In section 3 we give an overview of blog information and the fundamental concepts of our rating model. Section 4 presents the mathematical formulation of our proposed model and suggests a model for attaching rating semantics to blogs. Through the experimental evaluation of our designed mechanism in section 5, we demonstrate the first results from the application of our model in a collection of blogs and present some interesting findings. Finally, section 6 contains the conclusions from this work and our next plans.
2 Related Work Ranking on the web is primarily based on the analysis of the web graph as it is formulated by hyperlinks. It has been ten years since PageRank [5], the most cited ranking algorithm, has been introduced. Several research works, during this period, have attempted to improve PageRank’s performance and incorporate as many information as possible in the web graph, resulting in numerous PageRank variations and a multitude of interesting ideas (e.g., Topic-sensitive pagerank [6], Trustrank [7], Spamrank [8], Page-reRank [9], biased PageRank [10]). The primary aim in all the aforementioned works is to attach extra semantics to hyperlinks, by analyzing neighboring content, or other structural information (e.g. topic, negative or positive opinion, etc.). In addition to automatically extracted semantics, several hyperlink metadata formats have been proposed, which allow web content authors to annotate hyperlinks [11], [12], [13] and search engines to distinguish between links that provide positive and negative recommendations. However, none of these metadata formats has yet been widely employed, and as a result, there is still not a widely accepted method for distinguishing between positive and negative links.
Blog Rating as an Iterative Collaborative Process
189
In the case of blogs, several ranking algorithms have been suggested that exploit explicit (EigenRumor algorithm [14]) and/or implicit (BlogRank [15], [16]) hyperlinks between blogs. All these algorithms formulate a graph of blogs, based on hyperlinks and then apply PageRank or a variation of it in order to provide an overall ranking of blogs. However, all these algorithms provide a static measure of blog importance that does not reflect the temporal aspects accompanying the evolution of the blogosphere. Several models that capture the freshness of links have been proposed with applications in web pages and hyperlinks [17], [18], scientific papers and bibliographic citations [19], [20]. All these works are based on the fact that PageRank and its variations favor old pages. In order to balance this, a link (or citation) weighting scheme is employed, which is based on the age of the web page (or paper). In a post-processing step the authority of a pointed node decays, based on the node’s age and the incoming links age. In the current work, we consider that ranking in the blogosphere is an iterative process. As a first step, we consider that links in the blogosphere act as recommendations to readers. In a second step, we exploit two special features of the blogosphere links: a) the difference between blogroll links, which denote a more permanent trust towards the blog being pointed, and post links, which represent a more transient reference to a blog, b) the timestamp information of a post, which can be employed as a timestamp for a hyperlink.
3 Background This section illustrates the useful information that can be found in a blog and can be incorporated in the blog rating mechanism. In the following, we explain the details of each piece of information; we discuss its availability and its role in the iterative rating model.
3.1 Blog Structure Although the blog structure is not standard, most blogs share the following structure: Each blog has a host URL and contains one or more posts, authored by the blog editors. Post information comprises an author, a body, a date and time of publishing and a URL of the full, individual article, called the permalink. A post optionally includes: comments of readers, category tags and links to referrers (trackback links). The number of comments and trackbacks, where available, can be retrieved by processing the contents of each post. Since this type of information is not standard for all blog servers, the numbers can be retrieved for a small portion of the blogosphere (research works report that less than 1% of posts offers trackbacks and comments information [15]). Topic information is available for more posts ([15] report a number close to 24%). Although the choice of topic is subjective to the author, through the combined analysis of topic and author information we may
190
M. Louta and I. Varlamis
obtain useful information from of the blogosphere, such as authors that link to other authors, linked–related topics etc. The date and time that an entry was registered is another useful piece of information. Analysis of entries based on date and time, will reveal more or less recent blogs, more or less active blogs and authors, and topics with short or long lifecycle. Finally, the blogroll, the list of blogs that is usually placed in the sidebar of a blog, can be used as a list of recommendations by the blogger of other blogs. Blogroll is considered to be a fixed list of links that is updated infrequently. Blogrolls can be used to indicate the affiliated blogs of a certain blog.
3.2 Hyperlinks and the Blog Rating Model The aim of the proposed blog rating model is to adaptively assign a score to every blog based on the recommendations from other blogs. Each blog contains: a blogroll, which is a set of hyperlinks to affiliated blogs, and one or more posts, published at different times that contain hyperlinks to the posts of other blogs. The model distinguishes between these two hyperlink types as depicted in Figure 1.
Fig. 1 Hyperlink types in the blog rating model
More specifically, a blogroll hyperlink is a link in the blogroll of blog A pointing to a blog B. It denotes that A gives a permanent recommendation for B and thus contributes a constant degree to the score of B. On the contrary, a post hyperlink from an individual post A1 of blog A to a post B2 in blog B denotes a temporary interest of A to the contents of B and consequently increases the score of B only for a short period of time after the post has been published. The blogroll links of blog A increase the rating of all pointed blogs. Moreover, any new posts, which are added daily in blog A, contribute to the rating of the respective blogs they point to. As a consequence, the local rating assigned by a blog A to a blog B is the weighted sum of ratings assigned by blogroll and post hyperlinks respectively. This local rating information depicts the image of X for the part of the blogosphere pointed by X. This information is updated every time a new hyperlink appears, either in a post or in the blogroll of X. The rating assigned by a certain post hyperlink decreases as days pass and the post becomes old. By
Blog Rating as an Iterative Collaborative Process
191
monitoring a certain blog X for several periods, we are able to compute the accumulative local ratings assigned to all blogs pointed by blog X. The local rating information of a blog A can be enhanced by the information provided by its affiliated blogs FA (e.g., the blogs in its blogroll). The affiliated blogs are the trusted blogs of A and their opinion for the blogosphere is of interest to A. As a result, the collaborative local rating combines the direct experiences of the evaluator blog A for B with information regarding B gathered from the N affiliated witness blog sites. If we consider that in Figure 1 the blogs C and A collaborate, then the collaborative rating of blog B is a weighted sum of local ratings provided for B by A and C. In a similar manner, we are able to compute a global rating for every blog, by aggregating the local rating information of all blogs. Every new blog Y that is added to the blogosphere receives a default minimum global rating. This score increases by the number of incoming blogroll or post hyperlinks and is an indication of the blog’s credibility when it is used as a witness to other blogs. In general, when we rate a service by combining multiple witnesses, we take into account the credibility of the witness and the freshness of information. In an analogous manner, when we combine local ratings from different blogs we must consider the freshness of ratings, which corresponds to a) the freshness of links and b) the freshness of prior rating (i.e., considering the time period during which the rating was estimated) and the credibility of each individual blog, which in context of this study for the global rating formation is depicted in the blog’s global rating on a previous period. For example, the rating of Blog B, in figure 1, is subject to the local ratings from A to B and from C to B, weighted by the global ratings of A and C in the previous known period. The former is the sum of ratings assigned via the blogroll link and Hyperlinkt1, where as the latter is based only on Hyperlinkt3, which however is more recent than Hyperlinkt1 (given that t3>t2>t1). If no more post links are added in the next period, the global rating of B decreases, since the freshness of Hyperlinkt1 and Hyperlinkt3 decreases. If no more post links are added for several periods, then the global rating of B is subject only to the rating assigned by the blogroll link of blog A.
4 Blog Site Rating System Formulation Let us assume the presence of M Blog Sites BSs falling within the same category with respect to the topics covered and the interests shared. Let BS = {BS1 , BS 2 ,...BS M } be the set of Blog Sites in the system. In subsection 4.1, the local blog site rating formation is formally described taking into account only first hand information (i.e. what the evaluator blog site considers about the target blog site), in subsection 4.2, the blog site local rating is collaboratively formed (the evaluator blog site takes into account the opinion of other affiliated blog sites concerning the blog site under evaluation) , while in subsection 4.3, a global value for a blog site is formed taking into account the view of all blog sites in the system.
192
M. Louta and I. Varlamis
4.1 Local Accumulative Blog Site Rating Formation Concerning the local formation of the Blog Site
BS i rating, the Blog Site
BS j may rate BS i at time period c in accordance with the following formula:
LABSRt p =cj ( BS i ) = BS
where
c
∑w
k =c − n +1 k >0
⋅ LBSRt p =kj ( BSi ) . BS
t p =k
(1)
BS
LABSRt p =jc ( BSi ) is the local accumulative BS i rating estimated by
BS j at time period t p = c , LBSRt p =jk ( BS i ) denotes the local rating the BS
evaluator
BS j attributes to the target BS i at time period t p = k and weight BS
wt p =k provides the relative significance of the LBSRt p =jk ( BS i ) factor estimated BS i rating estimation by the evaluator BS j .
at time period k to the overall BS j t p =k
Concerning the LBSR
( BS i ) factor estimation, the evaluator BS j may
exploit the following formula:
LBSRt p =kj ( BS i ) = wBR ⋅ BRt p =jk ( BS i ) + wEP ⋅ EPt p =kj ( BS i ) . BS
BS
BS
As may be observed from Equation 2, the local rating of the target
(2)
BS i is a
weighted combination of two factors. The first factor contributing to the overall BS
BS i rating value (i.e., BRt p =jk ( BS i ) ) forms the blogroll related factor. This factor is introduced on the basis that the BS j blogroll provides a list of friendly blog sites frequently accessed/read by the authors of BS j . It has been assumed BS
that BRt p =jk ( BS i ) lies within the [0,1] range, where a value close to 1 indicates that the target
BS i is a friendly blog site to the evaluator BS j . In the context of BS
this study, BRt p =jk ( BS i ) is modeled as a decision variable assuming values 1 or 0 depending on whether BS i belongs to the blogroll of BS j or not at time period
k , respectively. Alternatively, BS j could provide a rating of the friendly blog sites in the blogroll, which could be exploited in order to differentiate BS
BRt p =jk ( BS i ) factor for the friendly blog sites comprised in the BS j blogroll. This issue will be considered in a future version of this study.
Blog Rating as an Iterative Collaborative Process
193
The second factor contributing to the overall
BS
LBSRt p =jk ( BS i ) (i.e.
BS
EPt p =kj ( BS i ) ) depends on the fraction of BS j posts pointing to BS i at time period k . This factor has been assumed to lie within the [0,1] range and may be given by the following equation: BS
BS j t p =k
EP
( BS i ) =
NoPt p =kj ( BS i ) BS
NoPt p =kj
.
(3)
BS
where NoPt p = kj ( BS i ) denotes the number of posts created between time period
t p = k − 1 and t p = k pointing to the target blog site BS i and NoPt p =kj BS
denotes the total number of the evaluator BS j posts created in between time period k − 1 and k .
wBR and wEP provide the relative significance of the anticipated blogroll related part and the posts related factor. It is assumed that weights wBR and wEP are normalized to add up to 1 (i.e., wBR + wEP = 1 ). From the Weights
BS
aforementioned analysis, it is obvious that the LBSRt p =kj ( BS i ) factor lies within the [0,1] range. c
Weights
wt p =k in equation (1) are normalized to add up to 1 (
∑w
k = c − n +1 k >0
t p =k
= 1)
and may be given by the following equation:
wt p = k =
wk n
∑ wl
.
(4)
l =1
where w = ⎧⎨n − c + k , c ≥ n ⎫⎬ . k ⎩k , c < n ⎭ At this point it should be noted that the authors have assumed that the local rating estimation takes place at consecutive, equally distributed, time intervals. For the formation of the local accumulative BS rating at a time period c , the evaluator considers only the n more recent ratings formed. The value n determines the memory of the system. Small value for the n parameter means that the memory of the system is small, whereas large value considers a large memory for the system. Equation (4) in essence models the fact that more recent local BS ratings should weigh more in the overall BS rating evaluation.
194
M. Louta and I. Varlamis
4.2 Collaborative Local Blog Site Rating Formation
BS i , the evaluator Blog Site BS j needs to contact a set WBS of N witness Blog Sites ( WBS ⊆ BS ) in
In order to estimate the rating of a target Blog Site
BSi . The set of the N witnesses is a subset of the BS = {BS1 , BS 2 ,...BS M } set and can be the blog sites in the blog roll of BS j . The target BS i overall collaborative rating order to get feedback reports on the usability of the
BS
CLBSRt p =jc ( BS i ) may be estimated by the evaluator Blog Site BS j at time period c in accordance with the following formula:
CLBSRt p =jc ( BSi ) = wt p =jc ( BS j ) ⋅ LABSRt p =cj ( BSi ) + BS
N
∑w k =1 k ≠i
BS j t p =c
BS
BS
.
k ( BSk ) ⋅ LABSRtBS ( BSi ) p =c
(5)
As may be observed from equation (5), the collaborative rating of the target
BSi
is a weighted combination of two factors. The first factor contributing to the rating value is based on the direct experiences of the evaluator blog site BS j , while the second factor depends on information regarding
BSi past behaviour gathered
N witnesses blog sites. BS Weight wt p =jc ( BS x ) provides the relative significance of the rating of the target
from the
blog site BSi as formed by the blog site evaluator
BS j t p =c
BS j . In general, w
BS x to the overall rating estimation by the
( BS x ) is a measure of the credibility of witness
BS x and may be a function of the local accumulative blog site rating attributed to each
BS
BS x by the evaluator BS j . It has been assumed that weights wt p =jc ( BS x )
are normalized to add up to 1 (i.e.,
N
wt p =jc ( BS j ) + ∑ wt p =jc ( BS k ) = 1 ). Thus, BS
BS
k =1 k ≠i
weight
BS
wt p =jc ( BS x ) may be given by the following equation: BS
BS j t p =c
w
( BS x ) =
LABSRt p =jc ( BS x )
∑ LABSR
BS x ∈WBS ∪ BS j
BS j t p =c
( BS x )
.
(6)
Blog Rating as an Iterative Collaborative Process
where
195
BS
LABSRt p =jc ( BS x ) is the local accumulative blog site rating attributed to
BS x by the evaluator BS j . One may easily conclude that for the
Blog Site evaluator
BS j it stands LABSRt p =jc ( BS j ) = 1 . BS
At this point it should be noted that, considering different blog sites, the duration of each time interval introduced in subsection 4.1 for the local accumulative blog site rating estimation may differ. This has the side-effect that it is not necessary all witness blog sites to have estimated their local accumulative rating concerning target blog site BSi at the same time. Let us for example consider a blog site updating local accumulative blog site ratings per month and a blog site updating the related information per day. In order to introduce the time effect in our mechanism and model the fact that more recent ratings should weigh more in the overall collaborative blog site rating estimation, equation (5) should be rewritten as follows:
CLBSRtimejc ( BSi ) = wtimej c ( BS j ) ⋅ LABSRtimejc ( BSi ) + BS
N
∑w k =1 k ≠i
where
timed k
BS
BS
BS k ⋅ wtimej c ( BS k ) ⋅ LABSRtime ( BSi )} dk BS
.
(7)
wtimed k is a decaying parameter given by the following equation: wtime dk = 1 −
In the context of this study,
timec − timed k timec
.
(8)
wtimed k is modeled as a polynomial function. Other
functions (for example exponential) could be defined as well. As may be observed from equation (7), the bigger the quantity timec − timed k , the smaller the
BS k rating provided to the overall collaborative target BS i rating formation. At this point, we assume that when at timec the collaborative rating is estimated by the evaluator BS j , its local accumulative contribution of witness blog site
ratings have also been updated.
4.3 Global Blog Site Rating Formation In order to estimate the global rating of a target Blog Site
BS i , a specialized blog
site search engine evaluator collects the feedback reports on the usability of the
196
M. Louta and I. Varlamis
BSi from the M Blog Sites belonging to the set BS = {BS1 , BS 2 ,...BSs M } . BS
BS i overall collaborative global rating GBSRt p =jc ( BSi ) may be
The target
estimated by the evaluator Blog Site
BS j at time c in accordance with the
following formula: M
BS k GBSRtimejc ( BSi ) = ∑ wtime d k ⋅ w BS k ⋅ LABSRtime ( BSi ) . dk BS
(9)
k =1 k ≠i
As may be observed from equation (9), the global rating of the target weighted combination of the rating values provided by the blog sites
BSi is a
BSk , based
on the direct experiences. Weight site
wBS k provides the relative significance of the rating of the target blog
BSi as formed by the blog site BSk to the overall rating estimation by the
evaluator blog site search engine. In general, of blog site
wBS k is a measure of the credibility
BSk and may be a function of its prior global rating as estimated by
the evaluator blog site search engine during the previous time. In the context of this study, weight
wBS k is given by the following equation:
BS k w BS k = GBSRtime ( BSi ) ⋅ c −1
where
NoBS ( BS k ) . M
(10)
BS k GBSRtime ( BSi ) denotes the prior global rating of blog site BSk c −1
timec −1 in accordance with equation (9), NoBS ( BSk ) denotes the number of blog sites pointing to BS k at timec and M is the total number of blog sites in the system. The portion of the blog sites pointing to BS k at time c
estimated at
has been introduced in equation (10) in order to enhance the credibility value of a witness blog site providing a rating for the blog site BSi under evaluation. Finally, in analogy to subsection 4.2, parameter
wtimed k is the decaying factor
given by equation (8), introduced in order to weigh down possible outdated evaluation ratings provided.
4.4 The Semantics of the Rating Model In order to support the operation of the rating mechanism, we suggest the use of semantics in the description of post and blogroll hyperlinks. The semantic
Blog Rating as an Iterative Collaborative Process
197
information which will be attached to each hyperlink will allow bloggers to better describe their intentions behind creating the link, to prioritize affiliated blogs in the blogroll or even to provide topic information for the pointed posts. The rating mechanism can be adopted to update the local scores, and to employ them in providing collaborative and global scores. RDF is a popular format for describing metadata and it is used to support our rating model. As mentioned in section 4.1, the local (accumulative or not) blog site ratings are solely based on the recommendations provided by the blog itself. As a result, only the RDF file associated with the current blog is required for storing local ratings. In general, the RDF file comprises URI and ratings for each blog in the blogroll and for each blog referenced in the posts. A software entity acting on behalf of each blog is responsible for reading the RDF file, recomputing the accumulative localRating and updating the RDF with the new ratings and the new date of update. In the following, we provide a fictional example of an RDF file for a blog (e.g., my.blog.co.uk) that contains two links in the blogroll (e.g., myother.blog.co.uk and blog.co.uk/agoodone) and two post with hyperlinks to an affiliated blog (e.g., blog.co.uk/agoodone) and a non-affiliated blog (e.g., anyblog.co.uk), respectively. 0.9 2008-11-1 http://my.blog.co.uk/2008-111 http://blog.co.uk/agoodone/2008-1028 0.8 2008-12-15 http://my.blog.co.uk/2008-1215 http://anyblog.co.uk/2008-1212
198
M. Louta and I. Varlamis
Fig. 2 The RDF structure for a blog
On the other side, the local collaborative blog site rating is subject to the blog’s RDF, but also to the RDF files of all other blogs in its blogroll, assuming that the witness set is constituted by the blog sites comprised in the blogroll. Moreover, the collaborative process will take into account the rating of each external recommendation. The rating is available in the original RDF file and the external recommendations can be retrieved from the respective RDF files. In such case the rating mechanism for a blog should process the blog’s current ratings and those provided by each of the affiliated blogs. Finally, the rating mechanism must collect and process the RDF metadata files from all blogs in the set in order to calculate the global blog site rating, This process is repeated periodically, so as to keep the rating up to date. In the computation of collaborative and global rating, the mechanism should take into account two factors that affect the transitivity of rating: a) the effect of ratings provided by the affiliated blogs depends upon their credibility (i.e., the picture the evaluator blog site has formed about them), b) the rating contributed by a certain postlink decreases day by day and c) the prior rating estimated during the previous time period decreases in order to weigh down possible outdated evaluation ratings provided. The latter factor is captured by the time decay factor of equation 8, whereas the former is captured by the respective weight BS
wt p =jc ( BS x ) in equation 5. The localRating and dateupdated values of the postlink are employed to store these two factors.
5 Experimental Setup In order to demonstrate the blog rating model, we performed experiments on a sample blog dataset provided by Nielsen BuzzMetrics, Inc. The dataset spans a period before and after an important event: the London bombings (4/7/2005 – 24/7/2005). Table 1 that follows summarizes the statistics of the dataset:
Blog Rating as an Iterative Collaborative Process
199
Table 1 Statistics of the sample blog set Unique blogs number Links to any blog 1,545,205 2,138,381
Links to blogs in the set 331,068
Links to news sites 498,834
It is obvious from Table 1 that the majority of the links points to blogs outside of the initial set and a large portion of the links points to news sites. The blogs that are outside of the initial set can probably be spam blogs (splogs), which are massively pointed by blogs in the set in an attempt to improve their ranking. We perform three experiments on the same dataset: a) We find the most referenced blogs and news site for a single day using inlinks only, b) we rank sites according to the global rating using information for a single day and compare results with those of the first experiment, c) we apply the global rating model in the blogs using the posts of a single day, using different values for the memory factor and compare the position of spam blogs in the different sets of ranked results. As it is explained in the analysis of the results, our rating model penalizes the spam blogs, even for small values of the memory factor. Results in Table 2 contain the top-20 blogs ranked using the number of incoming links as the rating factor. According to these results, the most popular sites on the first and the last day in the dataset comprise news sites and spam blogs (positions 13 to 20 on 4/7/2005 and 11 to 20 on 24/7/2005). Although news sites are acceptable in the top ranked results, the spam blogs should be penalized by the rating model. Table 2 Most referenced sites in the dataset for the 4th and 24th of July 2005 Most referenced sites (4/7) Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
URL www.livejournal.com spaces.msn.com www.xanga.com www.skaterz.info news.yahoo.com news.bbc.co.uk www.nytimes.com www.cnn.com www.washingtonpost.com pics.livejournal.com www.msnbc.msn.com www.guardian.co.uk fantasy-fest-nude.blogspot.com top-play-lolita.blogspot.com lolita-top-sites.blogspot.com hardcore-lesbian-pictures.blogspot.com lesbian-kissing-pictures.blogspot.com naturist-teen-photos.blogspot.com funny-as-shit.blogspot.com really-funny-shit.blogspot.com
Most referenced sites (24/7) Inlinks 9511 2724 2503 1647 1399 1127 1092 563 560 530 451 389 376 376 376 376 376 376 376 376
Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
URL www.livejournal.com www.xanga.com spaces.msn.com news.yahoo.com www.nytimes.com news.bbc.co.uk pics.livejournal.com www.washingtonpost.com biz.yahoo.com www.bbc.co.uk miss-usa-teen.blogspot.com nude-thumbnails.blogspot.com nude-girls-thumbnails.blogspot.com asian-nude-thumbnails.blogspot.com non-nude-teen-photos.blogspot.com amateur-teen-nude.blogspot.com nude-amateur-photos.blogspot.com photos-amateur-gratuites.blogspot.com breast-pumps-reviews.blogspot.com young-naked-gay-boys.blogspot.com
Inlinks 3450 2502 1229 1070 1039 841 535 513 443 440 361 361 361 361 361 361 361 361 361 361
The first step towards correcting this problem is to use our rating model instead of the number of inlinks. The local ratings are computed on a per blog basis. Thereafter, the global rating for all sites is estimated, using the accumulative
200
M. Louta and I. Varlamis
algorithm. A useful observation is that spam blogs usually receive a large number of inlinks the day they are created, but they further receive no inlinks, so it is expected that spam blogs will receive lower ratings by our model. The results in Table 3 show the ranking of blogs in the dataset according to their global rating in the 4th of July. Since this is the first day in our dataset, the global rating is computed using the inlinks of this specific day (memory equals to zero, m=0). It is obvious from the results in Table 3 that news sites have improved their ranking against all other blogs (including spam blogs). Table 3 Top-20 ranked sites in 4/7 using global rating (m=0) Most referenced sites Rank in 4/7 using inlinks
Rank
URL
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
www.livejournal.com news.yahoo.com news.bbc.co.uk www.bbc.co.uk www.nytimes.com www.cnn.com www.washingtonpost.com biz.yahoo.com pics.livejournal.com www.msnbc.msn.com www.guardian.co.uk www.latimes.com www.usatoday.com livejournal.com today.reuters.com www.sfgate.com www.boston.com www.forbes.com www.timesonline.co.uk www.newsday.com
1 5 6 120 7 8 9 118 10 11 12 123 238 246 261 122 173 178 174 266
As mentioned before, spam blogs usually receive a large number of links in a single day, which explains their ranking in the results of Table 2. However, these incoming links have a single origin (another spam blog), which has been created for this reason. According to equation 3, the contribution of blogs that contain many links is small and consequently spam blogs of this type receive a small local rating. However, there are still spam blogs that receive fabricated inlinks from different origins in the same time. In order to penalize these links we examine the blogosphere for several days (i.e., 20 days) using our model with memory (i.e., local accumulative blog site rating formation considering 20 time periods – 20 days). In Table 4, we present the top-20 ranked blog urls in the dataset (urls that contain the term ‘blog’) for the 24th of July, which is the last date in the set. The blogs are rated using the maximum possible memory in our dataset (m=20). The rightmost column of Table 4 contains the number of inlinks for each blog in the 24th of July and the middle column contains the position of this blog in the same date, ranked using only the number of inlinks. It is obvious that normal blogs rank higher when collective memory from the previous days is employed and surpass the spam blogs.
Blog Rating as an Iterative Collaborative Process
201
Table 4 Most highly ranked blogs in the 24th of July (global rating, m=20) Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Most referenced blogs URL Rank in 24/7 using inlinks radio.weblogs.com 199 blog.livedoor.jp 222 blogs.sun.com 318 postsecret.blogspot.com 348 blog.searchenginewatch.com 1204 www.blogathon.org 308 atrios.blogspot.com 302 blogs.salon.com 413 www.problogger.net 523 doc.weblogs.com 480 www.blogherald.com 519 profiles.blogdrive.com 253 powerlineblog.com 290 www.bloggingbaby.com 258 googleblog.blogspot.com 444 americaninlebanon.blogspot.com 1000 blogs.guardian.co.uk 1431 badhairblog.blogspot.com 496 hurryupharry.bloghouse.net 995 www.captainsquartersblog.com 267
Inlinks in 24/7 88 65 26 22 4 28 30 17 12 14 12 47 32 45 15 6 4 13 6 40
A point of interest in the results is that professional blogs, such as Sun’s blog or Google’s blog are ranked high when global rating is employed considering accumulative local blog site rating, although they receive few links in a single day (low ranking using a single day’s links). However, such blogs: receive links in a daily basis, are the single targets of the post each time (in contrast to spam blogs) and are pointed by highly rated blogs. Table 5 The effect of memory size in spam blog global ranking Memory Global ranking for the first set of spam blogs first – last in the set Best position for a spam blog
Inlinks 47 - 292
1 6298 12250
47
6298
-
3 8207 19672 7124
-
5 12157 21587 6699
-
7 1501923126 7882
In a third set of experiments, we examine the ranking of the 53383 blogs of our blogosphere part in the 14th of July (the date was selected because it is in the middle of the period examined) using five different values for the system’s memory: a) we consider that for memory equaling zero, only the inlinks created on the specific date affect rating, b) we take into account the postlinks provided at most m (m=1,3,5,7) days before the 14th of July. We manually examine the set of results to find the position of the first spam blog in the global ranking of blogs. As it is portrayed in Table 5, the first, from a set of spam blogs (ranging from the 49th to the 292nd position), falls below the 6298th position when the local ratings of the current day are employed in the calculation of global rating. It falls even lower for bigger values of m, although the change is smaller.
202
M. Louta and I. Varlamis
6 Conclusions This work presented an iterative collaborative process to provide a global rating for a set of blogs using local rating information expressed via blogroll and post hyperlinks. The rating model is mathematically formulated, comprising local and local accumulative blog site rating formation (where the accumulative rating is calculated considering the local rating as estimated upon different consecutive time periods), collaborative local blog site formation (where the evaluator blog site exploits information gathered form other affilitated witnesses blog sites) and global rating formation, incorporating the view of all blog sited in the system. Our model exploits two special features of the blogosphere: a) the difference between blogroll links, which denote a more permanent trust towards the blog being pointed, and post links, which represent a more transient reference to a blog, b) the timestamp information of a post, which can be employed as a timestamp for a hyperlink. Additionally, a suggestion on the semantics that can be attached to each blog is also provided. An initial experimental evaluation shows that the model performs well by punishing spam blogs that receive many links from a single source and favouring blogs that receive inlinks in a standard basis. The next steps of this work is to develop the architecture and the system entities that estimate and attach the rating information to blogs and that process local ratings in a periodic manner in order to update collaborative and global ratings. Future work additionally includes incorporation of possible postlink negative recommendations.
References 1. Blogpulse. Automated trend discovery system for blogs (2005), http://blogpulse.com/ (accessed May 2009) 2. Technorati, Blog tracking service (2005), http://technorati.com/ (accessed May 2009) 3. Mishne, G.: Information Access Challenges in the Blogspace. In: IIIA-2006: International Workshop on Intelligent Information Access, Helsinki, Finland (2006) 4. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Now Publishers (July 2008) ISBN 978-1-60198-150-9 5. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998) 6. Haveliwala, T.: Topic-sensitive PageRank. In: Proceedings of the Eleventh International World Wide Web Conference, Honolulu, Hawaii, May 2002, pp. 517–526 (2002) 7. Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with TrustRank. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), Toronto, Canada, September 2004, pp. 271–279 (2004) 8. Benczur, A.A., Csalogany, K., Sarlos, T., Uher, M.: SpamRank - fully automatic link spam detection. In: Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web, AIRWeb (2005)
Blog Rating as an Iterative Collaborative Process
203
9. Massa, P., Hayes, C.: Page-rerank: using trusted links to re-rank authority. In: Proceedings of Web Intelligence Conference, France (September 2005) 10. Jeh, G., Widom, J.: Scaling personalized web search. In: Proceedings of the Twelfth International World Wide Web Conference, Budapest, Hungary, May 2003, pp. 271– 279 (2003) 11. Technorati.com. VoteLinks, http://developer.technorati.com/wiki/VoteLinks 12. Technorati.com. XFN (Xhtml Friends Network), http://gmpg.org/xfn/ 13. Varlamis, I., Vazirgiannis, M.: Web Document Searching. Using Enhanced Hyperlink Semantics Based on XML. In: Proceeding of the International. Database Eng. & Applications Symposium (IDEAS 2001), pp. 34–43 (2001) 14. Nakajima, S., Tatemura, J., Hino, Y., Hara, Y., Tanaka, K.: Discovering Important Bloggers based on Analyzing Blog Threads. In: 2nd Annual Workshop on the Blogging Ecosystem: Aggregation, Analysis and Dynamics, WWW 2005 (2005) 15. Kritikopoulos, A., Sideri, M., Varlamis, I.: BlogRank: ranking blogs based on connectivity and similarity features. In: Proceedings of the 2nd international Workshop on Advanced Architectures and Algorithms For internet Delivery and Applications, AAA-IDEA 2006, Pisa, Italy, October 10, vol. 198. ACM, New York (2006), http://doi.acm.org/10.1145/1190183.1190193 16. Adar, E., Zhang, L., Adamic, L., Lukose, R.: Implicit Structure and the Dynamics of Blogspace. In: Workshop on the Blogging Ecosystem: Aggregation, Analysis and Dynamics, WWW 2004 (2004) 17. Amitay, E., Carmel, D., Herscovici, M., Lempel, R., Soffer, A.: Trend Detection Through Temporal Link Analysis. Journal of the American Society for Information Science & Technology 55(14), 1270–1281 (2004) 18. Bar-Yossef, Z., Broder, A., Kumar, R., Tomkins, A.: Sic Transit Gloria Telae: Towards an Understanding of the Web’s Decay. In: Proceedings of the 13th International Conference on World Wide Web, pp. 328–337 (2004) 19. Berberich, K., Vazirgiannis, M., Weikum, G.: Time-Aware Authority Ranking. Internet Mathematics Journal 2(3) (2005) 20. Yu, P.S., Li, X., Liu, B.: On the Temporal Dimension of Search. In: Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers & Posters, pp. 448–449. ACM Press, New York (2004)
Simulation-Based UMTS e-Learning Software Florin Sandu and Szilárd Cserey a
b
Abstract. This chapter describes a soft-switch-based mobile network simulation developed for the purpose of e-Learning. The subject of simulation is a 3GPP R4 mobile communication network, where 2 types of scenarios, MOC (Mobile-Originating Call) and MTC (MobileTerminated Call) are simulated. The simulator is capable to generate and send real H.248 and RANAP (Radio Access Network Application Protocol) pre-recorded messages to a loop-back network interface, which can be monitored using a software like Wireshark, Ethereal, so the messages can be decoded and clearly interpreted. This brings a great benefit to professors, enabling them to better explain and show to the students the behavior of specific mobile communication network elements, phases of call establishment, call processing and call control. It is an educational software, that allows university students or company workers in the mobile communication field to learn, understand and study the processes, events and flows that appear in typical UMTS call establishment and call control.
1 Introduction The Release 4 architecture of the 3rd Generation Partnership Project in mobile telecom adds two new elements to traditional network architectures: the mobile switching center server (MSC server) and the media gateway (MGW). These network elements communicate through the so-called “Mc interface” [1, 2, 3]. This architecture gives better opportunities to mobile operators, as it splits the call control functionality from the call switching functionality of the mobile network. This way, the architecture becomes much more flexible and the network becomes more scalable. For the purpose of e-Learning of such modern and complex telecom architectures, the authors integrated and completed a software environment for simulation and monitoring of specific call control and switching, allowing visualization of messaging throughout the network and signaling between network entities. Florin Sandu “Transilvania” University, Bd Eroilor Nr.29A, 500036 – Brasov, Romania Tel.: +40268478705
[email protected] Szilárd Cserey Siemens Program and System Engineering, Str. M. Kogalniceanu Nr.21 Bl.C6, 500090 – Brasov, Romania Tel.: +40765300301
[email protected] M. Wallace et al. (eds.): Semantics in Adaptive and Personalized Services, SCI 279, pp. 205–231. springerlink.com © Springer-Verlag Berlin Heidelberg 2010
206
F. Sandu and S. Cserey
The GUI of this application allows the trainees to perform detailed studies in the same way as in the process of protocol analysis, which belongs to the most important and resource-intensive work in the “real world” of telecom laboratories.
2 The Simulator as e-Learning Software 2.1 Motivation The e-Learning package was built on top the OMNeT++ discrete event simulator and was written in C++ , using the OMNeT++ Application Programming Interfaces (API) [4]. In order to implement our e-Learning concept, we have chosen the UMTS R4, an architecture that was not implemented in any other kind of mobile network simulator. The simulator not just shows the message flows between specific network elements, but also creates realistic packets that can be viewed using protocol monitoring tools like Wireshark / Ethereal. This is the emulation part of the software, a tool that “generates“ realistic RANAP and H.248 protocol messages / packets used on the Mc interface of the R4 network [5]. The messages are not really generated as those appearing on real network emulators, these messages are only extracted from pre-recorded real trace files and are rebuilt to form new packets. A very important feature of the simulator, that brings great benefits for eLearning, is that it can be run step-by-step, this way letting the trainee to analyze thoroughly - and the trainer to explain in more detail - every message that was sent and received [14]. The simulation can be viewed from two perspectives: one is the architectural (that shows which type of message was exchanged between which type of network nodes) - this is specific to the OMNeT++ simulator; the other is the perspective of protocol monitoring (using the Wireshark / Ethereal or other similar tools), where every packet can be analyzed in a great detail, bit-by-bit, showing what kind of protocol is used on each OSI layer and which are the specific parameters.
2.2 How the Simulation Environment Was Created The software was created using the OMNeT++ simulation engine and OMNeT++ APIs. We have developed 7 new models for OMNeT++, each model implementing the behavior of a specific network element. In order to have a complete simulation of the UMTS R4 mobile network, we had to create models for the following mobile network elements: User Mobile Equipment, Node B, Radio Network Controller, Circuit Switched - Media Gateway, Mobile Switching Center – Server, Gateway Mobile Switching Center – Server and a generic model representing the PSTN network [1, 4]. The simulation works only at the message-exchange level, every node can receive specific messages and can respond to specific requests. We put emphasis on messages, message exchange and detailed call flow. This was required by our specific “reverse engineering” approach: we introduced into simulation realistic packets, monitored in a real, operational, state-ofthe-art industrial 3G network. These can be “monitored” and analyzed by the very popular Wireshark / Ethereal protocol monitor [5].
Simulation-Based UMTS e-Learning Software
207
The way we enhanced our simulation was based on collecting some realistic messages (difficult to be synthesized off-line) - packets prerecorded by the Tektronix K1205 and K1297 protocol analyzers in the laboratories of Siemens Program and System Engineering Romania. Thus they were brought to the attention of trainees, complex packets generated by real mobile communication equipment, real Circuit Switched Media Gateways and real Mobile Switching Center Servers. We created a C++ software that can rebuild the messages from these real packets. The simulation environment runs in two planes, one is the OMNeT++ simulation and the other is the packet generator software that sends in the same time with the simulation, real packets to a virtual loop-back adapter which is monitored by Wireshark / Ethereal (see figures 6, 7 and 8). The packet generator is controlled by the OMNeT++ simulation [4, 5, 6].
3 The UMTS R4 Architecture With UMTS Release 4 (R4), the architecture of the core network circuit switched domain was revised radically. The circuit traffic is delivered over an internal packet-switched Internet Protocol (IP) network with connections to external networks handled via media gateways (MGW). The architecture of a R4 network is given in figure 1. The architecture of the CS (Circuit-Switched) core is described by the 3GPP TS 23.205 specification, entitled “Bearer-independent circuitswitched core network”, termed bearer-independent because the core network can use asynchronous transfer mode (ATM) or IP, with many different Layer 2 options. In this case, traffic entering or exiting the circuit-switched domain is controlled by the MGW [1, 2, 3]. This is responsible for switching the traffic within the core network domain and performing data translation between the packet-based format used within the core network and the circuit switched data transmitted on the PSTN or ISDN external network. The MGW is controlled by the mobile switching center (MSC) server, HLR/AC
CSE
signalling interface
CAP
signalling and control interface
CAP C D MSC-S/ VLR
Iucs
MSC-S/ VLR Nc
UTRAN Mc
CONTROL
Mc
PSTN
Iucs
A
Nb
A GERAN MGW
TRANSPORT
MGW
Fig. 1 The architecture of the Release 4 UMTS mobile communication network
208
F. Sandu and S. Cserey
which sends control commands to the MGW, for example to establish bearers in order to carry calls across the core network. The user data (i.e. voice traffic) within the CS-CN (Circuit Switched – Core Network) domain can be carried within ATM cells (ATM adaptation layer 2 - AAL2) or IP packets [1, 9, 10].
4 Implementation of the Simulator Based on OMNET++ As already mentioned in the previous paragraphs, the e-Learning package was built on top of the OMNeT++ simulator; the code was written in C++, using different APIs from OMNeT++ that allows the integration with the simulation engine. OMNeT++ is an object-oriented modular discrete-event network simulator, which can be used for: traffic modeling of telecommunication networks, protocol modeling and other network related simulations [4]. Our development can be considered mainly as “modeling”, as we created models for each element of an R4 UMTS network, and implemented their behavior from the perspective of call flows. The models have been implemented and tailored to support two kind of test scenarios: a Mobile-Originated Call (MOC) scenario
Fig. 2 The Network Editor
Simulation-Based UMTS e-Learning Software
209
and a Mobile-Terminated Call (MTC) scenario, the most basic ones, that happen most frequently in a mobile communication network. The simulation/emulation software consists of two parts. The first part is the simulation software which contains the simulation files of every model (mobile equipment, Node B, RNC, Circuit-Switched Media Gateway, MSC-Server, GMSC-Server and PSTN network), and also the network topology description file. The second part is the emulation software that generates and sends RANAP and H.248 messages to the virtual loop-back network interface. In the OMNeT++ simulator, every model has to be described at least by one class, derived from the cSimpleModule class. The behavior of a model is associated to a generic state machine. This state machine describes the states of a specific network node. Every network node has an initialization state (INIT) where the variables are initiated; after this state the node enters into the waiting state (IDLE), where it waits for incoming messages. Nodes become active when they receive a message. cSimpleModule
UE
NodeB
RNC
MGW
MSC_S
Fig. 3 The class hierarchy of the models
INIT
PACK
MSG RECV
IDLE
MSG SEND
MSG DISCR
Fig. 4 The state machine associated to the network model
GMSC_S
PSTN
210
F. Sandu and S. Cserey
Every event that happens in the simulated network must be caused by the sending and the receiving of a particular message. In order for the simulation to begin, a node in the initialization state must create and send a particular message to an appropriate node. There are two kinds of messages: one type of message is a message that is sent by a node to another node, or a message sent by a module to another module, and the other type of message is the self-message used for the implementation of counters and awake impulses. Because every message represents a specific event, these messages are introduced to a waiting queue, as events have to happen in a specific order. Another kind of state is the state of receiving a message (MSG RECV), it could be a message coming from another node or could be a self message. After this state, the node enters into the message analyzing state, MSG DISCR (message discrimination). The network node may or may not send a response message to the sender, if it decides to send a response, then it will enter into the MSG SEND state. The last state is the PACK state where the node invokes the emulation software which will create realistic RANAP and H.248 protocol messages and send them to a virtual loop-back network adapter, a free software called VirtNet. This interface can be monitored by the Wireshark / Ethereal network analyzer. RANAP messages are used by the MSC Server to communicate with the Radio Network Controller; these messages are forwarded by the Media Gateway to the RNC because the MSC Server is not directly interconnected with the RNC. RANAP is based on the SIGTRAN protocol stack – the Media Gateway contains a Signaling Gateway part whose role is to forward SIGTRAN messages. The SIGTRAN (Signaling Translation) protocol stack is an adaptation of the SS7 protocol to the IP protocol – so that SS7 protocol messages could be transmitted through IP networks. The H.248 / MEGACO (Media Gateway Control) protocol messages are used for the communication: notification and control messages between the MSCServer which is the master/controller and the Media Gateway which is the slave. The functionality of nodes can be described using SDL diagrams. SDL, the “Specification and Description Language” is used to describe the behavior of communication systems. The SDL standard was created by ITU in the Z.100 specification. In figure 5, below, it is given the SDL diagram which describes the functionality of the Radio Network Controller (RNC). The network nodes are behaving and communicating with each other as it is described in the 3GPP TS 23.205 specification where they can be found the call flows for MOC (Mobile-Originated Call) and MTC (Mobile-Terminated Call) [1]. In figure 6 it can be seen a generic illustration of the simulation environment. As simulation is started from the OMNeT++ engine, packets start to circulate on the network - if messages appear on the Mc interface, a packet generator is triggered to automatically generate the corresponding real packets and send them to the VirtNet loop-back adapter, while on the other side, Wireshark captures in real time these generated packets [6].
Simulation-Based UMTS e-Learning Software
211
WAIT FOR MESSAGE
Yes Data ? No CHECK AND CAST voice
CHECK AND CAST signaling
SEND TO MGW IuUP Data
Yes to RNC ?
No Yes RAB ASSIGNMENT REQ ?
No
Yes from NodeB ?
NOTIFY No Yes SEND MSG To MGW
Iu Release Command ? SEND TO MSC-S RAB ASSIGNMENT RESP Yes from MGW ?
SEND MSG To NodeB
NOTIFY
SEND TO MSC-S Iu Release Complete
Fig. 5 The SDL diagram which describes the functionality of RNC
OMNeT++ Simulation Command
WIRESHARK
PACKET GENERATOR
H.248 / RANAP packets
H.248 / RANAP packets
ViRTNeT – VIRTUAL LOOPBACK ADAPTER
OPERATING SYSTEM
Fig. 6 The generic architecture of the e-Learning environment
212
F. Sandu and S. Cserey
The figure 7 below shows a snapshot of the running simulation.
Fig. 7 Snapshot of the OMNet++ simulation
In figure 8 it can be seen how packets are captured and decoded by the Wireshark / Ethereal software.
Fig. 8 Packets captured and decoded by Wireshark / Ethereal
Simulation-Based UMTS e-Learning Software
213
5 Case Study: The MOC Scenario As it can be seen in figures 10-12 below, which illustrate the call flows and simulation of a Mobile-Originated Call, the call begins with a SETUP message sent by the mobile user equipment (UE) to the MSC Server. The MSC Server responds with a CALL PROCEEDING message, and after that, it will send an IAM (Initial Address Message) to the GMSC Server which is connected to the Public Switched Telephone Network (PSTN). The MSC Server will indicate through this message that a forward bearer establishment will be done. With this message, the MSC Server will send to the GMSC Server all the information about the bearers characteristics [1, 7, 8]. The MSC Server will choose a Media Gateway (MGW) to establish the bearer for the call. The IAM message also contains the Media Gateway identifier. The IAM message is part of the BICC (Bearer Independent Call Control) protocol. The GMSC Server will decide if the call must be forwarded or not to the PSTN network; it commands the MGW that resides under its control to make an association between the IP network (from the Core Network side) and the PSTN network. To achieve this setting, the Media Gateway, that is connected directly to the PSTN network, will create two terminations. In the above call-flow, these terminations are T3 and T4. First the GMSC Server sends the “ADD.request($)” command to request from the Media Gateway the creation of a new context, to choose a termination and to add to the newly created context. The $ character is a so-called “wildcard”, in this case $ is the “CHOOSE WILDCARD”, this will tell to Media Gateway to choose a termination. The Media Gateway will respond with the “ADD.request( T4)” message that will contain the ID’s (Identifications) of the context and the termination. This process will be made in the same way for T3 termination (ADD.request($) and ADD.response( T3)). The T3 termination resides on the Core Network and T4 on the other side, on the PSTN network. With the T4 termination it will be created a bearer to the PSTN network. The GMSC Server forwards the IAM message to the PSTN network and sends a response message back to the MSC Server, an APM (Application Transport Message – which belongs to the Bearer Information Messages) message that contains information about the bearer’s characteristics. After the (Mobile Switching Center) MSC Server has received the APM message, it will use the “Establish Bearer Procedure” to request from the Media Gateway that is under it’s control to create a bearer to the remote Media Gateway that is directly connected to the PSTN network. The MSC Server sends together with this request message the information got from the earlier received APM message, information like: The “bearer address”, “binding reference” and the “bearer characteristics”.
214
UE
F. Sandu and S. Cserey
NodeB
RNC
MSC-S
MGW-1
MGW-2
GMSC-S
PSTN
NAS SETUP CALL PROCEEDING BICC
IAM ( Initial Address Message )
H.248 ADD.request ( $ ) ADD.reply ( T4 )
BEARER ESTABLISHMENT ADD.request ( $ ) ADD.reply ( T3 )
ISUP IAM
BICC APM ( Bearer Information Message ) H.248 Establish Bearer + Change Through-Connection Procedures
ADD.request ( $ ) ADD.reply ( T2 )
BEARER ESTABLISHMENT ADD.request ( $ ) Prepare Bearer + Change Through-Connection Procedures
ADD.reply ( T1 )
Fig. 9a The sequence of call flows in a Mobile Originated Call scenario – part 1
Simulation-Based UMTS e-Learning Software
UE
NodeB
RNC
215
MGW-1
MSC-S
MGW-2
GMSC-S
PSTN
RANAP RAB Assignment Request
BEARER ESTABLISHMENT
IuUP IuUP Init IuUP Initialization IuUP Init Ack
RANAP RAB Assignment Response
NbUP NbUP Init NbUP Init Ack
NbUP Initialization
BICC CONTINUITY ACM ACM ( Address Complete Message )
NAS ALERTING
ANM
Fig. 9b The sequence of call flows in a Mobile Originated Call scenario – part 2
ISUP
216
UE
F. Sandu and S. Cserey
NodeB
RNC
MSC-S
MGW-1
MGW-2
GMSC-S
PSTN
H.248 MOD.request ( T3 ) MOD.reply ( T3 ) MOD.request ( T4 ) MOD.reply ( T4 )
RANAP ANM ( Answer Message )
H.248 MOD.request ( T1 ) MOD.reply ( T1 ) MOD.request ( T2 ) MOD.reply ( T2 )
Change Through-Connection + Activate Inter-Working Function + Activate Voice Processing Function Procedures Activate Inter-Working Function + Activate Voice Processing Function Procedures
NAS CONNECT
CONNECT ACKNOWLEDGE
Communication
NAS DISCONNECT BICC RELEASE
Fig. 9c The sequence of call flows in a Mobile Originated Call scenario – part 3
Simulation-Based UMTS e-Learning Software
UE
NodeB
RNC
MSC-S
217
MGW-1
MGW-2
GMSC-S
PSTN
ISUP
NAS
Release
RELEASE
Release Complete
RELEASE COMPLETE
RANAP Iu Release Command
H.248 SUB.request ( T3 )
BEARER RELEASE
SUB.reply ( T3 )
Iu Release Complete
SUB.request ( T4 ) SUB.reply ( T4 )
H.248 SUB.request ( T1 ) Release Termination Procedure
BEARER RELEASE
SUB.reply ( T1 )
BICC Release Complete
H.248 MOD.request ( T2 ) Release Bearer + Change Through Connection Procedures
MOD.reply ( T2 ) SUB.request ( T2 )
Release Termination Procedure
SUB.reply ( T2 ) BEARER RELEASE
Fig. 9d The sequence of call flows in a Mobile Originated Call scenario – part 4
Fig. 10 Mobile Originated Call scenario – simulation setup
218 F. Sandu and S. Cserey
Simulation-Based UMTS e-Learning Software
Fig. 11 Mobile Originated Call scenario – simulation running
219
Fig. 12 Mobile Originated Call scenario – packets captured by Wireshark
220 F. Sandu and S. Cserey
Simulation-Based UMTS e-Learning Software
221
The establishment of the bearer between the two Media Gateways is done with the use of the “ADD.request( $)” and “ADD.reply( T2)” messages. It can be observed that the T2 termination resides on the Core Network. At the creation of the bearer, a connection is established between the T2 termination residing on the local Media Gateway and the T3 termination residing on the remote Media Gateway. By now it has been created a bearer between the two Media Gateways, the local and the remote and another bearer between the remote Media Gateway and the PSTN network. Next comes the “Prepare Bearer Procedure” that will create a bearer to the UMTS Radio Access Network (UTRAN). The MSC Server will choose the characteristics of the bearer. The MSC Server will request from the Media Gateway to be prepared for the access bearer assignment by using the “Prepare Bearer Procedure”. This procedure is accomplished with the use of the “ADD.request( $ )” and “ADD.reply( T1 )” commands. T1 is the termination that is connected to the Radio Access Network. The MSC Server requests from the Media Gateway to send the information about the “bearer address” and the “binding reference”, and in response the MSC Server will send the bearer characteristics and will request, from the Media Gateway, to be notified if the bearer characteristics can be changed or not. For voice calls, the MSC Server will send to the Media Gateway some information for voice encoding. For data calls, the MSC Server will send to the Media Gateway some information about the “PLMN Bearer Capability”. The Media Gateway creates the T1 termination then it adds to the context and sends a response back to the MSC Server, with the IDs of the context and the termination, the IP address and the port number of the termination. After the Media Gateway responded with the “bearer address” and “binding reference” information, the MSC Server will request from the RNC (Radio Network Controller) to allocate the access bearer, by sending to the RNC the “RAB Assignment Request” command. This request will also contain the “bearer address” and “binding reference” information. The MSC Server will be notified by the Media Gateway about the possibilities of modification of the bearer’s characteristics at a later phase. This procedure is called the “Bearer Modification Support Procedure”. After this, the initializations of the user plane are done. The user plane is a protocol stack from the Iu and Nb interfaces. The Iu interface resides between the RNC and the Media Gateway, and the Nb interface resides between two media gateways. The Nb UP and the Iu UP protocols are set to work in the “forward bearer establishment” mode. The Media Gateway knows that “forward bearer establishment” is used, because of the information that was previously sent by the MSC Server at the “Establish Bearer” and “Prepare Bearer” procedures. After the radio access bearer assignment, the MSC Server will send a CONTINUITY message to the GMSC Server to acknowledge the assignment of the radio access bearer. Even at the beginning of the call processing when the IAM message was sent, the MSC Server warned the GMSC Server that soon it will send the CONTINUITY message - this behavior shows that the “late access bearer assignment” will not be used (“late access bearer assignment” means that the bearer
222
F. Sandu and S. Cserey
assignment will be done after the sending of “alerting” and “answer” messages, which in this case is not used). The called party sends and ACM (Address Complete Message) to the mobile network that is forwarded to the local MSC Server, which will send an “ALERTING” message to the “calling party” (to the party that starts the call). In this phase it will ring at the called party and this ringing tone will be also played at the caller party. If the called party responds to the call, in the same time an ANM (Answer Message) message is sent to the local MSC Server. At the receipt of the ANM message, the interconnection of the terminations is done both at the local and the remote Media Gateways. The interconnection of the terminations is done with the “Change Through-Connection” procedure, after this phase the data packets can travel through the terminations in both directions. At this procedure the “MOD.request” and “MOD.response” commands are used. Also with the use of these commands the “Activate Inter-Working Function” and “Activate Voice Processing Function” procedures are done. The “Inter-Working Function” procedure is used in case of data calls, and “Voice Processing Function” procedure in case of voice calls. The “Voice Processing Function” procedure is performed at the Media Gateway and it is used to insure the acoustic quality of the voice, at data calls this feature is deactivated. The MSC Server sends to the caller party a “CONNECT” message to signal the successful connection of the call. The mobile user equipment “UE” will respond with a “CONNECT ACKNOWLEDGE” message. From this point the conversation between the two parties can start. The termination of the call by the caller is done in the following way: The mobile user equipment sends a “DISCONNECT” message to the MSC Server, that will send two “RELEASE” messages, one to the user equipment and one to the GMSC Server. The GMSC Server will also send a “RELEASE” message to the PSTN network. After the resources are released in the PSTN network, this will respond with a “RELEASE COMPLETE” message. The GMSC Server also releases the resources, it deletes the terminations or it keeps them but adds them to a NULL context, and deletes the other context. The release of resources at the remote media gateway is done at both terminations, both at PSTN and Core Network part. After that the GMSC Server sends a “RELEASE COMPLETE” message to the MSC Server. While these operations are executed at the GMSC Server, the MSC Server also frees the allocated resources and will command the RNC to delete the radio access bearer using the “Iu Release” command and the RNC will respond with the “Iu Release Complete” message. This procedure is called the “Release Bearer Procedure”. After this event the T1 termination will be deleted - this termination was allocated at the UTRAN (UMTS Terrestrial Radio Access Network) part. This operation is executed using the “SUB.request( T1 )” command and the procedure is called the “Release Termination Procedure”. The T2 termination that was allocated at the Core Network part will be deleted just after the MSC Server has received the “Release Complete” message from the GMSC Server. First the connection between the terminations is untied, with the “ChangeThrough Connection Procedure” using the “MOD.request( T2 )” command and
Simulation-Based UMTS e-Learning Software
223
just after the accomplishment of this operation, the T2 termination will be dismissed, with the use of the SUB.request( T2 ) command. This procedure is also called “Release Termination Procedure”. Figure 13 presents a MTC (Mobile Terminated Call) scenario – with the call arriving from the PSTN.
Fig. 13 Mobile Terminated Call scenario
6 Interpreting Trace Files. RANAP and H.248 Messages The simulation environment generates 2 types of packets - RANAP and H.248. RANAP is a protocol used to ensure the communication between the MSC-Server and Radio Network Controllers. This protocol is used in UMTS signaling between the Core Network and the Radio Access Network. RANAP is carried over the Iu interface which directly connects the RNCs (Radio Network Controllers) to the CN (Core Network) [8, 12, 13]. RANAP is mainly used for tasks like: Relocation, Radio Access Bearer Management, Paging and assures the transport of signaling messages between the UE (User Equipment) and Core Network, this is called as non-access stratum signaling. The call setup in a MOC scenario begins with the SETUP message which is sent by a mobile equipment to the core network. The RANAP implements the following functions: • • • • •
Relocation – which includes functions like SRNS (Serving Radio Network Subsystem) Relocation, Hard Handover RAB (Radio Access Bearer) Management – where Radio Access Bearers are handled using operations like: RAB Set-up (by eventually queuing the set-up), modification of RAB characteristics, clearing an existing RAB Iu Release – which releases all resources (from control & user plane), of a specific Iu instance, related to a certain UE Report Unsuccessfully Transmitted Data Common ID Management – by permanently sending the identification of UE, from CN to UTRAN
224
F. Sandu and S. Cserey
Fig. 14 Call initiation using SETUP
• • • • • • •
Paging – where the CN pages an idle UE in order to establish a call with it Management of tracing UE-CN signaling transfer Security Mode Control Management of overload Reset Location Reporting
According to figure 14 , the MOC call begins with a SETUP message with it is replied with a CALL PROCEEDING message. Figure 15 shows the protocol stack of the Iu interface. This is important to know because this enables the protocol dissection in Wireshark in a correct manner.
Simulation-Based UMTS e-Learning Software
225
Fig. 15 The protocol stack of the Iu interface
This is how the SETUP message looks after it is captured and dissected by Wireshark. As you can see, it has almost the same structure as the illustration above, the difference is that it is not carried on ATM just on a simple Ethernet frame.
Fig. 16 Wireshark dissection of the SETUP message - details on RANAP
In this case the RANAP protocol role is to carry the signaling message between the UE and Core Network, as it can be seen it encapsulates the SETUP message sent by the a mobile equipment (DTAP – Setup). DTAP (Direct Transfer Application Part) messages are used to transfer call control and mobility management messages to and from the MS.
226
F. Sandu and S. Cserey
RANAP is the radio network layer signaling protocol of the Iu interface, it transfers the messages between RNC and 3G-SGSN, or between RNC and 3G MSC through the Iu interface. It provides a signaling channel through which messages are transparently carried between the UE and Core Network. There are 28 types of RANAP messages, and this is of the type: “DirectTransfer” Direct Transfer is used when a UE – CN signaling message has to be sent from the RNC to the CN without interpretation. The RANAP PDU (protocol data unit) is of the type “initiatingMessage”, which means that the initiating node waits for a reply message to receive in response. When the MSC Server receives a SETUP message, it replies with a CALL PROCEEDING. SETUP and CALL PROCEEDING are specifically used for the call establishment. The other type of message that is generated and captured by the simulation environment is H.248 or MEGACO, which is used by the MSC Server to control one or multiple Media Gateways [2]. As it can be seen in figure 17, the new architecture handles call control separately from call transport, this is why the new Mc interface was introduced.
Fig.17 The advantage of the R4 architecture
The main protocol used on this interface is H.248 - this ensures the communication (control, notification) between MSC Servers and Media Gateways. RANAP just passes through Mc on its way to the MSC Server, but is not specifically originated and terminated on this interface. RANAP is originated from the Radio
Simulation-Based UMTS e-Learning Software
227
Network Controller and terminates at the MSC Server, and is not a protocol that directly belongs to the Mc interface. The architecture of H.248 contains some specific elements called termination and context, these are abstract elements which define the status of connections inside the Media Gateway. A termination, for example, could be a source or a destination of multimedia traffic. Any termination could sink or generate multiple flows of multimedia traffic. The termination could refer to a physical resource like a time-slot from a TDM (Time-Division Multiplex) circuit, in which case is considered as a semipermanent termination as it will exist as long as it is fed with traffic by that TDM time-slot. The other type is the ephemeral termination, which can be created by the “add” command [1, 2]. An ephemeral termination could represent multimedia flows like RTP or AAL2 and it can have properties like : IP address, port number or channel IDs for ALL2. Every termination of a Media Gateway has a specific name / ID, of 32 bits. The Context defines the connection of multiple terminations. All terminations from a context will send and receive multimedia traffic. A termination can be connected to other just by simply moving it from a context to another. There is also a special type of context named “null context”. All terminations added to this context, are in fact disabled and are not connected to any other terminations. The illustration on figure 18 shows how 2 networks could be interconnected using a context and termination models, there are in fact 2 SCN (Switched Circuit Network ) bearers channels directly connected to a RTP multimedia flow from an IP network.
Context
Termination SCN bearer channel
Termination RTP stream Termination SCN bearer channel Null Context Termination
Null Context Termination
Context Termination
Fig. 18 A generic model of the Media Gateway
Termination
228
F. Sandu and S. Cserey
Terminations can generate events, which are detected by Media Gateways and signaled to MSC Servers. An MSC Server could request from a Media Gateway to be notified about certain events - for this purpose it will send a command message named “modify”. If an event suddenly occurs, the Media Gateway informs the MSC Server using the “notify” command. In figure 19, the dissection of a H.248 packet is shown, which contains and Add Request command expressed in the following format: T 6107b89 { c fffffffe { AddReq { 40000012 } } } The AddReq message is sent by the MSC Server to the Media Gateway, in order to add the termination 40000012 to the fffffffe context. The code 6107b89 represents the transaction ID. The context ID is expressed in hexadecimal form, so the ASCII equivalent of 0xfffffffe is $ which means to choose any context. The $ symbol is a wildcard which has the meaning of “choose any”. If instead of $, the * symbol would be used, then it would mean to “select all”. With * the MSC Server could select ALL the terminations and contexts available in a Media Gateway.
Fig. 19 Wireshark dissection of an H.248 AddReq message
The following types of messages were used in the MOC scenario simulation: AddReq, AddReply, ModReq, ModReply, SubReq, SubReply.
AddReq – adds a termination to a context ( acknowledged by AddReply )
Simulation-Based UMTS e-Learning Software
229
ModReq – modifies the properties of a termination ( acknowledged by ModReply)
SubReq-removes a termination from a context ( acknowledged by SubReply )
7 Conclusions The present approach is not an exhaustive coverage of all the message types that can occur in an UMTS network, but a basic and consistent pool of messages that are exchanged between the MSC Server and the CS-Media Gateway. These are the new network elements that make the difference between the new UMTS R4 network and the traditional GSM/UMTS network, introducing the “soft-switching” technology into mobile communications. This message pool is scalable - as the “proof of concept” was successful for the specific of our implementation: the collection and reuse of prerecorded packets that are re-introduced in simulations, creating this way a new kind of realistic e-Learning software packages. These could be very useful not only for university students but also for company employees, for the purpose of training - primary and updating (“delta training”). The software was already tested and practically validated in different laboratory works at “Transilvania” University for last-year students in engineering studying mobile communications and telecom architectures. The practical work was documented and cataloged as SCO (Shareable Content Objects) and listed in the Moodle LMS (Learning Management System) of our university for further use by teachers and tutors. Personalization of educational services becomes possible by this SCORM (SCO Reference Model) compliance of the meta-data attached to these “educational objects” (laboratory simulations-emulations). The semantic nature of these specific SCO fits them for “catalogues”, makes them “searchable” and possible to be “aggregated” in personalized “tailored” learning programs (“individualized paths”). The tutors can pick and recommend parts of the experiments and/or students themselves can choose subsets adapted for “beginner” or “advanced” levels. Furthermore, personalization can be done depending on various pre-requisites (prior graduated modules, quantified in “transferable credits” systems and/or chosen “vocational” profiles – including fees’ dependencies) [15, 16]. The intrinsic layered nature of protocol analysis allows adaptation on levels of difficulty for the approach of these e-Learning scenarios. They were chosen very popular protocol monitors and network simulators. The behavioral approach (state machines specific to telecommunication standards) brings an important feature to these educational services: virtualization, that involves semantics towards a “network of information”. This useful perspective can be extrapolated by students in the understanding of the distributed management of future global telecom systems controllable like “colonies” based on ontologies.
230
F. Sandu and S. Cserey
List of Abbreviations 3GPP AAL2 ANM APM ATM BICC CC CS CS-CN GERAN GGSN GMSC-S HLR IAM M3UA MAP MSC MSC-S MOC MTC MTP MTP3-B MGW MEGACO M3UA OMNeT++ OSI NED NAS PSTN RAB RAN RANAP RNC RTP SCCP SCTP SIGTRAN SGSN TDM UMTS UTRAN UE VLR
Third Generation Partnership Project ATM Adaptation Layer type 2 Answer Message Application Transport Message Asynchronous Transfer Mode Bearer Independent Call Control Call Control Circuit Switched Circuit Switched-Core Network GSM/EDGE Radio Access Network Gateway GPRS Support Node Gateway MSC Server Home Location Register Initial Address Message MTP 3 User Adaptation Mobile Application Part Mobile Switching Centre MSC Server Mobile Originated Call Mobile Terminated Call Message Transfer Part Message Transfer Part level 3 B Media GateWay Media Gateway Control Protocol MTP 3 User Adaptation Objective Modular Network Testbed in C++ Open Systems Interconnection Network Editor Non-Access Stratum Public Switched Telephone Network Radio Access Bearer Radio Access Network Radio Access Network Application Part Radio Network Controller Real-time Transport Protocol Signaling Connection Control Part Streaming Control Transport Protocol Signaling Transfer Serving GPRS Support Node Time Division Multiplexing Universal Mobile Telecommunications System UMTS Radio Access Network User Equipment Visitor Location Register
Simulation-Based UMTS e-Learning Software
231
References [1] 3GPP TS 23.205 version 4.11.0 Release 4, Universal Mobile Telecommunications System (UMTS); Bearer-independent circuit-switched core network; Stage 2, http://webapp.etsi.org/exchangefolder/ ts_123205v041100p.pdf, http://www.3gpp.org [2] ITU-T H.248.1, Gateway control protocol: Version 3, http://www.itu.int [3] Bannister, J., Mather, P., Coope, S.: Convergence Technologies for 3G Networks IP, UMTS, EGPRS and ATM. John Wiley & Sons, Chichester (2004) [4] Varga, A.: OMNeT++ Discrete Event Simulation System Version 3.2 User Manual (2005), http://www.omnetpp.org [5] Lamping, U.: Wireshark Developer’s Guide (2007), http://www.wireshark.org/ [6] Virtual Network Adapter VirtNet1.0, http://www.ntkernel.com/w&p.php?id=32 [7] Korhonen, J.: Introduction to 3G Mobile Communications, 2nd edn. Artech House (2003) ISBN 1-58053-507-0 [8] Kreher, R., Rüdebusch, T.: UMTS Signaling: UMTS Interfaces, Protocols, Message Flows and Procedures Analyzed and Explained. John Wiley & Sons, Chichester (2007) [9] Znaty, S.: Next Generation Network (NGN) dans les réseaux mobiles (2005), http://www.efort.com [10] Znaty, S., Dauphin, J.L.: Architecture NGN: Du NGN Téléphonie au NGN Multimédia (2005), http://www.efort.com [11] Hillebrand, F.: GSM and UMTS, The Creation of Global Mobile Communication. Wiley, Chichester (2001) [12] Wisely, D., Eardley, P., Burness, L.: IP for 3G-Networking Technologies for Mobile Communications. Wiley, Chichester (2002) [13] Glisic, S.G.: Advanced Wireless Networks, 4G Technologies. Wiley, Chichester (2006) [14] Sorensen, B., Ramachandran, S.: Simulation-Based Automated Intelligent Tutoring. In: Smith, M.J., Salvendy, G. (eds.) HCII 2007. LNCS, vol. 4558, pp. 466–474. Springer, Heidelberg (2007) [15] Gibson, D.: New directions in e-learning: Personalization, simulation and program assessment. Invited presentation at the International Conference on Innovation in Higher Education, Kiev, Ukraine (2003), http://ali.apple.com/ali_media/Users/1000507/files/ others/New_Directions_in_elearning.doc [16] Rose, A., Eckard, D., Rubloff, G.: An application framework for creating simulationbased learning environments, University of Maryland Dept. of Computer Science Technical Report CS-TR 3907 (1998)
Author Index
Aghasaryanb, Armen 23 Alexopoulos, Panos 9 Anagnostopoulos, Christos-Nikolaos 127 Anagnostopoulos, Ioannis 1, 145 Askounis, Dimitris 9 Bielikova, Maria Cserey, Szil´ ard
Louta, Malamati 187 Lykourentzou, Ioanna 109 Maglogiannis, Ilias 163 Mignon, Sabrina 23 Mpardis, Giorgos 109 Mylonas, Phivos 1
1 Naudet, Yannick 23 Nikolidakis, Stefanos 145 Nikolopoulos, Vassilis 109
205
Doukas, Charalampos
163 Poslad, Stefan
Felber, Pascal
Giannoukos, Ioannis Iliou, Theodoros
49
73 109
127
Kafentzis, Konstantinos 9 Karpouzis, Kostas 163 Kayafas, Eleftherios 109 Kesorn, Kraisak 49 Kov´ arov´ a, Alena 93 Kropf, Peter 73
Sandu, Florin 205 Senot, Christophe 23 Serbu, Sabina 73 Spielvogel, Christian 73 Szalayov´ a, Lucia 93 Toms, Yann
23
Varlamis, Iraklis 187 Vergados, Dimitrios D. Wallace, Manolis
Liang, Zekeng 49 Loumos, Vassili 109
145
1, 9
Zoumas, Christoforos
9